# Tracking Article Processing Charges (APCs) for a given institution

In this notebook, we will query the OpenAlex API to answer the following questions:  

1. **How much are researchers at my institution paying in APCs?**
2. **Which journals/publishers are collecting the most APCs from researchers at my institution?**
3. **How much money are my organization’s researchers saving in discounted APC charges from our transformative/read-publish agreements?**

Most organizations do not have an effective way of tracking the APCs that their researchers pay to publish in open access journals.  By estimating how much money is going to APCs each year, and which publishers are collecting the most APCs, libraries can make more informed decisions around the details of the read-publish agreements they have with various publishers.  

### APC-able Works

Before starting this analysis, it is important to define which types of works are subject to APCs and which are not.   

While a work may include contributions from a number of different institutions, the APC is typically the responsbility of the work's *corresponding author*.  

In addition, open access works published in Gold and Hybrid OA journals are subject to APCs, while those published in Green, Diamond, and Bronze OA journals are not.  

Finally, APCs are not typically charged for editorial content submitted to an open access journal.  

Thus, for the purposes of this notebook, *APC-able works* must have the following characteristics:
- Original articles or reviews
- Published in a Gold or Hybrid OA journal
- Corresponding author is a researcher at our institution.

## Surveying APCs by journal/publisher

### Steps

1. We need to get all the works published and corresponded by researchers at the institution
2. We get the journal/publisher and APC for each publication
3. We sum the APCs (by journal/publisher)

### Input

For inputs, we first need to identify the Research Organization Registry (ROR) ID for our institution. In this example we will use the ROR ID for McMaster University ([https://ror.org/02fa3aq29](https://ror.org/02fa3aq29)). You can search and substitute your own institution's ROR here: [https://ror.org/search](https://ror.org/search).  

Next, we identify the publication year we are interested in analyzing. If the details of your institution's specific tranformative/read-publish agreements change from year to year, you will want to limit your analysis to a single year.  

Finally, becauase editorial content is not typically subject to APCs, we will limit our search to works with the publication types "article" or "review".  

In [1]:
import requests
import numpy as np
import pandas as pd

SAVE_CSV = False  # flag to determine whether to save the output as a CSV file

# input
ror_id = "https://ror.org/02fa3aq29"
publication_year = 2024
publication_types = ["article", "review"]
publication_oa_statuses = ["gold", "hybrid"]

### Get OpenAlex ID of the given institution

We only want publications with corresponding authors, who are affiliated with McMaster University. However, OpenAlex currently does not support filtering corresponding institutions by ROR ID, we will need to find out the OpenAlex ID for McMaster using the [`institutions`](https://docs.openalex.org/api-entities/institutions) entity type.  

Our search criteria are as follows:  
- `ror`: ROR ID of the institution, `ror:https://ror.org/02fa3aq2`

Now we need to build an URL for the query from the following parameters:  
- Starting point is the base URL of the OpenAlex API: `https://api.openalex.org/`
- We append the entity type to it: `https://api.openalex.org/institutions`
- All criteria need to go into the query parameter filter that is added after a question mark: `https://api.openalex.org/institutions?filter=`
- To construct the filter value we take the criteria we specified and concatenate them using commas as separators: `https://api.openalex.org/institutions?filter=ror:https://ror.org/02fa3aq29`

```py
# construct the url using the provided ror id
url = f"https://api.openalex.org/institutions?filter=ror:{ror_id}"

# send a get request to the constructed url
response = requests.get(url)

# parse the response json data
json_data = response.json()

# extract the institution id from the first result
institution_id = json_data["results"][0]["id"]  # https://openalex.org/I98251732
```

In [2]:
from openalex_helpers.institutions import get_institution_by_ror
institution = get_institution_by_ror(ror_id)
institution_id = institution["id"]  # https://openalex.org/I98251732
institution_id


'https://openalex.org/I98251732'

### Get all APC-able works published by researchers at the institution

Our search criteria are as follows:  
- `corresponding_institution_ids`: institution affiliated with the corresponding authors of a work (OpenAlex ID), `corresponding_institution_ids:https://openalex.org/I98251732`
- `publication_year`: the year the work was published, `publication_year:2024`
- [`types`](https://docs.openalex.org/api-entities/works/work-object#type): the type of the work, `type:article|review`
- [`oa_status`](https://docs.openalex.org/api-entities/works/work-object#open_access): the OA status of the work, `oa_status:gold|hybrid`

Now we need to build an URL for the query from the following parameters:  
- Starting point is the base URL of the OpenAlex API: `https://api.openalex.org/`
- We append the entity type to it: `https://api.openalex.org/works`
- All criteria need to go into the query parameter filter that is added after a question mark: `https://api.openalex.org/works?filter=`
- To construct the filter value we take the criteria we specified and concatenate them using commas as separators: `https://api.openalex.org/works?filter=corresponding_institution_ids:https://openalex.org/I98251732,publication_year:2024,type:article|review,oa_status:gold|hybrid&page=1&per-page=50`


```py
def get_works_by_corresponding_institution(institution_id, publication_year, publication_types, page=1, items_per_page=50):
    # construct the api url with the given institution id, publication year, publication types, page number, and items per page
    url = f"https://api.openalex.org/works?filter=corresponding_institution_ids:{institution_id},publication_year:{publication_year},type:{'|'.join(publication_types)},oa_status:{'|'.join(publication_oa_statuses)}&page={page}&per-page={items_per_page}"

    # send a GET request to the api and parse the json response
    response = requests.get(url)
    json_data = response.json()

    # convert the json response to a dataframe
    df_json = pd.DataFrame.from_dict(json_data["results"])

    next_page = True
    if df_json.empty: # check if the dataframe is empty (i.e., no more pages available)
        next_page = False

    # if there are more pages, recursively fetch the next page
    if next_page:
        df_json_next_page = get_works_by_institution(institution_id, publication_year, publication_types, page=page+1, items_per_page=items_per_page)
        df_json = pd.concat([df_json, df_json_next_page])

    return df_json
```

In [3]:
from openalex_helpers.works import get_works_by_corresponding_institution
df_works = get_works_by_corresponding_institution(institution_id, publication_year, publication_types, publication_oa_statuses)
if SAVE_CSV:
    df_works.to_csv(f"institution_works_{publication_year}.csv", index=True)

df_works

Unnamed: 0,id,doi,title,display_name,publication_year,publication_date,ids,language,primary_location,type,...,referenced_works_count,referenced_works,related_works,abstract_inverted_index,abstract_inverted_index_v3,cited_by_api_url,counts_by_year,updated_date,created_date,is_authors_truncated
0,https://openalex.org/W4391340020,https://doi.org/10.1002/adfm.202314520,Borophene Based 3D Extrusion Printed Nanocompo...,Borophene Based 3D Extrusion Printed Nanocompo...,2024,2024-01-30,{'openalex': 'https://openalex.org/W4391340020...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",article,...,72,"[https://openalex.org/W1963693172, https://ope...","[https://openalex.org/W4313334364, https://ope...","{'Abstract': [0], 'Herein,': [1], 'a': [2, 10,...",,https://api.openalex.org/works?filter=cites:W4...,"[{'year': 2025, 'cited_by_count': 15}, {'year'...",2025-05-09T13:22:32.231856,2024-01-31,
1,https://openalex.org/W4391115198,https://doi.org/10.1002/anie.202318665,Development of Better Aptamers: Structured Lib...,Development of Better Aptamers: Structured Lib...,2024,2024-01-23,{'openalex': 'https://openalex.org/W4391115198...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",review,...,204,"[https://openalex.org/W1509198841, https://ope...","[https://openalex.org/W4313237059, https://ope...","{'Abstract': [0], 'Systematic': [1], 'evolutio...",,https://api.openalex.org/works?filter=cites:W4...,"[{'year': 2025, 'cited_by_count': 20}, {'year'...",2025-05-03T01:52:40.470226,2024-01-23,
2,https://openalex.org/W4392384326,https://doi.org/10.1016/s2666-7568(24)00007-2,Prevalence of multimorbidity and polypharmacy ...,Prevalence of multimorbidity and polypharmacy ...,2024,2024-03-04,{'openalex': 'https://openalex.org/W4392384326...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",review,...,127,"[https://openalex.org/W1056226603, https://ope...","[https://openalex.org/W3196849760, https://ope...","{'Multimorbidity': [0], '(multiple': [1, 5], '...",,https://api.openalex.org/works?filter=cites:W4...,"[{'year': 2025, 'cited_by_count': 22}, {'year'...",2025-05-14T01:32:53.765504,2024-03-05,
3,https://openalex.org/W4391180419,https://doi.org/10.1016/j.molliq.2024.124105,Rapid and effective antibiotics elimination fr...,Rapid and effective antibiotics elimination fr...,2024,2024-01-24,{'openalex': 'https://openalex.org/W4391180419...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",article,...,62,"[https://openalex.org/W1988636074, https://ope...","[https://openalex.org/W888754083, https://open...","{'Tetracycline': [0], '(TC)': [1], 'and': [2, ...",,https://api.openalex.org/works?filter=cites:W4...,"[{'year': 2025, 'cited_by_count': 12}, {'year'...",2025-04-30T03:50:33.571084,2024-01-25,
4,https://openalex.org/W4400340432,https://doi.org/10.1016/j.eswa.2024.124678,Physics-informed machine learning: A comprehen...,Physics-informed machine learning: A comprehen...,2024,2024-07-05,{'openalex': 'https://openalex.org/W4400340432...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",review,...,156,"[https://openalex.org/W1581242383, https://ope...","[https://openalex.org/W4394896187, https://ope...","{'Condition': [0], 'monitoring': [1, 21, 233, ...",,https://api.openalex.org/works?filter=cites:W4...,"[{'year': 2025, 'cited_by_count': 17}, {'year'...",2025-05-05T21:24:10.172974,2024-07-06,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34,https://openalex.org/W4405961485,https://doi.org/10.1093/geroni/igae098.1958,APPLES TO APPLES? DISCORDANT DEFINITIONS STILL...,APPLES TO APPLES? DISCORDANT DEFINITIONS STILL...,2024,2024-12-01,{'openalex': 'https://openalex.org/W4405961485...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",article,...,0,[],"[https://openalex.org/W4389568370, https://ope...","{'Abstract': [0], 'The': [1, 21, 146], 'defini...",,https://api.openalex.org/works?filter=cites:W4...,[],2025-05-13T15:17:39.531933,2025-01-01,
35,https://openalex.org/W4405976689,https://doi.org/10.1093/geroni/igae098.0740,UNIFIED FRAMEWORK FOR MEASURING MOBILITY IN OL...,UNIFIED FRAMEWORK FOR MEASURING MOBILITY IN OL...,2024,2024-12-01,{'openalex': 'https://openalex.org/W4405976689...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",article,...,0,[],"[https://openalex.org/W4402993591, https://ope...","{'Abstract': [0], 'In': [1, 171, 216], 'the': ...",,https://api.openalex.org/works?filter=cites:W4...,[],2025-05-14T13:06:11.432616,2025-01-02,
36,https://openalex.org/W4406500031,https://doi.org/10.1109/cascon62161.2024.10838142,Translating Formal Specs: Event-B to English,Translating Formal Specs: Event-B to English,2024,2024-11-11,{'openalex': 'https://openalex.org/W4406500031...,en,"{'is_oa': False, 'landing_page_url': 'https://...",article,...,0,[],"[https://openalex.org/W4396701345, https://ope...",,,https://api.openalex.org/works?filter=cites:W4...,[],2025-05-14T11:48:37.553638,2025-01-17,
37,https://openalex.org/W4407883950,https://doi.org/10.12927/hcpol.2024.27482,Commentary: Reducing the Mortality Gap for the...,Commentary: Reducing the Mortality Gap for the...,2024,2024-11-30,{'openalex': 'https://openalex.org/W4407883950...,en,"{'is_oa': False, 'landing_page_url': 'https://...",article,...,0,[],"[https://openalex.org/W4391375266, https://ope...","{'The': [0], 'mortality': [1], 'gap': [2], 'fa...",,https://api.openalex.org/works?filter=cites:W4...,[],2025-05-10T12:58:30.404897,2025-02-25,


### Get Journals/Publishers and APCs in USD

In a `work` entity object, there is information about the journal (`primary_location`) and the journal's APC list price ([`apc_list`](https://docs.openalex.org/api-entities/works/work-object#apc_list)).  

is derived from the [Directory of Open Access Journals (DOAJ)](https://doaj.org/), which compiles APC data currently available on publishers' websites.  

It should be noted that not all publishers list APC prices on their websites, meaning that not all works will have an `apc_list` price in OpenAlex.  In these cases, we will infer the APC price based on the mean APC prices of those works for which `apc_list` data is available.  

In addition, even when APC price is included on a publishers' website, there is no guarantee that this is the final APC price our authors paid for publication.  

For these reasons, results of this notebook must be understood as best available estimates.  

In [4]:
# extract 'value_usd' from 'apc_list' if it is a dictionary (i.e. 'apc_list' exists in the work record); otherwise, set to null
df_works["apc_list_usd"] = df_works["apc_list"].apply(lambda apc_list: apc_list["value_usd"] if isinstance(apc_list, dict) else np.nan)

# extract 'id' and 'name' from 'source' within 'primary_location' if 'source' exists; otherwise, set to null
df_works["source_id"] = df_works["primary_location"].apply(lambda location: location["source"]["id"] if location["source"] else np.nan)
df_works["source_name"] = df_works["primary_location"].apply(lambda location: location["source"]["display_name"] if location["source"] else np.nan)

# extract 'host_organization' from 'source' within 'primary_location' if 'source' exists; otherwise, set to null
df_works["source_host_organization"] = df_works["primary_location"].apply(lambda location: location["source"]["host_organization"] if location["source"] else np.nan)

# extract 'issn' and 'issn_l' from 'source' within 'primary_location' if 'source' exists; otherwise, set to null
df_works["source_issn"] = df_works["primary_location"].apply(lambda location: location["source"]["issn"] if location["source"] else np.nan)
df_works["source_issn_l"] = df_works["primary_location"].apply(lambda location: location["source"]["issn_l"] if location["source"] else np.nan)

In [5]:
# calculate the average apc where 'apc_list_usd' is not null
apc_mean = df_works[df_works["apc_list_usd"].notnull()]["apc_list_usd"].mean()

# fill null values in 'apc_list_usd' with the calculated average
df_works["apc_list_usd"] = df_works["apc_list_usd"].fillna(apc_mean)

# fill null values in 'source_id', 'source_name', 'source_issn' and 'source_issn_l'
df_works["source_id"] = df_works["source_id"].fillna("unknown source")
df_works["source_name"] = df_works["source_name"].fillna("unknown source")
df_works["source_host_organization"] = df_works["source_host_organization"].fillna("unknown source")
df_works["source_issn"] = df_works["source_issn"].fillna("unknown source")
df_works["source_issn_l"] = df_works["source_issn_l"].fillna("unknown source")

### Get Publisher Display Name (Optional)

OpenAlex identifies publishers with a unique identfier called an OpenAlex ID. The following code translates this OpenAlex ID to the publisher's display name for easier analysis.  

In [6]:
import re

CHUNK_SIZE = 5

def get_source_host_organization_display_name(publisher_ids):
    def get_source_host_organization_publisher_display_name(publisher_ids):
        def get_source_host_organization_publisher_display_name_by_chunk(publisher_ids_chunk):
            # construct the api url using the chunk of publisher ids
            url = f"https://api.openalex.org/publishers?filter=ids.openalex:{'|'.join(publisher_ids_chunk)}"

            # send a GET request to the api and parse the json response
            response = requests.get(url)
            json_data = response.json()

            # convert the json response to a dataframe and return the relevant columns
            df_json = pd.DataFrame.from_dict(json_data["results"])
            return df_json[["id", "display_name"]]
        
        # check if the length of 'publisher_ids' is less than 1
        if len(publisher_ids) < 1:
            # if true, return an empty dataframe
            return pd.DataFrame()

        # split the publisher ids into chunks and apply the function to each chunk
        chunks = np.array_split(publisher_ids, np.ceil(len(publisher_ids) / CHUNK_SIZE))
        df_chunks = pd.DataFrame({"chunk": chunks})
        return pd.concat(df_chunks["chunk"].apply(get_source_host_organization_publisher_display_name_by_chunk).tolist())

    def get_source_host_organization_institution_display_name(institution_ids):
        def get_source_host_organization_institution_display_name_by_chunk(institution_ids_chunk):
            # construct the api url using the chunk of institution ids
            url = f"https://api.openalex.org/institutions?filter=id:{'|'.join(institution_ids_chunk)}"

            # send a GET request to the api and parse the json response
            response = requests.get(url)
            json_data = response.json()

            # convert the json response to a dataframe and return the relevant columns
            df_json = pd.DataFrame.from_dict(json_data["results"])
            return df_json[["id", "display_name"]]

        # check if the length of 'institution_ids' is less than 1
        if len(institution_ids) < 1:
            # if true, return an empty dataframe
            return pd.DataFrame()

        # split the institution ids into chunks and apply the function to each chunk
        chunks = np.array_split(institution_ids, np.ceil(len(institution_ids) / CHUNK_SIZE))
        df_chunks = pd.DataFrame({"chunk": chunks})
        return pd.concat(df_chunks["chunk"].apply(get_source_host_organization_institution_display_name_by_chunk).tolist())

     # filter the publisher ids to get only publisher urls
    publishers = list(filter(lambda s: re.search(r"https:\/\/openalex\.org\/P", s), publisher_ids))
    # filter the institution ids to get only institution urls
    institutions = list(filter(lambda s: re.search(r"https:\/\/openalex\.org\/I", s), publisher_ids))

    # create a dataframe with a default entry for unknown source
    df_lookup = pd.DataFrame.from_dict({"id": ["unknown source"], "display_name": ["unknown source"]})
    # concatenate the dataframes with publisher and institution display names
    df_lookup = pd.concat([df_lookup, get_source_host_organization_publisher_display_name(publishers), get_source_host_organization_institution_display_name(institutions)], ignore_index=True)
    return df_lookup

In [7]:
# get the display names for unique source_host_organization ids in df_works
df_lookup = get_source_host_organization_display_name(df_works["source_host_organization"].unique())
if SAVE_CSV:
    df_lookup.to_csv(f"source_host_organization_lookup.csv", index=True)

df_lookup

Unnamed: 0,id,display_name
0,unknown source,unknown source
1,https://openalex.org/P4310320990,Elsevier BV
2,https://openalex.org/P4310320595,Wiley
3,https://openalex.org/P4310310987,Multidisciplinary Digital Publishing Institute
4,https://openalex.org/P4310319908,Nature Portfolio
...,...,...
106,https://openalex.org/P4310319827,Franz Steiner Verlag
107,https://openalex.org/I1299303238,National Institutes of Health
108,https://openalex.org/I67581229,Istanbul University
109,https://openalex.org/I2750212522,Cold Spring Harbor Laboratory


In [8]:
# update the 'source_host_organization' with the corresponding display names
df_works["source_host_organization"] = df_works["source_host_organization"].apply(lambda publisher: df_lookup[df_lookup["id"] == publisher]["display_name"].squeeze())  

In [9]:
# rename the 'apc_list_usd' column to 'apc_usd'
df_apc = df_works.rename(columns={"apc_list_usd": "apc_usd"})
if SAVE_CSV:
    df_apc.to_csv(f"apc_usd.csv", index=True)

df_apc

Unnamed: 0,id,doi,title,display_name,publication_year,publication_date,ids,language,primary_location,type,...,counts_by_year,updated_date,created_date,is_authors_truncated,apc_usd,source_id,source_name,source_host_organization,source_issn,source_issn_l
0,https://openalex.org/W4391340020,https://doi.org/10.1002/adfm.202314520,Borophene Based 3D Extrusion Printed Nanocompo...,Borophene Based 3D Extrusion Printed Nanocompo...,2024,2024-01-30,{'openalex': 'https://openalex.org/W4391340020...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",article,...,"[{'year': 2025, 'cited_by_count': 15}, {'year'...",2025-05-09T13:22:32.231856,2024-01-31,,5250.000000,https://openalex.org/S135204980,Advanced Functional Materials,Wiley,"[1616-301X, 1616-3028]",1616-301X
1,https://openalex.org/W4391115198,https://doi.org/10.1002/anie.202318665,Development of Better Aptamers: Structured Lib...,Development of Better Aptamers: Structured Lib...,2024,2024-01-23,{'openalex': 'https://openalex.org/W4391115198...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",review,...,"[{'year': 2025, 'cited_by_count': 20}, {'year'...",2025-05-03T01:52:40.470226,2024-01-23,,5250.000000,https://openalex.org/S67393510,Angewandte Chemie International Edition,Wiley,"[1433-7851, 1521-3773]",1433-7851
2,https://openalex.org/W4392384326,https://doi.org/10.1016/s2666-7568(24)00007-2,Prevalence of multimorbidity and polypharmacy ...,Prevalence of multimorbidity and polypharmacy ...,2024,2024-03-04,{'openalex': 'https://openalex.org/W4392384326...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",review,...,"[{'year': 2025, 'cited_by_count': 22}, {'year'...",2025-05-14T01:32:53.765504,2024-03-05,,5000.000000,https://openalex.org/S4210210722,The Lancet Healthy Longevity,Elsevier BV,[2666-7568],2666-7568
3,https://openalex.org/W4391180419,https://doi.org/10.1016/j.molliq.2024.124105,Rapid and effective antibiotics elimination fr...,Rapid and effective antibiotics elimination fr...,2024,2024-01-24,{'openalex': 'https://openalex.org/W4391180419...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",article,...,"[{'year': 2025, 'cited_by_count': 12}, {'year'...",2025-04-30T03:50:33.571084,2024-01-25,,3380.000000,https://openalex.org/S189140625,Journal of Molecular Liquids,Elsevier BV,"[0167-7322, 1873-3166]",0167-7322
4,https://openalex.org/W4400340432,https://doi.org/10.1016/j.eswa.2024.124678,Physics-informed machine learning: A comprehen...,Physics-informed machine learning: A comprehen...,2024,2024-07-05,{'openalex': 'https://openalex.org/W4400340432...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",review,...,"[{'year': 2025, 'cited_by_count': 17}, {'year'...",2025-05-05T21:24:10.172974,2024-07-06,,3220.000000,https://openalex.org/S13144211,Expert Systems with Applications,Elsevier BV,"[0957-4174, 1873-6793]",0957-4174
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34,https://openalex.org/W4405961485,https://doi.org/10.1093/geroni/igae098.1958,APPLES TO APPLES? DISCORDANT DEFINITIONS STILL...,APPLES TO APPLES? DISCORDANT DEFINITIONS STILL...,2024,2024-12-01,{'openalex': 'https://openalex.org/W4405961485...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",article,...,[],2025-05-13T15:17:39.531933,2025-01-01,,3615.000000,https://openalex.org/S2735557189,Innovation in Aging,University of Oxford,[2399-5300],2399-5300
35,https://openalex.org/W4405976689,https://doi.org/10.1093/geroni/igae098.0740,UNIFIED FRAMEWORK FOR MEASURING MOBILITY IN OL...,UNIFIED FRAMEWORK FOR MEASURING MOBILITY IN OL...,2024,2024-12-01,{'openalex': 'https://openalex.org/W4405976689...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",article,...,[],2025-05-14T13:06:11.432616,2025-01-02,,3615.000000,https://openalex.org/S2735557189,Innovation in Aging,University of Oxford,[2399-5300],2399-5300
36,https://openalex.org/W4406500031,https://doi.org/10.1109/cascon62161.2024.10838142,Translating Formal Specs: Event-B to English,Translating Formal Specs: Event-B to English,2024,2024-11-11,{'openalex': 'https://openalex.org/W4406500031...,en,"{'is_oa': False, 'landing_page_url': 'https://...",article,...,[],2025-05-14T11:48:37.553638,2025-01-17,,2872.307924,unknown source,unknown source,unknown source,unknown source,unknown source
37,https://openalex.org/W4407883950,https://doi.org/10.12927/hcpol.2024.27482,Commentary: Reducing the Mortality Gap for the...,Commentary: Reducing the Mortality Gap for the...,2024,2024-11-30,{'openalex': 'https://openalex.org/W4407883950...,en,"{'is_oa': False, 'landing_page_url': 'https://...",article,...,[],2025-05-10T12:58:30.404897,2025-02-25,,2872.307924,https://openalex.org/S4210211288,Healthcare policy,Longwoods Publishing,"[1715-6572, 1715-6580]",1715-6572


### Aggregate APCs Data (Optional)

Here, we build a dataframe containing the number of APC-able works and the estiamted total APC cost for each journal.

In [10]:
# group the dataframe by 'source_id' and 'source_issn_l'
# and aggregate 'source_issn' by taking the maximum value (in this case the common issn list of strings)
# and aggregate 'source_host_organization' by taking the maximum value (in this case the common string name of the source's host organization)
# and aggregate 'source_name' by taking the maximum value (in this case the common string name of the source)
# and 'id' by counting
# and 'apc_list_usd' by summing
df_apc = df_works.groupby(["source_id", "source_issn_l"]).agg({"source_issn": "max", "source_name": "max", "source_host_organization": "max", "id": "count", "apc_list_usd": "sum"})
# rename the 'id' column to 'num_publications' and 'apc_list_usd' column to 'apc_usd'
df_apc.rename(columns={"id": "num_publications", "apc_list_usd": "apc_usd"}, inplace=True)
if SAVE_CSV:
    df_apc.to_csv(f"apc_usd_by_source.csv", index=True)

df_apc

Unnamed: 0_level_0,Unnamed: 1_level_0,source_issn,source_name,source_host_organization,num_publications,apc_usd
source_id,source_issn_l,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
https://openalex.org/S100014455,1756-0500,[1756-0500],BMC Research Notes,BioMed Central,1,1361.000000
https://openalex.org/S10012645,0363-9061,"[0363-9061, 1096-9853]",International Journal for Numerical and Analyt...,Wiley,1,4530.000000
https://openalex.org/S100299040,0017-9078,"[0017-9078, 1538-5159]",Health Physics,Lippincott Williams & Wilkins,1,2872.307924
https://openalex.org/S100662246,1748-2623,"[1748-2623, 1748-2631]",International Journal of Qualitative Studies o...,Taylor & Francis,1,1790.000000
https://openalex.org/S100695177,0004-6256,"[0004-6256, 1538-3881]",The Astronomical Journal,Institute of Physics,1,4499.000000
...,...,...,...,...,...,...
https://openalex.org/S99498898,1567-5394,"[1567-5394, 1878-562X]",Bioelectrochemistry,Elsevier BV,1,3370.000000
https://openalex.org/S99546260,1836-9561,"[1836-9561, 1836-9553]",Journal of physiotherapy,Elsevier BV,1,3450.000000
https://openalex.org/S99961174,1363-2469,"[1363-2469, 1559-808X]",Journal of Earthquake Engineering,Taylor & Francis,1,2872.307924
https://openalex.org/S99985186,1360-8592,"[1360-8592, 1532-9283]",Journal of Bodywork and Movement Therapies,Elsevier BV,1,2670.000000


### Estimating the total (non-discounted) APC spend

In [11]:
total_apc = df_apc["apc_usd"].sum()
print(f"Estimated total (non-discounted) APC cost in {publication_year}: USD {round(total_apc, 2)}.")

Estimated total (non-discounted) APC cost in 2024: USD 4276866.5.


## Calculating Discounted APCs

### Steps

1. We load the given list of read-publish agreement discounts
2. We check if publishers are included in the list by ISSN
3. We calculate the APCs paid with the list of read-publish agreement discounts and the APC listed prior

### Input

Assume we have list of read-publish agreement discounts in CSV format, `discount-list.csv`. In the file, it includes the following necessary attributes,  
- `issn`: ISSN of the publisher
- `discount`: value of the discount, either a number or a percentage
- `is_flatrate`: flag indicating whether the discount is a flat rate discount or a percentage discount

You can download a template `data/discount-list.csv` here and update it with the details of your institutions own APC discounts.  

In [12]:
# input
df_discount = pd.read_csv("data/discount-list.csv")

In [13]:
import typing

def get_discount(issn: typing.List[str] | str, apc: float) -> float:
    # check if issn is a string, if so, convert it to a list
    if isinstance(issn, str):
        issn = [issn]

    # filter the discount dataframe to get rows where 'issn' is in the provided issn list
    discount_rows = df_discount[df_discount["issn"].isin(issn)]
    
    # if no discount rows are found, return the original apc
    if discount_rows.empty:
        return apc

    # get the first row from the filtered discount rows
    discount_row = discount_rows.iloc[0]
    
    # if the discount is a flat rate, subtract the discount from the apc
    if discount_row["is_flatrate"]:
        return apc - discount_row["discount"]
    else:
        # if the discount is a percentage, apply the discount to the apc
        return apc * (1 - discount_row["discount"])

### Apply Discounts to APC Data

Here, we apply the APC discounts to the APC data. This produces a dataframe and `.csv` file that includes the number of APC-able publications and the discounted APC cost for each journal.  

In [14]:
# apply the get_discount function to each row of the dataframe to calculate the discounted apc and store it in a new column 'discounted_apc_usd'
df_apc["discounted_apc_usd"] = df_apc.apply(lambda x: get_discount(issn=x["source_issn"], apc=x["apc_usd"]), axis=1)
if SAVE_CSV:
    df_apc.to_csv(f"apc_usd_with_discounts.csv", index=True)

df_apc

Unnamed: 0_level_0,Unnamed: 1_level_0,source_issn,source_name,source_host_organization,num_publications,apc_usd,discounted_apc_usd
source_id,source_issn_l,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
https://openalex.org/S100014455,1756-0500,[1756-0500],BMC Research Notes,BioMed Central,1,1361.000000,1361.000000
https://openalex.org/S10012645,0363-9061,"[0363-9061, 1096-9853]",International Journal for Numerical and Analyt...,Wiley,1,4530.000000,4530.000000
https://openalex.org/S100299040,0017-9078,"[0017-9078, 1538-5159]",Health Physics,Lippincott Williams & Wilkins,1,2872.307924,2872.307924
https://openalex.org/S100662246,1748-2623,"[1748-2623, 1748-2631]",International Journal of Qualitative Studies o...,Taylor & Francis,1,1790.000000,1790.000000
https://openalex.org/S100695177,0004-6256,"[0004-6256, 1538-3881]",The Astronomical Journal,Institute of Physics,1,4499.000000,4499.000000
...,...,...,...,...,...,...,...
https://openalex.org/S99498898,1567-5394,"[1567-5394, 1878-562X]",Bioelectrochemistry,Elsevier BV,1,3370.000000,3370.000000
https://openalex.org/S99546260,1836-9561,"[1836-9561, 1836-9553]",Journal of physiotherapy,Elsevier BV,1,3450.000000,3450.000000
https://openalex.org/S99961174,1363-2469,"[1363-2469, 1559-808X]",Journal of Earthquake Engineering,Taylor & Francis,1,2872.307924,2872.307924
https://openalex.org/S99985186,1360-8592,"[1360-8592, 1532-9283]",Journal of Bodywork and Movement Therapies,Elsevier BV,1,2670.000000,2670.000000


### Estimating the total discounted APC cost

In [15]:
total_apc_discount = df_apc["discounted_apc_usd"].sum()
print(f"Estimated APC cost (including discounts) for {publication_year}: USD {round(total_apc_discount, 2)}.")

Estimated APC cost (including discounts) for 2024: USD 4016812.84.


### Estimating the total APC savings of our institution's read-publish agreements

In [16]:
print(f"Estimated APC saving in {publication_year}: USD {round(total_apc - total_apc_discount, 2)}.")

Estimated APC saving in 2024: USD 260053.65.
