<div align="center" style="border:solid 1px gray;">
    <a href="https://openalex.org/">
        <img src="../../resources/img/OpenAlex-banner.png" alt="OpenAlex banner" width="300">
    </a>
</div>

# Calculate the Journal Impact Factor for a given journal

<div style='background:#e7edf7'>
    In this notebook we will use the OpenAlex API to determine:
    <blockquote>
        <b><i>What is the two-year Journal Impact Factor (JIF) for a given journal?</i></b>
    </blockquote>
    To get to the bottom of this, we will use the following API functionalities: 
    <a href="https://docs.openalex.org/api/get-lists-of-entities#filter">filtering</a>, 
    <a href="https://docs.openalex.org/api/get-lists-of-entities/filter-entity-lists#logical-expressions">logical expressions</a>
    and <a href="https://docs.openalex.org/api#paging">paging</a>
</div>
<br>

....

Let's start by dividing the process into smaller, more manageable steps:
1. First we need to get the number of all articles published in the *venue* within the previous two years
2. Next we need to count the number of citations for each article in the current year
3. Finally we can calculate the JIF 

---

In [1]:
# input
jif_year = 2021
journal_issn = '0012-3692'

<hr>

## 1. Get all articles published within the previous two years
The first step in querying OpenAlex is to build the URL to get exactly the data we need.  
We need to ask two things:

1. About which entity type (author, concept, institution, venue, work) do we want data?  
 --> Since we want to query for "_all articles_", the entity type should be `works`.

2. What are the criteria the works need to fulfill to fit our purpose?  
   Here we need to look into the list of available [filters for works](https://docs.openalex.org/api/get-lists-of-entities/filter-entity-lists#works-filters) and select the appropriate ones.  
 --> We want to query for "_all articles published in the venue within the previous two years_",  
so we will filter for works that:
    * are specified as journal articles:   
    `type:journal-article`,
    * were published in the *venue*:  
    `host_venue.issn:0012-3692`,
    * within the previous two years:   
    `publication_year:2019|2020`

Now we need to put the URL together from these parts as follows:  
* Starting point is the base URL of the OpenAlex API: `https://api.openalex.org/`
* We append the entity type to it: `https://api.openalex.org/works`
* All criteria need to go into the query parameter `filter` that is added after a question mark: `https://api.openalex.org/works?filter=`
* To construct the filter value we take the criteria we specified before and concatenate them using commas as separators:  
`https://api.openalex.org/works?filter=host_venue.issn:0012-3692,publication_year:2019|2020,type:journal-article`

In [2]:
def build_url_works_from_previous_two_years(year: int, issn: str):
    # specify endpoint
    endpoint = 'works'

    # build the 'filter' parameter
    filters = (
        f'host_venue.issn:{issn}',
        f'publication_year:{year-2}|{year-1}',
        'type:journal-article'
    )

    # put the URL together
    return f'https://api.openalex.org/{endpoint}?filter={",".join(filters)}'

previous_years_works_url = build_url_works_from_previous_two_years(jif_year, journal_issn)
print(f'URL for works from previous two years in given journal:\n{previous_years_works_url}\n')

URL for works from previous two years in given journal:
https://api.openalex.org/works?filter=host_venue.issn:0012-3692,publication_year:2019|2020,type:journal-article



With this URL we can get all articles published in the *venue* within the previous two years!  
In this step we are only interested in their total number, so we use the URL to query OpenAlex and extract their `count` from the `meta` section:

In [3]:
import requests
response = requests.get(previous_years_works_url).json()

previous_years_works_count = response['meta']['count']
print(f"number of articles published in {jif_year-2}-{jif_year-1}: {previous_years_works_count}")

number of articles published in 2019-2020: 6278


<hr>

## 2. Get this year's works that cite one of the articles of two previous years
So far we've got the URL to find all articles in the *venue* for the two previous years and their total number.

Next we would like to determine all works from this year citing one of the articles. 

We start again by building an URL to query the OpenAlex API:

1. About which entity type (author, concept, institution, venue, work) do we want data?  
 --> Since we want to query for "_all works_", the entity type should be `works`.

2. What are the criteria the works need to fulfill to fit our purpose?  
--> We want to query for "_all works from this year citing one of the articles_",  
so we will filter for works that:
    * were published this year:   
    `publication_year:2021`
    * [citing](https://docs.openalex.org/api/get-lists-of-entities/filter-entity-lists#cites) one of the articles:  
    `cites:{}` 

Hmm the `cites` filter is undetermined and we will put `{}` as placeholder in its value's place for now.

Putting together the complete URL will give us

In [5]:
def build_url_works_citing_in_this_year(year: int):
    # specify endpoint
    endpoint = 'works'

    # build the 'filter' parameter
    filters = (
        f'publication_year:{year}',
        'cites:{}'
    )

    # put the URL together
    return f'https://api.openalex.org/{endpoint}?filter={",".join(filters)}'

# put together complete URL
this_years_citing_works_url = build_url_works_citing_in_this_year(jif_year)
print(f'URL for works citing in selected year:\n{this_years_citing_works_url}\n')

URL for works citing in selected year:
https://api.openalex.org/works?filter=publication_year:2021,cites:{}



## So let's take care of the placeholder.

The documentation states that the `cites` filter needs an OpenAlex ID for a work entity as input and returns the list of works citing it. So now we need not only the total number of all articles from the two previous years but also each of their OpenAlex IDs.

In order to loop through all articles, we need to use paging.

In [12]:
import requests

# url with a placeholder for cursor
previous_years_works_url_with_cursor = previous_years_works_url + '&cursor={}&per_page=50'

# loop through pages
this_years_citations_count = 0
cursor = '*'
while cursor:
    
    # set cursor value and request page from OpenAlex
    url = previous_years_works_url_with_cursor.format(cursor)
    page_with_results = requests.get(url).json()
    
    # loop through partial list of results
    # and extract each OpenAlex ID
    results = page_with_results['results']
    openalex_ids = [work['id'].replace("https://openalex.org/", "") for work in results]
    
    if results:
        cites_value = "|".join(openalex_ids)
        citing_url = this_years_citing_works_url.format(cites_value)
        print(citing_url)

        this_years_citations = requests.get(citing_url).json()
        count = this_years_citations['meta']['count']
        print(f"count: {count}")
        this_years_citations_count += count

    # update cursor to meta.next_cursor
    cursor = page_with_results['meta']['next_cursor']

print(f"total number of this year's citations: {this_years_citations_count}")

https://api.openalex.org/works?filter=publication_year:2021,cites:W3014561994|W3045413314|W3015671971|W2899867564|W3014168462|W2911671206|W2904396437|W3012816616|W3015413039|W2938870751|W2911099424|W2896302998|W3017400328|W3089952857|W3039445169|W2889367571|W3045013692|W3049226000|W3042834885|W2896189647|W2955848963|W2894202568|W3011612860|W2895152564|W2912904466|W2945420558|W2987935015|W2911500694|W3020551346|W2981981172|W3015924558|W2990335548|W2991287546|W2872695204|W3025274738|W2946321184|W3013011550|W2965372996|W3023787396|W3049147554|W2897417210|W3040196187|W2894718719|W2898394729|W2943909458|W2971657164|W3019054910|W2939873104|W3045096104|W2910341328
count: 2598
https://api.openalex.org/works?filter=publication_year:2021,cites:W2942079884|W2976953338|W2886737174|W2911574428|W2963011520|W2986026259|W3020374319|W3014940845|W2900579861|W2908468049|W2923372709|W2906869853|W2918762320|W2964795278|W3027088329|W3033767101|W2977278846|W3018959136|W2937862696|W2945165979|W2951843570|W299

## 3. Calculate the JIF

In [14]:
jif = this_years_citations_count / previous_years_works_count
print(f"The JIF for the year 2021 for the given venue is {jif}")

The JIF for the year 2021 for the given venue is 1.1148454921949666


Happy exploring! ðŸ˜Ž