# Setup


In [None]:
# @title Setup
%pip install --upgrade pip
%pip install pandas altair tabulate matplotlib seaborn

In [None]:

import os

current_dir = os.getcwd()
git_repo = None

is_local_clone = False

while current_dir != "/":
    if os.path.exists(os.path.join(current_dir, ".git")):
        git_repo = current_dir
        break
    current_dir = os.path.dirname(current_dir)

if git_repo is not None:
    print(f"Found git repo at {git_repo}")
    os.chdir(git_repo)
    openalex_remote = "github.com/Mearman/openalex-python"
    remotes = os.popen("git remote -v").read()
    print("remotes:")
    print(remotes)
    if openalex_remote in remotes:
        print("Found valid openalex-python repo")
        is_local_clone = True

if is_local_clone:
    print("Installing from local clone")
    !cd {git_repo}
    %pip install -e .
else:
    print("Installing from github")
    %pip install --upgrade --no-cache-dir "git+https://github.com/Mearman/openalex-python.git"

In [None]:
import openalex_api
print(f"OpenAlex API Client version: {openalex_api.__version__}")

In [None]:
import altair as alt
import pandas as pd
import openalex_api

configuration = openalex_api.Configuration(
    host="https://api.openalex.org"
)

authors_api = openalex_api.AuthorsApi(
    openalex_api.ApiClient(configuration))
works_api = openalex_api.WorksApi(
    openalex_api.ApiClient(configuration))
concepts_api = openalex_api.ConceptsApi(
    openalex_api.ApiClient(configuration))
institutions_api = openalex_api.InstitutionsApi(
    openalex_api.ApiClient(configuration))
sources_api = openalex_api.SourcesApi(
    openalex_api.ApiClient(configuration))
publishers_api = openalex_api.PublishersApi(
    openalex_api.ApiClient(configuration))
funders_api = openalex_api.FundersApi(
    openalex_api.ApiClient(configuration))
info_api = openalex_api.InfoApi(
    openalex_api.ApiClient(configuration))

# [OpenAlex Python](https://github.com/Mearman/openalex-python) API Example

Documentation: https://github.com/Mearman/openalex-python


## Introduction

This notebook demonstrates how to use an OpenAlex Python API Client to query the OpenAlex database and generate visualizations.

The OpenAlex database is a database of academic publications and their citations.

The OpenAlex API is a REST API that allows you to query the OpenAlex database. The Documentation for the API can be found here: https://docs.openalex.org/

The OpenAlex Python API Client is a Python package that makes it easier to use the OpenAlex API. The package can be found here: https://github.com/Mearman/openalex-python

The specification from which the OpenAlex Python API Client was generated can be found here: https://github.com/Mearman/openalex-api-spec

An explorer for the REST API can be found here:

[![Open in](https://img.shields.io/badge/Open%20in-Swagger%20UI-85EA2D?style=for-the-badge&logo=Swagger&link=https://mearman.github.io/openalex-swagger-ui-react/)](https://mearman.github.io/openalex-swagger-ui-react/)


### [OpenAlex API documentation](https://docs.openalex.org/)

#### Overview

<!-- https://github.com/ourresearch/openalex-docs/blob/main/.gitbook/assets/OpenAlex-logo-5.png -->
<!-- https://github.com/ourresearch/openalex-docs/blob/main/.gitbook/assets/openalex_logo_text_transparent_20240117.png -->

<img src="https://raw.githubusercontent.com/ourresearch/openalex-docs/main/.gitbook/assets/openalex_logo_text_transparent_20240117.png" style="background-color:white" width="400px">

[**OpenAlex**](https://openalex.org) is a fully open catalog of the global research system. It's named after the [ancient Library of Alexandria](https://en.wikipedia.org/wiki/Library_of_Alexandria) and made by the nonprofit [OurResearch](https://ourresearch.org/).

This is the technical documentation for the **OpenAlex API.** Here, you can learn how to set up your code to access OpenAlex's data. If you want to explore the data as a human, you may be more interested in [**OpenAlex Web**](https://help.openalex.org)**.**

#### Data[](https://docs.openalex.org//#data)

The OpenAlex dataset describes scholarly [_entities_](https://docs.openalex.org/api-entities/entities-overview) and how those entities are connected to each other. Types of entities include [works](https://docs.openalex.org/api-entities/works), [authors](https://docs.openalex.org/api-entities/authors), [sources](https://docs.openalex.org/api-entities/sources), [institutions](https://docs.openalex.org/api-entities/institutions), [concepts](https://docs.openalex.org/api-entities/concepts), [publishers](https://docs.openalex.org/api-entities/publishers), and [funders](https://docs.openalex.org/api-entities/funders).

Together, these make a huge web (or more technically, heterogeneous directed [graph](https://en.wikipedia.org/wiki/Graph_theory)) of hundreds of millions of entities and billions of connections between them all.

#### Access[](https://docs.openalex.org//#access)

The API is the primary way to get OpenAlex data. It's free and requires no authentication. The daily limit for API calls is 100,000 requests per user per day. For best performance, [add your email](https://docs.openalex.org/how-to-use-the-api/rate-limits-and-authentication#the-polite-pool) to all API requests, like `mailto=example@domain.com`. [Learn more](https://docs.openalex.org/how-to-use-the-api/api-overview)

There is also a complete database snapshot available to download. [Learn more about the data snapshot here.](https://docs.openalex.org/download-all-data/openalex-snapshot)

The API has a limit of 100,000 calls per day, and the snapshot is updated monthly. If you need a higher limit, or more frequent updates, please look into [**OpenAlex Premium.**](https://openalex.org/pricing)

#### Why OpenAlex?[](https://docs.openalex.org//#why-openalex)

OpenAlex offers an open replacement for industry-standard scientific knowledge bases like Elsevier's Scopus and Clarivate's Web of Science. [Compared to](https://openalex.org/about#comparison) these paywalled services, OpenAlex offers significant advantages in terms of inclusivity, affordability, and avaliability.

Many people and organizations have already found great value using OpenAlex. Have a look at the [Testimonials](https://openalex.org/testimonials) to hear what they've said!

#### Contact[](https://docs.openalex.org//#contact)

For tech support and bug reports, please visit the [help page](https://openalex.org/help). You can also join the [OpenAlex user group](https://groups.google.com/g/openalex-users), and follow on [Twitter (@OpenAlex_org)](https://twitter.com/openalex_org) and [Mastodon](https://mastodon.social/@OpenAlex).

#### Citation[](https://docs.openalex.org//#citation)

If you use OpenAlex in research, please cite [this paper](https://arxiv.org/abs/2205.01833):

> Priem, J., Piwowar, H., & Orr, R. (2022). _OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts_. ArXiv. https://arxiv.org/abs/2205.01833


### [Entities](https://docs.openalex.org/)

The OpenAlex dataset describes scholarly _entities_ and how those entities are connected to each other. Together, these make a huge web (or more technically, heterogeneous directed [graph](https://en.wikipedia.org/wiki/Graph_theory)) of hundreds of millions of entities and billions of connections between them all.

<figure><img src="https://raw.githubusercontent.com/ourresearch/openalex-docs/main/.gitbook/assets/entities.png" alt="Entity relation diagram for OpenAlex"><figcaption></figcaption></figure>
Learn more about the OpenAlex entities:

- [Works](https://docs.openalex.org/api-entities/works): Scholarly documents like journal articles, books, datasets, and theses
- [Authors](https://docs.openalex.org/api-entities/authors): People who create works
- [Sources](https://docs.openalex.org/api-entities/sources): Where works are hosted (such as journals, conferences, and repositories)
- [Institutions](https://docs.openalex.org/api-entities/institutions): Universities and other organizations to which authors claim affiliations
- [Concepts](https://docs.openalex.org/api-entities/concepts): Topics assigned to works
- [Publishers](https://docs.openalex.org/api-entities/publishers): Companies and organizations that distribute works
- [Funders](https://docs.openalex.org/api-entities/funders): Organizations that fund research
- [Geo](https://docs.openalex.org/api-entities/geo): Where things are in the world


## [OpenAlex API Tutorial Notebooks](https://github.com/ourresearch/openalex-api-tutorials)

- getting-started

  - get-random-entity:
    - [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ourresearch/openalex-api-tutorials/main?filepath=notebooks/getting-started/get-random-entity.ipynb) [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourresearch/openalex-api-tutorials/blob/main/notebooks/getting-started/get-random-entity.ipynb) [![Deepnote](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://www.deepnote.com/launch?url=https://www.github.com/ourresearch/openalex-api-tutorials/blob/main/notebooks/getting-started/get-random-entity.ipynb)
  - paging
    - [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ourresearch/openalex-api-tutorials/main?filepath=notebooks/getting-started/paging.ipynb) [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourresearch/openalex-api-tutorials/blob/main/notebooks/getting-started/paging.ipynb) [![Deepnote](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://www.deepnote.com/launch?url=https://www.github.com/ourresearch/openalex-api-tutorials/blob/main/notebooks/getting-started/paging.ipynb)
  - premium
    - [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ourresearch/openalex-api-tutorials/main?filepath=notebooks/getting-started/premium.ipynb) [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourresearch/openalex-api-tutorials/blob/main/notebooks/getting-started/premium.ipynb) [![Deepnote](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://www.deepnote.com/launch?url=https://www.github.com/ourresearch/openalex-api-tutorials/blob/main/notebooks/getting-started/premium.ipynb)

- authors:

  - hirsch index:
    - [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ourresearch/openalex-api-tutorials/main?filepath=notebooks/authors/hirsch-index.ipynb) [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourresearch/openalex-api-tutorials/blob/main/notebooks/authors/hirsch-index.ipynb) [![Deepnote](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://www.deepnote.com/launch?url=https://www.github.com/ourresearch/openalex-api-tutorials/blob/main/notebooks/authors/hirsch-index.ipynb)

- works

  - openalex works:
    - [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ourresearch/openalex-api-tutorials/main?filepath=notebooks/openalex_works/openalex_works.ipynb) [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourresearch/openalex-api-tutorials/blob/main/notebooks/openalex_works/openalex_works.ipynb) [![Deepnote](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://www.deepnote.com/launch?url=https://www.github.com/ourresearch/openalex-api-tutorials/blob/main/notebooks/openalex_works/openalex_works.ipynb)

- institutions:
  - japan sources:
    - [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ourresearch/openalex-api-tutorials/main?filepath=notebooks/institutions/japan_sources.ipynb) [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourresearch/openalex-api-tutorials/blob/main/notebooks/institutions/japan_sources.ipynb) [![Deepnote](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://www.deepnote.com/launch?url=https://www.github.com/ourresearch/openalex-api-tutorials/blob/main/notebooks/institutions/japan_sources.ipynb)
  - oa percentage
    - [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ourresearch/openalex-api-tutorials/main?filepath=notebooks/institutions/oa-percentage.ipynb) [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourresearch/openalex-api-tutorials/blob/main/notebooks/institutions/oa-percentage.ipynb) [![Deepnote](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://www.deepnote.com/launch?url=https://www.github.com/ourresearch/openalex-api-tutorials/blob/main/notebooks/institutions/oa-percentage.ipynb)
  - uw collaborators:
    - [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ourresearch/openalex-api-tutorials/main?filepath=notebooks/institutions/uw-collaborators.ipynb) [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourresearch/openalex-api-tutorials/blob/main/notebooks/institutions/uw-collaborators.ipynb) [![Deepnote](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://www.deepnote.com/launch?url=https://www.github.com/ourresearch/openalex-api-tutorials/blob/main/notebooks/institutions/uw-collaborators.ipynb)
  - uw collaborators copy:
    - [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ourresearch/openalex-api-tutorials/main?filepath=notebooks/institutions/uw-collaborators%20copy.ipynb) [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourresearch/openalex-api-tutorials/blob/main/notebooks/institutions/uw-collaborators%20copy.ipynb) [![Deepnote](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://www.deepnote.com/launch?url=https://www.github.com/ourresearch/openalex-api-tutorials/blob/main/notebooks/institutions/uw-collaborators%20copy.ipynb)
- data questions
  - counts within country
    - [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ourresearch/openalex-api-tutorials/main?filepath=notebooks/data_questions/counts_within_country.ipynb) [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ourresearch/openalex-api-tutorials/blob/main/notebooks/data_questions/counts_within_country.ipynb) [![Deepnote](https://deepnote.com/buttons/launch-in-deepnote-small.svg)](https://www.deepnote.com/launch?url=https://www.github.com/ourresearch/openalex-api-tutorials/blob/main/notebooks/data_questions/counts_within_country.ipynb)


## [OpenAlex API Python Client](https://github.com/Mearman/openalex-python)

This is a Python client for the OpenAlex API. It was generated from the reverse-engineered [OpenAlex API specification](https://github.com/Mearman/openalex-api-spec).

The API specifications and derived software packages and documentation is in no way affiliated with OpenAlex or OurResearch and is not an official OpenAlex product.
This is an open-source project maintained by [Joseph Mearman](https://github.com/Mearman) and is supplied as-is with no warranty.

For any questions or issues, please [open an issue](https://github.com/Mearman/openalex-api-spec/issues/new).

To contribute, please fork and [open a pull request](https://github.com/Mearman/openalex-api-spec/compare).

If you have found the OpenAlex data useful, don't forget to cite the [OpenAlex paper](https://arxiv.org/abs/2205.01833).


## API Entities


### [Works](https://docs.openalex.org/api-entities/works)

Works are scholarly documents like journal articles, books, datasets, and theses.
OpenAlex indexes over 240M works, with about 50,000 added daily.
You can access a work in the OpenAlex API like this:

- Get a list of OpenAlex works: `https://api.openalex.org/works`

That will return a list of [Work](https://docs.openalex.org/api-entities/works/work-object) object, describing everything OpenAlex knows about each work. We collect new works from many sources, including Crossref, PubMed, institutional and discipline-specific repositories (eg, `arXiv`).
Many older works come from the now-defunct Microsoft Academic Graph (MAG).
Works are linked to other works via the [`referenced_works`](https://docs.openalex.org/api-entities/works/work-object#referenced_works) (outgoing citations), [`cited_by_api_url`](https://docs.openalex.org/api-entities/works/work-object#cited_by_api_url) (incoming citations), and [`related_works`](https://docs.openalex.org/api-entities/works/work-object#related_works) properties.


#### [Get a single Work](https://docs.openalex.org/api-entities/works/get-a-single-work)


In [None]:
# Define the ID of the work to retrieve
id = "W2741809807"  # @param

# Retrieve the work using the works_api
work = works_api.get_work(id=id)

# Print the attributes of the work object
for key in work.__dict__.keys():
    # if value is not None:
    value = getattr(work, key)
    if value is not None:
        print(f"{key}:\t{value}")

##### Extracting attributes from an entity


In [None]:
# within that work object there is an ids object which contains all of the IDs from various sources
work_ids = work.ids

print("Work IDs:\n")
for key in work_ids.__dict__.keys():
    print(f"{key}:\t{getattr(work_ids, key)}")

##### Searching by external ID


In [None]:
print("Original search result:")
print(f"\tOA\t{work.ids.openalex}")
print(f"\tDOI\t{work.ids.doi}")
print(f"\tTitle\t{work.title}")

# it is also possible to search OpenAlex by "external" IDs, for example DOI
work_searched_by_doi = works_api.get_work(work.ids.doi)
print("Work searched by DOI:")
print(f"\tOA\t{work_searched_by_doi.ids.openalex}")
print(f"\tDOI\t{work_searched_by_doi.ids.doi}")
print(f"\tTitle\t{work_searched_by_doi.title}")

##### Selecting fields of interest

When querying the API, you can select which fields you want to receive in the response. This can be useful if you only need a few fields, or if you want to reduce the size of the response. See [here](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/select-fields) for more information.


In [None]:
# Define the fields to be selected in the query
select = ", ".join([
    "id",
    "title",
    "authorships",
    "publication_year",
    "ids"
])  # Join the elements of the list with a comma separator

# Retrieve the work with the selected fields from the works API
work_with_selected_fields = works_api.get_work(
    id=id,
    select=select
)

# Print all keys that are not None
for key in work_with_selected_fields.__dict__.keys():
    value = getattr(work_with_selected_fields, key)
    if value is not None:
        print(f"{key}:\t{value}")

In [None]:
# Create a dataframe from the work_with_selected_fields data
df = pd.DataFrame(
    work_with_selected_fields,
    columns=["key", "value"]
)

# Set the "key" column as the index of the dataframe
df = df.set_index("key")

# Drop rows with missing values
df = df.dropna()

# Display the dataframe
display(df)

#### [Get lists of works](https://docs.openalex.org/api-entities/works/get-lists-of-works)

You can get lists of works:


##### Get all of the works in OpenAlex

Returns a response like this:

```json
{
	"meta": {
		"count": 245684392,
		"db_response_time_ms": 929,
		"page": 1,
		"per_page": 25
	},
	"results": [
		{
			"id": "https://openalex.org/W1775749144",
			"doi": "https://doi.org/10.1016/s0021-9258(19)52451-6",
			"title": "PROTEIN MEASUREMENT WITH THE FOLIN PHENOL REAGENT"
			// more fields (removed to save space)
		},
		{
			"id": "https://openalex.org/W2100837269",
			"doi": "https://doi.org/10.1038/227680a0",
			"title": "Cleavage of Structural Proteins during the Assembly of the Head of Bacteriophage T4"
			// more fields (removed to save space)
		}
		// more results (removed to save space)
	],
	"group_by": []
}
```


In [None]:
import json
get_works_response = works_api.get_works()
print(json.dumps(get_works_response.to_dict(), indent=2))

##### Page and sort works

You can [page through](../../how-to-use-the-api/get-lists-of-entities/paging.md) works and change the default number of results returned with the `page` and `per-page` parameters:


###### Get a second page of results with 50 results per page

https://api.openalex.org/works?per-page=50&page=2


In [None]:
works_response = works_api.get_works(
    per_page=50,
    page=2
)
print(json.dumps(works_response.to_dict(), indent=2))

##### You can [sort results](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/sort-entity-lists) with the `sort` parameter:


###### Sort works by publication year

https://api.openalex.org/works?sort=publication_year


In [None]:
works_response = works_api.get_works(
    sort="publication_year",
)
print(json.dumps(works_response.to_dict(), indent=2))

Continue on to learn how you can [filter](https://docs.openalex.org/api-entities/works/filter-works) and [search](https://docs.openalex.org/api-entities/works/search-works) lists of works.


##### Sample works

You can use `sample` to get a random batch of works. Read more about sampling and how to add a `seed` value [here](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/sample-entity-lists).


###### Get 20 random works

https://api.openalex.org/works?sample=20


In [None]:
works_response = works_api.get_works(
    sample=20
)
print(json.dumps(works_response.to_dict(), indent=2))

##### Select fields

You can use `select` to limit the fields that are returned in a list of works. More details are [here](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/select-fields).


###### Display only the `id` and `display_name` within works results

https://api.openalex.org/works?select=id,display_name


In [None]:
works_response = works_api.get_works(
  select="id, display_name"
)
print(json.dumps(works_response.to_dict(), indent=2))

##### A function to get all pages of a query

In [None]:
import math
%pip install tqdm
from tqdm import tqdm


def get_all_results(api_function, **kwargs):
	# Get the initial API response
	api_response = api_function(**kwargs)

	# Get the meta information of the 'api_response' object
	meta = api_response.meta
	# Get the total number of results
	count = meta.count
	# Get the current page
	page = meta.page
	# Get the number of results per page
	per_page = meta.per_page

	total_pages = math.ceil(count / per_page)

	# Create an empty list to store the results
	results = []

	# Create a progress bar
	progress_bar = tqdm(total=total_pages, desc="Progress", unit="page")

	# Iterate over the pages
	for page in range(1, total_pages + 1):
		# Retrieve the results for the current page
		api_response = api_function(
			page=page,
			**kwargs
		)
		# Append the results to the list
		results.extend(api_response.results)

		# Update the progress bar
		progress_bar.set_description(f"Results: {len(results)}/{count} Page: {page}/{total_pages}")
		progress_bar.update(1)

	# Close the progress bar
	progress_bar.close()

	return results

In [None]:
# Retrieve a list of works using the works_api.get_works() method.
get_works_result = get_all_results(
    works_api.get_works,  # the API function to call
    search="OpenAlex"  # the search query
    # additional parameters can be passed to the API function e.g. select, filter, etc.
)

### [Authors](https://docs.openalex.org/api-entities/authors)


In [None]:
authors = authors_api.get_authors(
    per_page=100,
    page=1,
    sort="cited_by_count:desc"
).results
display(pd.DataFrame(authors))

### Author Counts by Year


In [None]:
author_counts_by_year = pd.DataFrame(
    [{
        "display_name": author["display_name"],
        "id": author["ids"]["openalex"].split("/")[-1],
        "country_code": author["last_known_institution"]["country_code"] if author["last_known_institution"] else None,
        "year": entry["year"],
        "cited_by_count": entry["cited_by_count"],
        "works_count": entry["works_count"],
        "name_and_id": f"{author['display_name']} ({author['ids']['openalex'].split('/')[-1]})",
    } for author in authors for entry in author["counts_by_year"]],
)
display(author_counts_by_year)

#### Filter out current year


In [None]:
# filter out current year
author_counts_by_year = author_counts_by_year[author_counts_by_year["year"] < 2024]

#### Plot Citations vs Year


In [None]:
# plot with altair, marking each year on the x axis
plot_vitations_vs_year = alt.Chart(
    author_counts_by_year[
        # 	filter out zero values
        (author_counts_by_year["cited_by_count"] > 0) & (
            author_counts_by_year["works_count"] > 0)
    ]
).mark_line().encode(
    alt.X(
        "year:O",
        axis=alt.Axis(
            labelAngle=0,
            title="Year",
            titleFontSize=14,
            titleFontWeight="bold",
            titleColor="gray"
        )
    ),
    alt.Y(
        "cited_by_count:Q",
        scale=alt.Scale(type='log'),
        axis=alt.Axis(
            title="Citations",
            titleFontSize=14,
            titleFontWeight="bold",
            titleColor="gray",
        )
    ),
    alt.Color(
        "name_and_id:N"
    )
).properties(
    title="Citations vs Year"
)
display(plot_vitations_vs_year)

#### Scatter Plot of Citations vs Year


In [None]:
scatter_citations_vs_year = alt.Chart(
    author_counts_by_year[
        (
            author_counts_by_year['cited_by_count'] > 0
        ) & (
            author_counts_by_year['works_count'] > 0
        )]
).mark_circle(size=60).encode(
    x=alt.X(
        'cited_by_count:Q',
        scale=alt.Scale(type='log'),
        title='Cited by Count (Log Scale)'
    ),
    y=alt.Y(
        'works_count:Q',
        scale=alt.Scale(type='log'),
        title='Works Count (Log Scale)'
    ),
    # color='display_name:N',
    # color='country_code:N',
    color=alt.Color(
        'country_code:N',
        sort=alt.SortField(
            'citation_count',
            order='descending'
        ),
        legend=alt.Legend(
            title='Country',
            titleFontSize=14,
            titleFontWeight='bold',
            titleColor='gray',
            labelFontSize=14,
            labelFontWeight='bold',
            labelColor='gray',
        )
    ),
    # color='year:N',
    tooltip=[
        'display_name',
        'year',
        'cited_by_count',
        'works_count',
        "id",
        "country_code"
    ]
).properties(
    title='Relationship between Cited by Count and Works Count'
)
scatter_citations_vs_year

## Concepts


### Search for Concepts


In [None]:
concepts = pd.DataFrame(
    concepts_api.get_concepts(
        search="Machine Learning",
        sort="relevance_score:desc",
    ).results
)
display(concepts)

#### Get top concept from search


In [None]:
machine_learning_concept = concepts.sort_values(
    "relevance_score", ascending=False).iloc[0]
display(machine_learning_concept)

#### Extract raw concept ID


In [None]:
machine_learning_concept_id = machine_learning_concept["ids"]["openalex"].split(
    "/")[-1]
display(machine_learning_concept_id)

## Works


### Search for Works by Concept


In [None]:
filters = ",".join(
    [f"{key}:{value}" for key, value in {
        "concepts.id": machine_learning_concept_id,
        "publication_year": ">1970"
    }.items()]
)

works = works_api.get_works(
    sort="cited_by_count:desc",
    filter=filters,
    per_page=100,
).results

display(
    pd.DataFrame(
        works
    )
)

### Get counts by year for each work


In [None]:
works_with_counts_by_year = pd.DataFrame(
    [{
        "title": work["title"],
        "id": work["id"],
        "publication_year": work["publication_year"],
        "publication_date": work["publication_date"],
        "referenced_works_count": work["referenced_works_count"],
        "cited_by_count_year": entry["year"],
        "cited_by_count": entry["cited_by_count"],
        "ratio": entry["cited_by_count"] / work["referenced_works_count"] if work["referenced_works_count"] else 0,
    } for work in works for entry in work["counts_by_year"]],
)
display(works_with_counts_by_year)

### Get most recent count for each work


In [None]:
works_with_most_recent_counts_by_year = works_with_counts_by_year.sort_values(
    "cited_by_count_year",
    ascending=False
).groupby(
    "id"
).first().reset_index()

display(works_with_most_recent_counts_by_year)

#### Plot Citations vs Year


In [None]:
alt.Chart(
    works_with_most_recent_counts_by_year[
        (works_with_most_recent_counts_by_year['cited_by_count'] > 0) & (
            works_with_most_recent_counts_by_year['referenced_works_count'] > 0)
    ]
).mark_circle(size=60).encode(
    x=alt.X(
        "publication_date:T",
        title="Publication Date"
    ),
    y=alt.Y(
        'cited_by_count:Q',
        scale=alt.Scale(
            type='log',
        ),
        title='Citations (Log Scale)'
    ),
    color=alt.Color(
        'ratio:Q',
        title='Citations / References',
        scale=alt.Scale(
            type='log',
            scheme='yellowgreenblue',
        ),
    ),
    size=alt.Size(
        "referenced_works_count:Q",
        scale=alt.Scale(
            type='log',
        ),
        title='References (Log Scale)',
    ),
    # tooltip=[
    # 	"title:N",
    # 	# "publication_year:N",
    # 	"publication_date:T",
    # 	"referenced_works_count:Q",
    # 	"cited_by_count:Q",
    # 	"ratio:Q",
    # ],
    tooltip=alt.Tooltip(
        shorthand="title:N",
    )
).properties(
    title='Citability of Works over Time'
)

## Search for Works


In [None]:
search = "Machine Learning"
machine_learning_works_search_result = works_api.get_works(
    search=search,
    sort="relevance_score:desc",
    per_page=100,
).results
display(
    pd.DataFrame(
        machine_learning_works_search_result
    )
)

### Aggregate works over multiple pages


In [None]:
target = 1000
page = 1
machine_learning_works_search_result = []

# "cited_by_percentile_year.min:90"
# referenced_works_count>0

filters = ",".join(
    [f"{key}:{value}" for key, value in {
        "referenced_works_count": ">0",
        "cited_by_count": ">0"
    }.items()]
)
while len(machine_learning_works_search_result) < target:
    page_results = works_api.get_works(
        search=search,
        sort="relevance_score:desc",
        page=page,
        per_page=100,  # 25 is the default
        filter=filters
    ).results
    machine_learning_works_search_result += page_results

    print(
        f"Page {page}: {len(page_results)} results ({
            len(machine_learning_works_search_result)} total)"
    )
    page += 1

display(
    pd.DataFrame(
        machine_learning_works_search_result
    )
)

# Converting an abstract inverted index

```json
{
	"abstract_inverted_index": {
		"This": [0, 66],
		"paper": [1],
		"examines": [2]
	}
}
```


In [None]:
# {'The': [0, 74], 'automated': [1], 'categoriza...
# function to convert an abstract inverted index back into the abstract
# abstract inverted index is a dictionary where the keys are the words in the abstract and the values are a list of the positions of the word in the abstract
def convert_inverted_index_to_abstract(inverted_index):
    if not inverted_index:
        print("inverted index is empty")
        return ""
    sentence_array = []
    for word, positions in inverted_index.items():
        for position in positions:
            # 			set the word in the abstract to be the word in the inverted index
            sentence_array.insert(position, word)
    return " ".join(sentence_array)


random_work = works_api.get_work(
    id="random"
)

print(random_work.display_name)
print(random_work.id)
print(
    convert_inverted_index_to_abstract(
        random_work.abstract_inverted_index
    )
)

# How to use the API


### [API Overview](https://docs.openalex.org/how-to-use-the-api/api-overview)

The API is the primary way to get OpenAlex data. It's free and requires no authentication.
The daily limit for API calls is 100,000 requests per user per day.
For best performance, add your email to all API requests, like `mailto=example@domain.com`.

#### Using the `mailto` parameter to access the `polite pool`

```python
info_api.get_root(
    mailto="example@domain.com"
)
```

Find out more [here](https://docs.openalex.org/how-to-use-the-api/rate-limits-and-authentication#the-polite-pool).


### [Get single entities](https://docs.openalex.org/how-to-use-the-api/get-single-entities)

#### [Random result](https://docs.openalex.org/how-to-use-the-api/get-single-entities/random-result)

#### [Select fields](https://docs.openalex.org/how-to-use-the-api/get-single-entities/select-fields)


### [Get lists of entities](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities)


#### [Paging](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/paging)


#### [Filtering](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/filter-entity-lists)


#### [Searching](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/search-entities)


#### [Sorting](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/sort-entity-lists)


#### [Selecting](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/select-fields)


#### [Sampling](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/sample-entity-lists)

You can use `sample` to get a random list of up to 10,000 results.


##### Get 100 random works

[https://api.openalex.org/works?sample=100\&per-page=100](https://api.openalex.org/works?sample=100&per-page=100)


In [None]:
works = works_api.get_works(
    sample=100,
    per_page=100
)
display(works.meta)
display(pd.DataFrame(works.results))

##### Get 50 random works that are open access and published in 2021

https://api.openalex.org/works?filter=open_access.is_oa:true,publication_year:2021&sample=50&per-page=50


In [None]:
works = works_api.get_works(
    filter="open_access.is_oa:true,publication_year:2021",
    sample=50,
    per_page=50
)

display(works.meta)
display(pd.DataFrame(works.results))

##### Get 20 random sources with a seed value

https://api.openalex.org/sources?sample=20&seed=123

You can add a seed value in order to retrieve the same set of random records, in the same order, multiple times.


In [None]:
works = works_api.get_works(
    sample=20,
    seed=1234,
)
display(works.meta)
display(pd.DataFrame(works.results))

##### [Limitations](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/sample-entity-lists#limitations)

- The sample size is limited to 10,000 results.
- You must provide a seed value when paging beyond the first page of results. Without a seed value, you might get duplicate records in your results.
- You must use [basic paging](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/paging#basic-paging) when sampling. Cursor pagination is not supported.


#### [Autocomplete](https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/autocomplete-entities)

### [Get groups of entities](https://docs.openalex.org/how-to-use-the-api/get-groups-of-entities)

### [Rate limits and authentication](https://docs.openalex.org/how-to-use-the-api/rate-limits-and-authentication)

### [Tutorials](https://docs.openalex.org/additional-help/tutorials)
