[![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-181717?style=for-the-badge&logo=GitHub&link=https://github.com/Mearman/openalex-docs)](https://github.com/Mearman/openalex-docs)[![Open in GitHub](https://img.shields.io/badge/Open%20in-GitHub-181717?style=for-the-badge&logo=github&link=https://github.com/Mearman/openalex-docs/blob/main/how-to-use-the-api/get-lists-of-entities/sample-entity-lists.ipynb)](https://github.com/Mearman/openalex-docs/blob/main/how-to-use-the-api/get-lists-of-entities/sample-entity-lists.ipynb)[![Open in Colab](https://img.shields.io/badge/Open%20in-Colab-F9AB00?style=for-the-badge&logo=Google%20Colab&link=https://colab.research.google.com/github/Mearman/openalex-docs/blob/main/how-to-use-the-api/get-lists-of-entities/sample-entity-lists.ipynb)](https://colab.research.google.com/github/Mearman/openalex-docs/blob/main/how-to-use-the-api/get-lists-of-entities/sample-entity-lists.ipynb)

In [None]:
%pip install --upgrade "git+https://github.com/Mearman/openalex-python-pydantic-v1.git"
%pip install pandasai

In [None]:
import json
import pandas as pd
import numpy as np
from openalex_api import Configuration, ApiClient, AutocompleteApi, AuthorsApi, ConceptsApi, FundersApi, InstitutionsApi, PublishersApi, SourcesApi, WorksApi

configuration = Configuration(host="https://api.openalex.org")
autocomplete_api = AutocompleteApi(ApiClient(configuration))
authors_api = AuthorsApi(ApiClient(configuration))
concepts_api = ConceptsApi(ApiClient(configuration))
funders_api = FundersApi(ApiClient(configuration))
institutions_api = InstitutionsApi(ApiClient(configuration))
publishers_api = PublishersApi(ApiClient(configuration))
sources_api = SourcesApi(ApiClient(configuration))
works_api = WorksApi(ApiClient(configuration))

from pandasai import SmartDataframe
from pandasai.llm import OpenAI

In [None]:
# @title  { run: "auto", display-mode: "form" }
openapi_token = "" # @param {type:"string"}

# Sample entity lists

You can use `sample` to get a random list of up to 10,000 results.

* Get 100 random works\
  [https://api.openalex.org/works?sample=100\&per-page=100](https://api.openalex.org/works?sample=100\&per-page=100)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/works?sample=100&per-page=100
sample=100 # @param 100 {type: "integer"},
per_page=100 # @param 100 {type: "integer"}

response = works_api.get_works(
	sample=sample,
	per_page=per_page
)

df = pd.DataFrame(response.results)
display(df)

In [None]:
numeric_df = df[['id', 'display_name'] +
	[col for col in df.columns if df[col].dtype in ['int64', 'float64'] and col != 'relevance_score']]
display(numeric_df)

try:
	llm = OpenAI(api_token = openapi_token)
	sdf = SmartDataframe(numeric_df, config = { "llm": llm })
	sdf.chat("Plot a chart of this data")
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

* Get 50 random works that are open access and published in 2021\
  [https://api.openalex.org/works?filter=open\_access.is\_oa:true,publication\_year:2021\&sample=50\&per-page=50](https://api.openalex.org/works?filter=open\_access.is\_oa:true,publication\_year:2021\&sample=50\&per-page=50)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/works?filter=open_access.is_oa:true,publication_year:2021&sample=50&per-page=50
filter="open_access.is_oa:true,publication_year:2021" # @param "open_access.is_oa:true,publication_year:2021" {type: "string"},
sample=50 # @param 50 {type: "integer"},
per_page=50 # @param 50 {type: "integer"}

response = works_api.get_works(
	filter=filter,
	sample=sample,
	per_page=per_page
)

df = pd.DataFrame(response.results)
display(df)

In [None]:
numeric_df = df[['id', 'display_name'] +
	[col for col in df.columns if df[col].dtype in ['int64', 'float64'] and col != 'relevance_score']]
display(numeric_df)

try:
	llm = OpenAI(api_token = openapi_token)
	sdf = SmartDataframe(numeric_df, config = { "llm": llm })
	sdf.chat("Plot a chart of this data")
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

You can add a `seed` value in order to retrieve the same set of random records, in the same order, multiple times.

* Get 20 random sources with a seed value\
  [https://api.openalex.org/sources?sample=20\&seed=123](https://api.openalex.org/sources?sample=20\&seed=123)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/sources?sample=20&seed=123
sample=20 # @param 20 {type: "integer"},
seed=123 # @param 123 {type: "integer"}

response = sources_api.get_sources(
	sample=sample,
	seed=seed
)

df = pd.DataFrame(response.results)
display(df)

In [None]:
numeric_df = df[['id', 'display_name'] +
	[col for col in df.columns if df[col].dtype in ['int64', 'float64'] and col != 'relevance_score']]
display(numeric_df)

try:
	llm = OpenAI(api_token = openapi_token)
	sdf = SmartDataframe(numeric_df, config = { "llm": llm })
	sdf.chat("Plot a chart of this data")
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

{% hint style="info" %}
Depending on your query, random results with a seed value _may_ change over time due to new records coming into OpenAlex.&#x20;
{% endhint %}

## Limitations

* The sample size is limited to 10,000 results.
* You must provide a `seed` value when paging beyond the first page of results. Without a seed value, you might get duplicate records in your results.
* &#x20;You must use [basic paging](./paging.md#basic-paging) when sampling. Cursor pagination is not supported.