[![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-181717?style=for-the-badge&logo=GitHub&link=https://github.com/Mearman/openalex-docs)](https://github.com/Mearman/openalex-docs)[![Open in GitHub](https://img.shields.io/badge/Open%20in-GitHub-181717?style=for-the-badge&logo=github&link=https://github.com/Mearman/openalex-docs/blob/main/api-entities/sources/filter-sources.ipynb)](https://github.com/Mearman/openalex-docs/blob/main/api-entities/sources/filter-sources.ipynb)[![Open in Colab](https://img.shields.io/badge/Open%20in-Colab-F9AB00?style=for-the-badge&logo=Google%20Colab&link=https://colab.research.google.com/github/Mearman/openalex-docs/blob/main/api-entities/sources/filter-sources.ipynb)](https://colab.research.google.com/github/Mearman/openalex-docs/blob/main/api-entities/sources/filter-sources.ipynb)

In [None]:
%pip install --upgrade "git+https://github.com/Mearman/openalex-python-pydantic-v1.git"
%pip install pandasai

In [None]:
import json
import pandas as pd
import numpy as np
from openalex_api import Configuration, ApiClient, AutocompleteApi, AuthorsApi, ConceptsApi, FundersApi, InstitutionsApi, PublishersApi, SourcesApi, WorksApi

configuration = Configuration(host="https://api.openalex.org")
autocomplete_api = AutocompleteApi(ApiClient(configuration))
authors_api = AuthorsApi(ApiClient(configuration))
concepts_api = ConceptsApi(ApiClient(configuration))
funders_api = FundersApi(ApiClient(configuration))
institutions_api = InstitutionsApi(ApiClient(configuration))
publishers_api = PublishersApi(ApiClient(configuration))
sources_api = SourcesApi(ApiClient(configuration))
works_api = WorksApi(ApiClient(configuration))

from pandasai import SmartDataframe
from pandasai.llm import OpenAI

In [None]:
# @title  { run: "auto", display-mode: "form" }
openapi_token = "" # @param {type:"string"}

# Filter sources

You can filter sources with the `filter` parameter:

* Get sources that have an ISSN
  [`https://api.openalex.org/sources?filter=has_issn:true`](https://api.openalex.org/sources?filter=has_issn:true)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/sources?filter=has_issn:true
filter="has_issn:true" # @param "has_issn:true" {type: "string"}

response = sources_api.get_sources(
	filter=filter
)

df = pd.DataFrame(response.results)
display(df)

In [None]:
numeric_df = df[['id', 'display_name'] +
	[col for col in df.columns if df[col].dtype in ['int64', 'float64'] and col != 'relevance_score']]
display(numeric_df)

try:
	llm = OpenAI(api_token = openapi_token)
	sdf = SmartDataframe(numeric_df, config = { "llm": llm })
	sdf.chat("Plot a chart of this data")
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

{% hint style="info" %}
It's best to [read about filters](./../../how-to-use-the-api/get-lists-of-entities/filter-entity-lists.ipynb) before trying these out. It will show you how to combine filters and build an AND, OR, or negation query
{% endhint %}

### `/sources` attribute filters

You can filter using these attributes of the `Source` entity object (click each one to view their documentation on the [`Source`](./source-object.ipynb) object page):

* [`apc_prices.currency`](./source-object.md#apc_prices)
* [`apc_prices.price`](./source-object.md#apc_prices)
* [`apc_usd`](./source-object.md#apc_usd)
* [`cited_by_count`](./source-object.md#cited_by_count)
* [`country_code`](./source-object.md#country_code)
* [`host_organization`](./source-object.md#host_organization) (alias: `host_organization.id`)
* [`host_organization_lineage`](./source-object.md#host_organization_lineage) — Use this with a publisher ID to find works from that publisher and all of its children.
* [`ids.openalex`](./source-object.md#ids) (alias: `openalex`)
* [`is_in_doaj`](./source-object.md#is_in_doaj)
* [`is_oa`](./source-object.md#is_oa)
* [`issn`](./source-object.md#issn)
* [`publisher`](./source-object.md#publisher) — Requires exact Match. Use the [`host_organization_lineage`](./source-object.md#host_organization_lineage) filter instead if you want to find works from a publisher and all of its children.
* [`summary_stats.2yr_mean_citedness`](./source-object.md#summary_stats) (accepts float, null, !null, can use range queries such as < >)
* [`summary_stats.h_index`](./source-object.md#summary_stats) (accepts integer, null, !null, can use range queries)
* [`summary_stats.i10_index`](./source-object.md#summary_stats) (accepts integer, null, !null, can use range queries)
* [`type`](./source-object.md#type)
* [`works_count`](./source-object.md#works_count)
* [`x_concepts.id`](./source-object.md#x_concepts) (alias: `concepts.id` or `concept.id`)

{% hint style="info" %}
Want to filter by `host_organization.display_name`? This is a two-step process:

1. Find the host organization's ID by searching by `display_name` in Publishers or Institutions, depending on which type you are looking for.
2. Filter works by `host_organization.id`.

To learn more about why we do it this way, [see here.](./../works/search-works.md#why-cant-i-search-by-name-of-related-entity-author-name-institution-name-etc.)
{% endhint %}

### `/sources` convenience filters

These filters aren't attributes of the [`Source`](./source-object.ipynb) object, but they're included to address some common use cases:

#### `continent`

Value: a String with a valid [continent filter](./../geo/continents.md#filter-by-continent)

Returns: sources that are associated with the chosen continent.

* Get sources that are associated with Asia
  [`https://api.openalex.org/sources?filter=continent:asia`](https://api.openalex.org/sources?filter=continent:asia)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/sources?filter=continent:asia
filter="continent:asia" # @param "continent:asia" {type: "string"}

response = sources_api.get_sources(
	filter=filter
)

df = pd.DataFrame(response.results)
display(df)

In [None]:
numeric_df = df[['id', 'display_name'] +
	[col for col in df.columns if df[col].dtype in ['int64', 'float64'] and col != 'relevance_score']]
display(numeric_df)

try:
	llm = OpenAI(api_token = openapi_token)
	sdf = SmartDataframe(numeric_df, config = { "llm": llm })
	sdf.chat("Plot a chart of this data")
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

#### `default.search`

Value: a search string

This works the same as using the [`search` parameter](./search-sources.md#search-sources) for Sources.

#### `display_name.search`

Value: a search string

Returns: sources with a [`display_name`](./source-object.md#display_name) containing the given string; see the [search page](./search-sources.ipynb) for details.

* Get sources with names containing "Neurology":
  [`https://api.openalex.org/sources?filter=display_name.search:Neurology`](https://api.openalex.org/sources?filter=display_name.search:Neurology)``

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/sources?filter=display_name.search:Neurology
filter="display_name.search:Neurology" # @param "display_name.search:Neurology" {type: "string"}

response = sources_api.get_sources(
	filter=filter
)

df = pd.DataFrame(response.results)
display(df)

In [None]:
numeric_df = df[['id', 'display_name'] +
	[col for col in df.columns if df[col].dtype in ['int64', 'float64'] and col != 'relevance_score']]
display(numeric_df)

try:
	llm = OpenAI(api_token = openapi_token)
	sdf = SmartDataframe(numeric_df, config = { "llm": llm })
	sdf.chat("Plot a chart of this data")
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

{% hint style="info" %}
In most cases, you should use the [`search`](./search-sources.md#sources-full-search) parameter instead of this filter because it uses a better search algorithm.
{% endhint %}

#### `has_issn`

Value: a Boolean (`true` or `false`)

Returns: sources that have or lack an [ISSN](./source-object.md#issn), depending on the given value.

* Get sources without ISSNs:
  [`https://api.openalex.org/sources?filter=has_issn:false`](https://api.openalex.org/sources?filter=has_issn:false)``

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/sources?filter=has_issn:false
filter="has_issn:false" # @param "has_issn:false" {type: "string"}

response = sources_api.get_sources(
	filter=filter
)

df = pd.DataFrame(response.results)
display(df)

In [None]:
numeric_df = df[['id', 'display_name'] +
	[col for col in df.columns if df[col].dtype in ['int64', 'float64'] and col != 'relevance_score']]
display(numeric_df)

try:
	llm = OpenAI(api_token = openapi_token)
	sdf = SmartDataframe(numeric_df, config = { "llm": llm })
	sdf.chat("Plot a chart of this data")
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

#### `is_global_south`

Value: a Boolean (`true` or `false`)

Returns: sources that are associated with the [Global South](./../geo/regions.md#global-south).

* Get sources that are located in the Global South
  [`https://api.openalex.org/sources?filter=is_global_south:true`](https://api.openalex.org/sources?filter=is_global_south:true)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/sources?filter=is_global_south:true
filter="is_global_south:true" # @param "is_global_south:true" {type: "string"}

response = sources_api.get_sources(
	filter=filter
)

df = pd.DataFrame(response.results)
display(df)

In [None]:
numeric_df = df[['id', 'display_name'] +
	[col for col in df.columns if df[col].dtype in ['int64', 'float64'] and col != 'relevance_score']]
display(numeric_df)

try:
	llm = OpenAI(api_token = openapi_token)
	sdf = SmartDataframe(numeric_df, config = { "llm": llm })
	sdf.chat("Plot a chart of this data")
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")