[![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-181717?style=for-the-badge&logo=GitHub&link=https://github.com/Mearman/openalex-docs/tree/develop)](https://github.com/Mearman/openalex-docs/tree/develop)[![Open in GitHub](https://img.shields.io/badge/Open%20in-GitHub-181717?style=for-the-badge&logo=github&link=https://github.com/Mearman/openalex-docs/blob/develop/api-entities/publishers/filter-publishers.ipynb)](https://github.com/Mearman/openalex-docs/blob/develop/api-entities/publishers/filter-publishers.ipynb)[![Open in Colab](https://img.shields.io/badge/Open%20in-Colab-F9AB00?style=for-the-badge&logo=Google%20Colab&link=https://colab.research.google.com/github/Mearman/openalex-docs/blob/develop/api-entities/publishers/filter-publishers.ipynb)](https://colab.research.google.com/github/Mearman/openalex-docs/blob/develop/api-entities/publishers/filter-publishers.ipynb)

In [None]:
%pip install --upgrade "git+https://github.com/Mearman/openalex-python-pydantic-v1.git"
%pip install pandasai

In [None]:
import json
import pandas as pd
import numpy as np
from openalex_api import Configuration, ApiClient, AutocompleteApi, AuthorsApi, ConceptsApi, FundersApi, InstitutionsApi, PublishersApi, SourcesApi, WorksApi

configuration = Configuration(host="https://api.openalex.org")
autocomplete_api = AutocompleteApi(ApiClient(configuration))
authors_api = AuthorsApi(ApiClient(configuration))
concepts_api = ConceptsApi(ApiClient(configuration))
funders_api = FundersApi(ApiClient(configuration))
institutions_api = InstitutionsApi(ApiClient(configuration))
publishers_api = PublishersApi(ApiClient(configuration))
sources_api = SourcesApi(ApiClient(configuration))
works_api = WorksApi(ApiClient(configuration))

from pandasai import SmartDataframe
from pandasai.llm import OpenAI

In [None]:
# @title  { run: "auto", display-mode: "form" }
openapi_token = "" # @param {type:"string"}

# Filter publishers

You can filter publishers with the `filter` parameter:

* Get publishers that are hierarchy level 0
  [`https://api.openalex.org/publishers?filter=hierarchy_level:0`](https://api.openalex.org/publishers?filter=hierarchy_level:0)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/publishers?filter=hierarchy_level:0
filter="hierarchy_level:0" # @param {type: "string"}

response = publishers_api.get_publishers(
	filter=filter
)

df = pd.DataFrame(response.results)
display(df)

In [None]:
numeric_df = df[['id', 'display_name'] +
	[col for col in df.columns if df[col].dtype in ['int64', 'float64'] and col != 'relevance_score']]
display(numeric_df)

In [None]:
try:
	prompt = "Visualize this data" # @param {type:"string"}
	SmartDataframe(
		numeric_df,
		config={"llm": (OpenAI(api_token=openapi_token))},
	).chat(prompt)
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

{% hint style="info" %}
It's best to [read about filters](./../../how-to-use-the-api/get-lists-of-entities/filter-entity-lists.ipynb) before trying these out. It will show you how to combine filters and build an AND, OR, or negation query
{% endhint %}

### `/publishers` attribute filters

You can filter using these attributes of the `Publisher` entity object (click each one to view their documentation on the [`Publisher`](./publisher-object.ipynb) object page):

* [`cited_by_count`](./publisher-object.md#cited_by_count)
* [`country_codes`](./publisher-object.md#country_codes)
* [`hierarchy_level`](./publisher-object.md#hierarchy_level)
* [`ids.openalex`](./publisher-object.md#ids) (alias: `openalex`)
* [`ids.ror`](./publisher-object.md#ids) (alias: `ror`)
* [`ids.wikidata`](./publisher-object.md#ids) (alias: `wikidata`)
* [`lineage`](./publisher-object.md#lineage) — Use this with a publisher ID to find that publisher and all of its children
* [`parent_publisher`](./publisher-object.md#parent_publisher)
* [`summary_stats.2yr_mean_citedness`](./publisher-object.md#summary_stats) (accepts float, null, !null, can use range queries such as < >)
* [`summary_stats.h_index`](./publisher-object.md#summary_stats) (accepts integer, null, !null, can use range queries)
* [`summary_stats.i10_index`](./publisher-object.md#summary_stats) (accepts integer, null, !null, can use range queries)
* [`works_count`](./publisher-object.md#works_count)

### `/publishers` convenience filters

These filters aren't attributes of the [`Publisher`](./publisher-object.ipynb) object, but they're included to address some common use cases:

#### `continent`

Value: a String with a valid [continent filter](./../geo/continents.md#filter-by-continent)

Returns: publishers that are located in the chosen continent.

* Get publishers that are located in South America
  [https://api.openalex.org/publishers?filter=continent:south_america](https://api.openalex.org/publishers?filter=continent:south_america)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/publishers?filter=continent:south_america
filter="continent:south_america" # @param {type: "string"}

response = publishers_api.get_publishers(
	filter=filter
)

df = pd.DataFrame(response.results)
display(df)

In [None]:
numeric_df = df[['id', 'display_name'] +
	[col for col in df.columns if df[col].dtype in ['int64', 'float64'] and col != 'relevance_score']]
display(numeric_df)

In [None]:
try:
	prompt = "Visualize this data" # @param {type:"string"}
	SmartDataframe(
		numeric_df,
		config={"llm": (OpenAI(api_token=openapi_token))},
	).chat(prompt)
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

#### `default.search`

Value: a search string

This works the same as using the [`search` parameter](./search-publishers.md#search-publishers) for Publishers.

#### `display_name.search`

Value: a search string

Returns: publishers with a [`display_name`](./publisher-object.md#display_name) containing the given string; see the [search page](./search-publishers.md#search-a-specific-field) for details.

* Get publishers with names containing "elsevier":
  [`https://api.openalex.org/publishers?filter=display_name.search:elsevier`](https://api.openalex.org/publishers?filter=display_name.search:elsevier)``

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/publishers?filter=display_name.search:elsevier
filter="display_name.search:elsevier" # @param {type: "string"}

response = publishers_api.get_publishers(
	filter=filter
)

df = pd.DataFrame(response.results)
display(df)

In [None]:
numeric_df = df[['id', 'display_name'] +
	[col for col in df.columns if df[col].dtype in ['int64', 'float64'] and col != 'relevance_score']]
display(numeric_df)

In [None]:
try:
	prompt = "Visualize this data" # @param {type:"string"}
	SmartDataframe(
		numeric_df,
		config={"llm": (OpenAI(api_token=openapi_token))},
	).chat(prompt)
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

{% hint style="info" %}
In most cases, you should use the [`search` parameter](./search-publishers.ipynb) instead of this filter because it uses a better search algorithm.
{% endhint %}