[![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-181717?style=for-the-badge&logo=GitHub&link=https://github.com/Mearman/openalex-docs/tree/develop)](https://github.com/Mearman/openalex-docs/tree/develop)[![Open in GitHub](https://img.shields.io/badge/Open%20in-GitHub-181717?style=for-the-badge&logo=github&link=https://github.com/Mearman/openalex-docs/blob/develop/how-to-use-the-api/get-groups-of-entities.ipynb)](https://github.com/Mearman/openalex-docs/blob/develop/how-to-use-the-api/get-groups-of-entities.ipynb)[![Open in Colab](https://img.shields.io/badge/Open%20in-Colab-F9AB00?style=for-the-badge&logo=Google%20Colab&link=https://colab.research.google.com/github/Mearman/openalex-docs/blob/develop/how-to-use-the-api/get-groups-of-entities.ipynb)](https://colab.research.google.com/github/Mearman/openalex-docs/blob/develop/how-to-use-the-api/get-groups-of-entities.ipynb)

In [None]:
%pip install --upgrade "git+https://github.com/Mearman/openalex-python-pydantic-v1.git"
%pip install pandasai

In [None]:
import json
import pandas as pd
import numpy as np
from openalex_api import Configuration, ApiClient, AutocompleteApi, AuthorsApi, ConceptsApi, FundersApi, InstitutionsApi, PublishersApi, SourcesApi, WorksApi

configuration = Configuration(host="https://api.openalex.org")
autocomplete_api = AutocompleteApi(ApiClient(configuration))
authors_api = AuthorsApi(ApiClient(configuration))
concepts_api = ConceptsApi(ApiClient(configuration))
funders_api = FundersApi(ApiClient(configuration))
institutions_api = InstitutionsApi(ApiClient(configuration))
publishers_api = PublishersApi(ApiClient(configuration))
sources_api = SourcesApi(ApiClient(configuration))
works_api = WorksApi(ApiClient(configuration))

from pandasai import SmartDataframe
from pandasai.llm import OpenAI

In [None]:
# @title  { run: "auto", display-mode: "form" }
openapi_token = "" # @param {type:"string"}

# Get groups of entities

Sometimes instead of just listing entities, you want to _group them_ into facets, and count how many entities are in each group. For example, maybe you want to count the number of `Works` by [open access status](./../api-entities/works/work-object/README.md#open_access). To do that, you call the entity endpoint, adding the `group_by` parameter. Example:

* Get counts of works by type:
  [`https://api.openalex.org/works?group_by=type`](https://api.openalex.org/works?group_by=type)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/works?group_by=type
group_by="type" # @param {type: "string"}

response = works_api.get_works(
	group_by=group_by
)

df = pd.DataFrame(response.group_by)
display(df)

In [None]:
numeric_df = df.set_index('key')
display(numeric_df)

In [None]:
try:
	prompt = "Visualize this data" # @param {type:"string"}
	SmartDataframe(
		numeric_df,
		config={"llm": (OpenAI(api_token=openapi_token))},
	).chat(prompt)
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

This returns a `meta` object with details about the query, and a `group_by` object with the groups you've asked for:

```json
{
    meta: {
        count: 246136992,
        db_response_time_ms: 271,
        page: 1,
        per_page: 200,
        groups_count: 15
    },
    group_by: [
        {
            key: "article",
            key_display_name: "article",
            count: 202814957
        },
        {
            key: "book-chapter",
            key_display_name: "book-chapter",
            count: 21250659
        },
        {
            key: "dissertation",
            key_display_name: "dissertation",
            count: 6055973
        },
        {
            key: "book",
            key_display_name: "book",
            count: 5400871
        },
        ...
    ]
}
```

So from this we can see that the majority of works (202,814,957 of them) are type `article`, with another 21,250,659 `book-chapter`, and so forth.

You can group by most of the same properties that you can [filter](./get-lists-of-entities/filter-entity-lists.ipynb) by, and you can combine grouping with filtering.

## Group properties

Each group object in the `group_by` list contains three properties:

#### `key`

Value: a string; the [OpenAlex ID](./get-single-entities/README.md#the-openalex-id) or raw value of the `group_by` parameter for members of this group. See details on [`key` and `key_display_name`](./get-groups-of-entities.md#key-and-key_display_name).

#### `key_display_name`

Value: a string; the `display_name` or raw value of the `group_by` parameter for members of this group. See details on [`key` and `key_display_name`](./get-groups-of-entities.md#key-and-key_display_name).

#### `count`

Value: an integer; the number of entities in the group.&#x20;

## "Unknown" groups

The "unknown" group is hidden by default. If you want to include this group in the response, add `:include_unknown` after the group-by parameter.

* Group works by [`authorships.countries`](./../api-entities/works/work-object/authorship-object.md#countries) (unknown group hidden):
  [`https://api.openalex.org/works?group_by=authorships.countries`](https://api.openalex.org/works?group_by=authorships.countries)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/works?group_by=authorships.countries
group_by="authorships.countries" # @param {type: "string"}

response = works_api.get_works(
	group_by=group_by
)

df = pd.DataFrame(response.group_by)
display(df)

In [None]:
numeric_df = df.set_index('key')
display(numeric_df)

In [None]:
try:
	prompt = "Visualize this data" # @param {type:"string"}
	SmartDataframe(
		numeric_df,
		config={"llm": (OpenAI(api_token=openapi_token))},
	).chat(prompt)
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

* Group works by [`authorships.countries`](./../api-entities/works/work-object/authorship-object.md#countries) (includes unknown group):
  [`https://api.openalex.org/works?group_by=authorships.countries:include_unknown`](https://api.openalex.org/works?group_by=authorships.countries:include_unknown)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/works?group_by=authorships.countries:include_unknown
group_by="authorships.countries:include_unknown" # @param {type: "string"}

response = works_api.get_works(
	group_by=group_by
)

df = pd.DataFrame(response.group_by)
display(df)

In [None]:
numeric_df = df.set_index('key')
display(numeric_df)

In [None]:
try:
	prompt = "Visualize this data" # @param {type:"string"}
	SmartDataframe(
		numeric_df,
		config={"llm": (OpenAI(api_token=openapi_token))},
	).chat(prompt)
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

## `key` and `key_display_name`

If the value being grouped by is an OpenAlex `Entity`, the [`key`](./get-groups-of-entities.md#key) and [`key_display_name`](./get-groups-of-entities.md#key_display_name) properties will be that `Entity`'s `id` and `display_name`, respectively.

* Group `Works` by `Institution`:
  [`https://api.openalex.org/works?group_by=authorships.institutions.id`](https://api.openalex.org/works?group_by=authorships.institutions.id)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/works?group_by=authorships.institutions.id
group_by="authorships.institutions.id" # @param {type: "string"}

response = works_api.get_works(
	group_by=group_by
)

df = pd.DataFrame(response.group_by)
display(df)

In [None]:
numeric_df = df.set_index('key')
display(numeric_df)

In [None]:
try:
	prompt = "Visualize this data" # @param {type:"string"}
	SmartDataframe(
		numeric_df,
		config={"llm": (OpenAI(api_token=openapi_token))},
	).chat(prompt)
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

* For one group, `key` is "[https://openalex.org/I136199984](https://openalex.org/I136199984)" and `key_display_name` is "Harvard University".

Otherwise, `key` is the same as `key_display_name`; both are the raw value of the `group_by` parameter for this group.

* Group `Concepts` by [`level`](./../api-entities/concepts/concept-object.md#level):
  [`https://api.openalex.org/concepts?group_by=level`](https://api.openalex.org/concepts?group_by=level)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/concepts?group_by=level
group_by="level" # @param {type: "string"}

response = concepts_api.get_concepts(
	group_by=group_by
)

df = pd.DataFrame(response.group_by)
display(df)

In [None]:
numeric_df = df.set_index('key')
display(numeric_df)

In [None]:
try:
	prompt = "Visualize this data" # @param {type:"string"}
	SmartDataframe(
		numeric_df,
		config={"llm": (OpenAI(api_token=openapi_token))},
	).chat(prompt)
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

* For one group, both `key` and `key_display_name` are "3".

## Group-by `meta` properties

`meta.count` is the total number of works (this will be all works if no filter is applied). `meta.groups_count` is the count of groups (in the current page).

If there are no groups in the response, `meta.groups_count` is `null`.

Due to a technical limitation, we can only report the number of groups *in the current page,* and not the total number of groups.

## Paging

The maximum number of groups returned is 200. If you want to get more than 200 groups, you can use cursor pagination. This works the same as it does when getting lists of entities, so [head over to the section on paging through lists of results](./get-lists-of-entities/paging.md#cursor-paging) to learn how.

Due to technical constraints, when paging, results are sorted by key, rather than by count.