[![GitHub Repository](https://img.shields.io/badge/GitHub-Repository-181717?style=for-the-badge&logo=GitHub&link=https://github.com/Mearman/openalex-docs/tree/main)](https://github.com/Mearman/openalex-docs/tree/main)[![Open in GitHub](https://img.shields.io/badge/Open%20in-GitHub-181717?style=for-the-badge&logo=github&link=https://github.com/Mearman/openalex-docs/blob/main/api-entities/works/group-works.ipynb)](https://github.com/Mearman/openalex-docs/blob/main/api-entities/works/group-works.ipynb)[![Open in Colab](https://img.shields.io/badge/Open%20in-Colab-F9AB00?style=for-the-badge&logo=Google%20Colab&link=https://colab.research.google.com/github/Mearman/openalex-docs/blob/main/api-entities/works/group-works.ipynb)](https://colab.research.google.com/github/Mearman/openalex-docs/blob/main/api-entities/works/group-works.ipynb)

In [None]:
%pip install --upgrade "git+https://github.com/Mearman/openalex-python-pydantic-v1.git"
%pip install pandasai

In [None]:
import json
import pandas as pd
import numpy as np
from openalex_api import Configuration, ApiClient, AutocompleteApi, AuthorsApi, ConceptsApi, FundersApi, InstitutionsApi, PublishersApi, SourcesApi, WorksApi

configuration = Configuration(host="https://api.openalex.org")
autocomplete_api = AutocompleteApi(ApiClient(configuration))
authors_api = AuthorsApi(ApiClient(configuration))
concepts_api = ConceptsApi(ApiClient(configuration))
funders_api = FundersApi(ApiClient(configuration))
institutions_api = InstitutionsApi(ApiClient(configuration))
publishers_api = PublishersApi(ApiClient(configuration))
sources_api = SourcesApi(ApiClient(configuration))
works_api = WorksApi(ApiClient(configuration))

from pandasai import SmartDataframe
from pandasai.llm import OpenAI

In [None]:
# @title  { run: "auto", display-mode: "form" }
openapi_token = "" # @param {type:"string"}

# Group works

You can group works with the `group_by` parameter:

* Get counts of works by Open Access status:
  [`https://api.openalex.org/works?group_by=oa_status`](https://api.openalex.org/works?group_by=oa_status)

In [None]:
# @title { run: "auto", vertical-output: false }
# https://api.openalex.org/works?group_by=oa_status
group_by="oa_status" # @param {type: "string"}

response = works_api.get_works(
	group_by=group_by
)

df = pd.DataFrame(response.group_by)
display(df)

In [None]:
numeric_df = df.set_index('key')
display(numeric_df)

In [None]:
try:
	prompt = "Visualize this data" # @param {type:"string"}
	SmartDataframe(
		numeric_df,
		config={"llm": (OpenAI(api_token=openapi_token))},
	).chat(prompt)
except:
	if not openapi_token:
		print("Error: openapi_token not set")
	else:
		print("Error when creating SmartDataframe")

Or you can group using one the attributes below.

{% hint style="info" %}
It's best to [read about group by](./../../how-to-use-the-api/get-groups-of-entities.ipynb) before trying these out. It will show you how results are formatted, the number of results returned, and how to sort results.
{% endhint %}

### `/works` group_by attributes

{% hint style="danger" %}
The `host_venue` and `alternate_host_venues` properties have been deprecated in favor of [`primary_location`](./work-object/README.md#primary_location) and [`locations`](./work-object/README.md#locations). The attributes `host_venue` and `alternate_host_venues` are no longer available in the Work object, and trying to access them in filters or group-bys will return an error.
{% endhint %}

* [`authors_count`](./filter-works.md#authors_count)
* [`authorships.author.id`](./work-object/README.md#author) (alias `author.id`)
* [`authorships.author.orcid`](./work-object/README.md#author) (alias `author.orcid`)
* [`authorships.countries`](./work-object/authorship-object.md#countries)
* [`authorships.institutions.country_code`](./work-object/README.md#institutions) (alias `institutions.country_code`)
* [`authorships.institutions.continent`](./filter-works.md#authorships.institutions.continent-alias-institutions.continent) (alias `institutions.continent`)
* [`authorships.institutions.is_global_south`](./filter-works.md#authorships.institutions.is_global_south-alias-institutions.is_global_south)
* [`authorships.institutions.id`](./work-object/README.md#institutions) (alias `institutions.id`)
* [`authorships.institutions.lineage`](./work-object/authorship-object.md#institutions)
* [`authorships.institutions.ror`](./work-object/README.md#institutions) (alias `institutions.ror`)
* [`authorships.institutions.type`](./work-object/README.md#institutions) (alias `institutions.type`)
* [`authorships.is_corresponding`](./work-object/authorship-object.md#is_corresponding) (alias: `is_corresponding`): this marks whether or not we have corresponding author information for a given work
* [`apc_list.value`](./work-object/README.md#apc_list)
* [`apc_list.currency`](./work-object/README.md#apc_list)
* [`apc_list.provenance`](./work-object/README.md#apc_list)
* [`apc_list.value_usd`](./work-object/README.md#apc_list)
* [`apc_paid.value`](./work-object/README.md#apc_paid)
* [`apc_paid.currency`](./work-object/README.md#apc_paid)
* [`apc_paid.provenance`](./work-object/README.md#apc_paid)
* [`apc_paid.value_usd`](./work-object/README.md#apc_paid)
* [`best_oa_location.is_accepted`](./work-object/README.md#best_oa_location)
* [`best_oa_location.is_published`](./work-object/README.md#best_oa_location)
* [`best_oa_location.license`](./work-object/README.md#best_oa_location)
* [`best_oa_location.source.host_organization`](./work-object/README.md#best_oa_location)
* [`best_oa_location.source.id`](./work-object/README.md#best_oa_location)
* [`best_oa_location.source.is_in_doaj`](./work-object/README.md#best_oa_location)
* [`best_oa_location.source.issn`](./work-object/README.md#best_oa_location)
* [`best_oa_location.source.type`](./work-object/README.md#best_oa_location)
* [`best_oa_location.version`](./work-object/README.md#best_oa_location)
* [`best_open_version`](./filter-works.md#best_open_version)
* [`cited_by_count`](./work-object/README.md#cited_by_count)
* [`cites`](./filter-works.md#cites)
* [`concepts_count`](./filter-works.md#concepts_count)
* [`concepts.id`](./work-object/README.md#concepts)
* [`concepts.wikidata`](./work-object/README.md#concepts)
* [`corresponding_author_ids`](./work-object/README.md#corresponding_author_ids)
* [`corresponding_institution_ids`](./work-object/README.md#corresponding_institution_ids)
* [`countries_distinct_count`](./work-object/README.md#countries_distinct_count)
* [`fulltext_origin`](./work-object/README.md#fulltext_origin)
* [`grants.award_id`](./work-object/README.md#grants)
* [`grants.funder`](./work-object/README.md#grants)
* [`has_abstract`](./filter-works.md#has_abstract)
* [`has_doi`](./filter-works.md#has_doi)
* [`has_fulltext`](./work-object/README.md#has_fulltext)
* [`has_orcid`](./filter-works.md#has_orcid)
* [`has_pmid`](./filter-works.md#has_pmid)
* [`has_pmcid`](./filter-works.md#has_pmcid)
* [`has_ngrams`](./filter-works.md#has_ngrams) (DEPRECATED)
* [`has_references`](./filter-works.md#has_references)
* [`is_retracted`](./work-object/README.md#is_retracted)
* [`is_paratext`](./work-object/README.md#is_paratext)
* [`journal`](./filter-works.md#journal)
* [`keywords.keyword`](./work-object/README.md#keywords)
* [`language`](./work-object/README.md#language)
* [`locations.is_accepted`](./work-object/README.md#locations)
* [`locations.is_published`](./work-object/README.md#locations)
* [`locations.source.host_institutions_lineage`](./filter-works.md#locations.source.host_institution_lineage)
* [`locations.source.is_in_doaj`](./work-object/README.md#locations)
* [`locations.source.publisher_lineage`](./filter-works.md#locations.source.publisher_lineage)
* [`locations_count`](./work-object/README.md#locations_count)
* [`open_access.any_repository_has_fulltext`](./work-object/README.md#open_access)
* [`open_access.is_oa`](./work-object/README.md#is_oa-1) (alias `is_oa`)
* [`open_access.oa_status`](./work-object/README.md#oa_status) (alias `oa_status`)
* [`primary_location.is_accepted`](./work-object/README.md#primary_location)
* [`primary_location.is_oa`](./work-object/README.md#primary_location)
* [`primary_location.is_published`](./work-object/README.md#primary_location)
* [`primary_location.license`](./work-object/README.md#primary_location)
* [`primary_location.source.has_issn`](./work-object/README.md#primary_location)
* [`primary_location.source.host_organization`](./work-object/README.md#primary_location)
* [`primary_location.source.id`](./work-object/README.md#primary_location)
* [`primary_location.source.is_in_doaj`](./work-object/README.md#primary_location)
* [`primary_location.source.issn`](./work-object/README.md#primary_location)
* [`locations.source.publisher_lineage`](./filter-works.md#primary_location.source.publisher_lineage)
* [`primary_location.source.type`](./work-object/README.md#primary_location)
* [`primary_location.version`](./work-object/README.md#primary_location)
* [`publication_year`](./work-object/README.md#publication_year)
* [`repository`](./filter-works.md#repository)
* [`sustainable_development_goals.id`](./work-object/README.md#sustainable_development_goals)
* [`type`](./work-object/README.md#type)
* [`type_crossref`](./work-object/README.md#type_crossref)