Skip to content

Conversation

@jm-rivera
Copy link
Contributor

This PR introduces theDataCommonsClient class to interact with the Data Commons API, along with various utility functions and tests.

DataCommonsClient

The DataCommonsClient class would be the main entry point for most users. As described in the design doc:

The datacommons entrypoint would coordinate interactions with and between classes.
It would simply provide convenient access to the different endpoints, centralising where the api_key and dc_instance are provided, and binding a specific ‘client’ instance to a specific knowledge graph and API key.
It will also add convenient methods to perform additional, higher-level interactions with the data/api. For example, it can implement a method to get data as a nicely structured Pandas DataFrame (if the optional dependency is installed).

This implementation achieves those objectives but it shouldn't be seen as the end state of the class. The ultimate aim is to provide a range of convenience tools and functionality. For now, I wanted to keep this PR small and focused around defining how it will all come together, including clearly showcasing how the optional Pandas dependency will work.

For example, connecting to base DC:

from datacommons_client.client import DataCommonsClient

client = DataCommonsClient(dc_instance="datacommons.org", api_key="api-key-string")

And using that client, getting all property labels for a given node (docs example)

response = client.node.fetch_property_labels(node_dcids="geoId/06", out=False)

Which returns a NodeResponse object with the API response

NodeResponse(
data={'geoId/06': Properties(properties=['affectedPlace',
                                                      'containedInPlace',
                                                      'location',
                                                      'member',
                                                      'overlapsWith'])},
nextToken=None
)

Or which can produce a flatted list of properties for each entity:

properties = response.get_properties()

{'geoId/06': ['affectedPlace',
              'containedInPlace',
              'location',
              'member',
              'overlapsWith']}

As another example, getting statistical observations as a Pandas DataFrame from a custom DC instance

client = DataCommonsClient(dc_instance="datacommons.one.org")

response = client.observations_dataframe(
    variable_dcids="sdg/SI_POV_DAY1",
    date="latest",
    entity_dcids=["country/GTM", "country/NGA"],
)

That will return a pandas Data Frame shaped as follows

date entity variable value facetId importName measurementMethod observationPeriod provenanceUrl unit
0 2018 country/NGA sdg/SI_POV_DAY1 30.9 3549866825 UN_SDG SDG_G_G https://unstats.un.org/sdgs/dataportal SDG_PERCENT
1 2014 country/GTM sdg/SI_POV_DAY1 9.5 3549866825 UN_SDG SDG_G_G https://unstats.un.org/sdgs/dataportal SDG_PERCENT

Or we could also get all data for African countries for a specific indicator.

response = client.observations_dataframe(
    variable_dcids="sdg/SI_POV_DAY1",
    date="all",
    entity_type="Country",
    parent_entity="africa",
)

response.head(5)
date entity variable value facetId importName measurementMethod observationPeriod provenanceUrl unit
0 1996 country/CMR sdg/SI_POV_DAY1 50.4 3549866825 UN_SDG SDG_G_G https://unstats.un.org/sdgs/dataportal SDG_PERCENT
1 2001 country/CMR sdg/SI_POV_DAY1 25.7 3549866825 UN_SDG SDG_G_G https://unstats.un.org/sdgs/dataportal SDG_PERCENT
2 2007 country/CMR sdg/SI_POV_DAY1 31.4 3549866825 UN_SDG SDG_G_G https://unstats.un.org/sdgs/dataportal SDG_PERCENT
3 2014 country/CMR sdg/SI_POV_DAY1 25.7 3549866825 UN_SDG SDG_G_G https://unstats.un.org/sdgs/dataportal SDG_PERCENT
4 1992 country/CAF sdg/SI_POV_DAY1 82.2 3549866825 UN_SDG SDG_G_G https://unstats.un.org/sdgs/dataportal SDG_PERCENT

A lot of convenient features are possible but this PR does not incorporate them (the subject of future PRs)...

These examples are just scratching the surface to showcase the idea. Let me know if more would be helpful!

Adds convenience method to be able to get data for specific `entity_dcids` for specific `variable_dcids` for specific dates
Adds a decorator that can be added to any methods or functions that require Pandas to work.
Adds return types
@jm-rivera jm-rivera requested a review from hqpho February 12, 2025 01:42
@jm-rivera jm-rivera self-assigned this Feb 12, 2025
@jm-rivera jm-rivera requested a review from keyurva February 12, 2025 18:55
@kmoscoe kmoscoe requested review from kmoscoe and removed request for kmoscoe February 13, 2025 17:30
Copy link
Collaborator

@hqpho hqpho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the example calls in the comments and PR description!!

@jm-rivera jm-rivera merged commit 56173fc into datacommonsorg:master Feb 14, 2025
2 checks passed
@jm-rivera jm-rivera deleted the add-client-class branch February 14, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants