# Discoverying content of interest in the Data Observatory

The Discovery API is a powerful tool for exploring the available datasets in our data lake. Through its methods you would be able to navigate through the datasets and their properties, thus knowing in advanced which sources may be of interest for you before even requesting access to them. 

## Catalog: the first step for discovery

The Catalog class provides the methods to be used as the starting point in your discovery. It allows you to get the complete list of countries related to the avilable datasets, for exampple.


### Get the list of countries

In [1]:
from cartoframes.data.observatory.catalog import Catalog
from cartoframes.data.observatory.country import Country, Countries

countries = Catalog.countries()

countries

[{'country_iso_code3': 'spain'}, {'country_iso_code3': 'usa'}]

In [3]:
isinstance(countries, Countries)

True

We can also obtain the countries as a pandas DataFrame:

In [4]:
countries_df = countries.to_dataframe()
countries_df

Unnamed: 0,country_iso_code3
0,spain
1,usa


In [5]:
import pandas as pd

isinstance(countries_df, pd.DataFrame)

True

### Filter one country 

Since the list of countries is also a Pandas' DataFrame, we can use its already familiar methods to explore the data.

In [6]:
filtered_country = countries[0]
filtered_country

<cartoframes.data.observatory.country.Country at 0x7fadb64abcc0>

In [7]:
isinstance(filtered_country, Country)

True

We can obtain the country in a pandas Series:

In [8]:
country_series = filtered_country.to_series()
country_series


country_iso_code3    spain
dtype: object

In [9]:
import pandas as pd

isinstance(country_series, pd.Series)

True

## Get a particular country

If we already know that a particular country has presence in the Catalog, we can retrieve it directly by using its id.

In [2]:
country1 = Catalog.countries().get('spain')

country1

<cartoframes.data.observatory.country.Country at 0x7f478a628e80>

In [11]:
isinstance(country1, Country)

True

### Get the datasets for that country

Once we have a Country we can use the discovery methods to get the datasets related to that country.

In [4]:
esp_datasets = country1.datasets()
esp_datasets

[{'id': 'carto-do.bbva.financial_destinationdistribution_spain_zipcodes_2017_monthly_2017',
  'name': 'Destination Distribution',
  'description': None,
  'provider_id': 'bbva',
  'category_id': 'financial',
  'data_source_id': 'destinationdistribution',
  'country_iso_code3': 'spain',
  'language_iso_code3': None,
  'geography_id': 'carto-do.bbva.geography_spain_zipcodes_2017',
  'temporal_aggregation': None,
  'time_coverage': None,
  'update_frequency': None,
  'version': '2017',
  'is_public_data': None,
  'summary_jsonb': {'counts': {'rows': 465569,
    'cells': 7449104,
    'null_cells': 23402,
    'null_cells_percent': 0.31415858873765223},
   'glimpses': {'head': {'Txs': [27214,
      27214,
      27214,
      27214,
      27214,
      27214,
      27214,
      27214,
      27214,
      27214],
     'Date': ['2017-01',
      '2017-01',
      '2017-01',
      '2017-01',
      '2017-01',
      '2017-01',
      '2017-01',
      '2017-01',
      '2017-01',
      '2017-01'],
     'Z

In [None]:
from cartoframes.data.observatory.dataset import Datasets

isinstance(esp_datasets, Datasets)

Again, we can export the data as a pandas' DataFrame:

In [6]:
d1 = esp_datasets.to_dataframe().iloc[0]
d1

id                      carto-do.bbva.financial_destinationdistributio...
name                                             Destination Distribution
description                                                          None
provider_id                                                          bbva
category_id                                                     financial
data_source_id                                    destinationdistribution
country_iso_code3                                                   spain
language_iso_code3                                                   None
geography_id                  carto-do.bbva.geography_spain_zipcodes_2017
temporal_aggregation                                                 None
time_coverage                                                        None
update_frequency                                                     None
version                                                              2017
is_public_data                        

And as well as Country, a Dataset can be used to extract related properties:

In [7]:
vars1 = esp_datasets[0].variables()
vars1

[{'id': 'carto-do.bbva.financial_destinationdistribution_spain_zipcodes_2017_monthly_2017.Date',
  'name': 'Date',
  'description': 'The date the data refers to (YYYY-MM format for month and YYYY-MM-DD for day).',
  'column_name': 'Date',
  'db_type': 'STRING',
  'dataset_id': 'carto-do.bbva.financial_destinationdistribution_spain_zipcodes_2017_monthly_2017',
  'agg_method': None,
  'variable_group_id': None,
  'starred': False,
  'summary_jsonb': {'head': ['2017-01',
    '2017-01',
    '2017-01',
    '2017-01',
    '2017-01',
    '2017-01',
    '2017-01',
    '2017-01',
    '2017-01',
    '2017-01'],
   'tail': ['2017-12',
    '2017-12',
    '2017-12',
    '2017-12',
    '2017-12',
    '2017-12',
    '2017-12',
    '2017-12',
    '2017-12',
    '2017-12'],
   'counts': {'all': 465569,
    'null': 0,
    'distinct': 12,
    'null_percent': 0,
    'distinct_percent': 0.0025774911989415103},
   'top_values': [{'count': 40826, 'value': '2017-08'},
    {'count': 40320, 'value': '2017-07'},

In [8]:
from cartoframes.data.observatory.variable import Variable, Variables

isinstance(vars1, Variables)

True