# Discoverying content of interest in the Data Observatory

The Discovery API is a powerful tool for exploring the available datasets in our data lake. Through its methods you would be able to navigate through the datasets and their properties, thus knowing in advanced which sources may be of interest for you before even requesting access to them. 

## Catalog: the first step for discovery

The Catalog class provides the methods to be used as the starting point in your discovery. It allows you to get the complete list of categories related to the avilable datasets, for example.


### Get the list of categories

In [None]:
from cartoframes.data.observatory import Catalog
from cartoframes.data.observatory import Category

catalog = Catalog()
categories = catalog.categories

categories

In [None]:
isinstance(categories[0], Category)

We can also obtain the categories as a pandas DataFrame:

In [None]:
categories_df = categories.to_dataframe()
categories_df

In [None]:
import pandas as pd


catalog.country('usa').category('demographics').datasets.get('od_acs_13345497')

### Filter one category 

Since the list of categories is also a Pandas' DataFrame, we can use its already familiar methods to explore the data.

In [None]:
filtered_category = categories[0]
filtered_category

In [None]:
isinstance(filtered_category, Category)

We can obtain the category in a pandas Series:

In [None]:
category_series = filtered_category.to_series()
category_series


In [None]:
import pandas as pd

isinstance(category_series, pd.Series)

## Explore a particular category

If we already know that a particular category has presence in the Catalog, we can retrieve it directly by using its id.

In [None]:
category1 = catalog.categories.get('demographics')

category1

In [None]:
isinstance(category1, Category)

And we can access its different properties:

In [None]:
category1.name

### Get the datasets for that category

Once we have a Category we can use the discovery methods to get the datasets related to that category.

In [None]:
demographics_datasets = category1.datasets
demographics_datasets

In [None]:
Dataset.get('od_acs_181619a3')

Again, we can export the data as a pandas' DataFrame:

In [None]:
demographics_datasets.to_dataframe().iloc[0]

And as well as Category, a Dataset can be used to extract related properties:

In [None]:
d1 = demographics_datasets[0]

d1.country

In [None]:
vars1 = d1.variables
vars1

In [None]:
from cartoframes.data.observatory import Variable

Variable.get('mortgaged_housi_1741ccf')

In [None]:
Variable('households_publ_14c36c88','None')

In [None]:
from cartoframes.data.observatory import Variable

isinstance(vars1[0], Variable)

In [None]:
Dataset.get('od_acs_181619a3').to_dict()

And if we have a list of ids or slugs that we want to get from the catalog, we can obtain them in one call:

In [None]:
Dataset.get_list(['od_acs_181619a3', 'od_acs_13345497'])

## Navigate through the catalog with filters

It is possible to navigate the catalog by adding nested filters that will be applied when requesting lists of entities.

For example, we can add a country filter before requesting the list of datasets and then we will only obtain the datasets from that country:

In [None]:
from cartoframes.data.observatory import Catalog

catalog = Catalog()

In [None]:
catalog.country('esp').datasets

Applied filters are saved in that catalog instance, so if we now add a new filter the list of datasets will be affected by both filters:

In [None]:
catalog.category('demographics').datasets

We can also nest the filters in the same call with the same result:

In [None]:
catalog2 = Catalog()
catalog2.country('esp').category('demographics').datasets

To remove all past filters, we simply call the method to clear them:

In [None]:
catalog.clear_filters()
catalog.datasets

Nested filters can also be applied when requesting countries, categories, or geogrpahies:

In [None]:
catalog.clear_filters()
catalog.category('demographics').countries

In [None]:
catalog.clear_filters()
catalog.country('esp').categories

In [None]:
catalog.clear_filters()
catalog.country('usa').category('demographics').geographies

And we can use all the filters at the same time:

In [None]:
catalog.clear_filters()
catalog.country('usa').category('demographics').geography('ags_blockgroup_1c63771c').datasets

In [None]:
Catalog().country('usa').category('demographics').geography('od_countyclipp_caef1ec9').datasets