<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

This module implements functions to discover the data exposed by ISTAT. To do so, `istatapi` make metadata requests to the API endpoints. The `Discovery` module provides useful methods to parse and analyze API metadata responses. It makes use of the library `pandas` and returns data in the `DataFrame` format, making it convenient for interactive and exploratory analysis in Jupyter Notebooks.

The main class implemented in the `Discovery` module is [`DataSet`](https://Attol8.github.io/istatapi/discovery.html#dataset).

In [1]:
#| echo: false
#| output: asis
show_doc(parse_dataflows)

---

[source](https://github.com/Attol8/istatapi/blob/master/istatapi/discovery.py#L13){target="_blank" style="float:right; font-size:smaller"}

### parse_dataflows

>      parse_dataflows (response)

parse the `response` containing all the available datasets and return a list of dataflows.

The simplest way to get a full list of the dataflows provided by ISTAT is to call the method `all_available()` which returns a list of all the explorable dataflows, together with their IDs and descriptions.

In [2]:
#| echo: false
#| output: asis
show_doc(all_available)

---

[source](https://github.com/Attol8/istatapi/blob/master/istatapi/discovery.py#L46){target="_blank" style="float:right; font-size:smaller"}

### all_available

>      all_available (dataframe=True)

Return all available dataflows

In [None]:
available_datasets = all_available()
available_datasets.head()

Unnamed: 0,df_id,version,df_description,df_structure_id
0,101_1015,1.3,Crops,DCSP_COLTIVAZIONI
1,101_1030,1.0,"PDO, PGI and TSG quality products",DCSP_DOPIGP
2,101_1033,1.0,slaughtering,DCSP_MACELLAZIONI
3,101_1039,1.2,Agritourism - municipalities,DCSP_AGRITURISMO_COM
4,101_1077,1.0,"PDO, PGI and TSG products: operators - munici...",DCSP_DOPIGP_COM


In [None]:
print(f'number of available datasets: {len(available_datasets)}')

number of available datasets: 509


In [None]:
test_eq(available_datasets.columns, ['df_id', 'version', 'df_description', 'df_structure_id'])

In [3]:
#| echo: false
#| output: asis
show_doc(search_dataset)

---

[source](https://github.com/Attol8/istatapi/blob/master/istatapi/discovery.py#L59){target="_blank" style="float:right; font-size:smaller"}

### search_dataset

>      search_dataset (keyword)

Search available dataflows that contain `keyword`. Return these dataflows in a DataFrame

This method looks for `keyword` inside all datasets descriptions. By default, the `keyword` needs to be an english word.

In [None]:
df = search_dataset(keyword="Tax")
df.head()

Unnamed: 0,df_id,version,df_description,df_structure_id
168,168_261,1.1,Hicp - at constant tax rates annual data(base ...,DCSP_IPCATC2
169,168_306,1.2,Hicp - at constant tax rates monthly data (bas...,DCSP_IPCATC1
172,168_756,1.4,Hicp - at constant tax rates monthly data (bas...,DCSP_IPCATC1B2015
173,168_757,1.1,Hicp- at constant tax rates annual data (base ...,DCSP_IPCATC2B2015
267,30_1008,1.1,Irpef taxable incomes (Ipef) - municipalities,MEF_REDDITIIRPEF_COM


In [None]:
test_fail(lambda: search_dataset(keyword="disoccupazione"))

## Data Structures and Information about available Datasets

In [4]:
#| echo: false
#| output: asis
show_doc(DataSet)

---

[source](https://github.com/Attol8/istatapi/blob/master/istatapi/discovery.py#L73){target="_blank" style="float:right; font-size:smaller"}

### DataSet

>      DataSet (dataflow_identifier)

Class that implements methods to retrieve informations (metadata) about a Dataset

The class takes `df_id`, `df_structure_id` or `df_description` as inputs. These 3 values can be found by using the `all_available()` function.

In [None]:
ds = DataSet(dataflow_identifier="151_914")
test_eq(ds.identifiers['df_id'], '151_914')
test_eq(ds.identifiers['df_description'], 'Unemployment  rate')
test_eq(ds.identifiers['df_structure_id'], 'DCCV_TAXDISOCCU1')

In [None]:
ds2 = DataSet(dataflow_identifier="22_289")
test_eq(ds2.identifiers['df_id'], '22_289')
test_eq(ds2.identifiers['df_description'], 'Resident population  on 1st January')
test_eq(ds2.identifiers['df_structure_id'], 'DCIS_POPRES1')

In [None]:
ds2.dimensions_info()

Unnamed: 0,dimension,dimension_ID,description
0,FREQ,CL_FREQ,Frequency
1,ETA,CL_ETA1,Age class
2,ITTER107,CL_ITTER107,Territory
3,SESSO,CL_SEXISTAT1,Gender
4,STACIVX,CL_STATCIV2,Marital status
5,TIPO_INDDEM,CL_TIPO_DATO15,Data type 15


we can look at the dimensions of a dataflow by simply accessing its attribute `dimensions`. However, we won't have dimensions' descriptions here.

In [5]:
#| echo: false
#| output: asis
show_doc(DataSet.dimensions_info)

---

[source](https://github.com/Attol8/istatapi/blob/master/istatapi/discovery.py#L135){target="_blank" style="float:right; font-size:smaller"}

### DataSet.dimensions_info

>      DataSet.dimensions_info (dataframe=True, description=True)

Return the dimensions of a specific dataflow and their `descriptions`.

To have a look at the dimensions together with their description, we can use the `dimension_info` function. It will return an easy to read pandas DataFrame.

In [None]:
dimensions_df = ds.dimensions_info()
test_eq(dimensions_df.columns, ['dimension', 'dimension_ID', 'description'])
dimensions_df

Unnamed: 0,dimension,dimension_ID,description
0,FREQ,CL_FREQ,Frequency
1,CITTADINANZA,CL_CITTADINANZA,Citizenship
2,DURATA_DISOCCUPAZ,CL_DURATA,Duration
3,CLASSE_ETA,CL_ETA1,Age class
4,ITTER107,CL_ITTER107,Territory
5,SESSO,CL_SEXISTAT1,Gender
6,TIPO_DATO,CL_TIPO_DATO_FOL,Data type FOL
7,TITOLO_STUDIO,CL_TITOLO_STUDIO,Level of education


The values that the different dimensions can take can also be explored. The `available_values` attribute contains a dictionary with the dimensions of the dataset as keys. The values of the dictionary are themselves dictionaries which can be accessed through the `values_ids` and `values_description` keys. The former key returns an ID of the dimension's values, the latter a description of these values.

In [None]:
values_dict = ds.available_values
test_eq(isinstance(values_dict, dict), True)
test_eq(list(values_dict.keys()).sort(), ds.dimensions.sort())
test_eq(values_dict['DURATA_DISOCCUPAZ']['values_ids'], ['TOTAL', 'M_GE12'])
test_eq(values_dict['DURATA_DISOCCUPAZ']['values_description'], ['total', '12 months and over'])

In [6]:
#| echo: false
#| output: asis
show_doc(DataSet.get_dimension_values)

---

[source](https://github.com/Attol8/istatapi/blob/master/istatapi/discovery.py#L220){target="_blank" style="float:right; font-size:smaller"}

### DataSet.get_dimension_values

>      DataSet.get_dimension_values (dimension, dataframe=True)

Return the available values of a single `dimension` in the dataset

In [None]:
ds.get_dimension_values('DURATA_DISOCCUPAZ')

Unnamed: 0,values_ids,values_description
0,TOTAL,total
1,M_GE12,12 months and over


In [7]:
#| echo: false
#| output: asis
show_doc(DataSet.set_filters)

---

[source](https://github.com/Attol8/istatapi/blob/master/istatapi/discovery.py#L241){target="_blank" style="float:right; font-size:smaller"}

### DataSet.set_filters

>      DataSet.set_filters (**kwargs)

set filters for the dimensions of the dataset by passing dimension_name=value

With `DataSet.set_filters()` we can filter the dimensions of the dataset by passing the values that we want to filter for. The dataset will then only return data containing our filters. A dictionary with the selected filters is contained in the attribute `DataSet.filters`.

**Note** that the arguments of [`DataSet.set_filters`](https://Attol8.github.io/istatapi/discovery.html#dataset.set_filters) are lower case letters, but in `DataSet.filters` they are converted to upper case to be consistent with dimension names on ISTAT API.

In [None]:
dz = DataSet(dataflow_identifier="139_176")
dz.set_filters(freq="M", tipo_dato=["ISAV", "ESAV"], paese_partner="WORLD")

test_eq(dz.filters['FREQ'], 'M')
test_eq(dz.filters['TIPO_DATO'], ["ISAV", "ESAV"])
test_fail(lambda: dz.filters['freq']) #the filter is not saved in lower case