Explore and use Datadex datasets in your preferred tools!

## 🔍 Explore

You can a sense of how the datasets produced look by [exploring them in HuggingFace](https://huggingface.co/datonic). With each commit to `main`, Datadex will push a new version of the datasets as Parquet files.

These are the available Datasets!

In [8]:
# | echo: false

from IPython.display import Markdown, display
from huggingface_hub import HfApi

api = HfApi()

datasets = []
for dataset in api.list_datasets(author="datonic"):
    display(
        Markdown(
            f"  * [{dataset.id.split("/")[1]}](https://huggingface.co/datasets/{dataset.id})"
        )
    )
    datasets.append(dataset.id)

  * [wikidata_asteroids](https://huggingface.co/datasets/datonic/wikidata_asteroids)

  * [country_year_indicators](https://huggingface.co/datasets/datonic/country_year_indicators)

  * [threatened_animal_species](https://huggingface.co/datasets/datonic/threatened_animal_species)

  * [spain_ipc](https://huggingface.co/datasets/datonic/spain_ipc)

  * [spain_water_reservoirs_data](https://huggingface.co/datasets/datonic/spain_water_reservoirs_data)

  * [spain_energy_demand](https://huggingface.co/datasets/datonic/spain_energy_demand)

  * [spain_aemet_historical_weather](https://huggingface.co/datasets/datonic/spain_aemet_historical_weather)

## 🔧 Use

Since the datasets are just Parquet files somewhere. 
You can use pretty much any tool or framework to explore them.
Let's look at the [Spain IPC dataset](https://huggingface.co/datasets/datonic/spain_ipc) with Polars.

In [12]:
import polars as pl

df = pl.read_parquet(
    "https://huggingface.co/datasets/datonic/spain_ipc/resolve/refs%2Fconvert%2Fparquet/default/main/0000.parquet"
)

In [21]:
df.sample(4)

periodo,clases,indice,variacion_mensual,variacion_anual,variacion_en_lo_que_va_de_ano
date,str,f64,f64,f64,f64
2018-05-01,"""1254 Seguros relacionados con …",96.323,0.0,1.5,0.0
2010-04-01,"""0312 Prendas de vestir""",98.85,10.3,-1.2,-4.8
2014-08-01,"""0952 Prensa""",86.534,0.6,1.7,0.9
2013-08-01,"""0942 Servicios culturales""",99.054,0.3,10.7,1.4


In [22]:
ipc_prensa = df.filter(pl.col("clases") == "0952 Prensa")

In [27]:
import altair as alt

alt.Chart(ipc_prensa).mark_line().encode(
    x=alt.X("periodo", title="Period", axis=alt.Axis(labelAngle=-45)),
    y=alt.Y("indice", title="Index"),
    tooltip=["periodo", "indice"],
).properties(width="container", height=400, title="IPC Prensa Over Time").interactive()