# 02 - The Data Model
In this module we will explore the TEEHR data model and file formats.  The idea is that a scientist or researcher would format their data to the TEEHR data model and cache it for exploration and evaluation.  The data model is intended to be simple to understand and to get data into.

![data_model.png](../images/data_model.png)

NOTE: 

In [None]:
import duckdb
import pandas as pd
import geopandas as gpd
import hvplot.pandas
from pathlib import Path

# Explore the Data Model
Lets first specify an example timeseries, geospatial data, crosswalk data and attribute data cache file. We will explore each file one at a time, examining the data model, the Parquet schema, what it looks like when opened in Pandas, and ean examples of the data.

In [None]:
CACHE_DIR = Path(Path.home(), "shared", "rti-eval")
STUDY_DIR = Path(CACHE_DIR, "post-event-example")
GEOMETRY = Path(STUDY_DIR, "geo", "usgs_geometry.parquet")
TIMESERIES = Path(STUDY_DIR, "timeseries", "short_range", "20221218T00Z.parquet")
CROSSWALK = Path(STUDY_DIR, "geo", "usgs_nwm22_crosswalk.parquet")
ATTRIBUTE = Path(STUDY_DIR, "geo", "usgs_attr_upstream_area.parquet")

### Geometry

In [None]:
duckdb.query(f"SELECT name, type, logical_type FROM parquet_schema('{GEOMETRY}')")

In [None]:
geom_gdf = gpd.read_parquet(GEOMETRY)
geom_gdf.info()

In [None]:
geom_gdf.head()

### Attribute

In [None]:
duckdb.query(f"SELECT name, type, logical_type FROM parquet_schema('{ATTRIBUTE}')")

In [None]:
attr_df = pd.read_parquet(ATTRIBUTE)
attr_df.info()

In [None]:
attr_df.head()

### Crosswalk

In [None]:
duckdb.query(f"SELECT name, type, logical_type FROM parquet_schema('{CROSSWALK}')")

In [None]:
xwalk_df = pd.read_parquet(CROSSWALK)
xwalk_df.info()

In [None]:
xwalk_df.head()

### Timeseries

In [None]:
duckdb.query(f"SELECT name, type, logical_type FROM parquet_schema('{TIMESERIES}')")

In [None]:
ts_df = pd.read_parquet(TIMESERIES)
ts_df.info()

In [None]:
ts_df.head()