# 02 - The Data Cache
In this module we will explore the cached data structure using DuckDB.  The timeseries, geospatial data, crosswalk data and attribute data was fetched from different sources, formatted to the TEEHER data model and cached for exploration and evaluation.

In [None]:
import duckdb
from pathlib import Path

## Example study directory structure
```text
.
├── geo
│   ├── huc10_geometry.parquet
│   ├── huc10_huc10_crosswalk.parquet
│   ├── usgs_attr_ecoregions.parquet
│   ├── usgs_attr_retro_100yr_flow.parquet
│   ├── usgs_attr_retro_10yr_flow.parquet
│   ├── usgs_attr_retro_2yr_flow.parquet
│   ├── usgs_attr_stream_order.parquet
│   ├── usgs_attr_upstream_area.parquet
│   ├── usgs_geometry.parquet
│   ├── usgs_huc12_crosswalk.parquet
│   └── usgs_nwm22_crosswalk.parquet
├── study_definition.json
└── timeseries
    ├── analysis_assim
    │   ├── 20221217T21Z.parquet
    │   ├── ...
    │   └── 20230118T20Z.parquet
    ├── forcing_analysis_assim
    │   ├── 20221218.parquet
    │   ├── ...
    │   └── 20230126.parquet
    ├── forcing_medium_range
    │   ├── 20221218T00Z.parquet
    │   ├── ...
    │   └── 20230118T18Z.parquet
    ├── forcing_short_range
    │   ├── 20221218T00Z.parquet
    │   ├── ...
    │   └── 20230118T23Z.parquet
    ├── medium_range_mem1
    │   ├── 20221218T00Z.parquet
    │   ├── ...
    │   └── 20230116T18Z.parquet
    ├── short_range
    │   ├── 20221218T00Z.parquet
    │   ├── ...
    │   └── 20230118T23Z.parquet
    └── usgs
        ├── 20221218.parquet
        ├── ...
        └── 20230126.parquet
```

First we need to specify where the timeseries, geospatial data, crosswalk data and attribute data are cached.

In [None]:
CACHE_DIR = Path(Path.home(), "shared", "rti-eval")
STUDY_DIR = Path(CACHE_DIR, "post-event-example")
MEDIUM_RANGE_MEM1 = Path(STUDY_DIR, "timeseries", "medium_range_mem1", "*.parquet")

In [None]:
duckdb.query(f"""
    SELECT * FROM parquet_schema('{MEDIUM_RANGE_MEM1}')
;""").df()

In [None]:
duckdb.query(f"SELECT * FROM read_parquet('{MEDIUM_RANGE_MEM1}') LIMIT 1;").df().info()