## Exploring an Existing Evaluation Dataset

In [None]:
from pathlib import Path

from teehr import Evaluation

Set the path to the existing dataset directory and initialize the evaluation, optionally enabling logging.

In [None]:
TEST_STUDY_DIR = Path(Path().resolve(), "test_study")

The evaluation dataset contains the following directory structure:

In [None]:
! tree  $TEST_STUDY_DIR -d -L 2 .

Initialize the Evaluation object and optionally enable logging.

In [None]:
# Create an Evaluation object
ev = Evaluation(dir_path=TEST_STUDY_DIR)

# Enable logging
ev.enable_logging()

Now we can access the data, returning `Pandas DataFrames`, `GeoPandas GeoDataFrames`, or `PySpark DataFrames`.

##### Timeseries data

Get the primary_timeseries data ("observations") as a Pandas DataFrame:

In [None]:
primary_timeseries_df = ev.primary_timeseries.to_pandas()
primary_timeseries_df.head()

Or as a GeoDataFrame:

In [None]:
primary_timeseries_gdf = ev.primary_timeseries.to_geopandas()
primary_timeseries_gdf.head()

Get the secondary_timeseries data ("predictions"):

In [None]:
secondary_timeseries_df = ev.secondary_timeseries.to_pandas()
secondary_timeseries_df.head()

##### Location data

User-defined attributes describing the locations of the primary timeseries are stored in the location_attributes table:

In [None]:
location_attributes_df = ev.location_attributes.to_pandas()
location_attributes_df.head()

A cross-walk table is used to link the primary location IDs to the secondary location IDs:

In [None]:
location_crosswalks_df = ev.location_crosswalks.to_pandas()
location_crosswalks_df.head()

##### Joined timeseries table

Get the joined_timeseries data. This is the primary_timeseries, secondary_timeseries, and location_attributes data joined on location ID. All the data in this table can be used in grouping and filtering when calculating performance metrics:

In [None]:
joined_timeseries_df = ev.joined_timeseries.to_pandas()
joined_timeseries_df.head()

##### Domain tables

The domain tables contain “lookup” data that is referenced in the data and location tables. These tables serve as a way to keep the data constituent across an Evaluation.

Units:

In [None]:
units_df = ev.units.to_pandas()
units_df.head()

Variable names:

In [None]:
variables_df = ev.variables.to_pandas()
variables_df.head()

Configuration names:

In [None]:
configurations_df = ev.configurations.to_pandas()
configurations_df.head()

Attribute names:

In [None]:
attributes_df = ev.attributes.to_pandas()
attributes_df.head()