### Querying RAS XS through PyIceberg

The following notebook is to walk you through the process of querying a RAS XS through PyIceberg. 

For the warehouse path, please put the path to your S3 tables URI

Requires `.env` containing `test` account credentials AND default region: `AWS_DEFAULT_REGION="us-east-1"`

In [None]:
from pathlib import Path

from pyiceberg.catalog import load_catalog

In [None]:
from icefabric.helpers import load_creds, to_geopandas

# dir is where the .env file is located
load_creds(dir=Path.cwd().parents[1])

In [None]:
import os

os.environ["PYICEBERG_HOME"]

In [None]:
catalog = load_catalog("glue", **{"type": "glue", "glue.region": "us-east-1"})
catalog.list_tables("mip_xs")[40:50]

Using `catalog.load_table()` we can directly call the XS data. Each is stored based on HUC8

In [None]:
# Reading MIP XS
namespace = "mip_xs"
huc_number = "02040106"
df = catalog.load_table(f"{namespace}.{huc_number}").scan().to_pandas()
gdf = to_geopandas(df)
gdf.head()
# gdf.explore()

To query individual river reaches, we can use the scan feature to query based on the table schema

In [None]:
catalog.load_table(f"{namespace}.{huc_number}").schema

Let's query by the river name

In [None]:
from pyiceberg.expressions import EqualTo

df = (
    catalog.load_table(f"{namespace}.{huc_number}")
    .scan(row_filter=EqualTo("river", "Lehigh River"))
    .to_pandas()
)
display(df.tail())
# to_geopandas(df).explore()

Now, let's query by an individual river station ID

In [None]:
from pyiceberg.expressions import EqualTo

df = (
    catalog.load_table(f"{namespace}.{huc_number}")
    .scan(row_filter=EqualTo("river_station", 573.0077))
    .to_pandas()
)
display(df.head())
# to_geopandas(df).explore()

To extend this notebook to other HUCs, just change the HUC number, then change the XS reference