# Step 1: Domain Setup and Container Creation

This notebook walks through loading and examining the shapefile, then creates a SwimContainer to hold all project data.

The SwimContainer is the heart of SWIM-RS data management. It stores all project data (geometries, meteorology, remote sensing, properties, derived products) in a single `.swim` file with full provenance tracking.

In this notebook you will:
1. Load and explore the shapefile
2. Create a SwimContainer from the shapefile
3. Understand the container structure and API

## 1. Load Necessary Libraries

To work with geospatial data, we'll primarily use `geopandas` along with `matplotlib` for visualization.

In [None]:
import os
import sys

import geopandas as gpd
import matplotlib.pyplot as plt

root = os.path.abspath('../..')
sys.path.append(root)

%matplotlib inline

## 2. Load the Shapefile

Using `geopandas`, we can load the shapefile and inspect its structure. It contains data clipped from the Montana [Statewide Irrigation Dataset](https://mslservices.mt.gov/Geographic_Information/Data/DataList/datalist_Details.aspx?did=%7Bf33bc611-8d4e-4d92-ae99-49762dec888b%7D).

In [None]:
project_dir = os.path.abspath('.')
shapefile_path = os.path.join(project_dir, 'data', 'gis', 'mt_sid_boulder.shp')

gdf = gpd.read_file(shapefile_path)

# Display the first few rows of the GeoDataFrame to examine structure and attributes
print(f'{gdf.shape[0]} fields')
gdf.head()

## 3. Display the Shapefile Geometry

Now, we'll plot the shapefile to get a visual overview of the spatial data it contains.

In [None]:
gdf.plot(figsize=(10, 10), edgecolor='black')
plt.title('Shapefile Geometry')
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.show()

## 4. Check the EPSG Code

To confirm the projection, we'll display the EPSG code of the shapefile. We want the projection for the project in Albers Equal Area, so I reprojected the SID into 5071 in a GIS.

In [None]:
epsg_code = gdf.crs
print(f"EPSG Code: {epsg_code}")

## 5. List the Attributes (Fields) in the Shapefile

List all fields (attributes) within the shapefile to see the available data. There is valuable information, though we only really need a unique ID, for which we can use 'FID_1'.

In [None]:
attributes = gdf.columns
print('Attributes in shapefile:')
for attribute in attributes:
    print(attribute)

## 6. Display Shapefile with a Basemap

In this cell, we will load the shapefile and plot it with a basemap for better geographical context. To add the basemap, we use `contextily`, which provides tiles from various web-based map providers.

In [None]:
import contextily as ctx

# Check if the shapefile's CRS is in meters (projected); contextily basemaps are in Web Mercator (EPSG:3857)
gdf_plot = gdf.copy()
if gdf_plot.crs.to_string() != 'EPSG:3857':
    gdf_plot = gdf_plot.to_crs(epsg=3857)  # Reproject to Web Mercator if needed

fig, ax = plt.subplots(figsize=(10, 10))
gdf_plot.plot(ax=ax, edgecolor='black', alpha=0.5)

# Hybrid basemap: satellite imagery + labels overlay
ctx.add_basemap(ax, source=ctx.providers.Esri.WorldImagery)
ctx.add_basemap(ax, source=ctx.providers.CartoDB.PositronOnlyLabels, alpha=0.8)

plt.title("Shapefile with Basemap")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

## 7. Create the SwimContainer

Now we create a SwimContainer from our shapefile. The container will:
- Store the field geometries
- Define the time range for our analysis
- Serve as the central data store for all subsequent data (meteorology, remote sensing, etc.)

The container uses Zarr storage format internally, providing efficient access to large arrays with chunking and compression.

In [None]:
from swimrs.container import SwimContainer

container_path = os.path.join(project_dir, 'data', '1_Boulder.swim')

container = SwimContainer.create(
    container_path,
    fields_shapefile=shapefile_path,
    uid_column="FID_1",
    start_date="2004-01-01",
    end_date="2022-12-31",
    project_name="1_Boulder",
    overwrite=True,
)

print('Ignore the UnstableSpecificationWarning; it should clear up in future zarr python package releases')

## 8. Explore the Container

Let's examine what the container knows about our project.

In [None]:
print(f"Project: {container.project_name}")
print(f"Number of fields: {container.n_fields}")
print(f"Date range: {container.start_date} to {container.end_date}")
print(f"Number of days: {container.n_days}")
print(f"\nField UIDs (first 10): {container.field_uids[:10]}")

## 9. Check Container Status

The container provides a status query to show what data has been ingested. Right now it should be empty except for the geometry.

In [None]:
print(container.query.status())

## 10. Save and Close

Save the container and close it. We'll reopen it in the next notebooks to ingest data.

In [None]:
container.save()
container.close()

print(f"Container saved to: {container_path}")
print("\nNext: Run notebook 02 to extract data (if you have Earth Engine access)")
print("      Or skip to notebook 03 to ingest pre-built data")