# Run wapor_zonal function from wapordl 

 ---
 ## 1. Install Miniconda and Set Up Environment
 Conda is a package manager useful for managing dependencies and creating isolated environments. 
 
 **Steps**:
 - Install Conda if you haven’t already by following the instructions [here](https://docs.anaconda.com/miniconda/miniconda-install/).
 - Create a Conda environment with the necessary packages.
 - Install local version of wapordl
 - Install jupyter notebook
 - Run notebook
 
### Install Conda Environment with Packages

 Note: *Run this in your terminal, not directly in Jupyter.*

 change location to the repo folder (not the package folder within the repo)
 ```bash
 cd your_repo_path
 ```
 create the wapor conda env needed to run the notebook
 ```bash
 conda env create -f wapor_env.yaml
 ```
 Once done, activate the environment:
 ```bash
 conda activate wapor_env
 ```
 install local verison of wapordl
 ```bash
 pip install .
 ```
 To install Jupyter Notebook in this environment:
 ```bash
 conda install -c conda-forge notebook
 ```

 ## 2. Running a Jupyter Notebook from the Environment
 Once the environment is created and activated, launch Jupyter Notebook from within the repo by running:
 ```bash
 jupyter notebook
 ```
 This will open a browser window to manage notebooks within your environment select this notebook and your read to go

In [None]:
# test functionality

from osgeo import gdal
from wapordl.main import wapor_zonal

print('imports succesful')

## 4. Run wapor_zonal

### Parameters

- **target_polygons** (`Union[str, List[float], None]`):  
  Defines the geographical areas (polygons) of interest, for which statistics are calculated. The argument should be the file path to a vector file containing these polygons, such as a `.shp` file.
  - **Example**: `target_polygons="path/to/shapefile.shp"`

- **id_column** (`str`):  
  Specifies the column in `target_polygons` used to uniquely identify each polygon, so statistics can be grouped by each polygon in the output files.
  - **Example**: `id_column="region_id"`

- **variables** (`list[str]`):  
  List of variable names to download and process for each polygon. These could be climate or environmental metrics provided by the WaPOR3 or agERA5 datasets.
  - **Example**: `variables=["L1-PCP-E", "L2-AETI-D", "L2-NPP-D"]`

- **period** (`list`):  
  The time period for which data will be downloaded, given as start and end dates. The period should be specified as a list with two date strings in `"YYYY-MM-DD"` format.
  - **Example**: `period=["2018-01-01", "2023-12-31"]`

- **folder** (`str`):  
  The folder path where downloaded data will be saved. This path must already exist on your file system.
  - **Example**: `folder="wapor_output_data"`

- **overview** (`Union[str, int]`, optional):  
  Specifies which overview of the Cloud-Optimized GeoTIFF (COG) files to use if None chosen the original resolution is used, by default `None`.

- **unit_conversion** (`str`, optional):  
  Defines the unit conversion for the data, if needed. Options are `"day"`, `"dekad"`, `"month"`, or `"year"`. This allows data to be standardized across different timeframes.
  - **Example**: `unit_conversion="month"`

- **req_stats** (`list`, optional):  
  List of statistics to include in the output, such as `"mean"`, `"std"`, `"maximum"`, and `"minimum"`. The default is `["mean", "std"]`.
  - **Example**: `req_stats=["mean", "std", "minimum"]`

- **skip_if_exists** (`bool`, optional):  
  If `True`, skips downloading files that already exist in the folder. This is useful for resuming processes without duplicating work.
  - **Example**: `skip_if_exists=True`

- **split_by_year** (`bool`, optional):  
  If `True`, splits data processing by year during retrieval, which can help manage large datasets.
  - **Example**: `split_by_year=True`

- **output_gpkg** (`bool`, optional):  
  If `True`, outputs the zonal statistics as a GeoPackage (GPKG) file, a format for storing vector data with attribute tables, not reccomended for large requests.
  - **Example**: `output_gpkg=True`

In [None]:
### Define Inputs for wapor_zonal
# Adjust the following parameters based on your own data paths and choices.
variables = ["L1-NPP-D", "L1-AETI-D"]
folder = r"path_to_your_folder"
period = ["2019-01-01", "2019-12-31"]
region1 = r"path_to_your_polygon_file"

# Run wapor_zonal function
df = wapor_zonal(
    target_polygons=region1,
    id_column="fid",
    variables=variables,
    period=period,
    req_stats=["mean"],
    folder=folder,
    skip_if_exists=True,
    split_by_year=True,
)

# The `df` variable will contain the output DataFrame with calculated zonal statistics.
df.head()  # Display first few rows of the DataFrame
