Fetching of WikiSRAT data
===

In this notebook, we fetch WikiSRAT data for analysis and visualization by future notebooks.

Run within a conda environment specified in the included `environment.yml` file. 

Create the environment with either the Import button on Anaconda Navigator's Environments tab, or this Conda command in your terminal or console, replacing `path/environment.yml` with the full file pathway to the environment.yml file in the local cloned repository.

```bash
conda env create --file path/environment.yml
```

To update your environment, either use Anaconda Navigator, or run the following command:

```bash
conda env update --file path/environment.yml --prune
```

or

```bash
conda env create --file path/environment.yml --force
```

# Setup
* Import packages
* Load data from parquet files

In [1]:
# packages for data requests
import requests
import pandas as pd
from requests.auth import HTTPBasicAuth
import json
import os
import psycopg2
from pathlib import Path
import numpy as np


# geo packages
import geopandas as gpd

In [None]:
print("Geopandas: ", gpd.__version__)
# print("spatialpandas: ", spd.__version__)
# print("datashader: ", ds.__version__)
# print("pygeos: ", pygeos.__version__)

Geopandas:  0.9.0


## Read Parquet Files
- https://geopandas.readthedocs.io/en/latest/docs/reference/api/geopandas.read_parquet.html

In [3]:
# Find current working directory
Path.cwd()

WindowsPath('C:/Users/sjordan/Documents/GitHub/pollution-assessment/stage1')

If you get an error, make sure you've navigated to the `stage2` folder. 

In [4]:
# Use relative path - will work for anybody in this directory / cloning the github
data_folder    = Path('data/')

Info to help parse table names below:
* `base_` indicates base case
* `rest_` indicates with restoration
* `prot_` indicates protection projects
* `catch` indicates catchment-level data
* `reach` indicates reach data

**Clusters** are geographic units. There are 8 included in the DRB: Poconos-Kittaninny, Upper Lehigh,  New Jersey Highlands, Middle Schuylkill, Schuylkill Highlands, Upstream Suburban Philadelphia, Brandywine-Christina, Kirkwood-Cohansey Aquifer. These priority locations include parts of pristine headwaters and working forests of the upper watershed, farmlands, suburbs, and industrial and urban centers downstream, and the coastal plain where the river and emerging groundwater empties into either the Delaware Bay or the Atlantic Coast.

**Focus areas** are smaller geographic units within clusters. 

In [5]:
%%time
# read data from parquet files
base_catch_gdf = gpd.read_parquet(data_folder /'base_df_catch.parquet')
base_reach_gdf = gpd.read_parquet(data_folder /'base_df_reach.parquet')

rest_catch_gdf = gpd.read_parquet(data_folder /'rest_df_catch.parquet')
rest_reach_gdf = gpd.read_parquet(data_folder /'rest_df_reach.parquet')

point_src_gdf = gpd.read_parquet(data_folder /'point_source_df.parquet')

proj_prot_gdf = gpd.read_parquet(data_folder /'prot_proj_df.parquet')
proj_rest_gdf = gpd.read_parquet(data_folder /'rest_proj_df.parquet')

cluster_gdf = gpd.read_parquet(data_folder /'cluster_df.parquet')   

mmw_huc12_loads_df = pd.read_parquet(data_folder /'mmw_huc12_loads_df.parquet')

Wall time: 2.2 s


In [6]:
focusarea_gdf = gpd.read_parquet(data_folder /'fa_phase2_df.parquet')
focusarea_gdf.cluster = focusarea_gdf.cluster.replace('Kirkwood Cohansey Aquifer', 'Kirkwood - Cohansey Aquifer') # update name for consistency with other files 
focusarea_gdf.set_index('name', inplace=True)

Follow this notebook with WikiSRAT_Analysis.ipynb for analysis of fetched data. 