## Ongoing Notes:
Key Problems:
1. Which dataset or datasets even contain the information we want
2. How do we determine when measurements were taken in the WQP dataset
3. How do we determine which param_codes to use (there are ~10,000!) via NWIS
4. How do we match streamflow gauges to their closest water quality gauge if one exists (lat/long data is available)
5. How do we work with the extremely reduced number of sites that have these niche water quality metrics<br>
    -Seems very unlikely that most if any sites will have all the ones listed in the proposal in one place

...in other words how do we spatially and temporarily match streamflow data with water quality data
<br><br>

TODO:
1. Determine relevant gauge counts for at least some of these metrics, per state
2. Determine type of data returned, time-series, or otherwise
<br><br>

Comments:<br>
How granular will the water quality data be ultimately? Is building a robust water quality profile for a few gauges in locations where EAR is viable the focus, or is having a general idea of an entire watershed regions water quality the focus?

## Initial Water Quality Data Exploration

In [16]:
#Python3.10
import os
import pandas as pd 
import numpy as np
import seaborn as sns
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import contextily as cx
from importlib import reload
from typing import IO
from IPython.display import display
from collections import Counter
import warnings

from datetime import datetime, timedelta

# USGS Data retreival tool
from dataretrieval import nwis, utils, codes

# Custom modules are imported in multiple locations to faciliate easy reloading when edits are made to their respective files
import Src.classes as cl
import Src.func as fn
reload(cl)
reload(fn)

# TODO: Look into the warning that this is disabling. It doesn't appear to be significant for the purposes of this code but should be understood
pd.options.mode.chained_assignment = None

#pd.options.mode.chained_assignment = 'warn'

In [18]:
#'01578310'
test_aquifer = 'Central Valley aquifer system'

df_sites = pd.read_excel('Prelim_Data/_National_Metrics/National_Metrics_30_90.xlsx', sheet_name='site_metrics', dtype=fn.DATASET_DTYPES)
df_sites = df_sites[df_sites['within_aq'] == test_aquifer]
df_sites = df_sites[df_sites['valid'] == True]
print(len(df_sites))

20


In [19]:
site_list = df_sites['site_no'].to_list()
counter = Counter()

for site in site_list:
    try:
        print(f'Trying site: {site}')
        
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            df, metadata = nwis.get_qwdata(sites=site, start='1990-10-01', end='2020-09-30')            
            counter.update(df.columns)
            
    except Exception as e:
        print(f'ERROR: {site} - {e}')   
        
print(counter) 

Trying site: 11251000
Trying site: 11253310
ERROR: 11253310 - No sites/data found using the selection criteria specified in url: https://nwis.waterdata.usgs.gov/nwis/qwdata?site_no=11253310&begin_date=1990-10-01&end_date=2020-09-30&qw_sample_wide=qw_sample_wide&agency_cd=USGS&format=rdb&pm_cd_compare=Greater+than&inventory_output=0&rdb_inventory_output=file&TZoutput=0&rdb_qw_attributes=expanded&date_format=YYYY-MM-DD&rdb_compression=value&submitted_form=brief_list
Trying site: 11261100
Trying site: 11262900
Trying site: 11274000
Trying site: 11274500
Trying site: 11274630
Trying site: 11290000
Trying site: 11303000
Trying site: 11303500
Trying site: 11313405
Trying site: 11379500
Trying site: 11389500
Trying site: 11390500
Trying site: 11421000
Trying site: 11424000
Trying site: 11425500
Trying site: 11446500
Trying site: 11447650
Trying site: 11452500
Counter({'agency_cd': 19, 'site_no': 19, 'sample_dt': 19, 'sample_tm': 19, 'sample_end_dt': 19, 'sample_end_tm': 19, 'sample_start_time

In [20]:
print(counter)

Counter({'agency_cd': 19, 'site_no': 19, 'sample_dt': 19, 'sample_tm': 19, 'sample_end_dt': 19, 'sample_end_tm': 19, 'sample_start_time_datum_cd': 19, 'tm_datum_rlbty_cd': 19, 'coll_ent_cd': 19, 'medium_cd': 19, 'project_cd': 19, 'aqfr_cd': 19, 'tu_id': 19, 'body_part_id': 19, 'hyd_cond_cd': 19, 'samp_type_cd': 19, 'hyd_event_cd': 19, 'sample_lab_cm_txt': 19, 'p00061': 17, 'p30209': 17, 'p50280': 17, 'p00010': 16, 'p00095': 16, 'p80154': 16, 'p80155': 16, 'p00028': 16, 'p71999': 15, 'p82398': 14, 'p00025': 14, 'p00191': 14, 'p00300': 14, 'p00301': 14, 'p00400': 14, 'p00600': 14, 'p00602': 14, 'p00605': 14, 'p00607': 14, 'p00608': 14, 'p00613': 14, 'p00618': 14, 'p00623': 14, 'p00625': 14, 'p00631': 14, 'p00660': 14, 'p00665': 14, 'p00666': 14, 'p00671': 14, 'p71846': 14, 'p71851': 14, 'p71856': 14, 'p00405': 13, 'p70331': 12, 'p00681': 12, 'p00065': 11, 'p00689': 11, 'p30207': 11, 'p49270': 11, 'p49271': 11, 'p49272': 11, 'p00004': 10, 'p99111': 10, 'p00063': 9, 'p84164': 9, 'p00029': 