<a href="https://colab.research.google.com/github/FleaBusyBeeBergs/dtsa5741/blob/main/Assignment%203%20_%20Stearmflow%20Data%20via%20NWIS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Assignment 3: Access Streamflow Data via NWIS

In this lab, you will use the library `dataretriveal.nwis` to access streamflow data. From the github repository, `dataretrieval` was created to simplify the process of loading hydrologic data. It is designed to retrieve the major data types of U.S. Geological Survey (USGS) hydrology data that are available on the Web, as well as data from the Water Quality Portal (WQP), which currently houses water quality data from the Environmental Protection Agency (EPA), U.S. Department of Agriculture (USDA), and USGS. Direct USGS data is obtained from a service called the National Water Information System (NWIS).

For this lab, you will need to install `dataretriveal`. For more information, review the [github repository](https://github.com/DOI-USGS/dataretrieval-python).

In [7]:
!pip install dataretrieval -q

In [8]:
import dataretrieval.nwis as nwis
import pandas as pd

# nwis services avail:
* site info = 'site'
* instantaneous values = 'iv'
* daily values = 'dv'
* statistics = 'stat'
* discharge peaks = 'peaks'
* discharge measurements = 'measurements'
* water quality samples = 'qwdata'

example query:
df = nwis.get_record(site, service, start = 'yyyy-mm-dd', end = 'yyyy-mm-dd' )

In [14]:
# site info, drought well near pueblo, se colorado
site_a = '382323104200701'
pueblo = nwis.get_record(sites = site_a, service = 'site')

pueblo

Unnamed: 0,agency_cd,site_no,station_nm,site_tp_cd,lat_va,long_va,dec_lat_va,dec_long_va,coord_meth_cd,coord_acy_cd,...,reliability_cd,gw_file_cd,nat_aqfr_cd,aqfr_cd,aqfr_type_cd,well_depth_va,hole_depth_va,depth_src_cd,project_no,geometry
0,USGS,382323104200701,"SC01906221AAA DROUGHT WELL NEAR PUEBLO, CO",GW,382322.82,1042008.94,38.389672,-104.335817,M,S,...,C,YYNYNYNN,N100ALLUVL,112TERC,U,90.0,90,D,858200240,POINT (-104.33582 38.38967)


In [16]:
pueblo_dv = nwis.get_record(sites = site_a,
                            service = 'dv',
                            start = '2024-03-01',
                            end = '2024-03-31')

Unnamed: 0_level_0,site_no,72019_Mean,72019_Mean_cd
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-03-01 00:00:00+00:00,382323104200701,21.86,A
2024-03-02 00:00:00+00:00,382323104200701,21.86,A
2024-03-03 00:00:00+00:00,382323104200701,21.86,A


In [20]:
pueblo_dv.reset_index(inplace = True)
pueblo_dv.head()

Unnamed: 0,datetime,site_no,72019_Mean,72019_Mean_cd
0,2024-03-01 00:00:00+00:00,382323104200701,21.86,A
1,2024-03-02 00:00:00+00:00,382323104200701,21.86,A
2,2024-03-03 00:00:00+00:00,382323104200701,21.86,A
3,2024-03-04 00:00:00+00:00,382323104200701,21.86,A
4,2024-03-05 00:00:00+00:00,382323104200701,21.86,A


In [23]:
# slice, rename cols
pueblo_dv = pueblo_dv[['datetime', '72019_Mean']]
pueblo_dv.columns = ['datetime', 'gwl']
pueblo_dv.head()

Unnamed: 0,datetime,gwl
0,2024-03-01 00:00:00+00:00,21.86
1,2024-03-02 00:00:00+00:00,21.86
2,2024-03-03 00:00:00+00:00,21.86
3,2024-03-04 00:00:00+00:00,21.86
4,2024-03-05 00:00:00+00:00,21.86


For this lab, you need to retrive streamflow data from March 1 2024 to March 31 2024 and calculate the mean of the streamflow data. Use the starter code to calculate your mean value.



In [27]:
# Define site number and date range
site_no = '07106500'
start_date = '2024-03-01'
end_date = '2024-03-31'
serv_type = 'dv'

# Retrieve the data. Note you will need to add the get_record() parameters.
data = nwis.get_record(site = site_no,
                       service = serv_type,
                       start = start_date,
                       end = end_date)

data.describe()
# The data is already in a DataFrame format
# The parameter code for streamflow is '00060' and for the daily value we will be using '00060_Mean' column.
# Find the mean of the streamflow data for March 2024 i.e. find the mean of the '00060_Mean' column

#mean_streamflow = data.describe().loc[1, 3]

#print(mean_streamflow)

Unnamed: 0,00010_Maximum,00010_Minimum,00010_Mean,00060_Mean,00095_Maximum,00095_Minimum,00095_Mean,80154_Mean,80155_Mean
count,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0
mean,12.867742,4.429032,8.345161,212.903226,1152.096774,1090.129032,1119.290323,638.709677,583.451613
std,2.908652,1.852241,2.071206,90.091566,64.85592,79.82136,65.402953,771.871068,1069.765701
min,6.7,1.0,4.3,116.0,985.0,851.0,969.0,100.0,46.0
25%,11.25,2.8,6.95,136.0,1120.0,1070.0,1090.0,200.0,85.0
50%,13.4,4.2,8.2,200.0,1150.0,1100.0,1120.0,300.0,190.0
75%,14.9,6.0,9.8,243.5,1175.0,1120.0,1155.0,550.0,370.0
max,17.1,7.5,11.7,453.0,1280.0,1240.0,1250.0,3000.0,4400.0


Now that you have calculated the mean, use the following quiz to check your answer.