## Working with USGS data

*Olm* contains a variety of functions for retrieving discharge and water quality data from NWIS, the USGS water database. These functions are contained within the [`olm.USGS`](https://olm.readthedocs.io/en/master/olm.USGS.html) package.

In [None]:
#Check whether we are running on Colab or locally.
try:
    import google.colab
    IN_COLAB = True
    base_path = 'https://raw.githubusercontent.com/CovingtonResearchGroup/olm-examples/main/'
except:
    IN_COLAB = False
    base_path = './'
print('Base working path for data files is',base_path)


In [None]:
#If olm isn't already installed (or if you're running in Colab), then run this cell of code.
!pip install olm-karst

In [None]:
#We will run in pylab mode, to import plotting functions.
%pylab inline

#### Obtaining metadata about a USGS site

In [None]:
from olm.USGS.DataRetrieval import GetSiteData
#Use a USGS site number to retrieve the data
site_no = 'USGS-07056000' #Buffalo River near St Joe, AR

StJoe_meta = GetSiteData(site_no)
StJoe_meta

#### Obtaining daily discharge values

You can obtain a daily average discharge value for a given date using [`olm.USGS.DataRetrieval.GetDailyDischarge()`](https://olm.readthedocs.io/en/master/olm.USGS.DataRetrieval.GetDailyDischarge.html#olm.USGS.DataRetrieval.GetDailyDischarge)

In [None]:
from olm.USGS.DataRetrieval import GetDailyDischarge
Q = GetDailyDischarge(site_no, '2021-01-01')
#Discharge and some additional metadata are returned in a dictionary
print(Q)

In [None]:
print('Mean discharge on January 1, 2021 was',Q['discharge'], 'cfs')

You can obtain a mean daily discharge record for a longer period using  [`olm.USGS.DataRetrieval.GetDailyDischargeRecord()`](https://olm.readthedocs.io/en/master/olm.USGS.DataRetrieval.GetDailyDischargeRecord.html#olm.USGS.DataRetrieval.GetDailyDischargeRecord).

In [None]:
from olm.USGS.DataRetrieval import GetDailyDischargeRecord
StJoe_Q = GetDailyDischargeRecord(site_no, '2010-01-01', '2020-12-31')
#Discharge and some additional metadata are returned in a dictionary
StJoe_Q.plot(logy=True)
ylabel('Stream flow (cfs)')

### Automatic queries of water quality data

The most powerful functionality of the `olm.USGS` package lies in its ability to query, download, and process large sets of water quality data from a list of USGS sites. To set up one of these queries, you need to create or modify two files.
1. Create a text file that contains a list of the site numbers of interest, each on its own line.
2. Modify the runWQXtoPandas Excel file (called a start file) to provide the desired chemical parameters to query and a variety of other settings that control the query and how the data are stored (open the [provided Excel file](https://raw.githubusercontent.com/CovingtonResearchGroup/olm-examples/main/USGS/runWQXtoPandas-Buffalo-Start-File.xls) to see an example).

In [None]:
#If we are running in Colab, we need to create local files that contain 
#the site number list and start file.
#We will download these from Github.
if IN_COLAB:
    import requests
    %mkdir USGS
    res = requests.get(base_path + 'USGS/Buffalo.txt')
    with open('USGS/Buffalo.txt', 'w') as f:
        f.write(res.text)
    res = requests.get(base_path + 'USGS/runWQXtoPandas-Buffalo-Start-File.xls')
    with open('USGS/runWQXtoPandas-Buffalo-Start-File.xls', 'wb') as f:
      f.write(res.content)

In [None]:
from olm.USGS.WQXtoPandas import runWQXtoPandas

print('*********************************')
print('**This will take a while to run**')
print('*********************************')

#This function is run on the start file
runWQXtoPandas('USGS/runWQXtoPandas-Buffalo-Start-File.xls')


### Analyzing USGS data retrieved via *Olm*

To load the data from all sites in a query, you provide the site data directory to [`olm.USGS.loadWaterQualityData.loadSiteListData()`](https://olm.readthedocs.io/en/master/olm.USGS.loadWaterQualityData.loadSiteListData.html#olm.USGS.loadWaterQualityData.loadSiteListData)

In [None]:
from olm.USGS.loadWaterQualityData import loadSiteListData

sitesDict = loadSiteListData(processedSitesDir='Buffalo/')

In [None]:
#Data are accessed for each site using a dictionary key that is the site name
sitesDict.keys()

In [None]:
#For each site, the site number will retrieve a DataFrame from sitesDict with  multiindexed columns 
mi = sitesDict['USGS-07056000'].columns

print('Chemical parameters are:')
print('------------------------')
for prm in mi.levels[1]:
  print(prm)
print('')
print('For each chemical parameter the following are stored:')
print('-----------------------------------------------------')
for prm in mi.levels[0]:
  print(prm)


In [None]:
#Normally, we will just want the data (get it, as below). 
#The other quality and metadata may also be useful and can 
#be obtained with the corresponding column name.
StJoe = sitesDict['USGS-07056000']['data']
Boxley = sitesDict['USGS-07055646']['data']

In [None]:
#Each site has a DataFrame containing chemical parameters
Boxley.head()

#### Making a basic plot using the data retrieved

We will examine the relationship between Ca and discharge at the two sites.

In [None]:
loglog(StJoe['Stream flow, instantaneous'], StJoe['Calcium'], '.')
loglog(Boxley['Stream flow, instantaneous'], Boxley['Calcium'], '.')
ylabel('[Ca] (mg/L)')
xlabel('Stream flow (cfs)')
legend(['St Joe','Boxley'])

#### Basic carbonate calculations using the USGS data

In [None]:
from olm.calcite import solutionFrompHCaRelaxed, concCaEqFromSolution
from olm.general import molL_to_mgL

Boxley_sols = solutionFrompHCaRelaxed(Boxley['Calcium'], Boxley['pH'], T_C=Boxley['Temperature, water'])
#Calculate saturation concentration of Ca and convert to mg/L
Boxley_CaEq = molL_to_mgL(concCaEqFromSolution(Boxley_sols), 'Ca')
Boxley_sat = Boxley['Calcium']/Boxley_CaEq

StJoe_sols = solutionFrompHCaRelaxed(StJoe['Calcium'], StJoe['pH'], T_C=StJoe['Temperature, water'])
#Calculate saturation concentration of Ca and convert to mg/L
StJoe_CaEq = molL_to_mgL(concCaEqFromSolution(StJoe_sols), 'Ca')
StJoe_sat = StJoe['Calcium']/StJoe_CaEq

In [None]:
semilogx(Boxley['Stream flow, instantaneous'], Boxley_sat, '.')
semilogx(StJoe['Stream flow, instantaneous'], StJoe_sat, '.')
legend(['Boxley', 'St Joe'], loc='upper left')
xlabel('Stream flow (cfs)')
ylabel(r'$\rm{[Ca]/[Ca]_{eq}}$');