## <span style="color:green"><h1><center>Calculation of Return Period Flow from Peak Flow Data </center></h1></span>
<center>Prepared by <br>
    <b>Jibin Joseph and Venkatesh Merwade</b><br> 
Lyles School of Civil Engineering, Purdue University<br>
joseph57@purdue.edu, vmerwade@purdue.edu<br>
<b><br>
    FAIR Science in Water Resources</b><br></center>

## <span style="color:green">Introduction</span>
This program downloads peak flow data from USGS Surface Data Portal for a USER_INPUT station and calculates the flow 
corresponding to (different) return period

This code is written in Python 3 format

Revision No: 06

Last Revised : 2023-08-05

## <span style="color:green">Import the packages/modules required for this exercise</span>

<p> We need the following packages: urllib.parse, urllib.request, math, scipy.stats, numpy (np), gamma from scipy.stats, and invgamma from scipy.stats. The paranthesis contains the commonly used short forms for these libraries.</p>

In [None]:
## CELL-01

## Import the required Modules/Packages for obtaining the data from portal
import urllib.parse
import urllib.request

## Import the required Modules/Packages for calculating return period flow (cipy 
import math
import matplotlib.pyplot as plt

import numpy as np
#import csv

import os
from scipy.stats import gamma
from scipy.stats import invgamma
import scipy.stats

## <span style="color:green">Definition of Function for retrieval of Peak Flow Data</span> 

<p style='text-align: justify;'>Let us define the first definition block in Python to collect the data from the USGS web link using urllib package. This definition block will be later invoked in the code.</p>
<ul>
<li>Step 1: <span style="color:red">Build the url using the station code.</span></li>
<li>Step 2: <span style="color:red">Access the data using the url and gather the data (date, flow data, station name</span></li>
<li>Step 3: <span style="color:red">Decode the data and extract only the required data</span></li>
<li>Step 4: <span style="color:red">Return the flow data and station name</span></li>

In [None]:
## CELL-02

## Define a function for obtaining the peak flow data from USGS NWIS Web Interface
## Input (Arguments) - station number and folder name
## Output (Return) - peak streamflow and station name
def GetPeakFlowData_func(station_number,FolderName):
    ## Building URLs
    var1 = {'site_no': station_number}
    part1 = 'https://nwis.waterdata.usgs.gov/nwis/peak?'
    part2 = '&agency_cd=USGS&format=rdb'
    link = (part1 + urllib.parse.urlencode(var1) + part2)
    print("The USGS Link is: \n",link)
    
    ## Opening the link & retrieving data
    response = urllib.request.urlopen(link)
    page_data = response.read()
    
    ## File name assigning & storing the raw data as text file
    ## w - Open a file for writing. Creates a new file if it does not exist or truncates the file if it exists.
    ## b - Opens in binary mode.
    with open(FolderName+'/Data_' + station_number + '_raw'  + '.txt', 'wb') as f1:
        f1.write(page_data)
    ## Usually we need to use f1.close() if we open a file using "open" statement
    ## But using "with open", the file is closed when the block inside the the with statement is exited 
    ## It is done internally.
    
    ## Check if the file has been automatically closed.
    #f.closed
     
    
    print("\nDownload complete for USGS Station Number: ", station_number)
    
    ## Converts html from bytes class to str class
    html = page_data.decode()
    ## Splits the string by \n and converts list
    html2 = html.split('\r\n')
    
    ## To extract the station name for returning from the function call
    line_no=0
    for line_no in range(len(html2)):
        ## Check if first six (use 0:7) characters is "#  USGS",
        if html2[line_no][0:7]=="#  USGS":
            station_name=html2[line_no][3:]
            break
        line_no+=1
    
    ## Define an empty string and list
    reqd_data = '' ## for storing data in the folder
    reqd_flow_list=[] ## to return from the function
    
    for line in html2[74:]:
        ## Splits each line to col by tab separator
        cols = line.split('\t')
        if len(cols) == 1:
            continue
        ## Joins only date and peakflow
        ## cols[2] corresponds to date of peak streamflow (format YYYY-MM-DD)
        ## cols[4] corresponds to annual peak streamflow value in cfs
        newline = ','.join([cols[2],(cols[4])])
        reqd_data += newline + '\n'
        
        ## Append the flow value to the list and return from the function call
        reqd_flow_list.append((cols[4]))

    
    ## Converts reqd_data from str class to bytes class
    reqd_data = reqd_data.encode()
    
    ## Saves the date and peakflow into a new file
    with open(FolderName+'/Data_' + station_number + '_reqd'  + '.txt', 'wb') as f2:
        f2.write(reqd_data)
        
    ## Check if the file has been automatically closed.
    #f.closed

    print("\nRaw Data and Processed Data is stored in the folder for station: ", station_name)
    
    ## Returns the peak streamflow data as a list (for calculation of the return period in the turn-in part) 
    ## and the station name (for using in plots)
    return (reqd_flow_list,station_name)

## <span style="color:green">MAIN CODE</span> 
Now, the user has to input the station number of the desired USGS Station. It executes the definition block and stores the data in the folder.

In [None]:


## CELL-03

## WRITE THE CODE BELOW
station_number=input("Enter USGS Station Number of the Required Station (USGS Station Number/site_no) \t")

print('\t')
FolderName="./Results"

## Make folder to save the results
if os.path.exists(FolderName) == False:
    os.mkdir(FolderName)

peakflow_list_wb,station_name=GetPeakFlowData_func(station_number,FolderName)
print("\nThe station name is:", station_name,"\n")

## <span style="color:green">Time Series Plot</span>

In [None]:
## CELL-04

## To create a time series plot of peak flow data by opening the saved file
import pandas as pd
## Assigning column names
colnames=['Date','PeakFlow']
df = pd.read_csv(FolderName+'/Data_' + station_number + '_reqd'  + '.txt',
                 header=None,names=colnames,parse_dates=[0])

## Setting the index of dataframe as the Date column
df=df.set_index(['Date'])

plt.plot(df['PeakFlow'], 
         marker='*',
         linestyle='dashed',#$ \mathrm {{{}}} $".format(symbol), 
         color = 'r',
         markersize=8)
plt.ylabel("Discharge (in cfs)")
plt.xlabel("Time (in Years)")
plt.title("Time Series Plot of Peak Flows\n("+station_name+")")

## <span style="color:green">Years for Analysis</span> 

Now, the user has to input enter the four values for dates. This should be properly entered otherwise you will get an error message "Error in length of data and check whether it is continuous".

In [None]:
## CELL-05

## Enter the four years for carrying out the analysis
## Input data & analysis years

#data_start_year=int(input("Enter the starting year of DATA PERIOD (excluding initial break period):"))
print('\t')
#data_end_year=int(input("Enter the ending year of DATA PERIOD:"))
print('\t')
#analysis_start_year=int(input("Enter the starting year of ANALYSIS PERIOD:"))
print('\t')
#analysis_end_year=int(input("Enter the ending year of ANALYSIS PERIOD:"))
print('\t')

## <span style="color:green">Calculation of Return Period</span> 
Next, we have to write the code for performing the calculations of return period (using LP3 method) flow using moving average method.

In [None]:
## CELL-06

## WRITE YOUR CODE HERE


In [None]:
type(flow_matrix)