## <span style="color:green">Introduction</span>
This program downloads annual peak streamflow data from USGS Surface Data Portal for a user input USGS gage station. Further, it stores as a text file (.txt) in an assigned location in Jupyter Notebook directory.

This code is written in Python 3 format.

Revision No: 09

Last Revised : 2022-03-30

## <span style="color:green">Import the packages/ modules required for this exercise</span>

We need a few packages for this module: urllib.parse, urllib.request and os

For plotting, we also need pandas and pyplot (from matplotlib)

In [None]:
## Import the required Modules/Packages for obtaining the data from USGS NWIS Web Interface

## WRITE YOUR CODE BELOW


## <span style="color:green">Function for data access and data storage</span>
Let us write a few lines to create a function that takes the station number as input and stores the
results in the user-specified folder location. There are two arguments for this
function: USGS Station Number and the name of the folder where the output has to
be stored. Also, this function includes code for creating a link to NWIS Web Interface,
accessing the data using the link, and storing the raw data at the user-defined
location. The function also returns the peak streamflow as a list and the station name.

In [None]:
## Define a function for obtaining the peak flow data from USGS NWIS Web Interface
## Input (Arguments) - station number and folder name
## Output (Return) - peak streamflow and station name
def GetPeakFlowData_func(station_number,FolderName):
    ## Building URLs
    var1 = {'site_no': station_number}
    part1 = 'https://nwis.waterdata.usgs.gov/nwis/peak?'
    part2 = '&agency_cd=USGS&format=rdb'
    link = (part1 + urllib.parse.urlencode(var1) + part2)
    print("The USGS Link is: \n",link)
    
    ## Opening the link & retrieving data
    response = urllib.request.urlopen(link)
    page_data = response.read()
    
    ## File name assigning & storing the raw data as text file
    ## w - Open a file for writing. Creates a new file if it does not exist or truncates the file if it exists.
    ## b - Opens in binary mode.
    with open(FolderName+'/Data_' + station_number + '_raw'  + '.txt', 'wb') as f1:
        f1.write(page_data)
    ## Usually we need to use f1.close() if we open a file using "open" statement
    ## But using "with open", the file is closed when the block inside the the with statement is exited 
    ## It is done internally.
    
    ## Check if the file has been automatically closed.
    #f.closed
     
    
    print("\nDownload complete for USGS Station Number: ", station_number)
    
    ## Converts html from bytes class to str class
    html = page_data.decode()
    ## Splits the string by \n and converts list
    html2 = html.split('\r\n')
    
    ## To extract the station name for returning from the function call
    line_no=0
    for line_no in range(len(html2)):
        ## Check if first six (use 0:7) characters is "#  USGS",
        if html2[line_no][0:7]=="#  USGS":
            station_name=html2[line_no][3:]
            break
        line_no+=1
    
    ## Define an empty string and list
    reqd_data = '' ## for storing data in the folder
    reqd_flow_list=[] ## to return from the function
    
    for line in html2[74:]:
        ## Splits each line to col by tab separator
        cols = line.split('\t')
        if len(cols) == 1:
            continue
        ## Joins only date and peakflow
        ## cols[2] corresponds to date of peak streamflow (format YYYY-MM-DD)
        ## cols[4] corresponds to annual peak streamflow value in cfs
        newline = ','.join([cols[2],(cols[4])])
        reqd_data += newline + '\n'
        
        ## Append the flow value to the list and return from the function call
        reqd_flow_list.append((cols[4]))

    
    ## Converts reqd_data from str class to bytes class
    reqd_data = reqd_data.encode()
    
    ## Saves the date and peakflow into a new file
    with open(FolderName+'/Data_' + station_number + '_reqd'  + '.txt', 'wb') as f2:
        f2.write(reqd_data)
        
    ## Check if the file has been automatically closed.
    #f.closed

    print("\nRaw Data and Processed Data is stored in the folder for station: ", station_name)
    
    ## Returns the peak streamflow data as a list (for calculation of the return period in the turn-in part) and the station name 
    ## (for using in plots)
    return (reqd_flow_list,station_name)

## <span style="color:green">Main Code</span>

This part includes a USGS station number input from the user and also the folder location where the data
will be stored. The above two parameters (station number and folder location) are then passed to the definition block. (Note: make sure you use the correct folder name/structure as per your directory structure). The function returns the peak streamflow data (as a list) and station name.

In [None]:
## Main Code
## WRITE YOUR CODE BELOW


## <span style="color:green">Time Series Plot</span>

In [None]:
## To create a time series plot of peak flow data by opening the saved file
import pandas as pd
## Assigning column names
colnames=['Date','PeakFlow']
df = pd.read_csv(FolderName+'/Data_' + station_number + '_reqd'  + '.txt',
                 header=None,names=colnames,parse_dates=[0])

## Setting the index of dataframe as the Date column
df=df.set_index(['Date'])

## Plotting
## WRITE YOUR CODE BELOW