<h1>Code Revision 9/11/17</h1>

I rewrote all of Ed's code up to the point where he starts using HTML. So tomorrow I will start working on the HTML. After all of the raw code is displayed below, I went through the code piece by piece to explain how it worked and to test it out by executing it, from the top to the bottom, in notebook cells. The 'in' chunks show input code underneath each 'in' chunk I have printed the main data structures created by that chunk of code.

I think this will be helpful as a reference when I move on to the HTML portion of Ed's code, and if I am unfortunately not able to complete the entire task, maybe this will serve to get someone who might be able to help me up to speed quickly. It also might be a helpful reference if I try and just create an equivalent file in R.


<h2>Entire Raw Code File  <a name="rawcode"></a></h2>

```python
#Importing the packages pandas and numpy
import pandas as pd
import numpy as np


def TMDL_list_of_tuples(file,columns = None):
    '''argument:
            file: a .csv file
            columns: (optional; list of subset of columns to use)
       returns:
             a list of tuples: each tuple in the list represents
                               a row in the csv. If you specify a 
                               list of columns to keep, each tuple 
                               will contain only the values in those 
                               columns.
                                     
    '''
    dataframe = pd.read_csv(file)
    if columns is not None:
        dataframe = dataframe.loc[:,columns]
    return([tuple(x) for x in dataframe.values])

def TMDL_df(file, columns = None):
    '''
    arguments:
        file: the file containing the data in the form of a csv
        columns: list of a subset of columns to be imported
    returns: 
        dataframe of the subset of columns from the imported file
        
    '''
    dataframe = pd.read_csv(file)
    if columns is not None:
        dataframe = dataframe.loc[:,columns]
    return(dataframe)
    
def TMDL_df_to_list_of_tuples(dataframe):
     return([tuple(x) for x in dataframe.values])
```

```python

#Create of dataframe from Mun_Lakes file
Lakes_DF = TMDL_df('Mun_Lakes_NoDups_NewLink_20160223.csv')

#Creation of list of tuples from Mun_Lakes file 
LakeTmdlList = TMDL_list_of_tuples('Mun_Lakes_NoDups_NewLink_20160223.csv')

#Dataframe from Mun_Shellfish file
ShellFish_DF = TMDL_df('Mun_Shellfish_NoDups_NewLink_20160223.csv')

##Creation of list of tuples from Mun_Lakes file
ShellTmdlList  = TMDL_list_of_tuples("Mun_Shellfish_NoDups_NewLink_20160223.csv")

Stream_DF = TMDL_df('Mun_Streams_NoDups_NewLink_20160223.csv')
StreamTmdlList = TMDL_list_of_tuples('Mun_Streams_NoDups_NewLink_20160223.csv')

#Creation of unique list MunCodes

#creation of dataframe containing the single column 'Muncode'
Munis = TMDL_df('Muni_HUC14_MunCode_20160511.csv',columns = ['Muncode'])

#Dropping duplicates
Munis = Munis.drop_duplicates()

#Turning munis into a sorted list of muncode values
Munis = list(np.sort(Munis['Muncode'].values))

#Creation of MuniLbls DataFrame. 
#Each Row contains a Muncode, MunName, and the County. 
#Used the optional columns argument, so that the HUC14 columnwas not.
MuniLbls_DF = TMDL_df('Muni_HUC14_MunCode_20160511.csv',
                      columns = ['Muncode','MunName','County'])

python
#The County names were in all uppercase. 
#The following line converts the County Names to Title Case
MuniLbls_DF.County = MuniLbls_DF['County'].str.title()

#Removal of Duplicates
MuniLbls_DF = MuniLbls_DF.drop_duplicates()

#Sorting the dataframe by muncode. 
#  '.reset_index(drop = True) is necessary to resort the index numbers.
MuniLbls_DF = MuniLbls_DF.sort_values(by = 'Muncode')

#Resets the index so it start from 0
MuniLbls_DF = MuniLbls_DF.reset_index(drop=True)

#Creation of MuniLbls list of tuples from MuniLbls_DF
MuniLbls = TMDL_df_to_list_of_tuples(MuniLbls_DF)
```

<h2>Data Structures Used <a name="data-structures"></a></h2>
- **List of Tuples**: A tuple is an immutable list of values. They begin with '(' and end with ')'. The values in a tuple are seperated by commas. For examples, you could represent the point in a cartesian coordinate system at x = 3, y = 4 as the tuple (3,4). In Ed's code, he read in the text files as lists of tuples. So each row of a text file was turned into a tuple. A list of tuples is a list in which the elements of the lists are tuples. 

     For examples, if I have the type points (0,5) and (0,7), I could store them in a list of tuples as [(0,5), (0,7)]. (0,5) is the first element of the list. (0,7) is the second element of the list. 
    
    'csv' files are easier to work with in python than txt files, so I changed Ed's .txt files to csvs. I also added a header row (a row with column names). I created the function **TMDL_list_of_tuples** to import a csv file and create a list of tuples, with each row accounting for one tuple. It takes two arugments. 'file', which is the location of the file, and 'columns'. Columns is an optional argument. If you only want to use a subset of the columns, you can create a list of the columns you want to extract data from. If you don't include this argument, all the columns in the csv will be included in the list of tuples.
 
 
- **Pandas DataFrames**: Lists of tuples look like a mess, so I created the function **TMDL_df** to create a dataframe from a csv file instead of a list of tuples. There are two arguments to the function; 'file' is the location of the csv file you want to import and 'columns'. Columns is an optional argument. If you only want to use a subset of the colunms, you can create a list of the columns you want to extract. If you don't include this argument, all the columns in the csv will be included in the dataframe.

    I also created a function **TMDL_df_to_list_of_tuples** to convert dataframes already created from **TMDL_df** into lists of duples. The only argument that it takes is a dataframe.

<h2>Breaking Down the Code Piece by Piece</h2>

<h3>Functions Created to Make DataStructures From .csv Files</h3>

In [2]:
#Importing the packages pandas, numpy, and the function display from the Ipython.display module
import pandas as pd
import numpy as np
from IPython.display import display

def TMDL_list_of_tuples(file,columns = None):
'''arguments:
        file: a .csv file
        columns: (optional; list of subset of columns to use)
   returns:
        a list of tuples: each tuple in the list represents
        a row in the csv. If you specify a list of columns to keep,
        each tuple will contain only the values in those columns.
                                    
    '''
    dataframe = pd.read_csv(file)
    if columns is not None:
        dataframe = dataframe.loc[:,columns]
    return([tuple(x) for x in dataframe.values])

def TMDL_df(file, columns = None):
    '''
    arguments:
        file: the file containing the data in the form of a csv
        columns: list of a subset of columns to be imported
    returns: 
        dataframe of the subset of columns from the imported file
        
    '''
    dataframe = pd.read_csv(file)
    if columns is not None:
        dataframe = dataframe.loc[:,columns]
    return(dataframe)

def TMDL_df_to_list_of_tuples(dataframe):
     return([tuple(x) for x in dataframe.values])
    
    

### Creation of DataFrame and List of Tuples From Mun_Lakes File <a name="lakes"></a> 

In [2]:
#Create of dataframe from Mun_Lakes file
Lakes_DF = TMDL_df('Mun_Lakes_NoDups_NewLink_20160223.csv')

#Creation of list of tuples from Mun_Lakes file 
LakeTmdlList = TMDL_list_of_tuples('Mun_Lakes_NoDups_NewLink_20160223.csv')


print('FIRST FIVE ROWS OF LAKES_DF')
display(Lakes_DF.head())
print("\nFirst Five Tuples in LakeTmdlList\n")
print(LakeTmdlList[0:4])

FIRST FIVE ROWS OF LAKES_DF


Unnamed: 0,LakeMuncode,Lakeshed,LakeParameter,LakeTMDLTitle,TmdlYear,TmdlDoc_DEP
0,804,Iona Lake,Fecal Coliform,Total Maximum Daily Loads for Pathogens to Add...,2007,http://www.nj.gov/dep/wms/bears/docs/adopted_l...
1,805,Braddock Lake,Fecal Coliform,Total Maximum Daily Loads for Pathogens to Add...,2007,http://www.nj.gov/dep/wms/bears/docs/adopted_a...
2,805,Burnt Mill Pond,Total Phosphorus,Total Maximum Daily Loads for Phosphorus To Ad...,2003,http://www.nj.gov/dep/wms/bears/docs/Lower%20D...
3,805,Cushman Lake,Fecal Coliform,Total Maximum Daily Loads for Pathogens to Add...,2007,#http://www.nj.gov/dep/wms/bears/docs/adopted_...
4,805,Eastern Gate Lake,Fecal Coliform,Total Maximum Daily Loads for Pathogens to Add...,2007,http://www.nj.gov/dep/wms/bears/docs/adopted_l...



First Five Tuples in LakeTmdlList

[(804, 'Iona Lake', 'Fecal Coliform', 'Total Maximum Daily Loads for Pathogens to Address 17 Lakes in the Lower Delaware Water Region', 2007, 'http://www.nj.gov/dep/wms/bears/docs/adopted_lowerdelaware_fecal_lake.pdf'), (805, 'Braddock Lake', 'Fecal Coliform', 'Total Maximum Daily Loads for Pathogens to Address 18 Lakes in the Atlantic Coastal Water Region', 2007, 'http://www.nj.gov/dep/wms/bears/docs/adopted_atlantic_fecal_lake.pdf'), (805, 'Burnt Mill Pond', 'Total Phosphorus', 'Total Maximum Daily Loads for Phosphorus To Address 13 Eutrophic Lakes in the Lower Delaware Water Region', 2003, 'http://www.nj.gov/dep/wms/bears/docs/Lower%20Delaware%20Lakes.pdf'), (805, 'Cushman Lake', 'Fecal Coliform', 'Total Maximum Daily Loads for Pathogens to Address 18 Lakes in the Atlantic Coastal Water Region', 2007, '#http://www.nj.gov/dep/wms/bears/docs/adopted_atlantic_fecal_lake.pdf#')]


### Creation of DataFrame and List of Tuples From Mun_Shellfish File <a name="shellfish"></a> 

In [3]:
#Dataframe from Mun_Shellfish file
ShellFish_DF = TMDL_df('Mun_Shellfish_NoDups_NewLink_20160223.csv')

##Creation of list of tuples from Mun_Lakes file
ShellTmdlList  = TMDL_list_of_tuples("Mun_Shellfish_NoDups_NewLink_20160223.csv")



print('FIRST FIVE ROWS OF ShellFish_DF')
display(ShellFish_DF.head())
print("\nFirst Five Tuples in ShellTmdlList\n")
print(ShellTmdlList[0:4])


FIRST FIVE ROWS OF ShellFish_DF


Unnamed: 0,ShellMuncode,Shellfishgroup,ShellParameter,ShellDoc_DEP,ShellTmdlTitle
0,101,"absecon bay-a, absecon bay-c",Total coliform,http://www.nj.gov/dep/wms/bears/docs/Coastal_P...,Six Total Maximum Daily Loads for Total Colifo...
1,101,"absecon bay-a, absecon bay-c, absecon creek-a",Total coliform,http://www.nj.gov/dep/wms/bears/docs/Coastal_P...,Six Total Maximum Daily Loads for Total Colifo...
2,101,absecon bay-b,Total coliform,http://www.nj.gov/dep/wms/bears/docs/Coastal_P...,Six Total Maximum Daily Loads for Total Colifo...
3,101,cordery creek-a,Total coliform,http://www.nj.gov/dep/wms/bears/docs/Coastal_P...,Six Total Maximum Daily Loads for Total Colifo...
4,102,"absecon bay-a, absecon bay-c",Total coliform,http://www.nj.gov/dep/wms/bears/docs/Coastal_P...,Six Total Maximum Daily Loads for Total Colifo...



First Five Tuples in ShellTmdlList

[(101, 'absecon bay-a, absecon bay-c', 'Total coliform', 'http://www.nj.gov/dep/wms/bears/docs/Coastal_Pathogen_TMDLs_WMA15.pdf', 'Six Total Maximum Daily Loads for Total Coliform to Address Shellfish-Impaired Waters in Watershed Management Area 15'), (101, 'absecon bay-a, absecon bay-c, absecon creek-a', 'Total coliform', 'http://www.nj.gov/dep/wms/bears/docs/Coastal_Pathogen_TMDLs_WMA15.pdf', 'Six Total Maximum Daily Loads for Total Coliform to Address Shellfish-Impaired Waters in Watershed Management Area 15'), (101, 'absecon bay-b', 'Total coliform', 'http://www.nj.gov/dep/wms/bears/docs/Coastal_Pathogen_TMDLs_WMA15.pdf', 'Six Total Maximum Daily Loads for Total Coliform to Address Shellfish-Impaired Waters in Watershed Management Area 15'), (101, 'cordery creek-a', 'Total coliform', 'http://www.nj.gov/dep/wms/bears/docs/Coastal_Pathogen_TMDLs_WMA15.pdf', 'Six Total Maximum Daily Loads for Total Coliform to Address Shellfish-Impaired Waters in W

### Creation of DataFrame and List of Tuples From Mun_Stream File <a name="stream"></a> 

In [4]:
Stream_DF = TMDL_df('Mun_Streams_NoDups_NewLink_20160223.csv')
StreamTmdlList = TMDL_list_of_tuples('Mun_Streams_NoDups_NewLink_20160223.csv')

print('FIRST FIVE ROWS OF Stream_DF')
display(Stream_DF.head())
print("\nFirst Five Tuples in StreamTmdlList\n")
print(StreamTmdlList[0:4])


FIRST FIVE ROWS OF Stream_DF


Unnamed: 0,StreamMuncode,StreamParameter,StreamTmdlName,StreamTmdlDate,StreamDoc_DEP,StreamTmdlTitle
0,235,Fecal Coliform,"W Br Saddle, Saddle R at Ridgewood, Lodi & Fai...",2003,http://www.nj.gov/dep/wms/bears/docs/Northeast...,Total Maximum Daily Loads for Fecal Coliform t...
1,235,Total Phosphorus,Goffle Brook,2008,http://www.nj.gov/dep/wms/bears/docs/passaic_t...,Total Maximum Daily Load Report for the Non-Ti...
2,235,Total Phosphorus,Passaic R Lwr (Fair Lawn Ave to Goffle),2008,http://www.nj.gov/dep/wms/bears/docs/passaic_t...,Total Maximum Daily Load Report for the Non-Ti...
3,236,Fecal Coliform,Hackensack R,2003,http://www.nj.gov/dep/wms/bears/docs/Northeast...,Total Maximum Daily Loads for Fecal Coliform t...
4,236,Fecal Coliform,Pascack Brook/Musquapsink Brook,2003,http://www.nj.gov/dep/wms/bears/docs/Northeast...,Total Maximum Daily Loads for Fecal Coliform t...



First Five Tuples in StreamTmdlList

[(235, 'Fecal Coliform', 'W Br Saddle, Saddle R at Ridgewood, Lodi & Fairlaw', 2003, 'http://www.nj.gov/dep/wms/bears/docs/Northeast%20FC.pdf', 'Total Maximum Daily Loads for Fecal Coliform to Address 32 Streams in the Northeast Water Region'), (235, 'Total Phosphorus', 'Goffle Brook', 2008, 'http://www.nj.gov/dep/wms/bears/docs/passaic_tmdl.pdf', 'Total Maximum Daily Load Report for the Non-Tidal Passaic River Basin Addressing Phosphorus Impairments'), (235, 'Total Phosphorus', 'Passaic R Lwr (Fair Lawn Ave to Goffle)', 2008, 'http://www.nj.gov/dep/wms/bears/docs/passaic_tmdl.pdf', 'Total Maximum Daily Load Report for the Non-Tidal Passaic River Basin Addressing Phosphorus Impairments'), (236, 'Fecal Coliform', 'Hackensack R', 2003, 'http://www.nj.gov/dep/wms/bears/docs/Northeast%20FC.pdf', 'Total Maximum Daily Loads for Fecal Coliform to Address 32 Streams in the Northeast Water Region')]


### Creation of List of Unique Municipality Codes <a name="muncodes"></a> 

In [5]:
#Creation of unique list MunCodes

#creation of dataframe containing the single column 'Muncode'
Munis = TMDL_df('Muni_HUC14_MunCode_20160511.csv',columns = ['Muncode'])

#Dropping duplicates
Munis = Munis.drop_duplicates()

#Turning munis into a sorted list of muncode values
Munis = list(np.sort(Munis['Muncode'].values))

print('The Sorted Munis List, First 10 Values')
Munis[0:10]

The Sorted Munis List, First 10 Values


[101, 102, 103, 104, 105, 106, 107, 108, 109, 110]

### Creation of MuniLbls DataFrame, Which Contains Unique Muncodes, with corresponding Name and County <a name="mundata"></a> 

In [6]:
#Creation of MuniLbls DataFrame. Each Row contains a Muncode, MunName, and the County. 
#Used the optional columns argument, so that the HUC14 column was not included in the dataframe
MuniLbls_DF = TMDL_df('Muni_HUC14_MunCode_20160511.csv',columns = ['Muncode','MunName','County'])

#The County names were in all uppercase. The following line converts the County Names to Title Case
MuniLbls_DF.County = MuniLbls_DF['County'].str.title()

#Removal of Duplicates
MuniLbls_DF = MuniLbls_DF.drop_duplicates()

#Sorting the dataframe by muncode. '.reset_index(drop = True) is necessary to resort the index numbers.
MuniLbls_DF = MuniLbls_DF.sort_values(by = 'Muncode')

print('First Five Rows of the SortedMuniLbls_DF')
MuniLbls_DF.head()

First Five Rows of the SortedMuniLbls_DF


Unnamed: 0,Muncode,MunName,County
3109,101,Absecon City,Atlantic
186,102,Atlantic City,Atlantic
226,103,Brigantine City,Atlantic
257,104,Buena Borough,Atlantic
996,105,Buena Vista Township,Atlantic


Notice, the index (the column of numbers on the far left, is '3109,186,226,257,996,...'. It should be '0,1,2,....' The index values are the index values the Municipalities had before sorting by Muncode. The index can be fixed as follows

In [7]:
MuniLbls_DF = MuniLbls_DF.reset_index(drop=True)

print('First Five Rows of MuniLbls_DF With Fixed Index')
MuniLbls_DF.head()

First Five Rows of MuniLbls_DF With Fixed Index


Unnamed: 0,Muncode,MunName,County
0,101,Absecon City,Atlantic
1,102,Atlantic City,Atlantic
2,103,Brigantine City,Atlantic
3,104,Buena Borough,Atlantic
4,105,Buena Vista Township,Atlantic


### Creation of List of Tuples From MuniLbls DataFrame <a name="municipalities_list"></a> 

In [8]:
#Creation of MuniLbls list of tuples from MuniLbls_DF
MuniLbls = TMDL_df_to_list_of_tuples(MuniLbls_DF)

print('List Containing First 5 Tuples in the MiniLbls list\n')
MuniLbls[0:5]

List Containing First 5 Tuples in the MiniLbls list



[(101, 'Absecon City', 'Atlantic'),
 (102, 'Atlantic City', 'Atlantic'),
 (103, 'Brigantine City', 'Atlantic'),
 (104, 'Buena Borough', 'Atlantic'),
 (105, 'Buena Vista Township', 'Atlantic')]