# In this notebook we work through the code needed to scrape the Average Daily Water temperatures for the great lakes from the National Weather Service's website

## There are two parts to this ntebook.  Initial exploration with the scraping process, [here](#exploring) using the single year of 1995.  The full scrape of all the average daily temperatures from 1995 through 2018, [here](#scraping).

### Data Source

https://coastwatch.glerl.noaa.gov/statistic/statistic.html

In [103]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

import matplotlib.pyplot as plt

%matplotlib inline

<a id='exploring'></a>
## Average daily water temperature, 1995

In [2]:
url = 'https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/1995/glsea-temps1995_1024.dat'

In [3]:
res = requests.get(url)

### Checking the status code.

In [4]:
res.status_code

200

In [5]:
res.content

b'  Daily Lake Average Surface Water Temperature\n                     From\n Great Lakes Surface Environmental Analysis maps\n  \n--------------------------------------------------------\n               Surf. Water Temp. (degrees C)\n   \nYear Day    Sup.   Mich.   Huron    Erie    Ont.  St.Clr\n--------------------------------------------------------\n  \n1995 001    3.29    5.02    4.50    4.70    4.08    1.87\n1995 002    3.28    4.95    4.45    4.64    4.05    1.83\n1995 003    0.20    0.20    0.20    0.20    0.20    0.20\n1995 004    0.20    0.20    0.20    0.20    0.20    0.20\n1995 005    0.20    0.20    0.20    0.20    0.20    0.20\n1995 006    0.20    0.20    0.20    0.20    0.20    0.20\n1995 007    0.20    0.20    0.20    0.20    0.20    0.20\n1995 008    0.20    0.20    0.20    0.20    0.20    0.20\n1995 009    3.02    4.10    3.59    3.38    3.88    0.20\n1995 010    2.95    3.97    3.25    2.70    3.87    0.20\n1995 011    2.89    3.86    3.03    2.31    3.85    0.20\n19

### Instantiate a Beautiful Soup object.

In [6]:
soup = BeautifulSoup(res.content, 'lxml')

In [7]:
soup

<html><body><p>Daily Lake Average Surface Water Temperature
                     From
 Great Lakes Surface Environmental Analysis maps
  
--------------------------------------------------------
               Surf. Water Temp. (degrees C)
   
Year Day    Sup.   Mich.   Huron    Erie    Ont.  St.Clr
--------------------------------------------------------
  
1995 001    3.29    5.02    4.50    4.70    4.08    1.87
1995 002    3.28    4.95    4.45    4.64    4.05    1.83
1995 003    0.20    0.20    0.20    0.20    0.20    0.20
1995 004    0.20    0.20    0.20    0.20    0.20    0.20
1995 005    0.20    0.20    0.20    0.20    0.20    0.20
1995 006    0.20    0.20    0.20    0.20    0.20    0.20
1995 007    0.20    0.20    0.20    0.20    0.20    0.20
1995 008    0.20    0.20    0.20    0.20    0.20    0.20
1995 009    3.02    4.10    3.59    3.38    3.88    0.20
1995 010    2.95    3.97    3.25    2.70    3.87    0.20
1995 011    2.89    3.86    3.03    2.31    3.85    0.20
1995 012    

In [8]:
p = soup.find('p')

In [9]:
p.attrs

{}

In [10]:
p.text

'Daily Lake Average Surface Water Temperature\n                     From\n Great Lakes Surface Environmental Analysis maps\n  \n--------------------------------------------------------\n               Surf. Water Temp. (degrees C)\n   \nYear Day    Sup.   Mich.   Huron    Erie    Ont.  St.Clr\n--------------------------------------------------------\n  \n1995 001    3.29    5.02    4.50    4.70    4.08    1.87\n1995 002    3.28    4.95    4.45    4.64    4.05    1.83\n1995 003    0.20    0.20    0.20    0.20    0.20    0.20\n1995 004    0.20    0.20    0.20    0.20    0.20    0.20\n1995 005    0.20    0.20    0.20    0.20    0.20    0.20\n1995 006    0.20    0.20    0.20    0.20    0.20    0.20\n1995 007    0.20    0.20    0.20    0.20    0.20    0.20\n1995 008    0.20    0.20    0.20    0.20    0.20    0.20\n1995 009    3.02    4.10    3.59    3.38    3.88    0.20\n1995 010    2.95    3.97    3.25    2.70    3.87    0.20\n1995 011    2.89    3.86    3.03    2.31    3.85    0.20\n1995 

In [11]:
print(p.text)

Daily Lake Average Surface Water Temperature
                     From
 Great Lakes Surface Environmental Analysis maps
  
--------------------------------------------------------
               Surf. Water Temp. (degrees C)
   
Year Day    Sup.   Mich.   Huron    Erie    Ont.  St.Clr
--------------------------------------------------------
  
1995 001    3.29    5.02    4.50    4.70    4.08    1.87
1995 002    3.28    4.95    4.45    4.64    4.05    1.83
1995 003    0.20    0.20    0.20    0.20    0.20    0.20
1995 004    0.20    0.20    0.20    0.20    0.20    0.20
1995 005    0.20    0.20    0.20    0.20    0.20    0.20
1995 006    0.20    0.20    0.20    0.20    0.20    0.20
1995 007    0.20    0.20    0.20    0.20    0.20    0.20
1995 008    0.20    0.20    0.20    0.20    0.20    0.20
1995 009    3.02    4.10    3.59    3.38    3.88    0.20
1995 010    2.95    3.97    3.25    2.70    3.87    0.20
1995 011    2.89    3.86    3.03    2.31    3.85    0.20
1995 012    2.82    3.74   

In [13]:
type(p.text)

str

In [12]:
type(soup.text)

str

In [14]:
print(soup.text)

Daily Lake Average Surface Water Temperature
                     From
 Great Lakes Surface Environmental Analysis maps
  
--------------------------------------------------------
               Surf. Water Temp. (degrees C)
   
Year Day    Sup.   Mich.   Huron    Erie    Ont.  St.Clr
--------------------------------------------------------
  
1995 001    3.29    5.02    4.50    4.70    4.08    1.87
1995 002    3.28    4.95    4.45    4.64    4.05    1.83
1995 003    0.20    0.20    0.20    0.20    0.20    0.20
1995 004    0.20    0.20    0.20    0.20    0.20    0.20
1995 005    0.20    0.20    0.20    0.20    0.20    0.20
1995 006    0.20    0.20    0.20    0.20    0.20    0.20
1995 007    0.20    0.20    0.20    0.20    0.20    0.20
1995 008    0.20    0.20    0.20    0.20    0.20    0.20
1995 009    3.02    4.10    3.59    3.38    3.88    0.20
1995 010    2.95    3.97    3.25    2.70    3.87    0.20
1995 011    2.89    3.86    3.03    2.31    3.85    0.20
1995 012    2.82    3.74   

In [15]:
len(soup.text)

21151

In [16]:
raw_text = soup.text

In [17]:
raw_text.split('\n')[0]

'Daily Lake Average Surface Water Temperature'

In [18]:
for i in range(10):
    print(raw_text.split('\n')[i])

Daily Lake Average Surface Water Temperature
                     From
 Great Lakes Surface Environmental Analysis maps
  
--------------------------------------------------------
               Surf. Water Temp. (degrees C)
   
Year Day    Sup.   Mich.   Huron    Erie    Ont.  St.Clr
--------------------------------------------------------
  


### Create a dictionary that will house the data.

In [19]:
raw_text.split('\n')[10:][0].split()

['1995', '001', '3.29', '5.02', '4.50', '4.70', '4.08', '1.87']

In [20]:
raw_text.split('\n')[10:][364].split()

['1995', '365', '1.93', '2.59', '1.44', '0.95', '2.58', '0.20']

In [22]:
headers = raw_text.split('\n')[7].split()
headers

['Year', 'Day', 'Sup.', 'Mich.', 'Huron', 'Erie', 'Ont.', 'St.Clr']

In [23]:
for _ in range(len(headers)):
    print(headers[_])

Year
Day
Sup.
Mich.
Huron
Erie
Ont.
St.Clr


In [24]:
raw_text.split('\n')[10:][0].split()

['1995', '001', '3.29', '5.02', '4.50', '4.70', '4.08', '1.87']

In [25]:


Year = []
Day = []
Superior = []
Michigan = []
Huron = []
Erie = []
Ontario = []
StClr = []

headers = ['Year', 'Day', 'Superior', 'Michigan', 'Huron', 'Erie', 'Ontario', 'StClr']

for i in range(-1+len(raw_text.split('\n')[10:])):
#     print(i+1)
#     print(raw_text.split('\n')[10:][i])
    for k in range(len(raw_text.split('\n')[10:][0].split())):
#         print(k)
#         print(raw_text.split('\n')[10:][i].split()[k])
        eval(headers[k]).append(raw_text.split('\n')[10:][i].split()[k])
#         print(eval(headers[k]))
#         print()

In [31]:
Michigan

['5.02',
 '4.95',
 '0.20',
 '0.20',
 '0.20',
 '0.20',
 '0.20',
 '0.20',
 '4.10',
 '3.97',
 '3.86',
 '3.74',
 '3.63',
 '3.52',
 '3.41',
 '3.32',
 '3.23',
 '3.14',
 '3.03',
 '2.93',
 '2.84',
 '2.75',
 '2.67',
 '2.63',
 '2.59',
 '2.47',
 '2.39',
 '2.32',
 '2.25',
 '2.20',
 '2.12',
 '2.07',
 '2.02',
 '1.99',
 '1.96',
 '1.93',
 '1.91',
 '1.83',
 '1.78',
 '1.71',
 '1.66',
 '1.57',
 '1.53',
 '1.50',
 '1.45',
 '1.43',
 '1.41',
 '1.41',
 '1.41',
 '1.42',
 '1.40',
 '1.39',
 '1.90',
 '2.04',
 '2.03',
 '1.86',
 '1.76',
 '1.68',
 '1.38',
 '1.31',
 '1.21',
 '1.25',
 '1.22',
 '1.18',
 '1.12',
 '1.06',
 '1.10',
 '0.92',
 '0.86',
 '0.78',
 '0.71',
 '0.83',
 '0.89',
 '0.94',
 '0.99',
 '1.07',
 '1.15',
 '1.25',
 '1.34',
 '1.44',
 '1.56',
 '1.70',
 '1.92',
 '1.97',
 '1.89',
 '1.85',
 '1.81',
 '1.75',
 '1.70',
 '1.68',
 '1.70',
 '1.76',
 '1.80',
 '1.81',
 '1.79',
 '2.04',
 '2.08',
 '2.08',
 '2.08',
 '2.12',
 '2.15',
 '2.21',
 '2.24',
 '1.92',
 '2.19',
 '2.21',
 '2.32',
 '2.36',
 '2.39',
 '2.40',
 '2.42',
 

In [32]:
len(Year)

365

In [33]:
for i in range(8):
    print(headers[i])

Year
Day
Superior
Michigan
Huron
Erie
Ontario
StClr


In [34]:
# print out the seven different categories
for i in range(8):
    print(f"raw_text.split('\\n')[7].split()[{i}]")

raw_text.split('\n')[7].split()[0]
raw_text.split('\n')[7].split()[1]
raw_text.split('\n')[7].split()[2]
raw_text.split('\n')[7].split()[3]
raw_text.split('\n')[7].split()[4]
raw_text.split('\n')[7].split()[5]
raw_text.split('\n')[7].split()[6]
raw_text.split('\n')[7].split()[7]


In [36]:
data['Michigan']

['5.02',
 '4.95',
 '0.20',
 '0.20',
 '0.20',
 '0.20',
 '0.20',
 '0.20',
 '4.10',
 '3.97',
 '3.86',
 '3.74',
 '3.63',
 '3.52',
 '3.41',
 '3.32',
 '3.23',
 '3.14',
 '3.03',
 '2.93',
 '2.84',
 '2.75',
 '2.67',
 '2.63',
 '2.59',
 '2.47',
 '2.39',
 '2.32',
 '2.25',
 '2.20',
 '2.12',
 '2.07',
 '2.02',
 '1.99',
 '1.96',
 '1.93',
 '1.91',
 '1.83',
 '1.78',
 '1.71',
 '1.66',
 '1.57',
 '1.53',
 '1.50',
 '1.45',
 '1.43',
 '1.41',
 '1.41',
 '1.41',
 '1.42',
 '1.40',
 '1.39',
 '1.90',
 '2.04',
 '2.03',
 '1.86',
 '1.76',
 '1.68',
 '1.38',
 '1.31',
 '1.21',
 '1.25',
 '1.22',
 '1.18',
 '1.12',
 '1.06',
 '1.10',
 '0.92',
 '0.86',
 '0.78',
 '0.71',
 '0.83',
 '0.89',
 '0.94',
 '0.99',
 '1.07',
 '1.15',
 '1.25',
 '1.34',
 '1.44',
 '1.56',
 '1.70',
 '1.92',
 '1.97',
 '1.89',
 '1.85',
 '1.81',
 '1.75',
 '1.70',
 '1.68',
 '1.70',
 '1.76',
 '1.80',
 '1.81',
 '1.79',
 '2.04',
 '2.08',
 '2.08',
 '2.08',
 '2.12',
 '2.15',
 '2.21',
 '2.24',
 '1.92',
 '2.19',
 '2.21',
 '2.32',
 '2.36',
 '2.39',
 '2.40',
 '2.42',
 

In [37]:
data_df = pd.DataFrame(data)

In [38]:
pd.options.display.max_rows = 500

In [39]:
data_df

Unnamed: 0,Year,Day,Superior,Michigan,Huron,Erie,Ontario,StClr
0,1995,1,3.29,5.02,4.5,4.7,4.08,1.87
1,1995,2,3.28,4.95,4.45,4.64,4.05,1.83
2,1995,3,0.2,0.2,0.2,0.2,0.2,0.2
3,1995,4,0.2,0.2,0.2,0.2,0.2,0.2
4,1995,5,0.2,0.2,0.2,0.2,0.2,0.2
5,1995,6,0.2,0.2,0.2,0.2,0.2,0.2
6,1995,7,0.2,0.2,0.2,0.2,0.2,0.2
7,1995,8,0.2,0.2,0.2,0.2,0.2,0.2
8,1995,9,3.02,4.1,3.59,3.38,3.88,0.2
9,1995,10,2.95,3.97,3.25,2.7,3.87,0.2


### Now, create a loop to catch all of the values from the website over the course of HISTORY! (Really, just since 1995.  But, you know)

In [64]:
# First, let's parse out the code to work this as barebones as possible.

In [65]:
url = 'https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/1995/glsea-temps1995_1024.dat'

In [66]:
res = requests.get(url)

In [67]:
res.status_code

200

In [68]:
soup = BeautifulSoup(res.content, 'lxml')

In [69]:
title = soup.text.split('\n')[0]
title

'Daily Lake Average Surface Water Temperature'

In [70]:
Year = []
Day = []
Superior = []
Michigan = []
Huron = []
Erie = []
Ontario = []
StClr = []

headers = ['Year', 'Day', 'Superior', 'Michigan', 'Huron', 'Erie', 'Ontario', 'StClr']

raw_text = soup.text

for i in range(-1+len(raw_text.split('\n')[10:])):

    for k in range(len(raw_text.split('\n')[10:][0].split())):

        eval(headers[k]).append(raw_text.split('\n')[10:][i].split()[k])


In [71]:
data = {
    headers[i]: eval(headers[i]) for i in range(8)
}

In [72]:
data.keys()

dict_keys(['Year', 'Day', 'Superior', 'Michigan', 'Huron', 'Erie', 'Ontario', 'StClr'])

In [73]:
data_df = pd.DataFrame(data)
data_df.head()

Unnamed: 0,Year,Day,Superior,Michigan,Huron,Erie,Ontario,StClr
0,1995,1,3.29,5.02,4.5,4.7,4.08,1.87
1,1995,2,3.28,4.95,4.45,4.64,4.05,1.83
2,1995,3,0.2,0.2,0.2,0.2,0.2,0.2
3,1995,4,0.2,0.2,0.2,0.2,0.2,0.2
4,1995,5,0.2,0.2,0.2,0.2,0.2,0.2


In [74]:
# we delete this dataset, because it is not needed.  Here, we keep the code for completeness, but do not include it
# as a ran cell.

# # saving the initial data in a csv
# data_df.to_csv('data/1995_avg_water_temp.csv', index=False)

In [75]:
data_df.shape

(365, 8)

<a id='scraping'></a>
## Here is the code that will pull the data from the website https://coastwatch.glerl.noaa.gov/statistic/statistic.html comprising the average daily temperature readings of each of the great lakes.

In [93]:
for i in range(1995, 2019):
    url = f'https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/{i}/glsea-temps{i}_1024.dat'
    print(url)

https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/1995/glsea-temps1995_1024.dat
https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/1996/glsea-temps1996_1024.dat
https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/1997/glsea-temps1997_1024.dat
https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/1998/glsea-temps1998_1024.dat
https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/1999/glsea-temps1999_1024.dat
https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/2000/glsea-temps2000_1024.dat
https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/2001/glsea-temps2001_1024.dat
https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/2002/glsea-temps2002_1024.dat
https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/2003/glsea-temps2003_1024.dat
https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/2004/glsea-temps2004_1024.dat
https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/2005/glsea-temps2005_1024.dat
https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/2006/glsea-temps2006_1024.dat
http

### This is our magic code that will grab the data for average water temperatures from 1995 until 2018 and put it into a dataframe.

In [97]:
import time
start_time = time.time()


Year = []
Day = []
Superior = []
Michigan = []
Huron = []
Erie = []
Ontario = []
StClr = []

headers = ['Year', 'Day', 'Superior', 'Michigan', 'Huron', 'Erie', 'Ontario', 'StClr']

for i in range(1995, 2019):
    
    time.sleep(3) # You should use at least 3
    url = f'https://coastwatch.glerl.noaa.gov/ftp/glsea/avgtemps/{i}/glsea-temps{i}_1024.dat'

    res = requests.get(url)

    # checking the status of the requested url
    if res.status_code == 200:
        print(f'Status Code for {i} checks out.')
    else:
        print('Status Code error for {i}')

    # instantiating a beautiful soup object to contert our request into python managable syntax
    soup = BeautifulSoup(res.content, 'lxml')

    raw_text = soup.text
    
    for i in range(-1 + len(raw_text.split('\n')[10:])):

        for k in range(len(raw_text.split('\n')[10:][0].split())):
            eval(headers[k]).append(raw_text.split('\n')[10:][i].split()[k])

# saving all the data in a dictionary
data = {
    headers[i]: eval(headers[i]) for i in range(8)
}


end_time = round(time.time() - start_time, 3)
print(f'time: {end_time} seconds')
end_time_minutes = int(end_time/ 60)
end_time_seconds = round(end_time % 60, 3)

print(f'time: {end_time_minutes} minutes, {end_time_seconds} seconds')

Status Code for 1995 checks out.
Status Code for 1996 checks out.
Status Code for 1997 checks out.
Status Code for 1998 checks out.
Status Code for 1999 checks out.
Status Code for 2000 checks out.
Status Code for 2001 checks out.
Status Code for 2002 checks out.
Status Code for 2003 checks out.
Status Code for 2004 checks out.
Status Code for 2005 checks out.
Status Code for 2006 checks out.
Status Code for 2007 checks out.
Status Code for 2008 checks out.
Status Code for 2009 checks out.
Status Code for 2010 checks out.
Status Code for 2011 checks out.
Status Code for 2012 checks out.
Status Code for 2013 checks out.
Status Code for 2014 checks out.
Status Code for 2015 checks out.
Status Code for 2016 checks out.
Status Code for 2017 checks out.
Status Code for 2018 checks out.
time: 86.144 seconds
time: 1 minutes, 26.144 seconds


In [98]:
Avg_Water_Temps = pd.DataFrame(data)
Avg_Water_Temps.shape

(8766, 8)

In [99]:
Avg_Water_Temps.to_csv('data/lake_michigan/avg_water_temps.csv', index=False)

In [102]:
Avg_Water_Temps.head(20)

Unnamed: 0,Year,Day,Superior,Michigan,Huron,Erie,Ontario,StClr
0,1995,1,3.29,5.02,4.5,4.7,4.08,1.87
1,1995,2,3.28,4.95,4.45,4.64,4.05,1.83
2,1995,3,0.2,0.2,0.2,0.2,0.2,0.2
3,1995,4,0.2,0.2,0.2,0.2,0.2,0.2
4,1995,5,0.2,0.2,0.2,0.2,0.2,0.2
5,1995,6,0.2,0.2,0.2,0.2,0.2,0.2
6,1995,7,0.2,0.2,0.2,0.2,0.2,0.2
7,1995,8,0.2,0.2,0.2,0.2,0.2,0.2
8,1995,9,3.02,4.1,3.59,3.38,3.88,0.2
9,1995,10,2.95,3.97,3.25,2.7,3.87,0.2
