# Week 7 Assignment

This week's assignment you will be working with NOAAs weather API. This API will allow you to retrieve a variety of data from a specific weather station(s), of your choice.

API Documentation: https://www.ncdc.noaa.gov/cdo-web/webservices/v2#gettingStarted

As the API documentation page states, you will need to register for your own credentials. Following the instructions at https://www.ncdc.noaa.gov/cdo-web/token to register.

<div class="alert alert-block alert-danger">
<b>Important::</b> You can remove the following cell and use the commented out cell just below to load your Twitter credentials. The auth2.csv will not be provided to you. Please notice that the individual credential fields are stored as strings.
</div>

In [10]:
# ### You should uncomment this cell and use your credentials from NOAA

# # smy credentials for NOAA API. 
my_token = 'LXmmcJWnmZoOjkxNzXhdrAdeXrLnhddy'

Now we need to determine a weather station that we would like to retrieve our data for. Use the following link to get the id for a NOAA weather station. https://www.ncdc.noaa.gov/cdo-web/datatools/findstation

Fill out all field based on your preferences. I used:
   * Location: CO
   * Dataset: Daily Summaries
   * Data Range: 2019-11-01 to 20019-11-30
   * Data Category: Air Temperature


<img align="left" style="padding-right:10px;" src="figures_7/NOAA_find_station_query.png" >

#### Click on 'Full Details' to see all the information
<img align="left" style="padding-right:10px;" src="figures_7/NOAA_find_station_result.png" ><br>

From the Find A Station results, we will need to capture the following details:
   * Capture the values within the 'Network' and 'Id' fields (second cell from top, split on ':')

In [11]:
# variables based on my station search
network = 'GHCND'
ID = 'USW00023129'

# station_id = network:ID
station_id = network + ':' + ID
print(station_id)

GHCND:USW00023129


### What type of data are we looking for?
At this point we need to determine what type of data we want to retrieve. We can actually use the NOAAs API to help determine what is available for this station.

One of the documentation pages https://www.ncdc.noaa.gov/cdo-web/webservices/v2#dataTypes shows us how to query for the available datatypes for the station we have chosen above.

As we saw in the FTE, we can build a dictionary of parameters to be used in our request.

In [12]:
import requests
import json

# building the parameter dictionary
# 'limit = 1000' --> What does this do? Look at the NOAA API documentation
data = {}
data = {'limit':'1000', 'datasetid': network, 'station_id': station_id}

# calling NOAA API to get the available datatypes for this specific station
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/datatypes',params = data, headers = {'token':my_token})

Now we need to convert the JSON output from the request to something more readable

In [13]:
# JSON to dictionary
datatypes_dict = json.loads(r.text)

# need the keys from this dictionary
datatypes_dict.keys()


dict_keys(['metadata', 'results'])

I'm going to guess that the information we are after is stored in the results key.  Let's look at the first 5 and see if we might be right

In [14]:
datatypes_dict['results'][:10]

[{'mindate': '1994-03-19',
  'maxdate': '1996-05-28',
  'name': 'Average cloudiness midnight to midnight from 30-second ceilometer data',
  'datacoverage': 1,
  'id': 'ACMC'},
 {'mindate': '1965-01-01',
  'maxdate': '2005-12-31',
  'name': 'Average cloudiness midnight to midnight from manual observations',
  'datacoverage': 1,
  'id': 'ACMH'},
 {'mindate': '1994-02-01',
  'maxdate': '1996-05-28',
  'name': 'Average cloudiness sunrise to sunset from 30-second ceilometer data',
  'datacoverage': 1,
  'id': 'ACSC'},
 {'mindate': '1965-01-01',
  'maxdate': '2005-12-31',
  'name': 'Average cloudiness sunrise to sunset from manual observations',
  'datacoverage': 1,
  'id': 'ACSH'},
 {'mindate': '1982-01-01',
  'maxdate': '2019-12-12',
  'name': 'Average wind speed',
  'datacoverage': 1,
  'id': 'AWND'},
 {'mindate': '1948-08-02',
  'maxdate': '2012-07-23',
  'name': 'Number of days included in the multiday evaporation total (MDEV)',
  'datacoverage': 1,
  'id': 'DAEV'},
 {'mindate': '1832-0

So, the results appear to be a list of dictionaries. 

<div class="alert alert-block alert-warning">
<b>Note:</b>  I'll leave parsing through all of these as an exercise for you to do.  I already did this seperately and determined I will be using the datatype set of 'TAVG' which is average temp and is available for the year of 2018.
</div>

In [15]:
data = {}
data = {'limit':'1000', 'datasetid': network, 'stationid': station_id}


# append additional parameters to data dictionary
data.update({'datatypeid': 'TAVG'})
data.update({'startdate': '2018-01-01'})
data.update({'enddate': '2018-12-31'})
data.update({'units':'standard'})
data

{'limit': '1000',
 'datasetid': 'GHCND',
 'stationid': 'GHCND:USW00023129',
 'datatypeid': 'TAVG',
 'startdate': '2018-01-01',
 'enddate': '2018-12-31',
 'units': 'standard'}

In [16]:
# make the request to get our year of data
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',params = data, headers = {'token':my_token})

#load the api response as a json
avg_temp_2018_dict = json.loads(r.text)

In [17]:
# look at the first record of our data
avg_temp_2018_dict['results'][:10]

[{'date': '2018-01-01T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 56.0},
 {'date': '2018-01-02T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 60.0},
 {'date': '2018-01-03T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 61.0},
 {'date': '2018-01-04T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 62.0},
 {'date': '2018-01-05T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 63.0},
 {'date': '2018-01-06T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 62.0},
 {'date': '2018-01-07T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 63.0},
 {'date': '2018-01-08T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:U

Looks like we have daily data and the 'value' key appears to contain a number that seems reasonable for temperature.

Let's just verify that we got a record for everyday of 2018

In [18]:
# there were 365 days in 2018
len(avg_temp_2018_dict['results'])

365

In [19]:
# look at the first and last day
print(avg_temp_2018_dict['results'][0])
print(avg_temp_2018_dict['results'][364])

{'date': '2018-01-01T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00023129', 'attributes': 'H,,S,', 'value': 56.0}
{'date': '2018-12-31T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00023129', 'attributes': 'H,,S,', 'value': 55.0}


### Requirements for the assignment
Using the NOAA API, retrieve data for a weather station of your choice.  Based on the station you pick, 
   * Determine an appropriate dataset 
   * Determine an appropriatedataset type
   * Pull at least 3 years worth of data.<br>
     Note: if you pick an annual dataset, you will need to pull at least 25 years worth of data.
   * Organize your results into a meaningful representation
   * Store your result in one of the followinf formats:
      - csv file
      - json file
      - relational database






<div class="alert alert-block alert-danger">
<b>Important::</b> You MAY NOT reuse the station or datasettype that was demostrated above. This means the following are off limits: 
    
   * ID = 'USW00023129'
   * datatypeid = 'TAVG'

</div>

<div class="alert alert-block alert-warning">
<b>Hint:</b> The NOAA API will only allow you to pull one year of data at a time.
</div>

In [20]:
# variables based on my station search: Chose the station closest to my house in peoria az
network = 'GHCND'
ID = 'US1AZMR0291'

# station_id = network:ID
station_id = network + ':' + ID
print(station_id)

GHCND:US1AZMR0291


In [21]:
#chose Precipitation since its rare to see in arizona 
data = {}
data = {'limit':'1000', 'datasetid': network, 'stationid': station_id}
start='-01-01'
end='-12-31'
data_list=[]

for x in range(2010,2019):
    data.update({'datatypeid': 'PRCP'})
    data.update({'startdate': str(x)+start})
    data.update({'enddate': str(x)+end})
    data.update({'units':'standard'})
    r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',params = data, headers = {'token':my_token})
    dictonary_load = json.loads(r.text)
    data_list.append(dictonary_load['results'])

In [22]:
data_list[:1]

[[{'date': '2010-03-03T00:00:00',
   'datatype': 'PRCP',
   'station': 'GHCND:US1AZMR0291',
   'attributes': ',,N,',
   'value': 0.0},
  {'date': '2010-03-04T00:00:00',
   'datatype': 'PRCP',
   'station': 'GHCND:US1AZMR0291',
   'attributes': ',,N,',
   'value': 0.0},
  {'date': '2010-03-05T00:00:00',
   'datatype': 'PRCP',
   'station': 'GHCND:US1AZMR0291',
   'attributes': ',,N,',
   'value': 0.0},
  {'date': '2010-03-06T00:00:00',
   'datatype': 'PRCP',
   'station': 'GHCND:US1AZMR0291',
   'attributes': ',,N,',
   'value': 0.0},
  {'date': '2010-03-07T00:00:00',
   'datatype': 'PRCP',
   'station': 'GHCND:US1AZMR0291',
   'attributes': ',,N,',
   'value': 0.03},
  {'date': '2010-03-08T00:00:00',
   'datatype': 'PRCP',
   'station': 'GHCND:US1AZMR0291',
   'attributes': ',,N,',
   'value': 1.01},
  {'date': '2010-03-09T00:00:00',
   'datatype': 'PRCP',
   'station': 'GHCND:US1AZMR0291',
   'attributes': ',,N,',
   'value': 0.0},
  {'date': '2010-03-10T00:00:00',
   'datatype': 'PRC

In [23]:
#needed to create a flat list since the loop created a list of list
flat_list = []
for sublist in data_list:
    for item in sublist:
        flat_list.append(item)

In [24]:
flat_list[:5]

[{'date': '2010-03-03T00:00:00',
  'datatype': 'PRCP',
  'station': 'GHCND:US1AZMR0291',
  'attributes': ',,N,',
  'value': 0.0},
 {'date': '2010-03-04T00:00:00',
  'datatype': 'PRCP',
  'station': 'GHCND:US1AZMR0291',
  'attributes': ',,N,',
  'value': 0.0},
 {'date': '2010-03-05T00:00:00',
  'datatype': 'PRCP',
  'station': 'GHCND:US1AZMR0291',
  'attributes': ',,N,',
  'value': 0.0},
 {'date': '2010-03-06T00:00:00',
  'datatype': 'PRCP',
  'station': 'GHCND:US1AZMR0291',
  'attributes': ',,N,',
  'value': 0.0},
 {'date': '2010-03-07T00:00:00',
  'datatype': 'PRCP',
  'station': 'GHCND:US1AZMR0291',
  'attributes': ',,N,',
  'value': 0.03}]

In [25]:
import pandas as pd
df = pd.DataFrame(flat_list) 
df.head()

Unnamed: 0,date,datatype,station,attributes,value
0,2010-03-03T00:00:00,PRCP,GHCND:US1AZMR0291,",,N,",0.0
1,2010-03-04T00:00:00,PRCP,GHCND:US1AZMR0291,",,N,",0.0
2,2010-03-05T00:00:00,PRCP,GHCND:US1AZMR0291,",,N,",0.0
3,2010-03-06T00:00:00,PRCP,GHCND:US1AZMR0291,",,N,",0.0
4,2010-03-07T00:00:00,PRCP,GHCND:US1AZMR0291,",,N,",0.03


In [26]:
df.to_csv (r'C:\Users\eltac\Downloads\weather_results.csv', index = None, header=True)