## Assignment/Goal:
"This week's assignment you will be working with NOAAs weather API. This API will allow you to retrieve a variety of data from a specific weather station(s), of your choice.

API Documentation: https://www.ncdc.noaa.gov/cdo-web/webservices/v2#gettingStarted

## As the API documentation page states, you will need to register for your own credentials. Following the instructions at https://www.ncdc.noaa.gov/cdo-web/token to register.

Now we need to determine a weather station that we would like to retrieve our data for. Use the following link to get the id for a NOAA weather station. https://www.ncdc.noaa.gov/cdo-web/datatools/findstation"


### Requirements for the assignment
Using the NOAA API, retrieve data for a weather station of your choice.  Based on the station you pick, 
   * Determine an appropriate dataset 
   * Determine an appropriatedataset type
   * Pull at least 3 years worth of data.<br>
     Note: if you pick an annual dataset, you will need to pull at least 25 years worth of data.
   * Organize your results into a meaningful representation
   * Store your result in one of the followinf formats:
      - csv file
      - json file
      - relational database

### About the dataset:
I choose to use a dataset that is relevant to me. This dataset station ID & info is from the website: https://www.ncdc.noaa.gov/cdo-web/datatools/findstation. 

I choose to use data from a local Lakewood, Colorado station (station id:GHCND:USC00055402) that is a collection of Max daily temperatures. (TMAX)  Since we need 3 years of data, I pulled daily data from years 2016 - 2018.


## Setup/Intro: 

#### Determine an appropriate dataset/Determine an appropriate dataset type -
Before starting, we need to make sure that we have an appropriate dataset. We will use data found from a local Lakewood, CO station. We will look at MAX daily temperature data, which is relevant to the the average dataset outlined in the demo, is similar in structure, and is appropriate to work on in this assignment as it is different but similar, intersting to me personally, and also will pull from 3 years of data as outlined spanning from years 2016-2018 found here:https://www.ncdc.noaa.gov/cdo-web/webservices/v2#dataTypes. Since we can only pull 1 year of data at a time, we will pull all 3 years seperately and then will combine them into one master dataframe before we put into a csv to finalize the assignment later on. 

The dataset type we will be using is MAX daily Temp. from a local NOAA station from years 2016-2018. Again, before starting -- we will need to make sure that we will be able to work on the data once we pull it into a dataframe, or something similar. After investigation, it is determined that this data is in JSON format that we can easily convert to dict, or dictionary format later on. We will use the json.loads(r.text) function with datatypes_dict.keys() function to pull the data in. We will then use the pd.DataFrame() function later on, as we will demonstrate to convert the dict to a dataframe. Now that we know the dataset type, and that we can work with it, as well as knowing that the data is appropriate, we will more forward and will outline our steps/processes as we go.

In [1]:
#importing package requests
import requests #to get info from NOAA site
import json # to view the data as this is the dataset type format it comes in and allows us to store in dict 

In [2]:
# these are my individual token access key for the NOAA API scrapes from  https://www.ncdc.noaa.gov/cdo-web/token to register
# my credentials for NOAA API. 
my_token = 'AURUFnnABXultuNXLbnkzxoTVrNLYuRZ'

In [3]:
# variables based on my station search -- this is in Lakewood, CO -- we found the station and that is meets the correct dataset type & everything here: https://www.ncdc.noaa.gov/cdo-web/datatools/findstation
network = 'GHCND'
ID = 'USC00055402'

# station_id = network:ID
station_id = network + ':' + ID
print(station_id)

GHCND:USC00055402


In [4]:
# building the parameter dictionary -- this was outlined in the demo this week -- we puull in our private token using the requests.get() function 
# 'limit = 1000' --> What does this do? Look at the NOAA API documentation -- we want to follow the guidelines and not abuse the right to access this data, or get cut off from access to the data
data = {}
data = {'limit':'1000', 'datasetid': network, 'station_id': station_id}

# calling NOAA API to get the available datatypes for this specific station
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/datatypes',params = data, headers = {'token':my_token})

In [5]:
#loading json loads as readabke text 
# JSON to dictionary
datatypes_dict = json.loads(r.text)

#Here we get the keys for our dictionary with .keys() function
# need the keys from this dictionary
datatypes_dict.keys()

dict_keys(['metadata', 'results'])

In [6]:
#checking the first record of data --looks great -- we can move forward and start pulling our desired data -- by YEAR
datatypes_dict['results'][:1]

[{'mindate': '1994-03-19',
  'maxdate': '1996-05-28',
  'name': 'Average cloudiness midnight to midnight from 30-second ceilometer data',
  'datacoverage': 1,
  'id': 'ACMC'}]

Now that we have the data successfully pulling in from the NOAA API source, with the correct station location -- we can start to pull data by the year and by our selected variable Max Temp. in the next steps

## 2018 Max temp. data setup:
Let's start to pull data from the year 2018. We will pull from station: GHCND:USC00055402 with datetypeid: TMAX from Jan 1. - Dec. 31 in year 2018

In [7]:
#here we set the parameters of our API scrape -- station: GHCND:USC00055402 with datetypeid: TMAX from Jan 1. - Dec. 31 in year 2018
data = {}
data = {'limit':'1000', 'datasetid': network, 'stationid': station_id}


# append additional parameters to data dictionary
data.update({'datatypeid': 'TMAX'})
data.update({'startdate': '2018-01-01'})
data.update({'enddate': '2018-12-31'})
data.update({'units':'standard'})
data

{'limit': '1000',
 'datasetid': 'GHCND',
 'stationid': 'GHCND:USC00055402',
 'datatypeid': 'TMAX',
 'startdate': '2018-01-01',
 'enddate': '2018-12-31',
 'units': 'standard'}

In [8]:
#here is another cell highlighted in the dmeo this week -- this makes the request to get the data from the API site: https://www.ncdc.noaa.gov/cdo-web/api/v2/data
# make the request to get our year of data
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',params = data, headers = {'token':my_token})

#load the api response as a json
max_temp_2018_dict = json.loads(r.text)

In [9]:
# look at the first record of our data
max_temp_2018_dict['results'][:5]

[{'date': '2018-01-01T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 27.0},
 {'date': '2018-01-02T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 32.0},
 {'date': '2018-01-03T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 44.0},
 {'date': '2018-01-04T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 53.0},
 {'date': '2018-01-05T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 50.0}]

In [10]:
# there were 356 days recorded in 2018
len(max_temp_2018_dict['results'])

356

In [11]:
# look at the first and last day
print(max_temp_2018_dict['results'][0])
print(max_temp_2018_dict['results'][355])

{'date': '2018-01-01T00:00:00', 'datatype': 'TMAX', 'station': 'GHCND:USC00055402', 'attributes': ',,7,0700', 'value': 27.0}
{'date': '2018-12-31T00:00:00', 'datatype': 'TMAX', 'station': 'GHCND:USC00055402', 'attributes': ',,7,0700', 'value': 54.0}


## 2017 Max temp. data setup:
pull from station: GHCND:USC00055402 with datetypeid: TMAX from Jan 1. - Dec. 31 in year 2017

In [12]:
#here we set the parameters of our API scrape -- station: GHCND:USC00055402 with datetypeid: TMAX from Jan 1. - Dec. 31 in year 2017
data1 = {}
data1 = {'limit':'1000', 'datasetid': network, 'stationid': station_id}


# append additional parameters to data dictionary
data1.update({'datatypeid': 'TMAX'})
data1.update({'startdate': '2017-01-01'})
data1.update({'enddate': '2017-12-31'})
data1.update({'units':'standard'})
data1

{'limit': '1000',
 'datasetid': 'GHCND',
 'stationid': 'GHCND:USC00055402',
 'datatypeid': 'TMAX',
 'startdate': '2017-01-01',
 'enddate': '2017-12-31',
 'units': 'standard'}

In [13]:
#here is another cell highlighted in the dmeo this week -- this makes the request to get the data from the API site: https://www.ncdc.noaa.gov/cdo-web/api/v2/data
# make the request to get our year of data
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',params = data1, headers = {'token':my_token})

#load the api response as a json
max_temp_2017_dict = json.loads(r.text)

In [14]:
# look at the first record of our data
max_temp_2017_dict['results'][:5]

[{'date': '2017-01-01T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 39.0},
 {'date': '2017-01-02T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 49.0},
 {'date': '2017-01-03T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 52.0},
 {'date': '2017-01-04T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 28.0},
 {'date': '2017-01-05T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 14.0}]

In [15]:
# here were 361 days recorded in 2017
len(max_temp_2017_dict['results'])

361

In [16]:
# look at the first and last day
print(max_temp_2018_dict['results'][0])
print(max_temp_2018_dict['results'][355])

{'date': '2018-01-01T00:00:00', 'datatype': 'TMAX', 'station': 'GHCND:USC00055402', 'attributes': ',,7,0700', 'value': 27.0}
{'date': '2018-12-31T00:00:00', 'datatype': 'TMAX', 'station': 'GHCND:USC00055402', 'attributes': ',,7,0700', 'value': 54.0}


## 2016 Max temp. data setup:
pull from station: GHCND:USC00055402 with datetypeid: TMAX from Jan 1. - Dec. 31 in year 2016

In [17]:
#here we set the parameters of our API scrape -- station: GHCND:USC00055402 with datetypeid: TMAX from Jan 1. - Dec. 31 in year 2016
data2 = {}
data2 = {'limit':'1000', 'datasetid': network, 'stationid': station_id}


# append additional parameters to data dictionary
data2.update({'datatypeid': 'TMAX'})
data2.update({'startdate': '2016-01-01'})
data2.update({'enddate': '2016-12-31'})
data2.update({'units':'standard'})
data2

{'limit': '1000',
 'datasetid': 'GHCND',
 'stationid': 'GHCND:USC00055402',
 'datatypeid': 'TMAX',
 'startdate': '2016-01-01',
 'enddate': '2016-12-31',
 'units': 'standard'}

In [18]:
#here is another cell highlighted in the dmeo this week -- this makes the request to get the data from the API site: https://www.ncdc.noaa.gov/cdo-web/api/v2/data
# make the request to get our year of data
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',params = data2, headers = {'token':my_token})

#load the api response as a json
max_temp_2016_dict = json.loads(r.text)

In [19]:
# look at the first record of our data
max_temp_2016_dict['results'][:5]

[{'date': '2016-01-01T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 28.0},
 {'date': '2016-01-02T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 39.0},
 {'date': '2016-01-03T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 45.0},
 {'date': '2016-01-04T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 50.0},
 {'date': '2016-01-05T00:00:00',
  'datatype': 'TMAX',
  'station': 'GHCND:USC00055402',
  'attributes': ',,7,0700',
  'value': 48.0}]

In [20]:
# there were 346 days in 2016 recorded
len(max_temp_2016_dict['results'])

346

In [21]:
# look at the first and last day
print(max_temp_2016_dict['results'][0])
print(max_temp_2016_dict['results'][345])

{'date': '2016-01-01T00:00:00', 'datatype': 'TMAX', 'station': 'GHCND:USC00055402', 'attributes': ',,7,0700', 'value': 28.0}
{'date': '2016-12-31T00:00:00', 'datatype': 'TMAX', 'station': 'GHCND:USC00055402', 'attributes': ',,7,0700', 'value': 64.0}


## Putting all data together into pandas dataframes -- 2016-2018 Max Temp. 

In [22]:
#importing pandas to setup our dataframes and help with our final step of storing data in CSV format
import pandas as pd

In [23]:
#double cheking out datatype currently -- it is dict which was pulled originalling in JSON format, as outlined above -- 
type(max_temp_2016_dict)

dict

In [24]:
# pulling 2018 data into its own dataframe & checking that it looks good/all data is there
df8 = pd.DataFrame(max_temp_2018_dict['results'])
df8

Unnamed: 0,date,datatype,station,attributes,value
0,2018-01-01T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",27.0
1,2018-01-02T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",32.0
2,2018-01-03T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",44.0
3,2018-01-04T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",53.0
4,2018-01-05T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",50.0
...,...,...,...,...,...
351,2018-12-27T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",37.0
352,2018-12-28T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",29.0
353,2018-12-29T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",23.0
354,2018-12-30T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",42.0


In [25]:
# pulling 2017 data into its own dataframe & checking that it looks good/all data is there
df7 = pd.DataFrame(max_temp_2017_dict['results'])
df7

Unnamed: 0,date,datatype,station,attributes,value
0,2017-01-01T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",39.0
1,2017-01-02T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",49.0
2,2017-01-03T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",52.0
3,2017-01-04T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",28.0
4,2017-01-05T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",14.0
...,...,...,...,...,...
356,2017-12-27T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",14.0
357,2017-12-28T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",54.0
358,2017-12-29T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",55.0
359,2017-12-30T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",62.0


In [26]:
# pulling 2016 data into its own dataframe & checking that it looks good/all data is there
df6 = pd.DataFrame(max_temp_2016_dict['results'])
df6

Unnamed: 0,date,datatype,station,attributes,value
0,2016-01-01T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",28.0
1,2016-01-02T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",39.0
2,2016-01-03T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",45.0
3,2016-01-04T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",50.0
4,2016-01-05T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",48.0
...,...,...,...,...,...
341,2016-12-27T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",42.0
342,2016-12-28T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",54.0
343,2016-12-29T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",51.0
344,2016-12-30T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",50.0


## Combine seperate 2016-2018 dfs into 1 master data frame & convert 'date' column into date_time dtype:

In [27]:
#listing 3 dfs into 'frames' variable that we will use to concat dfs together 
# https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
frames = [df6, df7, df8]

In [28]:
#concatinating 2016 - 2018 data (3 seperate dfs) into 1 master dataframe stacked together
df = pd.concat(frames)
df

Unnamed: 0,date,datatype,station,attributes,value
0,2016-01-01T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",28.0
1,2016-01-02T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",39.0
2,2016-01-03T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",45.0
3,2016-01-04T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",50.0
4,2016-01-05T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",48.0
...,...,...,...,...,...
351,2018-12-27T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",37.0
352,2018-12-28T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",29.0
353,2018-12-29T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",23.0
354,2018-12-30T00:00:00,TMAX,GHCND:USC00055402,",,7,0700",42.0


In [29]:
#checking class types of dataframes -- we see 'date' column is not in date_time format -- this can be problemnatic later on, so we will change that now while it is easy
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1063 entries, 0 to 355
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   date        1063 non-null   object 
 1   datatype    1063 non-null   object 
 2   station     1063 non-null   object 
 3   attributes  1063 non-null   object 
 4   value       1063 non-null   float64
dtypes: float64(1), object(4)
memory usage: 49.8+ KB


In [30]:
#converting 'date' column into datetime64[ns] for dtype that will be easier to work with in the future
df['date'] = pd.to_datetime(df['date'])

In [31]:
#checking the new datetime type for 'date' column -- looks great
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1063 entries, 0 to 355
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   date        1063 non-null   datetime64[ns]
 1   datatype    1063 non-null   object        
 2   station     1063 non-null   object        
 3   attributes  1063 non-null   object        
 4   value       1063 non-null   float64       
dtypes: datetime64[ns](1), float64(1), object(3)
memory usage: 49.8+ KB


In [32]:
#quick summary of the data values statistics -- we can see con see some valuable info that can be investigated much farther such as:
# low temp = 4 degrees
# High temp = 100 degrees
# AVG high temp was 65.87 degrees 

df.describe()

Unnamed: 0,value
count,1063.0
mean,65.87206
std,17.998812
min,4.0
25%,53.0
50%,66.0
75%,81.0
max,100.0


## Converting final dateframe with all date (2016-2018 daily MAX temp. in Lakewood, CO) into CSV file 

In [33]:
df.to_csv('MaxTemp_16.17.18.csv')

## Summary/ Conclusion
In Conclusion, we were able to pull 3 years of MAX temp. daily data from a local Lakewood, CO NOAA API retreive site using our private access token generated. We successfully were able to pull MAX temp. data from years 2016-2018 using the requests & json functions. The dataset was originally in JSON format, that was pulled using the get.requests() function and read using the json.r() function to pull our data into dictionary format in the IDE. With this data pulled using our private token to access the get API requests, we were able to pull 3 years of data and store them seperately in their own dict results. Using pandas, we were then able to convert the json/dict results format into pandas dataframes using the pd.DataFrame(dict_results) function. We then were able to concat the 3 years of Max temp data into one master dataframe using the concat(frames) function that stacked the data into one master stacked dataframe. Once we generated this master dataframe table, we then were able to store the final dataset of 3 years of Max temp. data from Lakewood, CO into a completed, finalized CSV file as a meaningful representration of our data. This csv file is attached, and will also generate a new CSV file everytime the program is run. We can suceessfully say we met all requirement to this assignment, as the data is all together in a meaningful representation with the appropriate dataset/types selected, 3 years of data, organized, gave a quick summary of findings, and also stored in CSV format as requested. With this data cleaned and pulled together, it should be setup for much more advanced analysis if needed :)