# Collecting weather data from an API

## About the data

In this notebook, we will be collecting daily weather data from the National Centers for Environmental Information (NCEI) API. We will use the Global Historical Climatology
Network - Daily (GHCND) data set; see the documentation here.

Note: The NCEI is part of the National Oceanic and Atmospheric Administration (NOAA) and, as you can see from the URL for the API, this resource was created when the
NCEI was called the NCDC. Should the URL for this resource change in the future, you can search for the NCEI weather API to find the updated one

## **Using the NCEI API**

Paste your token below.

In [None]:
import requests

def make_request(endpoint, payload= None):
  """
  This function allows us to connect to the API using the python built in
  request module.

  *endpoints - are the locations/parts of the api you want to go
               to or continue to.
  *header - tokens are put in here
  *payload/param - are the Pagination of the certain part of the API
  """
  return requests.get(
      f'https://www.ncei.noaa.gov/cdo-web/api/v2/{endpoint}',
      headers = {
          'token':'anPgrgSWrWgUYFXULhgBXbwIMcbuJWps'
          # Token Given by the ncei.noaa.gov/
      },
      params=payload)

# Collect All Data Points for 2018 In NYC (Various Stations)

We can make a loop to query for all the data points one day at a time. Here we create a list of all the results

In [None]:
import datetime

from IPython import display # constant displaying statuses: less space in terminal

current = datetime.date(2018, 1, 1) # Starting date, variable for iteration
end = datetime.date(2019, 1, 1) # End of the date

# array for the data gathered in the whole 2018
results = []

while current < end: # loops until current >= end
  display.clear_output(wait=True)
  display .display(f'Gathering Data for {str(current)}')

  response = make_request(
      'data',
      {
          'datasetid' : 'GHCND',
          'locationid': 'CITY:US360019', # NYC City ID
          'startdate': current, # Gets all the data in a particular
          'enddate': current,   # day only
          'units':'metric',
          'limit':1000
      }
  )

  if response.ok: # validate if the response is ok
  # appends all the gathered items to existing results array
    results.extend(response.json()['results'])

  # timedelta() calculates the difference of the date to the param you input
  # params = days=0, seconds=0, microseconds=0,
  # milliseconds=0, minutes=0, hours=0, weeks=0
  current += datetime.timedelta(days=1)


Now, we can create a dataframe with all this data. Notice there are multiple stations with values for each datatype on a given day. We don't know what the stations are, but we
can look them up and add them to the data

In [None]:
import pandas as pd

df = pd.DataFrame(results) # array to Data Frame
df.head()

Save this data to a file:

In [None]:
# saving the dataframe to csv excluding the indexes
df.to_csv("/content/nyc_weather_2018.csv", index=False)

And write it to the database

In [None]:
import sqlite3

"""
practical ways to set your sqlite db.
"""
with sqlite3.connect('/content/weather.db') as connection:
  df.to_sql(
    'weather', connection, index=False, if_exists='replace'
  ) # dataframe to sql!

For learning about merging dataframes, we will also get the data mapping station IDs to information about the station:

In [None]:
response = make_request(
    'stations',
    {
        'datasetid': 'GHCND',
        'locationid': 'CITY:US360019',
        'limit':1000
    }
)

stations = pd.DataFrame(response.json()['results'])[['id', 'name','latitude','longitude', 'elevation']]

stations.to_csv('/content/weather_stations.db', index=False)

with sqlite3.connect('/content/weather.db') as connection:
  stations.to_sql(
      'stations', connection, index=False, if_exists='replace'
  )

# END