__W205, Fall 2016__   
__Final Project:__ Solar Fields and Weather   
__Group:__ Boris Kletser, Maya Miller-Vedam, Geoff Striling, Laura Williams   

# Initial Data Exploration

__OVERVIEW:__ This file contains example API calls to www.eia.gov and www.noaa.org and a brief exploration of the data available, its schema and contents.

In [1]:
# imports
import os
import requests
import numpy as np
import pandas as pd

In [None]:
# global vars -- you will need the following keys in order to run the code below
EIA_API_KEY = ''
NOAA_CDO_TOKEN = ''

### I. Laura's Demo Code

Getting EIA data via API Call:  


__NOTE:__  
* SEGS I -  no power generated in 2016  
* SEGS II - no power generated since 2014  
* SEGS III link: http://www.eia.gov/opendata/qb.cfm?category=4246&sdid=ELEC.PLANT.GEN.10439-SUN-ALL.M  

In [2]:
# Net Generartion for specific plants
url = 'http://api.eia.gov/series/?api_key=' + EIA_API_KEY + 
          '&series_id=ELEC.PLANT.GEN.10439-SUN-ALL.M'
segs3 = requests.get(url)

In [3]:
# Can check if data downloaded properly by checking the status_code
segs3.status_code

200

In [6]:
# Can turn the data into a json style dictionary
segs3_dict = segs3.json()

In [10]:
# create Dataframe
segs3_df = pd.io.json.json_normalize(segs3_dict['series'])
segs3_df

Unnamed: 0,copyright,data,description,end,f,geography,iso3166,lat,latlon,lon,name,series_id,source,start,units,updated
0,,"[[201608, 6272], [201607, 6351], [201606, 6609...",All solar powered electricity generation (incl...,201608,M,USA-CA,USA-CA,35.00694,"35.00694,-117.555768",-117.555768,Net generation : SEGS III (10439) : solar : al...,ELEC.PLANT.GEN.10439-SUN-ALL.M,"EIA, U.S. Energy Information Administration",200101,megawatthours,2016-10-25T13:28:18-0400


In [12]:
# A closer look at the data: 188 tuples
data_df = pd.DataFrame(segs3_df['data'][0], columns = ["date", 'megawatts'])
print data_df.describe()
data_df.tail()

         megawatts
count   188.000000
mean   4842.345745
std    2910.088007
min       5.000000
25%    2024.750000
50%    5405.000000
75%    7092.750000
max    9759.000000


Unnamed: 0,date,megawatts
183,200105,8523
184,200104,6181
185,200103,4888
186,200102,1805
187,200101,774


### II. Weather Data From NOAA's Online Data Center

First, just to get my feet wet w/ the API, here are a few queries.

In [14]:
# Defining a Token Authorization Class (needed for noaa API calls)
from requests.auth import AuthBase

class TokenAuth(AuthBase):
    """Attaches Token Authentication to the Request Object"""
    def __init__(self, token):
        self.token = token
        
    def __call__(self, r):
        # modify and return the request
        r.headers['Token'] = self.token
        return r

In [15]:
# Fetch all available datasets
noaa = requests.get('http://www.ncdc.noaa.gov/cdo-web/api/v2/datasets', auth=TokenAuth(NOAA_CDO_TOKEN))
print noaa.status_code, 

200


In [16]:
# convert them into a dataframe and take a look:
noaa_datasets_df = pd.io.json.json_normalize(noaa.json()['results'])
noaa_datasets_df.head()

Unnamed: 0,datacoverage,id,maxdate,mindate,name,uid
0,1.0,GHCND,2016-11-07,1763-01-01,Daily Summaries,gov.noaa.ncdc:C00861
1,1.0,GSOM,2016-10-01,1763-01-01,Global Summary of the Month,gov.noaa.ncdc:C00946
2,1.0,GSOY,2016-01-01,1763-01-01,Global Summary of the Year,gov.noaa.ncdc:C00947
3,0.95,NEXRAD2,2016-11-07,1991-06-05,Weather Radar (Level II),gov.noaa.ncdc:C00345
4,0.95,NEXRAD3,2016-11-05,1994-05-20,Weather Radar (Level III),gov.noaa.ncdc:C00708


In [17]:
# note that the requests API call above was really slow... 
# curls provides a command line alternative that could be piped to a file:
# for example:
! curl -H "token:bwptzltBRUPKGcIptOARfSHnMBmShaLh" "http://www.ncdc.noaa.gov/cdo-web/api/v2/datasets" > test.txt
# WARNING: running this cell will create a file names test.txt 
# that contains the query results in the current direcory

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1625    0  1625    0     0  13233      0 --:--:-- --:--:-- --:--:-- 13319



_For exploratory purposes I used_ [this noaa tool](http://www.ncdc.noaa.gov/cdo-web/datatools/findstation) _to manually locate a weather station close to the energy plant above._

__Weather Station:__
Bakersfield Airport, CA  
Lat/long: 35.4344, -119.0542  
Station ID: GHCND:USW00023155  

In [18]:
# Fetch data from the Bakersfield station
bakersfield = requests.get('http://www.ncdc.noaa.gov/cdo-web/api/v2/stations/GHCND:USW00023155', auth=TokenAuth(NOAA_CDO_TOKEN))
print bakersfield.status_code

200


In [19]:
# convert it into a dataframe and take a look:
bakersfield_df = pd.io.json.json_normalize(noaa.json()['results'])
bakersfield_df.head()

Unnamed: 0,datacoverage,id,maxdate,mindate,name,uid
0,1.0,GHCND,2016-11-07,1763-01-01,Daily Summaries,gov.noaa.ncdc:C00861
1,1.0,GSOM,2016-10-01,1763-01-01,Global Summary of the Month,gov.noaa.ncdc:C00946
2,1.0,GSOY,2016-01-01,1763-01-01,Global Summary of the Year,gov.noaa.ncdc:C00947
3,0.95,NEXRAD2,2016-11-07,1991-06-05,Weather Radar (Level II),gov.noaa.ncdc:C00345
4,0.95,NEXRAD3,2016-11-05,1994-05-20,Weather Radar (Level III),gov.noaa.ncdc:C00708


In [20]:
# fetching daily summaries for this station, May 1 2010
url = 'http://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&station=GHCND:USW00023155&startdate=2010-05-01&enddate=2010-05-01'
daily = requests.get(url, auth=TokenAuth(NOAA_CDO_TOKEN))
print daily.status_code

200


In [22]:
# convert it into a dataframe and take a look:
daily_df = pd.io.json.json_normalize(daily.json()['results'])
daily_df

Unnamed: 0,attributes,datatype,date,station,value
0,",,S,",PRCP,2010-05-01T00:00:00,GHCND:AE000041196,0
1,"H,,S,",TAVG,2010-05-01T00:00:00,GHCND:AE000041196,324
2,",,S,",TMAX,2010-05-01T00:00:00,GHCND:AE000041196,397
3,",,S,",TMIN,2010-05-01T00:00:00,GHCND:AE000041196,227
4,",,S,",PRCP,2010-05-01T00:00:00,GHCND:AEM00041194,0
5,"H,,S,",TAVG,2010-05-01T00:00:00,GHCND:AEM00041194,341
6,",,S,",TMAX,2010-05-01T00:00:00,GHCND:AEM00041194,387
7,",,S,",TMIN,2010-05-01T00:00:00,GHCND:AEM00041194,293
8,"H,,S,",TAVG,2010-05-01T00:00:00,GHCND:AEM00041217,327
9,",,S,",TMAX,2010-05-01T00:00:00,GHCND:AEM00041217,383
