# Gathering Weather Data

#### Imports

In [1]:
# Basic imports
import pandas as pd
import numpy as np
import datetime
import urllib.request
import json
import time
import sys
from ipywidgets import IntProgress
from IPython.display import display

# Custom imports
import api 
import utils

#### Index

1. [Introduction](#intro)
2. [Loading Data](#ld)
3. [API Call](#api)
4. [Saving Data](#saving)

---
<a id='intro'></a>
## Introduction

This notebook is dedicated entirely to the collection of weather data given the sample that we have created. The way in which we will collect the weather data is through an API call to <a href='https://www.visualcrossing.com/'>*Visual Crossing*</a>, a weather api which provides detailed weather information given a specific longitude and latitude. The API call will collect the data available for the 7 days prior to the start of the wildfire, storing these values in a list so we may access them at a later stage, whilst simultaneously calculating the average of these values. The weather information that we will collect will be the following:

<table>
  <tr>
    <th style="text-align: left; background: lightgrey">Weather Metric</th>
    <th style="text-align: left; background: lightgrey">Description</th>
    <th style="text-align: left; background: lightgrey">Unit</th>
  </tr>
  <tr>
    <td style="text-align: left"> <code>tempmax</code> </td>
    <td style="text-align: left">Maximum Temperature</td>
    <td style="text-align: left">Degrees Celcius</td>
  </tr>
  <tr>
    <td style="text-align: left"><code>temp</code></td>
    <td style="text-align: left">Temperature (or mean temperature)</td>
    <td style="text-align: left">Degrees Celcius</td>
  </tr>
  <tr>
    <td style="text-align: left"><code>humidity</code></td>
    <td style="text-align: left">Relative Humidity</td>
    <td style="text-align: left">%</td>
  </tr>
  <tr>
    <td style="text-align: left"><code>precip</code></td>
    <td style="text-align: left">Precipitation</td>
    <td style="text-align: left">Millimetres</td>
  </tr>
  <tr>
    <td style="text-align: left"><code>dew</code></td>
    <td style="text-align: left">Dew Point</td>
    <td style="text-align: left">Degrees Celsius</td>
  </tr>
  <tr>
    <td style="text-align: left"><code>windspeed</code></td>
    <td style="text-align: left">Wind Speed</td>
    <td style="text-align: left">Kilometres Per Hour</td>
  </tr>
  <tr>
    <td style="text-align: left"><code>winddir</code></td>
    <td style="text-align: left">Wind Direction</td>
    <td style="text-align: left">Degrees</td>
  </tr>
  <tr>
    <td style="text-align: left"><code>pressure</code></td>
    <td style="text-align: left">Sea Level Pressue</td>
    <td style="text-align: left">Millibars (Hectopascals)</td>
  </tr>
</table>

Once we have collected the data, we will store it as a `.pkl` file so we may access it again in other sections of the project.

---

<a id='ld'></a>
## Loading Data

As we have created a sample of wildfires previously, we will load this data in and collect the weather data for the longitude and latitude values contained within. We can load the data from the `.pkl` file that we saved in previous notebooks.

In [3]:
# Load the data
df = pd.read_pickle('sample_data/sample.pkl')

In [4]:
# Check the output
df.head()

Unnamed: 0,index,DATE,FIRE_YEAR,DISCOVERY_DOY,FIRE_SIZE,FIRE_SIZE_CLASS,LATITUDE,LONGITUDE,STATE
0,46,1992-01-01,1992,1,0.1,A,43.325,-101.0185,SD
1,0,1992-01-01,1992,1,3.0,B,33.0634,-90.120813,MS
2,36,1992-01-01,1992,1,1.0,B,33.058333,-79.979167,SC
3,132,1992-01-02,1992,2,0.25,A,40.775,-74.85416,NJ
4,215,1992-01-03,1992,3,0.5,B,29.79,-82.37,FL


In [5]:
# Check the shape
df.shape

(30000, 9)

---
<a id='api'></a>
## API Call

The API call that we are making is going to collect the weather information for a given longitude, latitude, and for a range of dates. Given the date of a wildfire, we are going to collect the weather information for the week prior. We are going to save the weather data for the past 7 days as a list, for each individual metric, and calculate the average. All of the data that is being collected is being saved into a dictionary that, once all the rows in the sample DataFrame have been parsed, will be converted into a DataFrame and concatenated into a unified DataFrame which will now have the wildfire data, alongside the relevant weather data.

A few questions arise regarding the API call. The first may be why is the weather data for the week being stored in the first place; why are we not calculating important variables from this data directly? We are storing the information from the past 7 days so that we may access this in case it is required during later stages of our project. Although we have already thought of what variables might be interesting to calculate from the historical weather data (variance and the change in the metric from 1 week prior to the day of the wildfire), there may be other metrics which we can calculate but have failed to recognise so far. 

A second question pertains to why we are only collecting the weather information for the past 7 days. The answer to thiis question has multiple dimensions, but at its core 7 days seems to be the best balance for maximising speed and minimising the cost to run the API call.

Having discussed some of the reasoning behind the API call, we can implement it, using the following code:

In [6]:
# Get API Key
key = api.API_KEY

# BaseURL
BaseURL = 'https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline/'

# Data dictionary into which the weather data will be appended
data = {
    'tempmax': [],
    'avg_tempmax': [],
    'temp': [], 
    'avg_temp': [],
    'humidity': [],
    'avg_humidity': [],
    'precip': [], 
    'avg_precip':[],
    'dew': [], 
    'avg_dew' :[],
    'windspeed': [],
    'avg_windspeed': [],
    'winddir': [],
    'avg_winddir': [],
    'pressure': [],
    'avg_pressure': []
}

# Instantiate the bar
progress = IntProgress(min=0, max=df.shape[0]) 

# Display the bar
display(progress) 

# Iterate through the dataframe and calculate weather data
for index, row in df.iterrows():
    
    # Update progress bar
    progress.value += 1
    
    # Create the table variables
    end_date = row['DATE']
    start_date = end_date - pd.Timedelta(6, 'days')
    latitude = row['LATITUDE']
    longitude = row['LONGITUDE']
    
    # Create the API Query
    query = f'{latitude}%2C%20{longitude}/{start_date.date()}/{end_date.date()}?unitGroup=metric&include=days&key=XYAH73VX8WMHKJ3YE6NG62R6V&contentType=json'
    url = BaseURL + query
    
    try: 
        response = urllib.request.urlopen(url)
        # Parse the results as JSON
        string = response.read().decode('utf-8')
        jsonData = json.loads(string)
    except urllib.error.HTTPError  as e:
        ErrorInfo= e.read().decode() 
        print('Error code: ', e.code, ErrorInfo)
        sys.exit()

    # Create lists for the values of the last 7 days
    variables = {
        'tempmax': [],    
        'temp': [],
        'humidity': [],
        'precip': [],
        'dew': [],
        'windspeed': [],
        'winddir': [],
        'pressure': []
    }
    
    # Parse the json data
    for daily_data in jsonData['days']:
        # Iterate through the variables to get the data for the last 7 days
        for variable in variables:
            if daily_data[variable] is not None:
                variables[variable].append(daily_data[variable]) 
            else:
                variables[variable].append(np.nan)
    
    # Append all the variables to the data dictionary
    for key in data:
        if 'avg_' in key:
            metric = key[4:]
            data[key].append(utils.mean(variables[metric]))
        else:
            data[key].append(variables[key])

IntProgress(value=0, max=30000)

---
<a id='saving'></a>
## Saving Data

Now that we have loaded the data into a dictionary, we can append this to our `df` and save this as a `.pkl`

In [7]:
# Convert dictionary into DataFrame
weather_info = pd.DataFrame(data)

In [8]:
weather_info.head()

Unnamed: 0,tempmax,avg_tempmax,temp,avg_temp,humidity,avg_humidity,precip,avg_precip,dew,avg_dew,windspeed,avg_windspeed,winddir,avg_winddir,pressure,avg_pressure
0,"[6.7, 6.7, 1.7, 7.2, 8.4, 1.7, 4.4]",5.257143,"[-3.0, -4.9, -0.3, -1.4, -3.9, -5.3, -1.7]",-2.928571,"[77.0, 78.1, 85.9, 79.3, 74.1, 85.1, 88.5]",81.142857,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]",0.0,"[-6.9, -8.7, -2.4, -5.0, -8.6, -7.5, -3.4]",-6.071429,"[13.0, 22.3, 31.7, 20.5, 11.2, 18.7, 11.2]",18.371429,"[295.9, 179.8, 186.4, 218.7, 263.2, 173.2, 247.3]",223.5,"[1026.9, 1029.2, 1017.7, 1012.2, 1019.1, 1024....",1021.671429
1,"[16.7, 13.4, 10.7, 11.7, 12.2, 10.7, 15.7]",13.014286,"[7.6, 11.0, 9.4, 9.1, 8.2, 6.2, 8.0]",8.5,"[64.5, 73.7, 91.7, 63.9, 71.7, 76.1, 79.1]",74.385714,"[nan, nan, nan, nan, nan, nan, nan]",,"[0.2, 6.4, 7.9, 2.3, 3.2, 2.2, 4.2]",3.771429,"[14.8, 18.4, 14.8, 14.8, 18.4, 22.3, 11.2]",16.385714,"[25.9, 46.9, 303.3, 10.4, 29.3, 36.9, 70.2]",74.7,"[1023.4, 1023.0, 1019.5, 1020.8, 1024.5, 1027....",1023.442857
2,"[10.8, 11.2, 16.0, 15.9, 13.9, 12.6, 15.5]",13.7,"[9.6, 10.3, 10.8, 11.0, 8.9, 8.5, 10.1]",9.885714,"[74.6, 94.7, 95.8, 89.3, 74.4, 69.3, 71.9]",81.428571,"[4.4, 2.9, 22.0, 0.0, 0.0, 0.0, 0.0]",4.185714,"[5.1, 9.5, 10.2, 9.3, 4.3, 3.1, 5.2]",6.671429,"[22.4, 24.7, 21.9, 23.9, 19.6, 27.7, 24.4]",23.514286,"[43.5, 23.8, 35.0, 281.4, 335.4, 37.3, 28.6]",112.142857,"[1025.3, 1025.6, 1021.5, 1014.8, 1020.6, 1028....",1023.357143
3,"[7.8, 8.0, 7.5, 6.4, 2.3, 5.0, 7.7]",6.385714,"[2.3, 1.4, 5.1, 2.7, -2.7, -1.6, 3.9]",1.585714,"[64.4, 65.8, 82.2, 67.6, 49.4, 65.1, 75.8]",67.185714,"[0.0, 0.0, 8.93, 0.83, 0.0, 0.0, 0.0]",1.394286,"[-4.0, -4.8, 2.2, -3.0, -12.2, -7.9, 0.0]",-4.242857,"[25.0, 12.2, 15.5, 29.2, 14.7, 16.1, 14.4]",18.157143,"[316.5, 260.2, 4.8, 2.4, 28.0, 259.6, 61.9]",133.342857,"[1031.2, 1029.6, 1010.7, 1018.4, 1036.6, 1035....",1027.4
4,"[18.4, 18.9, 9.5, 15.1, 13.9, 22.2, 17.2]",16.457143,"[16.0, 15.8, 7.1, 9.5, 11.9, 16.9, 14.0]",13.028571,"[98.3, 86.8, 91.7, 91.0, 92.8, 87.4, 82.4]",90.057143,"[0.0, 0.0, 0.0, nan, nan, nan, nan]",0.0,"[15.8, 13.5, 5.7, 8.0, 10.8, 14.5, 10.7]",11.285714,"[20.1, 22.2, 18.4, 29.5, 22.3, 27.7, 27.7]",23.985714,"[71.4, 295.1, 330.1, 29.3, 39.6, 57.4, 300.6]",160.5,"[1018.6, 1016.6, 1021.1, 1023.5, 1022.3, 1016....",1018.357143


In [9]:
weather_info.shape

(30000, 16)

In [10]:
# Concatenate
df_with_weather = pd.concat([df, weather_info], axis=1)

In [11]:
# Check that correctly concatenated
df_with_weather.head()

Unnamed: 0,index,DATE,FIRE_YEAR,DISCOVERY_DOY,FIRE_SIZE,FIRE_SIZE_CLASS,LATITUDE,LONGITUDE,STATE,tempmax,...,precip,avg_precip,dew,avg_dew,windspeed,avg_windspeed,winddir,avg_winddir,pressure,avg_pressure
0,46,1992-01-01,1992,1,0.1,A,43.325,-101.0185,SD,"[6.7, 6.7, 1.7, 7.2, 8.4, 1.7, 4.4]",...,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]",0.0,"[-6.9, -8.7, -2.4, -5.0, -8.6, -7.5, -3.4]",-6.071429,"[13.0, 22.3, 31.7, 20.5, 11.2, 18.7, 11.2]",18.371429,"[295.9, 179.8, 186.4, 218.7, 263.2, 173.2, 247.3]",223.5,"[1026.9, 1029.2, 1017.7, 1012.2, 1019.1, 1024....",1021.671429
1,0,1992-01-01,1992,1,3.0,B,33.0634,-90.120813,MS,"[16.7, 13.4, 10.7, 11.7, 12.2, 10.7, 15.7]",...,"[nan, nan, nan, nan, nan, nan, nan]",,"[0.2, 6.4, 7.9, 2.3, 3.2, 2.2, 4.2]",3.771429,"[14.8, 18.4, 14.8, 14.8, 18.4, 22.3, 11.2]",16.385714,"[25.9, 46.9, 303.3, 10.4, 29.3, 36.9, 70.2]",74.7,"[1023.4, 1023.0, 1019.5, 1020.8, 1024.5, 1027....",1023.442857
2,36,1992-01-01,1992,1,1.0,B,33.058333,-79.979167,SC,"[10.8, 11.2, 16.0, 15.9, 13.9, 12.6, 15.5]",...,"[4.4, 2.9, 22.0, 0.0, 0.0, 0.0, 0.0]",4.185714,"[5.1, 9.5, 10.2, 9.3, 4.3, 3.1, 5.2]",6.671429,"[22.4, 24.7, 21.9, 23.9, 19.6, 27.7, 24.4]",23.514286,"[43.5, 23.8, 35.0, 281.4, 335.4, 37.3, 28.6]",112.142857,"[1025.3, 1025.6, 1021.5, 1014.8, 1020.6, 1028....",1023.357143
3,132,1992-01-02,1992,2,0.25,A,40.775,-74.85416,NJ,"[7.8, 8.0, 7.5, 6.4, 2.3, 5.0, 7.7]",...,"[0.0, 0.0, 8.93, 0.83, 0.0, 0.0, 0.0]",1.394286,"[-4.0, -4.8, 2.2, -3.0, -12.2, -7.9, 0.0]",-4.242857,"[25.0, 12.2, 15.5, 29.2, 14.7, 16.1, 14.4]",18.157143,"[316.5, 260.2, 4.8, 2.4, 28.0, 259.6, 61.9]",133.342857,"[1031.2, 1029.6, 1010.7, 1018.4, 1036.6, 1035....",1027.4
4,215,1992-01-03,1992,3,0.5,B,29.79,-82.37,FL,"[18.4, 18.9, 9.5, 15.1, 13.9, 22.2, 17.2]",...,"[0.0, 0.0, 0.0, nan, nan, nan, nan]",0.0,"[15.8, 13.5, 5.7, 8.0, 10.8, 14.5, 10.7]",11.285714,"[20.1, 22.2, 18.4, 29.5, 22.3, 27.7, 27.7]",23.985714,"[71.4, 295.1, 330.1, 29.3, 39.6, 57.4, 300.6]",160.5,"[1018.6, 1016.6, 1021.1, 1023.5, 1022.3, 1016....",1018.357143


In [12]:
df_with_weather.shape

(30000, 25)

In [13]:
# Save the DataFrame as a .pkl
df_with_weather.to_pickle('sample_data/30k_samples_with_weather.pkl')