# Wild Fires caused by the weather
## Part 2a: Data Cleaning - Weather API

At this part, our data is clean and ready for the weather data to be added. <br>
This section will use the 'Visual Crossing API' to get the historical weather data.<br>
The columns we will add to our current data frame are:<br>
• Temperature <br>
• MaxTemperature <br>
• MinTemperature <br>
• WindSpeed <br>
• WindDirection <br>
• Humidity <br>
Afterward, we will perform another data cleaning to ensure that the whole data frame is complete.

#### Preceding Step - import modules (packages)
This step is necessary in order to use external packages. 

In [1]:
import pandas as pd
import numpy as np
import requests
import json

#### Global variables and constants
Here we define our global variables we will use in this notebook

In [11]:
BASE_URL = 'https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/weatherdata/history'
API_KEY = '' #The value removed because cost vulnerabilities
BASE_CSV = 'Wildfire_history_outliers_handling.csv'
FINAL_CSV = 'Wildfire_history_final.csv'
ATTRIBUTES = ['temp', 'maxt', 'mint', 'wspd', 'wdir', 'humidity']
NEW_COLS = ['Temperature', 'MaxTemperature', 'MinTemperature', 'WindSpeed', 'WindDirection', 'Humidity']

### Get the data from the weather API
In this section, we will loop over our data frame and we will use the weather API on each row to get the data. <br>

In [3]:
def getJsonReponse(date, latitude, longitude):
    locations = latitude+','+longitude
    params = {'startDateTime': date,
              'endDateTime': date,
              'locations':locations,
              'key':API_KEY, 
              'aggregateHours': '24',
              'unitGroup': 'uk', 
              'contentType':'json'}
    res = requests.get(BASE_URL, params=params)
    res_json = res.json()
    return res_json['locations'][locations]['values'][0]

In [4]:
def addWeatherColumns(df):
    new_df = df.copy()
    for col in NEW_COLS:
        new_df[col] = np.nan
    return new_df

In [5]:
def fillWeatherColumns(df, file_name):
    new_df = addWeatherColumns(df)
    chunks = 0 ## For backup
    file_num = 1
    backup_file_name = "backup/backup_num"
    
    for row in new_df.index:
        date = new_df['FireDiscoveryDateTime'][row]
        latitude = str(new_df['InitialLatitude'][row])
        longitude = str(new_df['InitialLongitude'][row])
        try:
            if chunks == 1000:
                    backup_path = backup_file_name+"_"+str(file_num)+".csv"
                    new_df.to_csv(backup_path, index=False)
                    chunks = 0
                    file_num += 1     
            res = getJsonReponse(date, latitude, longitude)
            for i in range(len(NEW_COLS)):
                new_df[NEW_COLS[i]][row] = res[ATTRIBUTES[i]]
            chunks+=1
        except:
            print("Failed to load row num {}".format(row))
            chunks += 1
            continue
    
    new_df.to_csv(file_name, index=False)
    return new_df

### Clean NaN data
In this section, we will remove the rows with a missing data

In [16]:
def handleMissingData(df, file_name):    
    df.dropna(inplace=True)
    df.to_csv(file_name, index=False)
    return df

### Handle Outliers
In this section, we will handle the outliers of the new columns we added

In [19]:
def handleWeatherOutliers(df, file_name):
    pd.options.mode.chained_assignment = None 
    for col in NEW_COLS:
        Q1 = np.percentile(df[col], 25)
        Q3 = np.percentile(df[col], 75)
        IQR = Q3 - Q1
        df[col][(df[col] < Q1 - 1.5*IQR) | (df[col] > Q3 + 1.5*IQR )] = np.nan
        
    df.dropna(inplace=True)
    df.to_csv(file_name, index=False)
    return df

## Main Program
In this section, we will the weather API process

In [None]:
df=pd.read_csv(BASE_CSV)
fillWeatherColumns(df, FINAL_CSV)

In [17]:
df = pd.read_csv(FINAL_CSV)
handleMissingData(df, FINAL_CSV)

Unnamed: 0,UniqueFireIdentifier,FireDiscoveryDateTime,FireOutDateTime,POOCounty,InitialLatitude,InitialLongitude,FireCause,FireDuration,CausedByWeather,Temperature,MaxTemperature,MinTemperature,WindSpeed,WindDirection,Humidity
0,2020-MTLG42-000224,2020-08-06T18:58:00,2020-08-12T14:00:00,Carter,45.78496,-104.49580,2,6,0,25.9,34.0,16.5,21.9,206.63,46.67
1,2017-MTNWS-000878,2017-10-17T20:20:24,2017-11-09T21:59:59,Flathead,48.07167,-114.83030,2,23,0,6.1,12.9,-3.3,13.9,119.55,64.20
2,2020-MSMNF-000308,2020-11-23T19:17:00,2020-11-30T14:29:59,Perry,31.06819,-89.06972,2,7,0,10.3,22.7,0.6,11.2,104.88,78.07
3,2019-UTUWF-000883,2019-10-26T21:29:00,2019-11-13T00:14:59,Utah,40.07631,-111.41820,4,18,0,2.2,6.3,-0.2,23.4,332.67,61.95
4,2020-MTCES-006641,2020-08-27T14:06:38,2020-08-27T20:52:59,Beaverhead,44.65363,-111.56360,1,0,1,11.5,21.8,0.7,7.0,125.22,75.11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
55958,2021-AZASF-000906,2021-07-20T00:26:06,2021-08-02T21:00:00,Greenlee,33.28745,-109.36730,1,13,1,30.4,36.7,25.0,16.1,115.71,50.50
55959,2021-COUMA-000801,2021-07-20T02:18:00,2021-07-20T16:47:59,Montezuma,37.19284,-108.61710,1,0,1,24.0,31.3,16.3,11.4,80.65,51.24
55960,2021-AZASF-000908,2021-07-20T02:57:41,2021-07-23T23:05:00,Coconino,34.36512,-110.94670,1,3,1,26.9,33.4,21.1,12.1,180.14,45.00
55961,2021-IDIPF-000526,2021-07-19T21:42:30,2021-07-21T00:59:59,Shoshone,47.22436,-115.60630,1,2,1,17.6,20.7,13.4,6.9,166.67,56.00


In [20]:
handleWeatherOutliers(df, FINAL_CSV)

Unnamed: 0,UniqueFireIdentifier,FireDiscoveryDateTime,FireOutDateTime,POOCounty,InitialLatitude,InitialLongitude,FireCause,FireDuration,CausedByWeather,Temperature,MaxTemperature,MinTemperature,WindSpeed,WindDirection,Humidity
0,2020-MTLG42-000224,2020-08-06T18:58:00,2020-08-12T14:00:00,Carter,45.78496,-104.49580,2,6,0,25.9,34.0,16.5,21.9,206.63,46.67
1,2017-MTNWS-000878,2017-10-17T20:20:24,2017-11-09T21:59:59,Flathead,48.07167,-114.83030,2,23,0,6.1,12.9,-3.3,13.9,119.55,64.20
2,2020-MSMNF-000308,2020-11-23T19:17:00,2020-11-30T14:29:59,Perry,31.06819,-89.06972,2,7,0,10.3,22.7,0.6,11.2,104.88,78.07
4,2020-MTCES-006641,2020-08-27T14:06:38,2020-08-27T20:52:59,Beaverhead,44.65363,-111.56360,1,0,1,11.5,21.8,0.7,7.0,125.22,75.11
5,2015-CAYNP-000028,2015-06-09T03:36:00,2015-06-09T16:57:00,Mariposa,37.63880,-119.69320,2,0,0,17.0,23.2,11.8,14.3,195.67,47.75
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
55958,2021-AZASF-000906,2021-07-20T00:26:06,2021-08-02T21:00:00,Greenlee,33.28745,-109.36730,1,13,1,30.4,36.7,25.0,16.1,115.71,50.50
55959,2021-COUMA-000801,2021-07-20T02:18:00,2021-07-20T16:47:59,Montezuma,37.19284,-108.61710,1,0,1,24.0,31.3,16.3,11.4,80.65,51.24
55960,2021-AZASF-000908,2021-07-20T02:57:41,2021-07-23T23:05:00,Coconino,34.36512,-110.94670,1,3,1,26.9,33.4,21.1,12.1,180.14,45.00
55961,2021-IDIPF-000526,2021-07-19T21:42:30,2021-07-21T00:59:59,Shoshone,47.22436,-115.60630,1,2,1,17.6,20.7,13.4,6.9,166.67,56.00
