# Silver Dataset Generation using Wildfire Spatial Data
<font size=3>**Author:** Ashkan Soltanieh<br>
**Date:** Jan. 8, 2022</font>

## Table of Contents

<div class="alert alert-success mt-20">
    <ul>
        <li><a href="#Approach">Approach</a></li>
        <li><a href="#Area of Burn Data">Area of Burn Data</a></li>
        <li><a href="#Characteristics Data">Characteristics Data</a></li>
        <li><a href="#Metadata">Metadata</a></li>
    </ul>
</div>

## Approach:
Raw uncleaned (bronze) dataset will be further refined in this notebood, the wildfire data in two different csv files will be merged and aggregated so that each row will represent an individual fire accident.

The lateral and longitude data are rounded to the nearest quarter number to match the scraped weather data, so part of the characteristics data cleaning is rount lat and lng data.

## Area of Burn Data
The area of burn(AoB) data in form of Polygon. The AoB has been selected as the label data. Each Polygon object has an area property which will be used to obtain these data. Each row of AoB cannot be uniquely identified by any of the other columns. Therefore, we will be aggregating the area data by UID_Fire, Year, and REF_ID (Unique Identifier). Here are the unused variable which will be dropped from the dataset:<br>

**Dropped Columns:**<br>
> Area of Burn(AoB) dataset:
>> **FD_Agency:** Redundant data as all data in current dataset are from Canada<br>
>> **JD, date_src, Year:** Date related data are covered in characteristics dataset. Only Map_Date is kept.<br>

## Characteristics Data
Most of the variables in this dataset will be selected as features. In this notebook, the variables with missing or redundant information are dropped. Below is the list of the dropped variables and the reason behind making this decision:

**Dropped Columns:**<br>
>Characteristics dataset:
>> **FD_Agency:** Redundant data as all data in current dataset are from Canada<br>
>> **dn:** This variable is missing for observations before 2016. We dropped it for consistency purpose among all observation<br>
>> **HHMM:** Time vairable will not be used as index as fire data will be aggregated daily like weather data<br>
>> **sample:** Other identifiers are used instead of this variable<br>
>> **type:** It's redundant for Alberta and British Columnbia dataset as only type zero(presumed vegetation fire) exist in the table<br>
>> **geometry:** There is EPSG 4326 representation of the point in lat/lng columns.<br>

In [1]:
import os
import pandas as pd
from shapely.wkt import loads
import sys
from IPython.display import display

In [2]:
#bronze datasets path for wildfire
path_aob = os.path.abspath(os.path.join(os.getcwd(), '../data/processed/wildfire/bronze/bronze_AoB.csv'))
path_characteristics = os.path.abspath(os.path.join(os.getcwd(), '../data/processed/wildfire/bronze/bronze_chracteristics.csv'))

In [3]:
# create the silver dataframe using custom script
sys.path.insert(1, os.path.abspath(os.path.join(os.getcwd(),"..","src/data")))
from wildfire import make_silver_dataframes

df_aob, df_characteristics = make_silver_dataframes(path_aob, path_characteristics)

In [4]:
# save extracted cleaned dataframe into silver datasets
path_aob = os.path.abspath(
        os.path.join(os.getcwd(), "../data/processed/wildfire/silver/silver_AoB.csv"))
path_characteristics = os.path.abspath(
        os.path.join(os.getcwd(), "../data/processed/wildfire/silver/silver_chracteristics.csv"))

df_aob.to_csv(path_aob, index = False)
df_characteristics.to_csv(path_characteristics, index = False)

## Metadata

In [15]:
display(df_aob.head())
display(df_aob.shape)
display(df_characteristics.head())
display(df_characteristics.shape)

Unnamed: 0,UID_Fire,REF_ID,Map_Date,area
0,193,BC-2010-C10060,2010-06-01,0.019264
1,208,BC-2010-C10258,2010-07-31,0.248266
2,215,BC-2010-C10320,2010-08-02,1.755061
3,222,BC-2010-C20018,2010-07-06,1.979787
4,245,BC-2010-C20293,2010-07-31,0.158772


(440523, 4)

Unnamed: 0,Date,sat,lat,lon,T21,T31,FRP,conf,UID_Fire,Status,REF_ID
0,2010-08-16,T,50.388,-127.29,307.3,293.0,9.3,71,547,primary,BC-2010-V90985
1,2010-07-21,A,49.199,-124.104,317.4,286.1,110.9,95,541,primary,BC-2010-V70506
2,2010-07-21,A,49.201,-124.111,305.8,285.8,32.3,66,541,primary,BC-2010-V70506
3,2010-08-14,T,48.492,-124.029,301.9,290.2,5.1,45,538,primary,BC-2010-V60945
4,2010-08-14,T,48.491,-124.043,357.2,294.9,84.1,100,538,primary,BC-2010-V60945


(164240, 11)