# Silver Dataset Generation using Weather Data
<font size=3><strong>Author:</strong> Ashkan Soltanieh<br>
<strong>Date:</strong> Jan. 13, 2022</font>

## Table of Contents

<div class="alert alert-success mt-20">
    <ul>
        <li><a href="#Approach">Approach</a></li>
        <li><a href="#Temperature Standardization">Temperature Standardization</a></li>
        <li><a href="#Metadata">Metadata</a></li>
    </ul>
</div>

## Approach:
Bronze dataset has further refined in this notebood. The weather data has been stored as csv dataset and it contains over 121 million rows and 10 columns of hourly data which are already filtered for the location and date of fire recordings. 

Since the data are recorded hourly, and our processing resource is limited, I decided to aggregate hourly data for each day by mean and standard deviation.

Here is the maximum daily standard deviation for the weather elements, generated from bronze dataset:

| Variable  |  Max_Daily_Std   |
| --- | --- |
|t2m  |     10.886982|
|wind_speed | 7.205235|
|cape |    3083.404515|
|d2m  |      11.848188|
|tp   |       0.005628|
|tcc  |       0.509919|
|cvh  |       0.000000|
|cvl  |       0.000000|
|swvl1|       0.151427|

It shows that most of the variables has a very high standard deviation (cosidering their overall range) while high, and low vegitation cover, i.e. **cvh, cvl** have the standard deviation of zero. For these two variables, we will be considering only daily mean aggregation, and for the rest we will be using daily mean and standard deviation aggregation.

## Temperature Standardization
Temperature and dew point temperature are in Kelvin unit. For easier understanding, the value of the data are replaced by Celsius equivalent. It is added into the python script at <code>src/data/weather.py</code> module 

In [1]:
import os
import pandas as pd
import sys
from IPython.display import display

In [2]:
path = os.path.abspath(os.path.join(os.getcwd(), '../data/processed/weather/bronze/bronze_weather-synced-with-fires.csv'))

In [3]:
sys.path.insert(1, os.path.abspath(os.path.join(os.getcwd(),"..","src/data")))
from weather import make_silver_dataframe

dfw = make_silver_dataframe(path)

In [5]:
path_silver = os.path.abspath(
        os.path.join(os.getcwd(), "../data/processed/weather/silver/silver_weather-daily-mean-std.csv"))

dfw.to_csv(path_silver, index = True)

## Metadata

In [4]:
display(dfw.head())
display(dfw.shape)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,t2m_mean,t2m_std,cape_mean,cape_std,d2m_mean,d2m_std,tp_mean,tp_std,tcc_mean,tcc_std,cvh_mean,cvl_mean,swvl1_mean,swvl1_std,wind_speed_mean,wind_speed_std
latitude,longitude,date,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
48.5,-131.5,2010-01-12,9.042482,0.168771,11.88623,15.840455,8.050577,0.304868,0.000275,0.000331,0.992729,0.013689,0.0,0.0,2.190471e-06,8e-06,7.198469,1.179787
48.5,-131.5,2010-01-13,8.115329,0.278717,1.458984,1.908876,5.957505,0.844338,1e-05,1.6e-05,0.529885,0.371077,0.0,0.0,1.199544e-06,6e-06,4.140918,1.112024
48.5,-131.5,2010-01-18,7.956377,0.460117,10.562988,14.215463,5.145894,1.463816,0.000494,0.000792,0.905539,0.239645,0.0,0.0,4.667789e-06,6e-06,8.20651,3.104146
48.5,-131.5,2010-01-22,7.634823,0.581308,9.37557,10.393382,5.235029,0.741708,0.000455,0.000298,0.805192,0.250606,0.0,0.0,-1.277774e-06,6e-06,9.242948,4.102592
48.5,-131.5,2010-02-08,7.284132,0.453648,5.213623,7.223079,3.8243,1.375603,5.4e-05,0.000127,0.466078,0.303536,0.0,0.0,-2.868472e-07,8e-06,5.110262,1.573757


(5054004, 16)