# Bronze Dataset Generation using Weather Data
<font size=3><strong>Author:</strong> <a href="https://www.linkedin.com/in/~ashkan/" target="_blank">Ashkan Soltanieh</a><br>
<strong>Date:</strong>  Jan. 13, 2022</font>

## Table of Contents

<div class="alert alert-success mt-20">
    <ul>
        <li><a href="#Overview">Overview</a></li>
        <li><a href="#Approach">Approach</a></li>
        <li><a href="#Metadata">Metadata Review</a></li>
    </ul>
</div>

## Overview:
The weather data are downloaded from [Climate Data Store(CDS) API](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=form) for western Canada area from 2010-2019. Here is the path for request data script from the api: <code>src/data/get_weather_data.py</code>

The weather elements that have direct affect in making up a fire based on a study from Auborn University [Weather Elements that Affect Fire Behavior](https://www.auburn.edu/academic/forestry_wildlife/fire/weather_elements.htm) are:

* Temperature
* Wind
* Stability of Atmosphere
* Relative Humidity
* Percipitaion
* Cloud Developement and Fronts
* Drought

Based on these elements the following components are selected to download from the weather API:<br>
<code>['2m_temperature', '10m_v_component_of_wind', '10m_u_component_of_wind', 'convective_available_potential_energy', '2m_dewpoint_temperature', 'total_precipitation', 'total_cloud_cover', 'high_vegetation_cover', 'low_vegetation_cover', 'volumetric_soil_water_layer_1']</code>

## Approach
The dataset that is selected for this project is from [ERA5 hourly data on single levels from 1979 to present](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview). The downloaded data are constrained by geographical location and time for this study. 
For reducing memeory usage, and saving space, the weather data only filtered based on the **location and date of the fires** occured from wildfire dataset.

Wind speed is calculated from u and v components of the wind and stored as a variable in dataset. 


In [1]:
import os
import sys
import pandas as pd
sys.path.insert(1, os.path.abspath(os.path.join(os.getcwd(),"..","src/data")))
from weather import make_bronze_dataframe

In [2]:
elements = ['2m_temperature', '10m_v_component_of_wind', '10m_u_component_of_wind', 'convective_available_potential_energy', '2m_dewpoint_temperature', 'total_precipitation', 'total_cloud_cover', 'high_vegetation_cover', 'low_vegetation_cover', 'volumetric_soil_water_layer_1']
paths = []
for element in elements:
    path = os.path.abspath(os.path.join(os.getcwd(), '../data/raw/weather/data-' + element + '.nc'))
    paths.append(path)

dfw = make_bronze_dataframe(paths)

In [4]:
path_bronze = os.path.abspath(
        os.path.join(os.getcwd(), "../data/processed/weather/bronze/bronze_weather-synced-with-fires.csv"))

dfw.to_csv(path_bronze, index = True)

## Metadata Review

In [5]:
# Display filtered weather data
display(dfw.head())
display(dfw.tail())
display(dfw.shape)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,t2m,cape,d2m,tp,tcc,cvh,cvl,swvl1,wind_speed
latitude,longitude,time,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
60.0,-131.5,2010-01-12 00:00:00,250.46933,0.542969,247.229904,9.2e-05,1.0,0.809577,0.190423,0.239689,3.023788
60.0,-131.5,2010-01-12 01:00:00,250.247498,0.271484,246.957611,9.1e-05,1.0,0.809577,0.190423,0.239689,3.058896
60.0,-131.5,2010-01-12 02:00:00,250.044037,0.271484,246.778137,5.6e-05,1.0,0.809577,0.190423,0.239689,3.064803
60.0,-131.5,2010-01-12 03:00:00,249.805267,0.542969,246.54248,4.3e-05,0.999969,0.809577,0.190423,0.239701,3.135798
60.0,-131.5,2010-01-12 04:00:00,250.340744,2.985352,246.969833,3.1e-05,0.999969,0.809577,0.190423,0.239689,3.173725


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,t2m,cape,d2m,tp,tcc,cvh,cvl,swvl1,wind_speed
latitude,longitude,time,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
48.5,-110.0,2019-11-01 19:00:00,277.805969,0.0,268.718475,0.0,0.0,0.0,1.0,0.106368,5.316845
48.5,-110.0,2019-11-01 20:00:00,280.041168,0.0,266.26059,0.0,0.0,0.0,1.0,0.105464,5.240007
48.5,-110.0,2019-11-01 21:00:00,281.25766,0.0,264.493774,0.0,0.0,0.0,1.0,0.10456,5.093779
48.5,-110.0,2019-11-01 22:00:00,277.661865,0.0,266.165344,0.0,0.0,0.0,1.0,0.101882,4.861839
48.5,-110.0,2019-11-01 23:00:00,277.77771,0.0,267.121399,0.0,0.003174,0.0,1.0,0.101264,3.683271


(121296096, 9)

Description of selected elements retrieved from [ERA5 hourly data on single levels from 1979 to present](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview)
> 2m Temperature:
>> This parameter is the temperature of air at 2m above the surface of land, sea or inland waters. 2m temperature is calculated by interpolating between the lowest model level and the Earth's surface, taking account of the atmospheric conditions. This parameter has units of kelvin (K). Temperature measured in kelvin can be converted to degrees Celsius (°C) by subtracting 273.15.<br>

> 10m v-component of Wind: **(Used for calculating wind_speed variable and then dropped)**
>> This parameter is the northward component of the 10m wind. It is the horizontal speed of air moving towards the north, at a height of ten metres above the surface of the Earth, in metres per second. Care should be taken when comparing this parameter with observations, because wind observations vary on small space and time scales and are affected by the local terrain, vegetation and buildings that are represented only on average in the ECMWF Integrated Forecasting System (IFS). This parameter can be combined with the U component of 10m wind to give the speed and direction of the horizontal 10m wind.

> 10m u-component of Wind: **(Used for calculating wind_speed variable and then dropped)**
>> This parameter is the eastward component of the 10m wind. It is the horizontal speed of air moving towards the east, at a height of ten metres above the surface of the Earth, in metres per second. Care should be taken when comparing this parameter with observations, because wind observations vary on small space and time scales and are affected by the local terrain, vegetation and buildings that are represented only on average in the ECMWF Integrated Forecasting System (IFS). This parameter can be combined with the V component of 10m wind to give the speed and direction of the horizontal 10m wind.

> Convective Available Potential Energy:
>> This is an indication of the instability (or stability) of the atmosphere and can be used to assess the potential for the development of convection, which can lead to heavy rainfall, thunderstorms and other severe weather. In the ECMWF Integrated Forecasting System (IFS), CAPE is calculated by considering parcels of air departing at different model levels below the 350 hPa level. If a parcel of air is more buoyant (warmer and/or with more moisture) than its surrounding environment, it will continue to rise (cooling as it rises) until it reaches a point where it no longer has positive buoyancy. CAPE is the potential energy represented by the total excess buoyancy. The maximum CAPE produced by the different parcels is the value retained. Large positive values of CAPE indicate that an air parcel would be much warmer than its surrounding environment and therefore, very buoyant. CAPE is related to the maximum potential vertical velocity of air within an updraft; thus, higher values indicate greater potential for severe weather. Observed values in thunderstorm environments often may exceed 1000 joules per kilogram (J kg-1), and in extreme cases may exceed 5000 J kg-1. The calculation of this parameter assumes: (i) the parcel of air does not mix with surrounding air; (ii) ascent is pseudo-adiabatic (all condensed water falls out) and (iii) other simplifications related to the mixed-phase condensational heating.

> 2m Dewpoint Temperature:
>> This parameter is the temperature to which the air, at 2 metres above the surface of the Earth, would have to be cooled for saturation to occur. It is a measure of the humidity of the air. Combined with temperature and pressure, it can be used to calculate the relative humidity. 2m dew point temperature is calculated by interpolating between the lowest model level and the Earth's surface, taking account of the atmospheric conditions. This parameter has units of kelvin (K). Temperature measured in kelvin can be converted to degrees Celsius (°C) by subtracting 273.15.

> Total Precipitation:
>> This parameter is the accumulated liquid and frozen water, comprising rain and snow, that falls to the Earth's surface. It is the sum of large-scale precipitation and convective precipitation. Large-scale precipitation is generated by the cloud scheme in the ECMWF Integrated Forecasting System (IFS). The cloud scheme represents the formation and dissipation of clouds and large-scale precipitation due to changes in atmospheric quantities (such as pressure, temperature and moisture) predicted directly by the IFS at spatial scales of the grid box or larger. Convective precipitation is generated by the convection scheme in the IFS, which represents convection at spatial scales smaller than the grid box. This parameter does not include fog, dew or the precipitation that evaporates in the atmosphere before it lands at the surface of the Earth. This parameter is accumulated over a particular time period which depends on the data extracted. For the reanalysis, the accumulation period is over the 1 hour ending at the validity date and time. For the ensemble members, ensemble mean and ensemble spread, the accumulation period is over the 3 hours ending at the validity date and time. The units of this parameter are depth in metres of water equivalent. It is the depth the water would have if it were spread evenly over the grid box. Care should be taken when comparing model parameters with observations, because observations are often local to a particular point in space and time, rather than representing averages over a model grid box.

> Total Cloud Cover:
>> This parameter is the proportion of a grid box covered by cloud. Total cloud cover is a single level field calculated from the cloud occurring at different model levels through the atmosphere. Assumptions are made about the degree of overlap/randomness between clouds at different heights. Cloud fractions vary from 0 to 1.

> High Vegetation Cover:
>> This parameter is the fraction of the grid box that is covered with vegetation that is classified as "high". The values vary between 0 and 1 but do not vary in time. This is one of the parameters in the model that describes land surface vegetation. "High vegetation" consists of evergreen trees, deciduous trees, mixed forest/woodland, and interrupted forest.

> Low Vegetation Cover:
>> This parameter is the fraction of the grid box that is covered with vegetation that is classified as "low". The values vary between 0 and 1 but do not vary in time. This is one of the parameters in the model that describes land surface vegetation. "Low vegetation" consists of crops and mixed farming, irrigated crops, short grass, tall grass, tundra, semidesert, bogs and marshes, evergreen shrubs, deciduous shrubs, and water and land mixtures.

> Volumetric Soil Water Layer 1:
>> This parameter is the volume of water in soil layer 1 (0 - 7cm, the surface is at 0cm). The ECMWF Integrated Forecasting System (IFS) has a four-layer representation of soil: Layer 1: 0 - 7cm, Layer 2: 7 - 28cm, Layer 3: 28 - 100cm, Layer 4: 100 - 289cm. Soil water is defined over the whole globe, even over ocean. Regions with a water surface can be masked out by only considering grid points where the land-sea mask has a value greater than 0.5. The volumetric soil water is associated with the soil texture (or classification), soil depth, and the underlying groundwater level.

## <h3 align="center"> Copyright © 2022 - All rights reserved by the author.<h3/>