#### The Objective

The ultimate aim of of the challenge is to **predict the area of wildfires in 7 regions in Australia for February 2021** with historical data, before they have happened! 

There are three submissions:
- 1) Predict wildfires in February 2020.
- 2) Predict wildifres in 3rd and 4th week of January 2021.
- 3) Predict wildfires in February 20201.

#### 1.4 Historical Weather Forecasts

This file contains the same variables as the weather_data, but these predicted forecasts and not observations. There is an extra column `Lead time` that gives the number of days the forecast is valid for. The below example explains this:

#### Steps:
[1. Load Packages](#LoadPackages) 

[2. Descriptive Stats](#DescriptiveStats) 

[3. Evaluating for Missing Values(no missing values)](#MissingValues) 

[4. Checking for Duplicates (no duplicates)](#Duplicates) 

[5. Rearranging Table via Pivot](#PivotTable) 

[6. Evaluate Re-Arranced Parameter Columns for Missing and Duplicates](#RearrangedTable) 

[7. Weather Forecast Data Review](#DataReview) 

[8. Save out Pre-Processed "C&P_Weather" CSV File](#PreprocessedWeather) 

#### Load packages

In [1]:
# Import the necessary packages for analysis and visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt, mpld3
%matplotlib inline
import json
import datetime

from shapely.geometry import Polygon, mapping
import geopandas as gpd
import folium
from folium.plugins import TimeSliderChoropleth
import seaborn as sns
import plotly.express as px

sns.set_style("whitegrid")

import warnings
warnings.filterwarnings("ignore")

In [2]:
hwforecasts = "P:\Wildfires_Australia\cfc_wildfireforecastforAustralia\COPY\H_WeatherForecasts.csv"
print("Reading file: '{}'".format(hwforecasts))
hwforecasts_df = pd.read_csv(hwforecasts, parse_dates=[1])

print("Loaded...")

hwforecasts_df.head()

Reading file: 'P:\Wildfires_Australia\cfc_wildfireforecastforAustralia\COPY\H_WeatherForecasts.csv'
Loaded...


Unnamed: 0,Date,Region,Parameter,Lead time,count()[unit: km^2],min(),max(),mean(),variance()
0,2014-01-01,NSW,RelativeHumidity,5,803768.2,7.482927,85.021118,28.223569,353.620815
1,2014-01-01,NSW,SolarRadiation,5,803768.2,24.865765,33.557598,31.647308,2.276068
2,2014-01-01,NSW,Temperature,5,803768.2,21.243755,36.929035,30.893523,17.918553
3,2014-01-01,NSW,WindSpeed,5,803768.2,1.593531,6.989559,3.958822,1.334834
4,2014-01-01,NT,RelativeHumidity,5,1349817.0,14.796251,73.601479,39.799856,189.805002


#### Notes:

* Renaming the columns to make more sense.
* Data type has been changed to match across all other datasets.
* No duplicates or drops.

In [3]:
# Rename columns names
hwforecasts_df.columns = ['Date', 'Region', 'Parameter', 'Lead time', 'area', 'min_forcast', 'max_forcast', 'mean_forcast', '2nd_moment_forcast']
hwforecasts_df.head()

Unnamed: 0,Date,Region,Parameter,Lead time,area,min_forcast,max_forcast,mean_forcast,2nd_moment_forcast
0,2014-01-01,NSW,RelativeHumidity,5,803768.2,7.482927,85.021118,28.223569,353.620815
1,2014-01-01,NSW,SolarRadiation,5,803768.2,24.865765,33.557598,31.647308,2.276068
2,2014-01-01,NSW,Temperature,5,803768.2,21.243755,36.929035,30.893523,17.918553
3,2014-01-01,NSW,WindSpeed,5,803768.2,1.593531,6.989559,3.958822,1.334834
4,2014-01-01,NT,RelativeHumidity,5,1349817.0,14.796251,73.601479,39.799856,189.805002


#### Descriptive Stats <a class="anchor" id="DescriptiveStats"></a>

In [4]:
hwforecasts_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 217212 entries, 0 to 217211
Data columns (total 9 columns):
 #   Column              Non-Null Count   Dtype  
---  ------              --------------   -----  
 0   Date                217212 non-null  object 
 1   Region              217212 non-null  object 
 2   Parameter           217212 non-null  object 
 3   Lead time           217212 non-null  int64  
 4   area                217212 non-null  float64
 5   min_forcast         217212 non-null  float64
 6   max_forcast         217212 non-null  float64
 7   mean_forcast        217212 non-null  float64
 8   2nd_moment_forcast  217212 non-null  float64
dtypes: float64(5), int64(1), object(3)
memory usage: 14.9+ MB


In [5]:
hwforecasts_df.shape

(217212, 9)

In [6]:
#Changing date for consistency across all datasets
hwforecasts_df['Date'] = pd.to_datetime(hwforecasts_df['Date'])
hwforecasts_df['Date'].dtype.name

'datetime64[ns]'

#### Evaluating for Missing Values <a class="anchor" id="MissingValues"></a>

In [7]:
hwforecasts_df.isna().sum()

Date                  0
Region                0
Parameter             0
Lead time             0
area                  0
min_forcast           0
max_forcast           0
mean_forcast          0
2nd_moment_forcast    0
dtype: int64

#### Checking for Duplicates <a class="anchor" id="Duplicates"></a>

In [8]:
hwforecasts_df.duplicated().sum()

0

In [9]:
hwforecasts_df.dtypes

Date                  datetime64[ns]
Region                        object
Parameter                     object
Lead time                      int64
area                         float64
min_forcast                  float64
max_forcast                  float64
mean_forcast                 float64
2nd_moment_forcast           float64
dtype: object

In [10]:
hwforecasts_df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Lead time,217212.0,9.660976,4.135758,5.0,5.0,10.0,15.0,15.0
area,217212.0,1101102.0,799723.923013,65671.416905,230045.695238,979710.252773,1736319.0,2542548.0
min_forcast,217212.0,10.97838,14.678897,-5.01816,0.805676,5.66342,15.82764,254.0969
max_forcast,217212.0,33.13941,55.526747,0.0,9.633714,23.250812,37.972,10001.39
mean_forcast,217212.0,20.20531,25.018023,0.0,3.962613,15.144843,29.22472,5079.335
2nd_moment_forcast,217212.0,367.2121,77323.513214,0.0,1.842921,6.909953,29.77374,23730570.0


In [11]:
print("Number of records: {}".format(len(hwforecasts_df)))
print("Number of regions: {}\n".format(len(hwforecasts_df['Region'].unique())))
print(hwforecasts_df['Region'].unique())
print(hwforecasts_df['Parameter'].unique())

Number of records: 217212
Number of regions: 7

['NSW' 'NT' 'QL' 'SA' 'TA' 'VI' 'WA']
['RelativeHumidity' 'SolarRadiation' 'Temperature' 'WindSpeed'
 'Precipitation']


#### Re-arranging Table via Pivot Function <a class="anchor" id="PivotTable"></a>

In [12]:
#rearranging Paramater values in the weather data
df_pivot = hwforecasts_df.pivot_table(values=[ 'min_forcast', 'max_forcast', 'mean_forcast', '2nd_moment_forcast'], index=['Date','Region', 'area'], columns=['Parameter'])
df_pivot

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,2nd_moment_forcast,2nd_moment_forcast,2nd_moment_forcast,2nd_moment_forcast,2nd_moment_forcast,max_forcast,max_forcast,max_forcast,max_forcast,max_forcast,mean_forcast,mean_forcast,mean_forcast,mean_forcast,mean_forcast,min_forcast,min_forcast,min_forcast,min_forcast,min_forcast
Unnamed: 0_level_1,Unnamed: 1_level_1,Parameter,Precipitation,RelativeHumidity,SolarRadiation,Temperature,WindSpeed,Precipitation,RelativeHumidity,SolarRadiation,Temperature,WindSpeed,Precipitation,RelativeHumidity,SolarRadiation,Temperature,WindSpeed,Precipitation,RelativeHumidity,SolarRadiation,Temperature,WindSpeed
Date,Region,area,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
2014-01-01,NSW,8.037682e+05,,353.620815,2.276068,17.918553,1.334834,,85.021118,33.557598,36.929035,6.989559,,28.223569,31.647308,30.893523,3.958822,,7.482927,24.865765,21.243755,1.593531
2014-01-01,NT,1.349817e+06,,189.805002,6.573279,6.605778,4.871655,,73.601479,32.766205,39.907539,12.156700,,39.799856,28.016282,33.379110,4.740253,,14.796251,18.851019,25.955570,1.192912
2014-01-01,QL,1.736319e+06,,470.624907,7.079362,22.150075,1.352935,,83.466888,33.518051,40.503181,7.026765,,40.504877,30.337689,32.331884,4.178836,,7.028183,17.929157,22.155766,1.559428
2014-01-01,SA,9.797103e+05,,275.718715,30.069684,15.097683,10.752086,,75.290993,33.439438,38.834274,15.038714,,26.038432,27.126219,34.074006,8.630796,,6.606842,11.718054,20.889954,2.831450
2014-01-01,TA,6.567142e+04,,12.570180,9.588782,2.292068,1.000159,,92.093201,22.283730,20.020470,8.109127,,85.672655,17.549968,14.569532,4.831787,,68.435989,12.340322,11.401472,3.079223
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-10-31,QL,1.736319e+06,14.986532,380.939464,19.213265,11.243581,1.586587,29.090143,90.172732,38.616156,32.907190,7.578119,1.118654,41.996622,33.750577,26.856760,4.035015,0.000000,13.588029,18.501085,16.184907,0.679350
2020-10-31,SA,9.797103e+05,0.592746,193.923947,29.354453,10.887814,1.288522,2.898387,83.288211,38.391802,28.583809,8.654681,0.180554,38.729378,33.814225,22.343187,4.209881,0.000000,19.076419,19.093437,13.297380,1.290624
2020-10-31,TA,6.567142e+04,0.640148,8.139720,16.065145,1.197304,0.950032,2.406794,83.861537,34.327941,13.944504,7.087545,0.478931,77.457429,29.091940,11.475015,2.323866,0.000000,71.230115,20.606527,9.069527,0.789637
2020-10-31,VI,2.300457e+05,22.698446,128.169341,30.996366,7.111611,1.155491,26.487278,94.456088,35.922787,21.131717,5.890976,1.902180,73.666598,28.344835,14.950023,2.982499,0.003576,46.044953,14.276971,9.318666,0.719432


In [13]:
hwforecasts_df.loc[(hwforecasts_df['Date'] == '2017-10-06') & (hwforecasts_df['Lead time'] == 15) &
                ((hwforecasts_df['Parameter'] == 'Precipitation')), :]

Unnamed: 0,Date,Region,Parameter,Lead time,area,min_forcast,max_forcast,mean_forcast,2nd_moment_forcast
100641,2017-10-06,NSW,Precipitation,15,772526.1,250.288422,8210.618164,316.54793,290270.2
100656,2017-10-06,NT,Precipitation,15,915251.0,252.940247,10001.387695,1671.411758,10770770.0
100671,2017-10-06,QL,Precipitation,15,459084.0,254.096924,10000.420898,5079.335413,23730570.0
100686,2017-10-06,SA,Precipitation,15,757387.8,250.923462,10000.200195,1975.757703,13195890.0
100701,2017-10-06,TA,Precipitation,15,65671.42,249.333389,255.05513,250.576076,0.9660292
100716,2017-10-06,VI,Precipitation,15,230045.7,250.512894,258.626038,254.752662,3.58046
100731,2017-10-06,WA,Precipitation,15,1037480.0,252.199997,10000.639648,3748.751125,21103310.0


Removing records where Lead time equals 15 for Precipitation forecasts on Date 2017-10-06

In [14]:
hwforecasts_df.drop(hwforecasts_df.index[(hwforecasts_df['Date'] == '2017-10-06') &
                                    (hwforecasts_df['Lead time'] == 15) &
                                    ((hwforecasts_df['Parameter'] == 'Precipitation'))], inplace=True)

In [15]:
num_rows, num_cols = hwforecasts_df.shape
print("There are total {} records in the following {} columns:\n".format(num_rows, num_cols))
print("\n".join(list(hwforecasts_df.columns)))
#7 records were removed from outliers removed above (previous record count was 217212)

There are total 217205 records in the following 9 columns:

Date
Region
Parameter
Lead time
area
min_forcast
max_forcast
mean_forcast
2nd_moment_forcast


Now to rearrange data such that the Parameter Values become columns containing values 'min', 'max', 'mean' and '2nd_moment' while keeping distinct values for 'Date', 'Region', 'Lead time' and 'area'

In [16]:
#rearranging Paramater values data
df_pivot = hwforecasts_df.pivot_table(values=['min_forcast', 'max_forcast', 'mean_forcast', '2nd_moment_forcast'], index=['Date', 'Region', 'Lead time', 'area'], columns=['Parameter'])
df_pivot

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,2nd_moment_forcast,2nd_moment_forcast,2nd_moment_forcast,2nd_moment_forcast,2nd_moment_forcast,max_forcast,max_forcast,max_forcast,max_forcast,max_forcast,mean_forcast,mean_forcast,mean_forcast,mean_forcast,mean_forcast,min_forcast,min_forcast,min_forcast,min_forcast,min_forcast
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Parameter,Precipitation,RelativeHumidity,SolarRadiation,Temperature,WindSpeed,Precipitation,RelativeHumidity,SolarRadiation,Temperature,WindSpeed,Precipitation,RelativeHumidity,SolarRadiation,Temperature,WindSpeed,Precipitation,RelativeHumidity,SolarRadiation,Temperature,WindSpeed
Date,Region,Lead time,area,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2
2014-01-01,NSW,5,8.037682e+05,,353.620815,2.276068,17.918553,1.334834,,85.021118,33.557598,36.929035,6.989559,,28.223569,31.647308,30.893523,3.958822,,7.482927,24.865765,21.243755,1.593531
2014-01-01,NT,5,1.349817e+06,,189.805002,6.573279,6.605778,4.871655,,73.601479,32.766205,39.907539,12.156700,,39.799856,28.016282,33.379110,4.740253,,14.796251,18.851019,25.955570,1.192912
2014-01-01,QL,5,1.736319e+06,,470.624907,7.079362,22.150075,1.352935,,83.466888,33.518051,40.503181,7.026765,,40.504877,30.337689,32.331884,4.178836,,7.028183,17.929157,22.155766,1.559428
2014-01-01,SA,5,9.797103e+05,,275.718715,30.069684,15.097683,10.752086,,75.290993,33.439438,38.834274,15.038714,,26.038432,27.126219,34.074006,8.630796,,6.606842,11.718054,20.889954,2.831450
2014-01-01,TA,5,6.567142e+04,,12.570180,9.588782,2.292068,1.000159,,92.093201,22.283730,20.020470,8.109127,,85.672655,17.549968,14.569532,4.831787,,68.435989,12.340322,11.401472,3.079223
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-10-31,VI,10,2.300457e+05,0.226294,71.399972,15.359641,7.669216,1.141684,3.754553,91.215370,38.469955,22.857519,6.038062,0.276201,75.933565,33.088057,15.726723,2.701240,0.0,47.217869,20.501947,10.892990,0.609430
2020-10-31,VI,15,2.300457e+05,1.064420,231.553974,25.315183,9.185775,0.332452,5.357795,93.267899,39.743999,24.375984,4.464130,0.819959,65.747567,31.603136,16.975974,1.919621,0.0,28.085405,19.440273,9.519750,0.374749
2020-10-31,WA,5,2.542548e+06,1.479139,200.508811,8.996331,14.749877,2.626650,9.332361,81.588310,30.319111,33.862797,9.138333,0.590085,34.693232,27.266754,26.657820,5.245600,0.0,16.358620,10.524326,15.775144,1.311029
2020-10-31,WA,10,2.542548e+06,0.146198,214.113833,2.844737,14.610838,2.232168,2.732273,79.339310,42.067039,34.801270,10.549594,0.170651,35.824960,39.744260,26.999880,5.451390,0.0,14.909077,24.397556,16.062984,0.365395


In [17]:
# Reset dataframe index
df_pivot.reset_index(inplace=True)
df_pivot.head()

Unnamed: 0_level_0,Date,Region,Lead time,area,2nd_moment_forcast,2nd_moment_forcast,2nd_moment_forcast,2nd_moment_forcast,2nd_moment_forcast,max_forcast,...,mean_forcast,mean_forcast,mean_forcast,mean_forcast,mean_forcast,min_forcast,min_forcast,min_forcast,min_forcast,min_forcast
Parameter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Precipitation,RelativeHumidity,SolarRadiation,Temperature,WindSpeed,Precipitation,...,Precipitation,RelativeHumidity,SolarRadiation,Temperature,WindSpeed,Precipitation,RelativeHumidity,SolarRadiation,Temperature,WindSpeed
0,2014-01-01,NSW,5,803768.2,,353.620815,2.276068,17.918553,1.334834,,...,,28.223569,31.647308,30.893523,3.958822,,7.482927,24.865765,21.243755,1.593531
1,2014-01-01,NT,5,1349817.0,,189.805002,6.573279,6.605778,4.871655,,...,,39.799856,28.016282,33.37911,4.740253,,14.796251,18.851019,25.95557,1.192912
2,2014-01-01,QL,5,1736319.0,,470.624907,7.079362,22.150075,1.352935,,...,,40.504877,30.337689,32.331884,4.178836,,7.028183,17.929157,22.155766,1.559428
3,2014-01-01,SA,5,979710.3,,275.718715,30.069684,15.097683,10.752086,,...,,26.038432,27.126219,34.074006,8.630796,,6.606842,11.718054,20.889954,2.83145
4,2014-01-01,TA,5,65671.42,,12.57018,9.588782,2.292068,1.000159,,...,,85.672655,17.549968,14.569532,4.831787,,68.435989,12.340322,11.401472,3.079223


In [18]:
# Renaming Column names
df_pivot.columns = [col[0] if not(col[1]) else '{1}_{0}'.format(*col) for col in df_pivot.columns.values]
df_pivot.head()

Unnamed: 0,Date,Region,Lead time,area,Precipitation_2nd_moment_forcast,RelativeHumidity_2nd_moment_forcast,SolarRadiation_2nd_moment_forcast,Temperature_2nd_moment_forcast,WindSpeed_2nd_moment_forcast,Precipitation_max_forcast,...,Precipitation_mean_forcast,RelativeHumidity_mean_forcast,SolarRadiation_mean_forcast,Temperature_mean_forcast,WindSpeed_mean_forcast,Precipitation_min_forcast,RelativeHumidity_min_forcast,SolarRadiation_min_forcast,Temperature_min_forcast,WindSpeed_min_forcast
0,2014-01-01,NSW,5,803768.2,,353.620815,2.276068,17.918553,1.334834,,...,,28.223569,31.647308,30.893523,3.958822,,7.482927,24.865765,21.243755,1.593531
1,2014-01-01,NT,5,1349817.0,,189.805002,6.573279,6.605778,4.871655,,...,,39.799856,28.016282,33.37911,4.740253,,14.796251,18.851019,25.95557,1.192912
2,2014-01-01,QL,5,1736319.0,,470.624907,7.079362,22.150075,1.352935,,...,,40.504877,30.337689,32.331884,4.178836,,7.028183,17.929157,22.155766,1.559428
3,2014-01-01,SA,5,979710.3,,275.718715,30.069684,15.097683,10.752086,,...,,26.038432,27.126219,34.074006,8.630796,,6.606842,11.718054,20.889954,2.83145
4,2014-01-01,TA,5,65671.42,,12.57018,9.588782,2.292068,1.000159,,...,,85.672655,17.549968,14.569532,4.831787,,68.435989,12.340322,11.401472,3.079223


In [19]:
# Rearranging Data and column
params = df_pivot.columns.tolist()[4:]
params.sort()
forecasts_data = df_pivot[df_pivot.columns.tolist()[:4] + params].copy()
forecasts_data.head()

Unnamed: 0,Date,Region,Lead time,area,Precipitation_2nd_moment_forcast,Precipitation_max_forcast,Precipitation_mean_forcast,Precipitation_min_forcast,RelativeHumidity_2nd_moment_forcast,RelativeHumidity_max_forcast,...,SolarRadiation_mean_forcast,SolarRadiation_min_forcast,Temperature_2nd_moment_forcast,Temperature_max_forcast,Temperature_mean_forcast,Temperature_min_forcast,WindSpeed_2nd_moment_forcast,WindSpeed_max_forcast,WindSpeed_mean_forcast,WindSpeed_min_forcast
0,2014-01-01,NSW,5,803768.2,,,,,353.620815,85.021118,...,31.647308,24.865765,17.918553,36.929035,30.893523,21.243755,1.334834,6.989559,3.958822,1.593531
1,2014-01-01,NT,5,1349817.0,,,,,189.805002,73.601479,...,28.016282,18.851019,6.605778,39.907539,33.37911,25.95557,4.871655,12.1567,4.740253,1.192912
2,2014-01-01,QL,5,1736319.0,,,,,470.624907,83.466888,...,30.337689,17.929157,22.150075,40.503181,32.331884,22.155766,1.352935,7.026765,4.178836,1.559428
3,2014-01-01,SA,5,979710.3,,,,,275.718715,75.290993,...,27.126219,11.718054,15.097683,38.834274,34.074006,20.889954,10.752086,15.038714,8.630796,2.83145
4,2014-01-01,TA,5,65671.42,,,,,12.57018,92.093201,...,17.549968,12.340322,2.292068,20.02047,14.569532,11.401472,1.000159,8.109127,4.831787,3.079223


In [20]:
num_rows, num_cols = forecasts_data.shape
print("There are total {} records in the following {} columns:\n".format(num_rows, num_cols))
print("\n".join(list(forecasts_data.columns)))

There are total 44620 records in the following 24 columns:

Date
Region
Lead time
area
Precipitation_2nd_moment_forcast
Precipitation_max_forcast
Precipitation_mean_forcast
Precipitation_min_forcast
RelativeHumidity_2nd_moment_forcast
RelativeHumidity_max_forcast
RelativeHumidity_mean_forcast
RelativeHumidity_min_forcast
SolarRadiation_2nd_moment_forcast
SolarRadiation_max_forcast
SolarRadiation_mean_forcast
SolarRadiation_min_forcast
Temperature_2nd_moment_forcast
Temperature_max_forcast
Temperature_mean_forcast
Temperature_min_forcast
WindSpeed_2nd_moment_forcast
WindSpeed_max_forcast
WindSpeed_mean_forcast
WindSpeed_min_forcast


#### Evaluate Re-Arranged Paramater Columns for Missing and Duplicates <a class="anchor" id="RearrangedTable"></a>

In [21]:
forecasts_data.isna().sum()

Date                                      0
Region                                    0
Lead time                                 0
area                                      0
Precipitation_2nd_moment_forcast       4692
Precipitation_max_forcast              4692
Precipitation_mean_forcast             4692
Precipitation_min_forcast              4692
RelativeHumidity_2nd_moment_forcast     279
RelativeHumidity_max_forcast            279
RelativeHumidity_mean_forcast           279
RelativeHumidity_min_forcast            279
SolarRadiation_2nd_moment_forcast       328
SolarRadiation_max_forcast              328
SolarRadiation_mean_forcast             328
SolarRadiation_min_forcast              328
Temperature_2nd_moment_forcast          275
Temperature_max_forcast                 275
Temperature_mean_forcast                275
Temperature_min_forcast                 275
WindSpeed_2nd_moment_forcast            321
WindSpeed_max_forcast                   321
WindSpeed_mean_forcast          

Cross checking NULL values in the arranged data. Let's pick RelativeHumidity_mean_forcast column

In [22]:
forecasts_data.loc[forecasts_data['RelativeHumidity_mean_forcast'].isna(), :]

Unnamed: 0,Date,Region,Lead time,area,Precipitation_2nd_moment_forcast,Precipitation_max_forcast,Precipitation_mean_forcast,Precipitation_min_forcast,RelativeHumidity_2nd_moment_forcast,RelativeHumidity_max_forcast,...,SolarRadiation_mean_forcast,SolarRadiation_min_forcast,Temperature_2nd_moment_forcast,Temperature_max_forcast,Temperature_mean_forcast,Temperature_min_forcast,WindSpeed_2nd_moment_forcast,WindSpeed_max_forcast,WindSpeed_mean_forcast,WindSpeed_min_forcast
560,2014-03-23,NSW,5,8.037682e+05,,,,,,,...,19.355496,7.199873,13.017521,28.743320,21.209778,11.813959,3.114025,7.779472,3.270816,0.703011
561,2014-03-23,NT,5,1.349817e+06,,,,,,,...,24.110669,17.686712,3.612621,33.679714,30.377331,25.807884,1.862877,7.517807,5.187270,2.139800
562,2014-03-23,QL,5,1.736319e+06,,,,,,,...,24.263057,9.167148,11.560332,33.592918,27.291569,17.680374,1.508545,7.050274,3.630924,0.893402
563,2014-03-23,SA,5,9.797103e+05,,,,,,,...,21.799047,10.947695,21.358818,32.686142,22.979985,14.914130,1.418101,7.940142,6.061781,1.961252
564,2014-03-23,TA,5,6.567142e+04,,,,,,,...,9.425926,2.684387,5.340168,16.724670,10.503687,6.034809,2.712482,11.102831,4.651797,2.021668
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
43809,2020-09-23,QL,15,1.736319e+06,0.013461,1.299296,0.027631,0.0,,,...,,,,,,,,,,
43812,2020-09-23,SA,15,9.797103e+05,0.000949,0.578502,0.003454,0.0,,,...,,,,,,,,,,
43815,2020-09-23,TA,15,6.567142e+04,0.232065,1.476720,0.434436,0.0,,,...,,,,,,,,,,
43818,2020-09-23,VI,15,2.300457e+05,0.009802,0.551880,0.075300,0.0,,,...,,,,,,,,,,


We can see that RelativeHumidity_mean_forcast has NULL values. Picking Date 2014-03-23 and verify in Original Data.

In [23]:
hwforecasts_df.loc[hwforecasts_df['Date'] == '2014-03-23', :]

Unnamed: 0,Date,Region,Parameter,Lead time,area,min_forcast,max_forcast,mean_forcast,2nd_moment_forcast
2240,2014-03-23,NSW,SolarRadiation,5,803768.2,7.199873,24.159389,19.355496,14.676007
2241,2014-03-23,NSW,Temperature,5,803768.2,11.813959,28.74332,21.209778,13.017521
2242,2014-03-23,NSW,WindSpeed,5,803768.2,0.703011,7.779472,3.270816,3.114025
2243,2014-03-23,NT,SolarRadiation,5,1349817.0,17.686712,26.653238,24.110669,3.156213
2244,2014-03-23,NT,Temperature,5,1349817.0,25.807884,33.679714,30.377331,3.612621
2245,2014-03-23,NT,WindSpeed,5,1349817.0,2.1398,7.517807,5.18727,1.862877
2246,2014-03-23,QL,SolarRadiation,5,1736319.0,9.167148,26.346916,24.263057,3.060256
2247,2014-03-23,QL,Temperature,5,1736319.0,17.680374,33.592918,27.291569,11.560332
2248,2014-03-23,QL,WindSpeed,5,1736319.0,0.893402,7.050274,3.630924,1.508545
2249,2014-03-23,SA,SolarRadiation,5,979710.3,10.947695,24.474842,21.799047,6.550752


We can see that there are no forcasts for RelativeHumidity and Precipitation on 2014-03-23 in all regions. Hence, these values below, are NULL values because they have no readings in original data. Okay now to fill in all null values with zeros in data.

In [24]:
forecasts_data = forecasts_data.fillna(0).copy()
forecasts_data.head()

Unnamed: 0,Date,Region,Lead time,area,Precipitation_2nd_moment_forcast,Precipitation_max_forcast,Precipitation_mean_forcast,Precipitation_min_forcast,RelativeHumidity_2nd_moment_forcast,RelativeHumidity_max_forcast,...,SolarRadiation_mean_forcast,SolarRadiation_min_forcast,Temperature_2nd_moment_forcast,Temperature_max_forcast,Temperature_mean_forcast,Temperature_min_forcast,WindSpeed_2nd_moment_forcast,WindSpeed_max_forcast,WindSpeed_mean_forcast,WindSpeed_min_forcast
0,2014-01-01,NSW,5,803768.2,0.0,0.0,0.0,0.0,353.620815,85.021118,...,31.647308,24.865765,17.918553,36.929035,30.893523,21.243755,1.334834,6.989559,3.958822,1.593531
1,2014-01-01,NT,5,1349817.0,0.0,0.0,0.0,0.0,189.805002,73.601479,...,28.016282,18.851019,6.605778,39.907539,33.37911,25.95557,4.871655,12.1567,4.740253,1.192912
2,2014-01-01,QL,5,1736319.0,0.0,0.0,0.0,0.0,470.624907,83.466888,...,30.337689,17.929157,22.150075,40.503181,32.331884,22.155766,1.352935,7.026765,4.178836,1.559428
3,2014-01-01,SA,5,979710.3,0.0,0.0,0.0,0.0,275.718715,75.290993,...,27.126219,11.718054,15.097683,38.834274,34.074006,20.889954,10.752086,15.038714,8.630796,2.83145
4,2014-01-01,TA,5,65671.42,0.0,0.0,0.0,0.0,12.57018,92.093201,...,17.549968,12.340322,2.292068,20.02047,14.569532,11.401472,1.000159,8.109127,4.831787,3.079223


In [25]:
forecasts_data.isna().sum()

Date                                   0
Region                                 0
Lead time                              0
area                                   0
Precipitation_2nd_moment_forcast       0
Precipitation_max_forcast              0
Precipitation_mean_forcast             0
Precipitation_min_forcast              0
RelativeHumidity_2nd_moment_forcast    0
RelativeHumidity_max_forcast           0
RelativeHumidity_mean_forcast          0
RelativeHumidity_min_forcast           0
SolarRadiation_2nd_moment_forcast      0
SolarRadiation_max_forcast             0
SolarRadiation_mean_forcast            0
SolarRadiation_min_forcast             0
Temperature_2nd_moment_forcast         0
Temperature_max_forcast                0
Temperature_mean_forcast               0
Temperature_min_forcast                0
WindSpeed_2nd_moment_forcast           0
WindSpeed_max_forcast                  0
WindSpeed_mean_forcast                 0
WindSpeed_min_forcast                  0
dtype: int64

#### Weather Forecast Data Review <a class="anchor" id="DataReview"></a>

In [26]:
forecasts_data.dtypes

Date                                   datetime64[ns]
Region                                         object
Lead time                                       int64
area                                          float64
Precipitation_2nd_moment_forcast              float64
Precipitation_max_forcast                     float64
Precipitation_mean_forcast                    float64
Precipitation_min_forcast                     float64
RelativeHumidity_2nd_moment_forcast           float64
RelativeHumidity_max_forcast                  float64
RelativeHumidity_mean_forcast                 float64
RelativeHumidity_min_forcast                  float64
SolarRadiation_2nd_moment_forcast             float64
SolarRadiation_max_forcast                    float64
SolarRadiation_mean_forcast                   float64
SolarRadiation_min_forcast                    float64
Temperature_2nd_moment_forcast                float64
Temperature_max_forcast                       float64
Temperature_mean_forcast    

In [27]:
# frequencies for  Region column
forecasts_data.pivot_table(index= ['Region'], aggfunc='size')

Region
NSW    6373
NT     6376
QL     6376
SA     6373
TA     6373
VI     6373
WA     6376
dtype: int64

#### Saving out the final forecast_data CSV File

In [28]:
final_file = "C&P_Forecasts.csv"
print("Saving file: '{}'".format(final_file))
forecasts_data.to_csv(final_file, index=False, encoding='utf-8')
print("File Saved...")

Saving file: 'C&P_Forecasts.csv'
File Saved...


In [29]:
# check DataFrame exported
df = pd.read_csv("P:\Wildfires_Australia\cfc_wildfireforecastforAustralia\COPY\C&P_Forecasts.csv")
df['Date'] = pd.to_datetime(df['Date'])

In [30]:
df.head()

Unnamed: 0,Date,Region,Lead time,area,Precipitation_2nd_moment_forcast,Precipitation_max_forcast,Precipitation_mean_forcast,Precipitation_min_forcast,RelativeHumidity_2nd_moment_forcast,RelativeHumidity_max_forcast,...,SolarRadiation_mean_forcast,SolarRadiation_min_forcast,Temperature_2nd_moment_forcast,Temperature_max_forcast,Temperature_mean_forcast,Temperature_min_forcast,WindSpeed_2nd_moment_forcast,WindSpeed_max_forcast,WindSpeed_mean_forcast,WindSpeed_min_forcast
0,2014-01-01,NSW,5,803768.2,0.0,0.0,0.0,0.0,353.620815,85.021118,...,31.647308,24.865765,17.918553,36.929035,30.893523,21.243755,1.334834,6.989559,3.958822,1.593531
1,2014-01-01,NT,5,1349817.0,0.0,0.0,0.0,0.0,189.805002,73.601479,...,28.016282,18.851019,6.605778,39.907539,33.37911,25.95557,4.871655,12.1567,4.740253,1.192912
2,2014-01-01,QL,5,1736319.0,0.0,0.0,0.0,0.0,470.624907,83.466888,...,30.337689,17.929157,22.150075,40.503181,32.331884,22.155766,1.352935,7.026765,4.178836,1.559428
3,2014-01-01,SA,5,979710.3,0.0,0.0,0.0,0.0,275.718715,75.290993,...,27.126219,11.718054,15.097683,38.834274,34.074006,20.889954,10.752086,15.038714,8.630796,2.83145
4,2014-01-01,TA,5,65671.42,0.0,0.0,0.0,0.0,12.57018,92.093201,...,17.549968,12.340322,2.292068,20.02047,14.569532,11.401472,1.000159,8.109127,4.831787,3.079223


In [31]:
df.shape

(44620, 24)

In [32]:
df.isna().sum()

Date                                   0
Region                                 0
Lead time                              0
area                                   0
Precipitation_2nd_moment_forcast       0
Precipitation_max_forcast              0
Precipitation_mean_forcast             0
Precipitation_min_forcast              0
RelativeHumidity_2nd_moment_forcast    0
RelativeHumidity_max_forcast           0
RelativeHumidity_mean_forcast          0
RelativeHumidity_min_forcast           0
SolarRadiation_2nd_moment_forcast      0
SolarRadiation_max_forcast             0
SolarRadiation_mean_forcast            0
SolarRadiation_min_forcast             0
Temperature_2nd_moment_forcast         0
Temperature_max_forcast                0
Temperature_mean_forcast               0
Temperature_min_forcast                0
WindSpeed_2nd_moment_forcast           0
WindSpeed_max_forcast                  0
WindSpeed_mean_forcast                 0
WindSpeed_min_forcast                  0
dtype: int64