# Morocco Wildfire Predictions: 2010-2022 ML Dataset
## Comprehensive ML-Ready Tabular Data for Wildfire Analysis
About this Dataset
This dataset, titled "Morocco Wildfire Predictions: 2010-2022 ML Dataset" offers an extensive collection of data tailored for machine learning applications aiming to predict wildfire occurrences in Morocco from 2010 to 2022. It integrates diverse data sources, including meteorological conditions from the Global Surface Summary of the Day (GSOD) dataset provided by NOAA, wildfire instances from NASA's Fire Information for Resource Management System (FIRMS), human population density figures adjusted from the UN WPP, and environmental factors such as soil moisture and vegetation greenness (NDVI) from NASA GES DISC and ORNL DAAC, respectively.  
  
The dataset has been meticulously prepared to facilitate the development of predictive models that can aid in wildfire management and mitigation efforts. Key features include standardized column names, interpolated missing values, and enriched data points for a comprehensive analysis. Additionally, the dataset is balanced to include equal instances of wildfire and non-wildfire occurrences, and features a variety of indicators such as temperature, precipitation, wind speed, humidity, NDVI, soil moisture levels, and population density, alongside temporal and spatial data.  
  
This dataset was inspired by the increasing need for advanced predictive models that can provide accurate and timely predictions of wildfire occurrences, leveraging machine learning techniques. It is intended for researchers, data scientists, and policymakers who are focused on environmental protection, disaster management, and the development of AI-driven solutions for predicting and mitigating wildfire risks.

This dataset was obtained through Kaggle.
# Dataset Link: https://www.kaggle.com/code/ayoubjadouli/ml-wildfire-morocco

# Objective: Data Down-Sampling
Data down-sampling is the process of reducing the number of rows (or samples) in a dataset. This is typically done by selecting a subset of the data according to some criteria.  

This Dataset contains 934586 samples, which are approximately a million samples. Hence, we need to reduce the number of samples to ease the exploratory data analysis and data visualization processes. Moreover, this would further aid us in Resource management by increasing the efficiency of resources being used.

In [22]:
#Importing Pandas (Python Data Analysis) Library.
import pandas as pd

In [2]:
#Reading the parquet file.
df = pd.read_parquet(r"D:\AISSMS IOIT - AI&DS (628299510)\General\Hackathons\Prasunethon\dataset.parquet")
df.head()

Unnamed: 0,acq_date,latitude,longitude,is_holiday,day_of_week,day_of_year,is_weekend,NDVI,SoilMoisture,sea_distance,...,wind_gust_quarterly_mean,dew_point_quarterly_mean,average_temperature_yearly_mean,maximum_temperature_yearly_mean,minimum_temperature_yearly_mean,precipitation_yearly_mean,snow_depth_yearly_mean,wind_gust_yearly_mean,dew_point_yearly_mean,is_fire
0,2015-05-28,31.390602,-4.254445,0.0,3.0,148.0,0.0,1139.0,7.0,464731.9375,...,882.085571,27.641111,71.703011,82.031784,58.668766,2.21326,999.900024,857.071777,32.760273,1.0
1,2017-12-05,33.832943,-5.188356,0.0,1.0,339.0,0.0,3223.0,31.0,186799.984375,...,936.817383,52.452175,65.621719,80.128555,53.295628,2.256421,999.900024,864.04834,47.819126,1.0
2,2021-11-19,35.385689,-5.684218,0.0,4.0,323.0,0.0,4987.0,30.0,44937.300781,...,884.289124,65.556519,66.108742,74.353828,58.815575,0.050738,999.900024,806.456848,56.298634,1.0
3,2014-04-19,30.122351,-7.498038,0.0,5.0,109.0,1.0,991.0,12.5,231336.125,...,839.094421,21.176111,69.008766,82.65918,54.666027,0.005205,999.900024,771.132629,26.421097,0.0
4,2014-04-11,30.221554,-9.154314,0.0,4.0,101.0,0.0,2171.0,18.0,51333.945312,...,945.704468,46.40889,66.195618,79.167122,54.768494,0.032082,999.900024,951.709595,52.842464,1.0


In [3]:
df.tail()

Unnamed: 0,acq_date,latitude,longitude,is_holiday,day_of_week,day_of_year,is_weekend,NDVI,SoilMoisture,sea_distance,...,wind_gust_quarterly_mean,dew_point_quarterly_mean,average_temperature_yearly_mean,maximum_temperature_yearly_mean,minimum_temperature_yearly_mean,precipitation_yearly_mean,snow_depth_yearly_mean,wind_gust_yearly_mean,dew_point_yearly_mean,is_fire
934581,2013-05-04,35.243923,-6.036503,0.0,5.0,124.0,1.0,5212.0,56.0,16203.352539,...,999.900024,50.740002,65.564072,72.516533,56.366531,0.05209,999.900024,999.900024,55.318714,1.0
934582,2019-07-12,27.80945,-10.815974,0.0,4.0,193.0,0.0,931.0,12.0,98430.929688,...,925.520874,55.751648,66.890686,73.265617,59.702328,0.283699,999.900024,981.435059,54.126575,0.0
934583,2014-01-04,31.45772,-8.303036,0.0,5.0,4.0,1.0,1263.0,21.5,144579.828125,...,946.797852,49.169567,69.292877,83.434517,56.006577,0.018247,999.900024,807.17041,49.37315,0.0
934584,2018-09-06,32.085369,-6.435242,0.0,3.0,249.0,0.0,3889.0,13.0,252267.078125,...,989.197815,52.209888,73.755753,83.011505,84.859451,5.478904,999.900024,960.210693,43.917946,0.0
934585,2020-09-24,35.171848,-5.836794,0.0,3.0,268.0,0.0,1393.0,28.5,40485.554688,...,47.949451,57.345055,65.277397,70.919182,56.512054,0.321589,999.900024,475.121796,109.153427,1.0


In [30]:
print("Range of latitude:", df["latitude"].min(), "-", df["latitude"].max())

Range of latitude: 27.074949264526367 - 35.91529083251953


In [31]:
print("Range of longitude:", df["longitude"].min(), "-", df["longitude"].max())

Range of longitude: -12.044235229492188 - -1.6565592288970947


As we can see, the ranges of latitude and longitude perfectly align with the geographic coordinates of Morocco.  
This is a proof of the genuinity of this dataset.

# These are the columns of our dataset.

acq_date   
latitude  
longitude  
is_holiday  
day_of_week  
day_of_year  
is_weekend  
NDVI  
SoilMoisture  
sea_distance  
station_lat  
station_lon  
average_temperature_lag_1  
average_temperature_lag_2  
average_temperature_lag_3  
average_temperature_lag_4  
average_temperature_lag_5  
average_temperature_lag_6  
average_temperature_lag_7  
average_temperature_lag_8  
average_temperature_lag_9  
average_temperature_lag_10  
average_temperature_lag_11  
average_temperature_lag_12  
average_temperature_lag_13  
average_temperature_lag_14  
average_temperature_lag_15  
maximum_temperature_lag_1  
maximum_temperature_lag_2  
maximum_temperature_lag_3  
maximum_temperature_lag_4  
maximum_temperature_lag_5  
maximum_temperature_lag_6  
maximum_temperature_lag_7  
maximum_temperature_lag_8  
maximum_temperature_lag_9  
maximum_temperature_lag_10  
maximum_temperature_lag_11  
maximum_temperature_lag_12  
maximum_temperature_lag_13  
maximum_temperature_lag_14  
maximum_temperature_lag_15  
minimum_temperature_lag_1  
minimum_temperature_lag_2  
minimum_temperature_lag_3  
minimum_temperature_lag_4  
minimum_temperature_lag_5  
minimum_temperature_lag_6  
minimum_temperature_lag_7  
minimum_temperature_lag_8  
minimum_temperature_lag_9  
minimum_temperature_lag_10  
minimum_temperature_lag_11  
minimum_temperature_lag_12  
minimum_temperature_lag_13  
minimum_temperature_lag_14  
minimum_temperature_lag_15  
precipitation_lag_1  
precipitation_lag_2  
precipitation_lag_3  
precipitation_lag_4  
precipitation_lag_5  
precipitation_lag_6  
precipitation_lag_7  
precipitation_lag_8  
precipitation_lag_9  
precipitation_lag_10  
precipitation_lag_11  
precipitation_lag_12  
precipitation_lag_13  
precipitation_lag_14  
precipitation_lag_15  
snow_depth_lag_1  
snow_depth_lag_2  
snow_depth_lag_3  
snow_depth_lag_4  
snow_depth_lag_5  
snow_depth_lag_6  
snow_depth_lag_7  
snow_depth_lag_8  
snow_depth_lag_9  
snow_depth_lag_10  
snow_depth_lag_11  
snow_depth_lag_12  
snow_depth_lag_13  
snow_depth_lag_14  
snow_depth_lag_15  
wind_speed_lag_1  
wind_speed_lag_2  
wind_speed_lag_3  
wind_speed_lag_4  
wind_speed_lag_5  
wind_speed_lag_6  
wind_speed_lag_7  
wind_speed_lag_8  
wind_speed_lag_9  
wind_speed_lag_10  
wind_speed_lag_11  
wind_speed_lag_12  
wind_speed_lag_13  
wind_speed_lag_14  
wind_speed_lag_15  
maximum_sustained_wind_speed_lag_1  
maximum_sustained_wind_speed_lag_2  
maximum_sustained_wind_speed_lag_3  
maximum_sustained_wind_speed_lag_4  
maximum_sustained_wind_speed_lag_5  
maximum_sustained_wind_speed_lag_6  
maximum_sustained_wind_speed_lag_7  
maximum_sustained_wind_speed_lag_8  
maximum_sustained_wind_speed_lag_9  
maximum_sustained_wind_speed_lag_10  
maximum_sustained_wind_speed_lag_11  
maximum_sustained_wind_speed_lag_12  
maximum_sustained_wind_speed_lag_13  
maximum_sustained_wind_speed_lag_14  
maximum_sustained_wind_speed_lag_15  
wind_gust_lag_1  
wind_gust_lag_2  
wind_gust_lag_3  
wind_gust_lag_4  
wind_gust_lag_5  
wind_gust_lag_6  
wind_gust_lag_7  
wind_gust_lag_8  
wind_gust_lag_9  
wind_gust_lag_10  
wind_gust_lag_11  
wind_gust_lag_12  
wind_gust_lag_13  
wind_gust_lag_14  
wind_gust_lag_15  
dew_point_lag_1  
dew_point_lag_2  
dew_point_lag_3  
dew_point_lag_4  
dew_point_lag_5  
dew_point_lag_6  
dew_point_lag_7  
dew_point_lag_8  
dew_point_lag_9  
dew_point_lag_10  
dew_point_lag_11  
dew_point_lag_12  
dew_point_lag_13  
dew_point_lag_14  
dew_point_lag_15  
fog_lag_1  
fog_lag_2  
fog_lag_3  
fog_lag_4  
fog_lag_5  
fog_lag_6  
fog_lag_7  
fog_lag_8  
fog_lag_9  
fog_lag_10  
fog_lag_11  
fog_lag_12  
fog_lag_13  
fog_lag_14  
fog_lag_15  
thunder_lag_1  
thunder_lag_2  
thunder_lag_3  
thunder_lag_4  
thunder_lag_5  
thunder_lag_6  
thunder_lag_7  
thunder_lag_8  
thunder_lag_9  
thunder_lag_10  
thunder_lag_11  
thunder_lag_12  
thunder_lag_13  
thunder_lag_14  
thunder_lag_15  
lat_lag_1  
lat_lag_2  
lat_lag_3  
lat_lag_4  
lat_lag_5  
lat_lag_6  
lat_lag_7  
lat_lag_8  
lat_lag_9  
lat_lag_10  
lat_lag_11  
lat_lag_12  
lat_lag_13  
lat_lag_14  
lat_lag_15  
lon_lag_1  
lon_lag_2  
lon_lag_3  
lon_lag_4  
lon_lag_5  
lon_lag_6  
lon_lag_7  
lon_lag_8  
lon_lag_9  
lon_lag_10  
lon_lag_11  
lon_lag_12  
lon_lag_13  
lon_lag_14  
lon_lag_15  
average_temperature_weekly_mean  
maximum_temperature_weekly_mean  
minimum_temperature_weekly_mean  
precipitation_weekly_mean  
snow_depth_weekly_mean  
wind_gust_weekly_mean  
dew_point_weekly_mean  
average_temperature_last_1_year  
average_temperature_last_2_year  
average_temperature_last_3_year  
maximum_temperature_last_1_year  
maximum_temperature_last_2_year  
maximum_temperature_last_3_year  
minimum_temperature_last_1_year  
minimum_temperature_last_2_year  
minimum_temperature_last_3_year  
precipitation_last_1_year  
precipitation_last_2_year  
precipitation_last_3_year  
snow_depth_last_1_year  
snow_depth_last_2_year  
snow_depth_last_3_year  
wind_gust_last_1_year  
wind_gust_last_2_year  
wind_gust_last_3_year  
dew_point_last_1_year  
dew_point_last_2_year  
dew_point_last_3_year  
average_temperature_monthly_mean  
maximum_temperature_monthly_mean  
minimum_temperature_monthly_mean  
precipitation_monthly_mean  
snow_depth_monthly_mean  
wind_gust_monthly_mean  
dew_point_monthly_mean  
average_temperature_last_1_year_monthly_mean  
average_temperature_last_2_year_monthly_mean  
average_temperature_last_3_year_monthly_mean  
maximum_temperature_last_1_year_monthly_mean  
maximum_temperature_last_2_year_monthly_mean  
maximum_temperature_last_3_year_monthly_mean  
minimum_temperature_last_1_year_monthly_mean  
minimum_temperature_last_2_year_monthly_mean  
minimum_temperature_last_3_year_monthly_mean  
precipitation_last_1_year_monthly_mean  
precipitation_last_2_year_monthly_mean  
precipitation_last_3_year_monthly_mean  
snow_depth_last_1_year_monthly_mean  
snow_depth_last_2_year_monthly_mean  
snow_depth_last_3_year_monthly_mean  
wind_gust_last_1_year_monthly_mean  
wind_gust_last_2_year_monthly_mean  
wind_gust_last_3_year_monthly_mean  
dew_point_last_1_year_monthly_mean  
dew_point_last_2_year_monthly_mean  
dew_point_last_3_year_monthly_mean  
average_temperature_quarterly_mean  
maximum_temperature_quarterly_mean  
minimum_temperature_quarterly_mean  
precipitation_quarterly_mean  
snow_depth_quarterly_mean  
wind_gust_quarterly_mean  
dew_point_quarterly_mean  
average_temperature_yearly_mean  
maximum_temperature_yearly_mean  
minimum_temperature_yearly_mean  
precipitation_yearly_mean  
snow_depth_yearly_mean  
wind_gust_yearly_mean  
dew_point_yearly_mean  
is_fire  

In [4]:
df.shape

(934586, 278)

In [6]:
print(df.info(memory_usage='deep'))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 934586 entries, 0 to 934585
Columns: 278 entries, acq_date to is_fire
dtypes: datetime64[ns](1), float32(277)
memory usage: 994.7 MB
None


## Sorting Dataset with "acq_date" as the key
Since our dataset was unsorted, we begin with sorting the dataset based on the dates.

In [23]:
df_sorted = df.sort_values(by = "acq_date")

In [24]:
df_sorted.head(10)

Unnamed: 0,acq_date,latitude,longitude,is_holiday,day_of_week,day_of_year,is_weekend,NDVI,SoilMoisture,sea_distance,...,wind_gust_quarterly_mean,dew_point_quarterly_mean,average_temperature_yearly_mean,maximum_temperature_yearly_mean,minimum_temperature_yearly_mean,precipitation_yearly_mean,snow_depth_yearly_mean,wind_gust_yearly_mean,dew_point_yearly_mean,is_fire
154140,2013-01-31,31.769793,-8.215368,0.0,3.0,31.0,0.0,1212.0,19.5,140188.390625,...,936.328247,50.915218,69.876778,84.312019,56.015846,0.019235,999.900024,813.05603,47.915302,0.0
273837,2013-01-31,32.52211,-4.844104,0.0,3.0,31.0,0.0,2113.0,12.0,312051.5625,...,999.900024,34.893478,60.979782,71.897408,47.901638,0.019918,999.900024,999.900024,34.069534,0.0
182655,2013-01-31,33.849125,-4.860106,0.0,3.0,31.0,0.0,5315.0,34.0,201680.078125,...,863.400024,47.572826,63.222542,76.387978,50.511066,0.875246,999.900024,893.730347,45.193169,1.0
713354,2013-01-31,31.027136,-4.541747,0.0,3.0,31.0,0.0,901.0,1.0,483727.0,...,910.639648,34.379349,71.847404,80.993034,84.816803,0.295382,999.900024,852.085632,31.956148,0.0
498881,2013-01-31,33.752628,-5.987223,0.0,3.0,31.0,0.0,5906.0,42.0,113376.828125,...,947.098938,48.127174,64.375816,76.444397,53.086067,1.145929,999.900024,957.453308,47.770901,0.0
122442,2013-01-31,33.849846,-4.857559,0.0,3.0,31.0,0.0,5315.0,34.0,201543.203125,...,863.400024,47.572826,63.222542,76.387978,50.511066,0.875246,999.900024,893.730347,45.193169,1.0
273764,2013-01-31,27.965025,-11.720864,0.0,3.0,31.0,0.0,1177.0,9.5,22365.439453,...,999.900024,215.986954,68.038254,76.220352,60.385792,0.004454,999.900024,999.900024,95.312019,0.0
330782,2013-01-31,32.331696,-6.938021,0.0,3.0,31.0,0.0,4162.0,25.0,196585.65625,...,999.900024,50.217392,69.400276,81.856422,52.913799,0.043975,999.900024,999.900024,102.98497,0.0
399118,2013-01-31,31.628965,-6.49552,0.0,3.0,31.0,0.0,2481.0,34.111111,300748.25,...,999.900024,50.217392,69.400276,81.856422,52.913799,0.043975,999.900024,999.900024,102.98497,1.0
193010,2013-01-31,28.060123,-10.299895,0.0,3.0,31.0,0.0,1217.0,8.0,116260.335938,...,999.900024,215.986954,68.038254,76.220352,60.385792,0.004454,999.900024,999.900024,95.312019,0.0


In [25]:
df_sorted.tail(10)

Unnamed: 0,acq_date,latitude,longitude,is_holiday,day_of_week,day_of_year,is_weekend,NDVI,SoilMoisture,sea_distance,...,wind_gust_quarterly_mean,dew_point_quarterly_mean,average_temperature_yearly_mean,maximum_temperature_yearly_mean,minimum_temperature_yearly_mean,precipitation_yearly_mean,snow_depth_yearly_mean,wind_gust_yearly_mean,dew_point_yearly_mean,is_fire
411198,2022-12-23,31.844727,-7.981693,0.0,4.0,357.0,0.0,825.0,19.0,163439.203125,...,627.806519,58.136955,69.567673,84.6063,56.657536,0.012178,999.900024,864.437256,49.907806,0.0
344069,2022-12-23,32.775856,-5.372431,0.0,4.0,357.0,0.0,3044.0,35.0,246579.828125,...,999.900024,39.713043,62.992603,74.068359,49.056438,0.017849,997.16272,999.900024,32.57452,0.0
450392,2022-12-23,32.010033,-5.656669,0.0,4.0,357.0,0.0,1386.0,31.5,308563.90625,...,905.44458,50.809784,70.802742,82.505203,108.412605,4.931863,999.900024,976.021912,46.451508,0.0
160750,2022-12-23,31.408916,-5.966709,0.0,4.0,357.0,0.0,1337.0,23.5,354627.15625,...,554.955444,33.838043,69.711647,82.831917,55.05452,0.286767,999.900024,871.874268,26.710958,0.0
228355,2022-12-23,33.755226,-4.441535,0.0,4.0,357.0,0.0,2283.0,36.0,212615.890625,...,904.881531,52.642391,64.400955,78.330551,51.95726,3.342329,999.900024,970.732849,48.288082,0.0
411013,2022-12-23,34.801178,-5.194282,0.0,4.0,357.0,0.0,3159.0,51.5,98151.09375,...,999.900024,58.77174,65.949043,76.781372,51.52,0.066507,999.900024,999.900024,242.498215,0.0
9243,2022-12-23,34.857826,-3.85281,0.0,4.0,357.0,0.0,1404.0,44.5,71063.710938,...,775.702148,65.113045,66.009865,74.550682,58.045479,0.030137,999.900024,900.844116,56.96767,0.0
315408,2022-12-23,33.769985,-4.683769,0.0,4.0,357.0,0.0,2285.0,33.0,210250.25,...,904.881531,52.642391,64.400955,78.330551,51.95726,3.342329,999.900024,970.732849,48.288082,0.0
278062,2022-12-23,27.940552,-10.470703,0.0,4.0,357.0,0.0,960.0,8.0,113639.65625,...,999.900024,65.811958,67.709038,74.781372,59.776165,0.00526,999.900024,997.261658,54.124111,0.0
507652,2022-12-23,31.293976,-3.927056,0.0,4.0,357.0,0.0,967.0,6.5,498007.09375,...,695.239136,34.810871,71.861916,83.816849,59.49041,0.007699,999.900024,806.690674,28.587397,0.0


In [26]:
# We suspect that there are multiple values for the same date with different latitudes and longitudes.
# Hence we perform a check on the "acq_date" column.
date_counts = df.groupby('acq_date').size().reset_index(name='counts')
date_counts

Unnamed: 0,acq_date,counts
0,2013-01-31,238
1,2013-02-01,142
2,2013-02-02,133
3,2013-02-03,313
4,2013-02-04,184
...,...,...
3534,2022-12-19,234
3535,2022-12-20,173
3536,2022-12-21,150
3537,2022-12-22,149


In [13]:
df["acq_date"].min()

Timestamp('2013-01-31 00:00:00')

In [14]:
df["acq_date"].max()

Timestamp('2022-12-23 00:00:00')

In [15]:
date_counts.shape

(3539, 2)

### Now we have confirmed that there have been multiple values recorded for the same date.
Moreover, the dataset is balanced and contains equal instances of fire and not fire.
  
Hence for every date in the "acq_date" column, we group by "is_fire" as key and compute the mean of every other feature.
Since, if there is a fire on a particular date recorded at different latitudes and longitudes, they must be very close to each other since this data is very precise.  
Moreover, wild fires do not occur over a single geographical point, instead they spread uncontrollably.  
  
Hence, grouping them makes more sense as we can analyze the origin of the wild fires.

In [27]:
# Group by "date" and "is_fire", and calculate the mean for all other columns
grouped_df = df.groupby(["acq_date", "is_fire"]).mean().reset_index()

# Sort the resulting DataFrame by "date" and "is_fire" for better readability
grouped_df = grouped_df.sort_values(by=["acq_date"])

In [28]:
grouped_df.head()

Unnamed: 0,acq_date,is_fire,latitude,longitude,is_holiday,day_of_week,day_of_year,is_weekend,NDVI,SoilMoisture,...,snow_depth_quarterly_mean,wind_gust_quarterly_mean,dew_point_quarterly_mean,average_temperature_yearly_mean,maximum_temperature_yearly_mean,minimum_temperature_yearly_mean,precipitation_yearly_mean,snow_depth_yearly_mean,wind_gust_yearly_mean,dew_point_yearly_mean
0,2013-01-31,0.0,31.287281,-6.995301,0.0,3.0,31.0,0.0,2323.204834,17.211515,...,999.900024,946.35791,81.221893,67.510727,78.131996,61.211182,0.526038,999.900024,923.268799,74.629219
1,2013-01-31,1.0,33.349586,-5.102332,0.0,3.0,31.0,0.0,3824.991455,35.953068,...,999.900024,934.883728,50.842926,65.921844,76.639679,52.941498,1.358508,999.900024,948.962402,68.137794
2,2013-02-01,0.0,31.884949,-6.043734,0.0,4.0,32.0,0.0,2442.943848,19.78075,...,999.900024,935.881592,65.702271,66.930008,78.381691,60.120266,0.85546,999.900024,920.485229,68.063812
3,2013-02-01,1.0,33.911613,-5.494411,0.0,4.0,32.0,0.0,5753.0,37.0,...,999.900024,947.098877,48.127174,64.375816,76.444397,53.086067,1.145929,999.900024,957.453369,47.770901
4,2013-02-02,0.0,31.857609,-5.985701,0.0,5.0,33.0,1.0,2438.428467,17.860897,...,999.900024,940.024414,68.386024,67.123344,77.616066,62.042496,0.6594,999.900024,922.55304,69.138527


In [29]:
grouped_df.tail()

Unnamed: 0,acq_date,is_fire,latitude,longitude,is_holiday,day_of_week,day_of_year,is_weekend,NDVI,SoilMoisture,...,snow_depth_quarterly_mean,wind_gust_quarterly_mean,dew_point_quarterly_mean,average_temperature_yearly_mean,maximum_temperature_yearly_mean,minimum_temperature_yearly_mean,precipitation_yearly_mean,snow_depth_yearly_mean,wind_gust_yearly_mean,dew_point_yearly_mean
6298,2022-12-19,1.0,30.115797,-9.507822,0.0,0.0,353.0,0.0,1860.579712,16.804348,...,999.900024,794.978943,61.585144,68.732635,80.1922,58.058659,0.00907,999.900024,924.112,51.802086
6299,2022-12-20,0.0,31.007114,-7.137991,0.0,1.0,354.0,0.0,1481.867065,21.274815,...,999.900024,779.042175,97.728912,68.483192,79.401939,65.349136,1.044128,999.789307,751.882812,100.389267
6300,2022-12-21,0.0,31.630329,-6.800214,1.0,2.0,355.0,0.0,1669.186646,23.10157,...,999.900024,775.292847,53.510265,67.901039,79.067284,62.639503,0.864148,999.626221,785.699829,64.890205
6301,2022-12-22,0.0,30.786356,-7.419971,0.0,3.0,356.0,0.0,1388.187866,19.763521,...,999.900085,760.960083,54.459949,68.349213,79.527412,65.550713,1.051775,999.753052,711.901123,63.346725
6302,2022-12-23,0.0,31.48361,-7.023698,0.0,4.0,357.0,0.0,1609.934204,23.911671,...,999.899963,782.935852,53.837486,68.303474,79.314964,67.435989,1.296888,999.791931,781.474609,60.064159


#### Finally, we export the grouped_df to a csv file

In [20]:
grouped_df.to_csv(r"D:\AISSMS IOIT - AI&DS (628299510)\General\Hackathons\Prasunethon\reduced.csv", index = False)

# Terminated
Hence, we have successfully performed data downsampling on our dataset.
The number of samples we reduced from 934586 to 6303.

## Next Steps include: Feature Engineering and Feature Extraction
  
Thank you!!!