# Forest Fire Size and Temperature Analysis

In this activity, we will use pandas features to derive some insights from a forest fire dataset. We will get the mean size of forest fires, what the largest recorded fire in our dataset is, and whether the amount of forest fires grows proportionally to the temperature in each month.

Our forest fires dataset has the following structure:

- X: X-axis spatial coordinate within the Montesinho park map: 1 to 9
- Y: Y-axis spatial coordinate within the Montesinho park map: 2 to 9
- month: Month of the year: 'jan' to 'dec'
- day: Day of the week: 'mon' to 'sun'
- FFMC: FFMC index from the FWI system: 18.7 to 96.20
- DMC: DMC index from the FWI system: 1.1 to 291.3
- DC: DC index from the FWI system: 7.9 to 860.6
- ISI: ISI index from the FWI system: 0.0 to 56.10
- temp: Temperature in degrees Celsius: 2.2 to 33.30
- RH: Relative humidity in %: 15.0 to 100
- wind: Wind speed in km/h: 0.40 to 9.40
- rain: Outside rain in mm/m2: 0.0 to 6.4
- area: The burned area of the forest (in ha): 0.00 to 1090.84
- Note: We will only be using the month, temp, and area columns in this activity.

In [1]:
import pandas as pd 

In [4]:
df = pd.read_csv('../../Datasets/forestfires.csv')
df

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
0,7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,0.00
1,7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,0.00
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,0.00
3,8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,0.00
4,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,0.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...
512,4,3,aug,sun,81.6,56.7,665.6,1.9,27.8,32,2.7,0.0,6.44
513,2,4,aug,sun,81.6,56.7,665.6,1.9,21.9,71,5.8,0.0,54.29
514,7,4,aug,sun,81.6,56.7,665.6,1.9,21.2,70,6.7,0.0,11.16
515,1,4,aug,sat,94.4,146.0,614.7,11.3,25.6,42,4.0,0.0,0.00


## Derive insights from the sizes of forest fires

In [7]:
# Filter the dataset so that it only contains entries that have an area larger than 0
df_area = df[df.area > 0]
df_area

Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area
138,9,9,jul,tue,85.8,48.3,313.4,3.9,18.0,42,2.7,0.0,0.36
139,1,4,sep,tue,91.0,129.5,692.6,7.0,21.7,38,2.2,0.0,0.43
140,2,5,sep,mon,90.9,126.5,686.5,7.0,21.9,39,1.8,0.0,0.47
141,1,2,aug,wed,95.5,99.9,513.3,13.2,23.3,31,4.5,0.0,0.55
142,8,6,aug,fri,90.1,108.0,529.8,12.5,21.2,51,8.9,0.0,0.61
...,...,...,...,...,...,...,...,...,...,...,...,...,...
509,5,4,aug,fri,91.0,166.9,752.6,7.1,21.1,71,7.6,1.4,2.17
510,6,5,aug,fri,91.0,166.9,752.6,7.1,18.2,62,5.4,0.0,0.43
512,4,3,aug,sun,81.6,56.7,665.6,1.9,27.8,32,2.7,0.0,6.44
513,2,4,aug,sun,81.6,56.7,665.6,1.9,21.9,71,5.8,0.0,54.29


In [16]:
df_area.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
X,270.0,4.807407,2.383326,1.0,3.0,5.0,7.0,9.0
Y,270.0,4.366667,1.17074,2.0,4.0,4.0,5.0,9.0
FFMC,270.0,91.034074,3.70902,63.5,90.325,91.7,92.975,96.2
DMC,270.0,114.707778,61.78652,3.2,82.9,111.7,141.3,291.3
DC,270.0,570.867037,229.981242,15.3,486.5,665.6,721.325,860.6
ISI,270.0,9.177037,4.14735,0.8,6.8,8.4,11.375,22.7
temp,270.0,19.311111,6.179444,2.2,16.125,20.1,23.4,33.3
RH,270.0,43.733333,15.080059,15.0,33.0,41.0,53.0,96.0
wind,270.0,4.112963,1.884573,0.4,2.7,4.0,4.9,9.4
rain,270.0,0.028889,0.398392,0.0,0.0,0.0,0.0,6.4


In [19]:
# mean of the area column
print(f'mean: {df_area.area.mean():.2f}')
# min of the area column
print(f'min: {df_area.area.min():.2f}')
# max of the area column
print(f'max: {df_area.area.max():.2f}')
# std of the area column
print(f'std: {df_area.area.std():.2f}')

mean: 24.60
min: 0.09
max: 1090.84
std: 86.50


In [22]:
# Sort the filtered dataset using the area column and print the last 20 entries using the tail method to see how many huge values it holds.
df_area.sort_values(by=["area"])[["area"]].tail(20)

Unnamed: 0,area
469,61.13
228,64.1
473,70.32
392,70.76
229,71.3
457,82.75
293,86.45
230,88.49
231,95.18
232,103.39


In [24]:
# get the median of the area column and visually compare it to the mean value.
df_area.area.median()

6.37

## Finding the month with the most forest fires

In [29]:
# Get a list of unique values from the month column of the dataset
months = list(df['month'].unique())
months

['mar',
 'oct',
 'aug',
 'sep',
 'apr',
 'jun',
 'jul',
 'feb',
 'jan',
 'dec',
 'may',
 'nov']

In [28]:
# Get the number of entries for the month of March using the shape member of our DataFrame.
df[df.month == 'mar'].shape[0]

54

In [40]:
# iterate over all the months, filter our dataset for the rows containing the given month, and calculate the mean temperature. Print a statement with the number of fires, the mean temperature, and the month.
for month in months:
    print(month)
    df_temp = df[df.month == month]
    print(f'number of fires: {df_temp.shape[0]}')
    print(f'mean temperature: {df_temp["temp"].mean():.2f}')
    print('\n')

mar
number of fires: 54
mean temperature: 13.08


oct
number of fires: 15
mean temperature: 17.09


aug
number of fires: 184
mean temperature: 21.63


sep
number of fires: 172
mean temperature: 19.61


apr
number of fires: 9
mean temperature: 12.04


jun
number of fires: 17
mean temperature: 20.49


jul
number of fires: 32
mean temperature: 22.11


feb
number of fires: 20
mean temperature: 9.63


jan
number of fires: 2
mean temperature: 5.25


dec
number of fires: 9
mean temperature: 4.52


may
number of fires: 2
mean temperature: 14.65


nov
number of fires: 1
mean temperature: 11.80


