# Exploratory Data Analysis Report
## Algerian Forest Fires Dataset

---

This report presents key findings from exploratory data analysis of the Algerian Forest Fires dataset (June-September 2012). The analysis revealed:

**Key Findings:**
- **Skewness present** in FWI distribution → Log transformation recommended
- **High multicollinearity** detected (BUI & DMC: 0.98, BUI & DC: 0.94)
- **Strong predictors identified:** ISI (r=0.89), FFMC (r=0.87), DMC (r=0.81)
- **Weak predictors:** Wind Speed (r=0.02)
- **Seasonality observed:** Peak fire activity in August-September

**Model Deductions:**
- Regularization techniques (Ridge) recommended due to multicollinearity
- Log transformation applied to target variable (FWI)
- Feature selection needed to eliminate redundant variables (Lasso)

---

## 1. Dataset Overview

### 1.1 Dataset Description

- **Source:** Algerian Forest Fires Dataset, Kaggle
- **Time Period:** June 1 - September 30, 2012
- **Regions:** Bejaia and Sidi Bel-abbes, Algeria
- **Total Instances:** 244 observations
- **Features:** 12 variables (after cleaning)

### 1.2 Feature Categories

**Weather Indices:**
- Temperature (°C)
- RH - Relative Humidity (%)
- Ws - Wind Speed (km/h)
- Rain - Precipitation (mm)

**Canadian Forest Fire Weather Index Components:**
- FFMC - Fine Fuel Moisture Code
- DMC - Duff Moisture Code
- DC - Drought Code
- ISI - Initial Spread Index
- BUI - Buildup Index

**Target Variables:**
- FWI - Fire Weather Index (original)
- FWI-log - Log-transformed FWI (created during preprocessing)
- Classes - Binary fire occurrence (0: No fire, 1: Fire)

**Temporal Variables:**
- Month
- Year
- Day

---

## 2. Distribution Analysis

### 2.1 Target Variable Distribution (FWI)

**Finding: Right-skewed distribution observed**

**Problem:**
- Majority of values concentrated at lower end
- Long tail extending to high values
- Skewness impacts linear regression assumptions

**Solution Applied:**
- Created FWI-log

**Impact:**
- More symmetric distribution
- Better suited for linear regression & tree models
- Reduces influence of extreme values

### 2.2 Feature Distributions

**Observations:**
- Most features show reasonable distributions
- Some outliers present reflecting extreme weather conditions (in the event of a forest fire)
- Rain heavily concentrated at 0mm (Summer Season)

---

## 3. Feature Relationship Insights

### 3.1 Fire Weather Index Components

- Similar relations between (DC, DMC, BUI) and Fire Count, deducted by looking at the visualizations and the collinearity, that is because BUI is derived from DC and DMC,
thus it's enough to look at BUI to assess the relationship with fire occurrence
- FFMC ( Fine Fuel Moisture Code) ranges from **28.6** to **96**, FFMC > **75** have a higher chance of fire
- FWI (Fire Weather Index) ranges from 0 to 31.1
    - Lower Fire Chances for FWI < 3 and FWI > 22-23
    - Increased Fire Chances for FWI between 3 and 22
- BUI (Build-up index) ranges from 1.1 to 68, lower fire chance can be observed for BUI < 5 and BUI > 32

### 3.2 Weather Index

- Rain: Barplot exhibits a significant amount of fires during dry days, forest fire count tends to drop during less rainy days
- Temperature: Highest fire counts can be observed between 31 & 36 degrees 

### 3.3 Temporal

- July and August house the most forest fire counts

---