# Fire Classification Week 1 Notebook

This notebook analyzes MODIS fire data from **2021–2023** for India.  
We perform:
- Dataset loading & merging
- Basic cleaning (duplicates)
- Exploratory Data Analysis (EDA)
- Visualization of fire types and confidence values


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Show plots inline
%matplotlib inline

In [None]:
# Load MODIS fire data
df1 = pd.read_csv('modis_2021_India.csv')
df2 = pd.read_csv('modis_2022_India.csv')
df3 = pd.read_csv('modis_2023_India.csv')

# Merge datasets
df = pd.concat([df1, df2, df3], ignore_index=True)
df.drop_duplicates(inplace=True)

print('Data Shape:', df.shape)
df.head()

In [None]:
# Check dataset info
df.info()

# Describe numerical columns
df.describe()

# Fire type counts
df['type'].value_counts()

In [None]:
# Count plot of fire types
plt.figure(figsize=(6,4))
sns.countplot(x='type', data=df)
plt.title('Fire Type Distribution')
plt.show()

# Histogram of confidence
plt.figure(figsize=(6,4))
sns.histplot(df['confidence'], bins=20, kde=True)
plt.title('Confidence Distribution')
plt.show()

# Observations

- The dataset has **271217 rows and 15 columns**.
- Fire type distribution: {0: 257625, 2: 13550, 3: 42}.
- No missing values found in the dataset.
- `type 0` (vegetation fires) is dominant, while types 2 and 3 are rare.
- Confidence values range from 0 to 100 and have peaks around 50+.
