In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Analysis of the California Fire Incidents Dataset

### 1. Data Overview
- **Total Records (Entries):** 1,636
- **Total Variables (Columns):** 40
- **Dataset Type:** CSV file

### 2. Data Provenance
- **Source:** The dataset appears to be compiled from official fire department records, including CAL FIRE and other agencies (kagle : https://www.kaggle.com/datasets/ananthu017/california-wildfire-incidents-20132020)
- **Collection Method:** Likely aggregated from fire incident reports submitted by various firefighting units and government agencies.

### 3. Census or Sample?
- This dataset is **not a sample**; it seems to be a **census of fire incidents** within California, capturing all reported fires over a specified period.

### 4. Data Processing
- Some fields, such as **AcresBurned**, **Fatalities**, and **PersonnelInvolved**, appear to be numeric.
- Others, like **ConditionStatement** and **Status**, contain text descriptions.
- **Dates** are in an ISO 8601 format (e.g., `2013-08-17T15:25:00Z`).
- Some values are missing, indicating incomplete reporting.

### 5. Selection/Sampling Criteria
- The dataset includes **all reported incidents** rather than a sampled subset.
- Fires that meet a particular threshold (size, impact, or agency reporting) may be included.

### 6. Units and Sample Size
- **Unit of Analysis:** Each row represents an individual fire incident.
- **Sample Size (Total Fires Recorded):** 1,636 incidents.

### 7. Key Variables (40 total)
- **Fire Characteristics:** `AcresBurned`, `PercentContained`, `FuelType`
- **Location:** `Latitude`, `Longitude`, `Counties`
- **Response Details:** `Engines`, `AirTankers`, `Helicopters`, `CrewsInvolved`
- **Impact:** `Fatalities`, `Injuries`, `StructuresDamaged`, `StructuresDestroyed`
- **Dates:** `Started`, `Extinguished`, `Updated`
- **Incident Management:** `AdminUnit`, `MajorIncident`, `CalFireIncident`



## Fire Characteristics

### AcresBurned
**Description:** Total area burned in acres.

**Type:** Numerical (Continuous)

**Range:** 0 - 257,314 acres

### PercentContained
**Description:** Percentage of fire containment.

**Type:** Numerical (Continuous)

**Range:** 0 - 100%

### FuelType
**Description:** Type of fuel that contributed to the fire.

**Type:** Categorical

**Levels:** Grass, Brush, Timber, Mixed

## Location

### Latitude
**Description:** Geographic coordinate (latitude) of fire incident.

**Type:** Numerical (Continuous)

**Range:** -90 to 90

### Longitude
**Description:** Geographic coordinate (longitude) of fire incident.

**Type:** Numerical (Continuous)

**Range:** -180 to 180

### Counties
**Description:** County where the fire occurred.

**Type:** Categorical

**Levels:** Various California counties

## Response Details

### Engines
**Description:** Number of fire engines deployed.

**Type:** Numerical (Discrete)

**Values:** 0, 1, 2, ...

### AirTankers
**Description:** Number of air tankers used.

**Type:** Numerical (Discrete)

**Values:** 0, 1, 2, ...

### Helicopters
**Description:** Number of helicopters involved.

**Type:** Numerical (Discrete)

**Values:** 0, 1, 2, ...

### CrewsInvolved
**Description:** Number of firefighting crews involved.

**Type:** Numerical (Discrete)

**Values:** 0, 1, 2, ...

## Impact

### Fatalities
**Description:** Number of reported fatalities due to the fire.

**Type:** Numerical (Discrete)

**Values:** 0, 1, 2, ...

### Injuries
**Description:** Number of people injured due to the fire.

**Type:** Numerical (Discrete)

**Values:** 0, 1, 2, ...

### StructuresDamaged
**Description:** Number of structures damaged.

**Type:** Numerical (Discrete)

**Values:** 0, 1, 2, ...

### StructuresDestroyed
**Description:** Number of structures completely destroyed.

**Type:** Numerical (Discrete)

**Values:** 0, 1, 2, ...

## Dates

### Started
**Description:** Date and time when the fire started.

**Type:** DateTime

**Format:** ISO 8601 (YYYY-MM-DDTHH:MM:SSZ)

### Extinguished
**Description:** Date and time when the fire was fully extinguished.

**Type:** DateTime

**Format:** ISO 8601 (YYYY-MM-DDTHH:MM:SSZ)

### Updated
**Description:** Last update timestamp of the fire record.

**Type:** DateTime

**Format:** ISO 8601 (YYYY-MM-DDTHH:MM:SSZ)

## Incident Management

### AdminUnit
**Description:** Fire management agency responsible for the fire.

**Type:** Categorical

**Levels:** Various fire management agencies

### MajorIncident
**Description:** Whether the fire was classified as a major incident.

**Type:** Boolean

**Levels:** True, False

### CalFireIncident
**Description:** Whether CAL FIRE was involved in managing the incident.

**Type:** Boolean

**Levels:** True, False



In [None]:
df = pd.read.csv("California_Fire_Incidents.csv")
df

In [4]:
df.describe()

Unnamed: 0,AcresBurned,AirTankers,ArchiveYear,CrewsInvolved,Dozers,Engines,Fatalities,Helicopters,Injuries,Latitude,Longitude,PercentContained,PersonnelInvolved,StructuresDamaged,StructuresDestroyed,StructuresEvacuated,StructuresThreatened,WaterTenders
count,1633.0,28.0,1636.0,171.0,123.0,191.0,21.0,84.0,120.0,1636.0,1636.0,1633.0,204.0,67.0,175.0,0.0,30.0,146.0
mean,4589.443968,4.071429,2016.608802,11.561404,7.585366,23.565445,8.619048,5.357143,3.5,37.203975,-108.082642,100.0,328.553922,67.970149,271.788571,,522.8,7.815068
std,27266.337722,6.399818,1.84534,14.455633,14.028616,41.004424,18.529642,7.265437,3.806231,135.40138,37.006927,0.0,521.138789,155.771975,1557.255963,,739.586856,12.719251
min,0.0,0.0,2013.0,0.0,0.0,0.0,1.0,0.0,0.0,-120.258,-124.19629,100.0,0.0,0.0,0.0,,0.0,1.0
25%,35.0,2.0,2015.0,2.5,1.0,5.0,1.0,1.0,1.0,34.165891,-121.768358,100.0,55.0,1.0,1.0,,0.0,2.0
50%,100.0,2.0,2017.0,6.0,2.0,11.0,3.0,2.0,3.0,37.104065,-120.46156,100.0,151.5,6.0,7.0,,45.0,4.0
75%,422.0,4.0,2018.0,13.5,5.0,24.0,6.0,5.0,4.0,39.086808,-117.474073,100.0,350.0,49.5,41.5,,1043.75,6.0
max,410203.0,27.0,2019.0,82.0,76.0,256.0,85.0,29.0,26.0,5487.0,118.9082,100.0,3100.0,783.0,18804.0,,2600.0,79.0


In [9]:
df.info

<bound method DataFrame.info of       AcresBurned  Active                                          AdminUnit  \
0        257314.0   False  Stanislaus National Forest/Yosemite National Park   
1         30274.0   False  USFS Angeles National Forest/Los Angeles Count...   
2         27531.0   False  CAL FIRE Riverside Unit / San Bernardino Natio...   
3         27440.0   False                              Tahoe National Forest   
4         24251.0   False                       Ventura County Fire/CAL FIRE   
...           ...     ...                                                ...   
1631          9.0   False                   CAL FIRE / Riverside County Fire   
1632          2.0   False                  CAL FIRE Nevada-Yuba-Placer Unit    
1633          NaN   False               Yolo County Fire Protection District   
1634          NaN   False                   Camp Pendleton Marine Corps Base   
1635          NaN   False                           Bureau of Indian Affairs   

      A

Check missing values

In [10]:
df.isnull().sum()


AcresBurned                3
Active                     0
AdminUnit                  0
AirTankers              1608
ArchiveYear                0
CalFireIncident            0
CanonicalUrl               0
ConditionStatement      1352
ControlStatement        1531
Counties                   0
CountyIds                  0
CrewsInvolved           1465
Dozers                  1513
Engines                 1445
Extinguished              59
Fatalities              1615
Featured                   0
Final                      0
FuelType                1624
Helicopters             1552
Injuries                1516
Latitude                   0
Location                   0
Longitude                  0
MajorIncident              0
Name                       0
PercentContained           3
PersonnelInvolved       1432
Public                     0
SearchDescription         17
SearchKeywords           203
Started                    4
Status                     0
StructuresDamaged       1569
StructuresDest

In [None]:
df0 = df.dropna()


Unnamed: 0,AcresBurned,Active,AdminUnit,AirTankers,ArchiveYear,CalFireIncident,CanonicalUrl,ConditionStatement,ControlStatement,Counties,...,SearchKeywords,Started,Status,StructuresDamaged,StructuresDestroyed,StructuresEvacuated,StructuresThreatened,UniqueId,Updated,WaterTenders


In [14]:
df0.isnull().sum()


AcresBurned             0
Active                  0
AdminUnit               0
AirTankers              0
ArchiveYear             0
CalFireIncident         0
CanonicalUrl            0
ConditionStatement      0
ControlStatement        0
Counties                0
CountyIds               0
CrewsInvolved           0
Dozers                  0
Engines                 0
Extinguished            0
Fatalities              0
Featured                0
Final                   0
FuelType                0
Helicopters             0
Injuries                0
Latitude                0
Location                0
Longitude               0
MajorIncident           0
Name                    0
PercentContained        0
PersonnelInvolved       0
Public                  0
SearchDescription       0
SearchKeywords          0
Started                 0
Status                  0
StructuresDamaged       0
StructuresDestroyed     0
StructuresEvacuated     0
StructuresThreatened    0
UniqueId                0
Updated     

In [6]:
n = len(df)
n

1636