<a href="https://colab.research.google.com/github/SeanG347/kaggle-projects/blob/main/American_Wildfire_Geospatial_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.
import kagglehub
rtatman_188_million_us_wildfires_path = kagglehub.dataset_download('rtatman/188-million-us-wildfires')

print('Data source import complete.')


# Imports

In [None]:
import math
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import sqlite3
import folium
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster
import os

# Connecting to Database, Creating DataFrame

In [None]:
# This code block will create an sqlite connection to the database,
# initialize a cursor, and run the first query,
# this query will showcase all of the tables in the database.

filepath = '/kaggle/input/188-million-us-wildfires/FPA_FOD_20170508.sqlite'

sqliteConnection = sqlite3.connect(filepath)
cursor = sqliteConnection.cursor()

query = 'SELECT Name FROM sqlite_master;'

cursor.execute(query)
cursor.fetchall()

- It turns out that this database has a pretty large number of tables. We have a few options at this point
    - Go through different tables until we find one with 'latitude' and 'longitude' attributes,
    - Create a loop to iteratively go through all of the tables until one is found with the 'latitude' and 'longitude' attributes.
- It ended up being a pretty obvious solution however, the location data was in the 'Fires' table.

In [None]:
# This query will fetch all of the features in the 'Fires' table.

query = 'PRAGMA table_info (Fires)'

cursor.execute(query)
cursor.fetchall()

In [None]:
# This will establishe the query which retrieves the longitude and
# latitude attributes from the 'Fires' table, and then executes the query.
# The results from the query will be assigned to the temporary 'table' variable.

query = 'SELECT longitude, latitude, fire_size FROM Fires'

cursor.execute(query)
table = cursor.fetchall()

In [None]:
# Now that we have the temporary 'table' variable, which is a list of tuples
# containing the size, longitude and latitude of the wildfires, we are going to have
# to trisect the list into three separate lists for size, longitude and latitude so we can
# create a DataFrame with the values.

longitude = (x[0] for x in table)
latitude = (x[1] for x in table)
fire_size = (x[2] for x in table)

df = pd.DataFrame({'Longitude': longitude,
                    'Latitude': latitude,
                  'fire_size': fire_size})

# Because the dataset is so large, it makes sense to limit the number of values.
# This reduces runtime and also makes the maps less cluttered.


df2 = df.sort_values(by=['fire_size'], ascending=False)[0:100000]


# First map: 50 largest wildfires recorded in the database.


In [None]:
# Creating a map with markers indicating the location and size of the 50 biggest wildfires.

M1 = folium.Map(location=[50,-120], tiles='openstreetmap', zoom_start=3.35)

for idx, row in df2[0:49].iterrows():
    Circle(location=[row['Latitude'],row['Longitude']],
           radius = row['fire_size']/5,
           color = 'red',
            fill = True,
           fill_opacity = 0.4
          ).add_to(M1)

M1

- Although there are some 1.88 million records in the database, drawing 1.88 million markers on the map is overencumbering, so this is instead a map of the 50 largest wildfires in the database, with the size of the markers corresponding with the fires actual size.
- It seems most of the largest wildfires are in Alaska, which is likely due to the sheer size of greenspaces in Alaska (lack of urbanization).

# Second map: Using MarkerClusters to plot the 100000 largest wildfires

In [None]:
# This will create a map with MarkerClusters showing the 100000 largest wildfires

M2 = folium.Map(location=[50,-120], tiles='openstreetmap', zoom_start=3.35)

MC = MarkerCluster()

for idx, row in df2.iterrows():
    if not math.isnan(row['Latitude']) and not math.isnan(row['Longitude']):
        MC.add_child(Marker([row['Latitude'],row['Longitude']]))

M2.add_child(MC)

- This map will use MarkerClusters to show the locations of the 100000 biggest wildfires.

- The map suggests that the majority of large-scale wildfires in the US occur in the South-Eastern portion of the mpa, i.e., Texas due East to Florida. Of the 100000 largest wildfires, approximately 50000 of them occur in this band. There are also a lot of wildfires in the Northwestern portion of the map, i.e., North California, Oregon, and Utah.

# Third map: Heatmap with all 1.88 million wildfires

In [None]:
M3 = folium.Map(location=[50,-120], tiles='openstreetmap', zoom_start=3.35)

HeatMap(data=df[['Latitude', 'Longitude']], radius=8).add_to(M3)

M3