The contents of this notebook show how I've come up with my Smoke Estimate.

# Section 1: Importing Libraries and Initial Setup

In [3]:
import pandas as pd
import re

In [4]:
filter_df = pd.read_csv('/Users/aviva/Desktop/MSDS/quarter_4/Human Centered Data Science/Part 1/GeoJSON Exports/intermediate_data/filter.csv')

Trying to see the contents of my Overlap Flag column to see how I can leverage it.

In [5]:
print(filter_df['OverlapFlags'].unique())

[nan
 'Caution, this Wildfire in 1963 overlaps with a Wildfire that occurred in 1964 (1 year difference). The overlapping fire overlaps by 11.9% (592.0 acres). Overlapping fire USGS Assigned ID: 14618.'
 'Caution, this Wildfire in 1963 overlaps with a Wildfire that occurred in 1962 (1 year difference). The overlapping fire overlaps by 69.4% (220.0 acres). Overlapping fire USGS Assigned ID: 14005.'
 ...
 'Caution, this Prescribed Fire in 2020 overlaps with a Prescribed Fire that occurred in 2018 (2 year difference). The overlapping fire overlaps by 100.0% (20.0 acres). Overlapping fire USGS Assigned ID: 124961.'
 'Caution, this Prescribed Fire in 2020 overlaps with a Prescribed Fire that occurred in 2019 (1 year difference). The overlapping fire overlaps by 100.0% (7.0 acres). Overlapping fire USGS Assigned ID: 135002.'
 'Caution, this Prescribed Fire in 2020 overlaps with a Prescribed Fire that occurred in 2019 (1 year difference). The overlapping fire overlaps by 100.0% (9.0 acres). O

In [6]:
filter_df.head()

Unnamed: 0,OBJECTID,FireType,FireYear,GISAcres,OverlapFlags,shortest_dist
0,14299,Wildfire,1963,40992.458271,,1045.62
1,14300,Wildfire,1963,25757.090203,,1074.55
2,14301,Wildfire,1963,45527.210986,,1038.72
3,14302,Wildfire,1963,10395.010334,,990.02
4,14303,Wildfire,1963,9983.605738,,1034.55


The objective is now to estimate the impact of smoke from a wildfire on Salina.

We have two primary pieces of information: the number of acres burned by the wildfire `GISAcres` and the distance between the wildfire and the city `shortest_dist`.     

The first step is to acknowledge that several elements, including as plant type, fire intensity, wind direction, weather, and terrain, affect how much smoke is created and spread. This intricate problem is made simpler by introducing a proportional connection. It is expected that the amount of smoke that reaches the city is inversely related to the distance from the fire to the city and directly proportional to the size of the fire (measured in acres).   

`Smoke Factor`  
To account for the variation in smoke production per acre due to factors like vegetation type and fire intensity, a constant factor called Smoke_Factor is introduced. This factor provides a simplification of various elements that affect the amount of smoke produced per acre.  

This factor takes into account Acres Burned and Distance from Salina. More about this:  

`Acre Classification`  
We categorize wildfires into three size categories: "small fire," "medium fire," and "large fire." These categories are determined based on the number of acres burned by the wildfire. For example, a "small fire" could represent fires with fewer acres burned, while a "large fire" could represent massive fires.  

`Distance Classification`    
Wildfires are also categorized by their proximity to the city, resulting in classifications like "close fire," "intermediate fire," and "far fire."   

Based on these classification values, we then define a mapping of Smoke_Factor based on combinations of acre classification and dist classification. For instance:  

- A "small fire" that is "close to the city" may have a Smoke Factor of 0.1.  
- A "medium fire" that is "far from the city" may have a Smoke Factor of 0.3.  
- A "large fire" that is "close to the city" may have a Smoke Factor of 0.9.  

`Overlap Factor`     
We also introduce an Overlap_Factor, which considers the cumulative effects of previous fires and the extent to which they overlap with the current fire's location and time. This factor helps account for how multiple fires, over time and space, may contribute to the overall smoke production and dispersion. This factor takes into account both the time since the previous burn `Years_Since_Previous_Burn` and the extent of the overlap between the wildfires `Overlap_Percentage` which are extracted from the `OverlapFlags` column.    

**Finally, we construct the formula:**.      

`Smoke_Estimate = (Burned_Acres / Distance_from_City) * Smoke_Factor * (1 + Overlap_Factor)`   

The Overlap_Factor is computed using information from the 'OverlapFlags' column. If no overlap information is available, it is set to 0. If overlap information is present, the code extracts the years since the previous burn and the overlap percentage. The Overlap_Factor is then computed as    

`(Years_Since_Previous_Burn + 1) * (1 + (Overlap_Percentage / 100))`   



Some readings which contributed to this estimation:  
1. [Wildland fire growth prediction method based on Multiple Overlapping Solution](https://www.sciencedirect.com/science/article/abs/pii/S1877750310000463)
2. [Vegetation response to a short interval between high-severity wildfires in a mixed-evergreen forest](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/j.1365-2745.2008.01456.x)
3. [Integrating multiple factors to optimize watchtower deployment for wildfire detection](https://www.sciencedirect.com/science/article/pii/S0048969720330783?casa_token=RwM-EUbbC6MAAAAA:XmccsTeNzvDZeBSFQOWeFB2Ew78fxUXJemoPj1xep32vsfMeOrAExhkgiTSBdra3zri671-yGF4)

4. [Fire behavior and smoke modeling: Model improvement and measurement needs for next-generation smoke research and forecasting systems](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7336523/#:~:text=Fire%20behavior%20and%20smoke%20models%20are%20numerical%20tools%20that%20provide,of%20wildland%20fires%20and%20develop)

# Section 2: Function to calculate Overlap Factor 

Extracting information about the number of years it has been since previous burn and the overlap percentage present in the OverlapFlags column

In [7]:
def calculate_overlap_factor(overlap_flags):
    if pd.isna(overlap_flags):
        return 0  

    if "No overlap information available" in overlap_flags:
        return 0 
    year_pattern = r'Wildfire in (\d{4}) overlaps with a Wildfire that occurred in (\d{4})'
    overlap_pattern = r'overlaps by ([\d.]+)%'

    years_match = re.search(year_pattern, overlap_flags)
    overlap_match = re.search(overlap_pattern, overlap_flags)

    if years_match and overlap_match:
        current_year = int(years_match.group(1))
        previous_year = int(years_match.group(2))
        overlap_percentage = float(overlap_match.group(1))

        years_since_previous_burn = current_year - previous_year

        if years_since_previous_burn < 0:
            years_since_previous_burn = 0  

        overlap_factor = (years_since_previous_burn + 1) * (1 + (overlap_percentage / 100))
        return years_since_previous_burn, overlap_percentage, overlap_factor
    else:
        return 0, 0, 0  

Computing Overlap Factor

In [8]:
filter_df['Years_Since_Previous_Burn'] = filter_df['OverlapFlags'].apply(calculate_overlap_factor)
filter_df['Overlap_Percentage'] = filter_df['OverlapFlags'].apply(calculate_overlap_factor)
filter_df['Overlap_Factor'] = filter_df['OverlapFlags'].apply(calculate_overlap_factor)

filter_df.head()

Unnamed: 0,OBJECTID,FireType,FireYear,GISAcres,OverlapFlags,shortest_dist,Years_Since_Previous_Burn,Overlap_Percentage,Overlap_Factor
0,14299,Wildfire,1963,40992.458271,,1045.62,0,0,0
1,14300,Wildfire,1963,25757.090203,,1074.55,0,0,0
2,14301,Wildfire,1963,45527.210986,,1038.72,0,0,0
3,14302,Wildfire,1963,10395.010334,,990.02,0,0,0
4,14303,Wildfire,1963,9983.605738,,1034.55,0,0,0


In [9]:
filter_df.count()

OBJECTID                     91781
FireType                     91781
FireYear                     91781
GISAcres                     91781
OverlapFlags                 13475
shortest_dist                91781
Years_Since_Previous_Burn    91781
Overlap_Percentage           91781
Overlap_Factor               91781
dtype: int64

Sanity checking the Overlap Factor

In [10]:
filter_df['Years_Since_Previous_Burn'] = filter_df['Years_Since_Previous_Burn'].apply(lambda x: x[0] if isinstance(x, tuple) else x)
filter_df['Overlap_Percentage'] = filter_df['Overlap_Percentage'].apply(lambda x: x[0] if isinstance(x, tuple) else x)
filter_df['Overlap_Factor'] = filter_df['Overlap_Factor'].apply(lambda x: x[0] if isinstance(x, tuple) else x)

print("Number of rows with Overlap_Factor equal to 0:", len(filter_df[filter_df['Overlap_Factor'] == 0]))
print("Number of rows with negative Overlap_Factor:", len(filter_df[filter_df['Overlap_Factor'] < 0]))
print("Number of rows with positive Overlap_Factor:", len(filter_df[filter_df['Overlap_Factor'] > 0]))

Number of rows with Overlap_Factor equal to 0: 90164
Number of rows with negative Overlap_Factor: 0
Number of rows with positive Overlap_Factor: 1617


# Section 3: Calculating the Smoke Factor

Checking the data in the GISAcres and Shortest_Distance columns

In [11]:
gisacres_range = (filter_df['GISAcres'].min(), filter_df['GISAcres'].max())
shortest_distance_range = (filter_df['shortest_dist'].min(), filter_df['shortest_dist'].max())

print("Range of GISAcres: ({}, {})".format(gisacres_range[0], gisacres_range[1]))
print("Range of Shortest_Distance: ({}, {})".format(shortest_distance_range[0], shortest_distance_range[1]))

Range of GISAcres: (2.2753461981788584e-08, 1566273.1853343395)
Range of Shortest_Distance: (5.51, 1249.99)


Coming up with acre classification and distance classification 

In [12]:
acre_boundaries = [0, 50000, 500000, filter_df['GISAcres'].max()]
dist_boundaries = [0, 200, 600, filter_df['shortest_dist'].max()]

acre_labels = ['small fire', 'medium fire', 'large fire']
dist_labels = ['close fire', 'intermediate fire', 'far fire']

filter_df['acre classification'] = pd.cut(filter_df['GISAcres'], bins=acre_boundaries, labels=acre_labels)
filter_df['dist classification'] = pd.cut(filter_df['shortest_dist'], bins=dist_boundaries, labels=dist_labels)

filter_df.head()

Unnamed: 0,OBJECTID,FireType,FireYear,GISAcres,OverlapFlags,shortest_dist,Years_Since_Previous_Burn,Overlap_Percentage,Overlap_Factor,acre classification,dist classification
0,14299,Wildfire,1963,40992.458271,,1045.62,0,0,0,small fire,far fire
1,14300,Wildfire,1963,25757.090203,,1074.55,0,0,0,small fire,far fire
2,14301,Wildfire,1963,45527.210986,,1038.72,0,0,0,small fire,far fire
3,14302,Wildfire,1963,10395.010334,,990.02,0,0,0,small fire,far fire
4,14303,Wildfire,1963,9983.605738,,1034.55,0,0,0,small fire,far fire


Assigning smoke factor values to combinations of acre and distance classification

In [13]:
smoke_factor_mapping = {
    ('small fire', 'close fire'): 0.1,             #smaller fires close to the city
    ('small fire', 'intermediate fire'): 0.2,      #Smaller fires at an intermediate distance
    ('small fire', 'far fire'): 0.01,              #Smaller fires far from the city (minimum value)
    ('medium fire', 'close fire'): 0.5,            #Medium-sized fires close to the city
    ('medium fire', 'intermediate fire'): 0.4,     #Medium-sized fires at an intermediate distance
    ('medium fire', 'far fire'): 0.3,              #Medium-sized fires far from the city
    ('large fire', 'close fire'): 0.9,             #Larger fires close to the city (maximum value)
    ('large fire', 'intermediate fire'): 0.8,      #Larger fires at an intermediate distance
    ('large fire', 'far fire'): 0.7,               #Larger fires far from the city
}

filter_df['Smoke_Factor'] = filter_df.apply(lambda row: smoke_factor_mapping.get((row['acre classification'], row['dist classification'])), axis=1)

In [14]:
filter_df.head()

Unnamed: 0,OBJECTID,FireType,FireYear,GISAcres,OverlapFlags,shortest_dist,Years_Since_Previous_Burn,Overlap_Percentage,Overlap_Factor,acre classification,dist classification,Smoke_Factor
0,14299,Wildfire,1963,40992.458271,,1045.62,0,0,0,small fire,far fire,0.01
1,14300,Wildfire,1963,25757.090203,,1074.55,0,0,0,small fire,far fire,0.01
2,14301,Wildfire,1963,45527.210986,,1038.72,0,0,0,small fire,far fire,0.01
3,14302,Wildfire,1963,10395.010334,,990.02,0,0,0,small fire,far fire,0.01
4,14303,Wildfire,1963,9983.605738,,1034.55,0,0,0,small fire,far fire,0.01


Applying final formula to calculate the Smoke Estimate

# Section 4: Calculating the Smoke Estimate

In [15]:
filter_df['Smoke_Estimate'] = (filter_df['GISAcres'] / filter_df['shortest_dist']) * filter_df['Smoke_Factor'] * (1 + filter_df['Overlap_Factor'])
filter_df

Unnamed: 0,OBJECTID,FireType,FireYear,GISAcres,OverlapFlags,shortest_dist,Years_Since_Previous_Burn,Overlap_Percentage,Overlap_Factor,acre classification,dist classification,Smoke_Factor,Smoke_Estimate
0,14299,Wildfire,1963,40992.458271,,1045.62,0,0,0,small fire,far fire,0.01,0.392040
1,14300,Wildfire,1963,25757.090203,,1074.55,0,0,0,small fire,far fire,0.01,0.239701
2,14301,Wildfire,1963,45527.210986,,1038.72,0,0,0,small fire,far fire,0.01,0.438301
3,14302,Wildfire,1963,10395.010334,,990.02,0,0,0,small fire,far fire,0.01,0.104998
4,14303,Wildfire,1963,9983.605738,,1034.55,0,0,0,small fire,far fire,0.01,0.096502
...,...,...,...,...,...,...,...,...,...,...,...,...,...
91776,135052,Prescribed Fire,2020,60.879054,,1115.61,0,0,0,small fire,far fire,0.01,0.000546
91777,135056,Prescribed Fire,2020,14.545208,,1111.90,0,0,0,small fire,far fire,0.01,0.000131
91778,135058,Prescribed Fire,2020,7.050837,"Caution, this Prescribed Fire in 2020 overlaps...",1117.25,0,0,0,small fire,far fire,0.01,0.000063
91779,135059,Prescribed Fire,2020,9.342668,"Caution, this Prescribed Fire in 2020 overlaps...",1117.31,0,0,0,small fire,far fire,0.01,0.000084


In [16]:
selected_columns = ['FireYear', 'GISAcres', 'shortest_dist', 'Smoke_Estimate']
filter_df = filter_df[selected_columns]

In [18]:
filter_df.to_csv('/Users/aviva/Desktop/HCDS Project/salina_wildfire_analysis/intermediate_data/smoke_est_without_agg.csv', index=False)

In [17]:
filter_df

Unnamed: 0,FireYear,GISAcres,shortest_dist,Smoke_Estimate
0,1963,40992.458271,1045.62,0.392040
1,1963,25757.090203,1074.55,0.239701
2,1963,45527.210986,1038.72,0.438301
3,1963,10395.010334,990.02,0.104998
4,1963,9983.605738,1034.55,0.096502
...,...,...,...,...
91776,2020,60.879054,1115.61,0.000546
91777,2020,14.545208,1111.90,0.000131
91778,2020,7.050837,1117.25,0.000063
91779,2020,9.342668,1117.31,0.000084


Grouping the mean smoke estimate by year and taking the mean of the remaining columns

# Section 5: Final Dataframe with Smoke Estimate Data grouped by year

In [18]:
selected_columns = ['FireYear', 'GISAcres', 'shortest_dist', 'Smoke_Estimate']
smoke_est_df = filter_df[selected_columns]
smoke_est_df = smoke_est_df.groupby('FireYear').mean().reset_index()
smoke_est_df

Unnamed: 0,FireYear,GISAcres,shortest_dist,Smoke_Estimate
0,1963,766.773922,896.624518,0.022618
1,1964,1146.301446,870.262828,0.153983
2,1965,587.560328,992.53375,0.149189
3,1966,1397.433572,932.50522,0.111709
4,1967,1175.659287,1010.487956,0.098337
5,1968,778.607539,1047.34041,0.009388
6,1969,697.467811,1012.400827,0.009837
7,1970,1606.377982,870.597663,0.226435
8,1971,1403.523162,811.794534,0.256539
9,1972,788.254539,935.641934,0.404458


In [19]:
smoke_est_df.to_csv('/Users/aviva/Desktop/HCDS Project/salina_wildfire_analysis/intermediate_data/smoke_est.csv', index=False)