# Part 1 - Common Analysis

This notebook illustrates the steps taken towards laying the groundwork towards the project deliverables as part of the project component for DATA 512. This Notebook in particular deals with data procurement from the combined wildfire dataset from the US Geological Survey [1]. This project focuses particularly on the city of Cheyenne, Wyoming.

### Preliminaries *
We import some libraries conducive towards data collection

In [240]:
#    Import some standard python modules
import json
#
#    The module pyproj is a standard module that can be installed using pip or your other favorite
#    installation tool. This module provides tools to convert between different geodesic coordinate systems
#    and for calculating distances between points (coordinates) in a specific geodesic system.
#
from pyproj import Transformer, Geod

import geojson

from tqdm import tqdm

# Pandas is a library for data manipulation, it would require installation if you do not have it already.
import pandas as pd
import os

In [2]:
#
#    CONSTANTS
#

EXTRACT_FILENAME = "USGS_Wildland_Fire_Combined_Dataset.json"
FILENAME = os.path.join(EXTRACT_FILENAME)
print(f"{FILENAME=}")


# Setting CITY_LOCATION to Cheyenne, WY
CITY_LOCATION = {
    'cheyenne': {'city': 'Cheyenne',
                 'latlon': [41.1400, -104.8202] }
}


FILENAME='USGS_Wildland_Fire_Combined_Dataset.json'


## Example 1. Load the wildfire data using the geojson module*

In this example we use the GeoJSON module ([documentation](https://pypi.org/project/geojson/), [GitHub repo](https://github.com/jazzband/geojson)) to load the sample file. This module works mostly the way you would expect. GeoJSON is mostly just JSON, so actually, you don't even really need to use the GeoJSON module. However, that module will do some conversion of Geo type things to something useful. However, this example, and the examples that follow, do not rely on specific Geo features from geojson.


In [4]:
#
#    Open a file, load it with the geojson loader
#
print(f"Attempting to open '{FILENAME}'")
with open(FILENAME,"r") as f:
    gj_data = geojson.load(f)


Attempting to open 'USGS_Wildland_Fire_Combined_Dataset.json'


In [5]:
#    Print the keys from the object
#
gj_keys = list(gj_data.keys())
print("The loaded JSON dictionary has the following keys:")
print(gj_keys)
print()

The loaded JSON dictionary has the following keys:
['displayFieldName', 'fieldAliases', 'geometryType', 'spatialReference', 'fields', 'features']



## Example 3. Distance computations with Pyproj *

One issue in performing geodetic computation is that any (all) geographic coordinate systems are eventually translated to the surface of the earth - which is not flat. That means every computation of distance between two points is some kind of arc (not actually a straight line). Further the earth is not a true sphere, its a type of ellipsoid. That means the amount of curvature varies depending upon where you are on the surface and the direction - which changes the distance.

Lucky for us there are geographers who like to write code and have built systems to simplify the computation of distances over the earth's surface. One of those systems is called [Pyproj](https://pyproj4.github.io/pyproj/stable/index.html). It has functions that will convert coordinate points between (almost) any two of the many different geographic coordinate systems. As well, Pyproj provides ways to compute distances between two points (mostly assuming the points are already in the same coordinate system).

This example uses the Geod() object to calculate the distance between a slected starting city and all of the cities defined in our CITY_LOCATIONS dictionary (see CONSTANTS above).

The example calls the distances computed 'straight line' distances - because that is what you would have to use to find the distance between two cities using Google. If you didn't use some form of language like that Google would map roads to get you between a source and destination; that would never match our calculation.


## Example 4. Convert points between geodetic coordinate systems *

One of the constraints in doing geodetic computations is that most of the time we need to have our points (the coordinates for places) in the same geographic coordinate system. There are tons and tons of coordinate systems. You can find descriptions of many of them at [EPSG.io](https://epsg.io).

Looking at the wildfire header information, you can find this in the output of Example 1, we can see fields named "geometryType" and "spatialReference". This looks like:

        "geometryType": "esriGeometryPolygon",
        "spatialReference": {
            "wkid": 102008,
            "latestWkid": 102008
        },

This indicates that the geometry of our wildfire data are generic polygons and that they are expressed in a coordinate system with the well-known ID (WKID) 102008. This coordinate system is also known as [ESRI:102008](https://epsg.io/102008)

If you look back at Example 2, you might have wondered about the line of code that says:

    geocalc = Geod(ellps='WGS84')         # Use WGS84 ellipsoid representation of the earth

That string, 'WGS84', is a representation of the earth, that also relies on a well known coordinate system that is sometimes called 'decimal degrees' (DD). That decimal degrees system has an official name (or WKID) of [EPSG:4326](https://epsg.io/4326).

For the example below, what we're going to do is take the geometry of a fire feature, extract the largest ring (i.e., the largest boundary of the fire) and convert all of the points in that ring from the ESRI:102008 coordinate system to EPSG:4326 coordinates.


In [6]:
#
#    Transform feature geometry data
#
#    The function takes one parameter, a list of ESRI:102008 coordinates that will be transformed to EPSG:4326
#    The function returns a list of coordinates in EPSG:4326
def convert_ring_to_epsg4326(ring_data=None):
    converted_ring = list()
    #
    # We use a pyproj transformer that converts from ESRI:102008 to EPSG:4326 to transform the list of coordinates
    to_epsg4326 = Transformer.from_crs("ESRI:102008","EPSG:4326")
    # We'll run through the list transforming each ESRI:102008 x,y coordinate into a decimal degree lat,lon
    for coord in ring_data:
        lat,lon = to_epsg4326.transform(coord[0],coord[1])
        new_coord = lat,lon
        converted_ring.append(new_coord)
    return converted_ring

## Example 5. Compute distance between a place and a wildfire*

The basic problem is knowing how far away a fire is from some location (like a city). One issue is that fires are irregularly shaped so the actual answer to that is a bit dependent upon the exact shape and how you want to think about the notion of 'distance'. For example, should we just find the closest point on the perimiter of a fire and call that the distance? Maybe we should find the centroid of the region, identify that as a geolocation (coordinate) and then calculate the distance to that? We can come up with numerous other ways.

The first bit of code finds the point on the perimiter with the shortest distance to the city (place) and returns the distance as well as the lat,lon of the perimeter point.

The second bit of code calculates the average distance of all perimeter points to the city (place) and returns that average as the distance. This is not quite what the centroid would be, but it is probably fairly close.

These are two reasonable ways to think about possible distance to a fire. But both require computing distance to a whole set of points on the perimeter of a fire.


In [7]:
#    
#    The function takes two parameters
#        A place - which is coordinate point (list or tuple with two items, (lat,lon) in decimal degrees EPSG:4326
#        Ring_data - a list of decimal degree coordinates for the fire boundary
#
#    The function returns a list containing the shortest distance to the perimeter and the point where that is
#
def shortest_distance_from_place_to_fire_perimeter(place=None,ring_data=None):
    # convert the ring data to the right coordinate system
    ring = convert_ring_to_epsg4326(ring_data)    
    # create a epsg4326 compliant object - which is what the WGS84 ellipsoid is
    geodcalc = Geod(ellps='WGS84')
    closest_point = list()
    # run through each point in the converted ring data
    for point in ring:
        # calculate the distance
        d = geodcalc.inv(place[1],place[0],point[1],point[0])
        # convert the distance to miles
        distance_in_miles = d[2]*0.00062137
        # if it's closer to the city than the point we have, save it
        if not closest_point:
            closest_point.append(distance_in_miles)
            closest_point.append(point)
        elif closest_point and closest_point[0]>distance_in_miles:
            closest_point = list()
            closest_point.append(distance_in_miles)
            closest_point.append(point)
    return closest_point


In [8]:
#    Get a city from the CITY_LOCATIONS constant as the starting position
place = CITY_LOCATION["cheyenne"]
attributes_list = []
count_curveRings = 0

for feature in tqdm(gj_data['features']):
    try:
        wf_year = feature['attributes']['Fire_Year']
        if 1963 <= wf_year <= 2023:
            ring_data = feature['geometry']['rings'][0]
        
        #   Compute using the shortest distance to any point on the perimeter
            distance = shortest_distance_from_place_to_fire_perimeter(place['latlon'],ring_data)

            if distance[0] <= 1250.00:
                feature_attributes = feature['attributes']
                feature_attributes['Distance'] = distance[0]
                attributes_list.append(feature_attributes)
    except Exception as e:
        count_curveRings += 1

print(f"Number of curveRings : {count_curveRings}")


# Create a DataFrame from the list of feature dictionaries
df = pd.DataFrame(attributes_list)


100%|███████████████████████████████████████████████████████████████████████████████████████| 135061/135061 [45:58<00:00, 48.96it/s]


Number of curveRings : 35


In [241]:
df = pd.DataFrame(attributes_list)

In [242]:
df.to_csv('data_with_distance.csv', index=False)

In [243]:
df.describe()

Unnamed: 0,OBJECTID,USGS_Assigned_ID,Fire_Year,Fire_Polygon_Tier,GIS_Acres,GIS_Hectares,Circleness_Scale,Circle_Flag,Shape_Length,Shape_Area,Distance
count,100569.0,100569.0,100569.0,100569.0,100569.0,100569.0,100569.0,7852.0,100569.0,100569.0,100569.0
mean,74393.4868,74393.4868,2002.541529,3.116557,1749.352,707.9376,0.479948,1.0,10612.0,7079376.0,721.045515
std,35642.763884,35642.763884,14.190653,2.709905,13439.42,5438.738,0.260132,0.0,65429.74,54387380.0,252.351801
min,14299.0,14299.0,1963.0,1.0,6.558795e-07,2.65425e-07,5.1e-05,1.0,0.3082688,0.00265425,9.434425
25%,43612.0,43612.0,1993.0,1.0,13.85425,5.606616,0.272176,1.0,1170.598,56066.16,551.850005
50%,72660.0,72660.0,2006.0,1.0,81.689,33.05836,0.448062,1.0,3281.98,330583.6,761.608554
75%,106720.0,106720.0,2014.0,7.0,638.8205,258.5215,0.657171,1.0,10008.75,2585215.0,903.421013
max,135061.0,135061.0,2020.0,8.0,1566273.0,633848.3,0.999917,1.0,17579480.0,6338483000.0,1249.997689


In [244]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100569 entries, 0 to 100568
Data columns (total 31 columns):
 #   Column                        Non-Null Count   Dtype  
---  ------                        --------------   -----  
 0   OBJECTID                      100569 non-null  int64  
 1   USGS_Assigned_ID              100569 non-null  int64  
 2   Assigned_Fire_Type            100569 non-null  object 
 3   Fire_Year                     100569 non-null  int64  
 4   Fire_Polygon_Tier             100569 non-null  int64  
 5   Fire_Attribute_Tiers          100569 non-null  object 
 6   GIS_Acres                     100569 non-null  float64
 7   GIS_Hectares                  100569 non-null  float64
 8   Source_Datasets               100569 non-null  object 
 9   Listed_Fire_Types             100569 non-null  object 
 10  Listed_Fire_Names             100569 non-null  object 
 11  Listed_Fire_Codes             100569 non-null  object 
 12  Listed_Fire_IDs               99745 non-null

In [245]:
df['Listed_Fire_Dates'] = df['Listed_Fire_Dates'].astype(str)

In [246]:
# Checking for null values in each column
df.isnull().sum()

OBJECTID                            0
USGS_Assigned_ID                    0
Assigned_Fire_Type                  0
Fire_Year                           0
Fire_Polygon_Tier                   0
Fire_Attribute_Tiers                0
GIS_Acres                           0
GIS_Hectares                        0
Source_Datasets                     0
Listed_Fire_Types                   0
Listed_Fire_Names                   0
Listed_Fire_Codes                   0
Listed_Fire_IDs                   824
Listed_Fire_IRWIN_IDs           30417
Listed_Fire_Dates                   0
Listed_Fire_Causes                  0
Listed_Fire_Cause_Class             0
Listed_Rx_Reported_Acres        79650
Listed_Map_Digitize_Methods     12627
Listed_Notes                    29712
Processing_Notes                28175
Wildfire_Notice                     0
Prescribed_Burn_Notice              0
Wildfire_and_Rx_Flag            99223
Overlap_Within_1_or_2_Flag      87686
Circleness_Scale                    0
Circle_Flag 

In [247]:
# Using regex for extracting the potential start and end dates for the fires in Cheyenne

import re
def extract_dates(date_str):
    dates = re.findall(r'\d{4}-\d{2}-\d{2}', date_str)
    return dates

df['Dates'] = df['Listed_Fire_Dates'].apply(extract_dates)

In [248]:
# Split the dates into start and end dates
df['Start Date'] = df['Dates'].apply(lambda x: x[0] if len(x) > 0 else None)
df['End Date'] = df['Dates'].apply(lambda x: x[1] if len(x) > 1 else x[0] if len(x) > 0 else None)

# Drop the 'Dates' column
df.drop('Dates', axis=1, inplace=True)

In [249]:
# Save this file with the start and end dates extracted earlier
df.to_csv("start_end_dates_file.csv")

In [258]:
# Some of the Start Date and End Date in the Wildfires had starting dates before the 'Fire_Year', these entries will be dropped.
df['Start Date'] = pd.to_datetime(df['Start Date'], format='%Y-%m-%d', errors='coerce')
df['End Date'] = pd.to_datetime(df['End Date'], format='%Y-%m-%d', errors='coerce')

threshold_year = pd.to_datetime('1963-01-01')

# Filter the DataFrame
filtered_df = df[(df['Start Date'] >= threshold_year) & (df['End Date'] >= threshold_year)]

# Reset the index of the filtered DataFrame
filtered_df.reset_index(drop=True, inplace=True)

In [259]:
# Quick Inspection of the Filtered Dataframe
filtered_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86562 entries, 0 to 86561
Data columns (total 33 columns):
 #   Column                        Non-Null Count  Dtype         
---  ------                        --------------  -----         
 0   OBJECTID                      86562 non-null  int64         
 1   USGS_Assigned_ID              86562 non-null  int64         
 2   Assigned_Fire_Type            86562 non-null  object        
 3   Fire_Year                     86562 non-null  int64         
 4   Fire_Polygon_Tier             86562 non-null  int64         
 5   Fire_Attribute_Tiers          86562 non-null  object        
 6   GIS_Acres                     86562 non-null  float64       
 7   GIS_Hectares                  86562 non-null  float64       
 8   Source_Datasets               86562 non-null  object        
 9   Listed_Fire_Types             86562 non-null  object        
 10  Listed_Fire_Names             86562 non-null  object        
 11  Listed_Fire_Codes           

In [260]:
# Converting 'Start Date' to datetime format
filtered_df['Start Date'] = filtered_df['Start Date'].dt.date

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_df['Start Date'] = filtered_df['Start Date'].dt.date


In [261]:
# Converting 'End Date' to datetime format
filtered_df['End Date'] = filtered_df['End Date'].dt.date

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_df['End Date'] = filtered_df['End Date'].dt.date


In [262]:
filtered_df.head()

Unnamed: 0,OBJECTID,USGS_Assigned_ID,Assigned_Fire_Type,Fire_Year,Fire_Polygon_Tier,Fire_Attribute_Tiers,GIS_Acres,GIS_Hectares,Source_Datasets,Listed_Fire_Types,...,Wildfire_and_Rx_Flag,Overlap_Within_1_or_2_Flag,Circleness_Scale,Circle_Flag,Exclude_From_Summary_Rasters,Shape_Length,Shape_Area,Distance,Start Date,End Date
0,14299,14299,Wildfire,1963,1,"1 (1), 3 (3)",40992.458271,16589.059302,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (1), Likely Wildfire (3)",...,,,0.385355,,No,73550.428118,165890600.0,632.041602,1963-08-06,1963-12-31
1,14300,14300,Wildfire,1963,1,"1 (1), 3 (3)",25757.090203,10423.524591,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (2), Likely Wildfire (2)",...,,,0.364815,,No,59920.576713,104235200.0,661.238055,1963-07-28,2019-09-13
2,14301,14301,Wildfire,1963,1,"1 (5), 3 (15), 5 (1)",45527.210986,18424.208617,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (6), Likely Wildfire (15)",...,,,0.320927,,No,84936.82781,184242100.0,625.424025,1963-08-06,1963-12-31
3,14302,14302,Wildfire,1963,1,"1 (1), 3 (3), 5 (1)",10395.010334,4206.711433,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (2), Likely Wildfire (3)",...,,,0.428936,,No,35105.903602,42067110.0,576.211068,1963-08-06,1963-12-31
4,14303,14303,Wildfire,1963,1,"1 (1), 3 (3)",9983.605738,4040.2219,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (1), Likely Wildfire (3)",...,,,0.703178,,No,26870.456126,40402220.0,620.880899,1963-08-06,1963-12-31


#### Dropping Irrelevant Features

In [263]:

relevant_features = filtered_df.drop(filtered_df[['OBJECTID', 'USGS_Assigned_ID', 'Fire_Attribute_Tiers', 'Source_Datasets', 'Overlap_Within_1_or_2_Flag', 'Circleness_Scale', 'Circle_Flag', 'Exclude_From_Summary_Rasters']], axis = 1)


In [264]:
relevant_features.drop(relevant_features[['Listed_Fire_Codes', 'Listed_Fire_IDs', 'Listed_Fire_IRWIN_IDs', 'Wildfire_and_Rx_Flag']], axis = 1, inplace = True)

In [265]:
relevant_features.drop(relevant_features[['Listed_Notes', 'Processing_Notes']],axis = 1, inplace = True)

In [266]:
relevant_features.drop(relevant_features[['Listed_Fire_Causes', 'Listed_Fire_Cause_Class', 'Listed_Map_Digitize_Methods', 'Listed_Rx_Reported_Acres']], axis = 1, inplace = True)

In [267]:
relevant_features.head()

Unnamed: 0,Assigned_Fire_Type,Fire_Year,Fire_Polygon_Tier,GIS_Acres,GIS_Hectares,Listed_Fire_Types,Listed_Fire_Names,Listed_Fire_Dates,Wildfire_Notice,Prescribed_Burn_Notice,Shape_Length,Shape_Area,Distance,Start Date,End Date
0,Wildfire,1963,1,40992.458271,16589.059302,"Wildfire (1), Likely Wildfire (3)",RATTLESNAKE (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,73550.428118,165890600.0,632.041602,1963-08-06,1963-12-31
1,Wildfire,1963,1,25757.090203,10423.524591,"Wildfire (2), Likely Wildfire (2)","McChord Butte (2), No Fire Name Provided (1), ...",Listed Wildfire Discovery Date(s): 1963-07-28 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,59920.576713,104235200.0,661.238055,1963-07-28,2019-09-13
2,Wildfire,1963,1,45527.210986,18424.208617,"Wildfire (6), Likely Wildfire (15)","WILLOW CREEK (16), EAST CRANE CREEK (4), Crane...",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,84936.82781,184242100.0,625.424025,1963-08-06,1963-12-31
3,Wildfire,1963,1,10395.010334,4206.711433,"Wildfire (2), Likely Wildfire (3)","SOUTH CANYON CREEK (4), No Fire Name Provided (1)",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,35105.903602,42067110.0,576.211068,1963-08-06,1963-12-31
4,Wildfire,1963,1,9983.605738,4040.2219,"Wildfire (1), Likely Wildfire (3)",WEBB CREEK (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,26870.456126,40402220.0,620.880899,1963-08-06,1963-12-31


#### Mapping the fire type to a severity score

In [297]:
fire_type_mapping = {
    'Wildfire': 5,
    'Likely Wildfire': 4,
    'Unknown - Likely Wildfire': 3,
    'Prescribed Fire': 2,
    'Unknown - Likely Prescribed Fire': 1
}

# Use the replace method to apply the mapping and create a new 'Label' column
#relevant_features['Fire_Severity'] = relevant_features['Listed_Fire_Types'].replace(fire_type_mapping)

relevant_features['First_Fire_Type'] = relevant_features['Listed_Fire_Types'].str.extract(f'({"|".join(fire_type_mapping.keys())})', expand=False)

# Use the replace method to apply the mapping based on the first keyword
relevant_features['Fire_Severity'] = relevant_features['First_Fire_Type'].replace(fire_type_mapping)

# Drop the 'First_Fire_Type' column if not needed
relevant_features.drop('First_Fire_Type', axis=1, inplace=True)

# Display the DataFrame with the 'Label' column
relevant_features.head()

Unnamed: 0,Assigned_Fire_Type,Fire_Year,Fire_Polygon_Tier,GIS_Acres,GIS_Hectares,Listed_Fire_Types,Listed_Fire_Names,Listed_Fire_Dates,Wildfire_Notice,Prescribed_Burn_Notice,Shape_Length,Shape_Area,Distance,Start Date,End Date,Label,Fire_Duration,Fire_Severity
0,Wildfire,1963,1,40992.458271,16589.059302,"Wildfire (1), Likely Wildfire (3)",RATTLESNAKE (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,73550.428118,165890600.0,632.041602,1963-08-06,1963-12-31,5,148,5
1,Wildfire,1963,1,25757.090203,10423.524591,"Wildfire (2), Likely Wildfire (2)","McChord Butte (2), No Fire Name Provided (1), ...",Listed Wildfire Discovery Date(s): 1963-07-28 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,59920.576713,104235200.0,661.238055,1963-07-28,2019-09-13,5,1,5
2,Wildfire,1963,1,45527.210986,18424.208617,"Wildfire (6), Likely Wildfire (15)","WILLOW CREEK (16), EAST CRANE CREEK (4), Crane...",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,84936.82781,184242100.0,625.424025,1963-08-06,1963-12-31,5,148,5
3,Wildfire,1963,1,10395.010334,4206.711433,"Wildfire (2), Likely Wildfire (3)","SOUTH CANYON CREEK (4), No Fire Name Provided (1)",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,35105.903602,42067110.0,576.211068,1963-08-06,1963-12-31,5,148,5
4,Wildfire,1963,1,9983.605738,4040.2219,"Wildfire (1), Likely Wildfire (3)",WEBB CREEK (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,26870.456126,40402220.0,620.880899,1963-08-06,1963-12-31,5,148,5


In [269]:
relevant_features.head()

Unnamed: 0,Assigned_Fire_Type,Fire_Year,Fire_Polygon_Tier,GIS_Acres,GIS_Hectares,Listed_Fire_Types,Listed_Fire_Names,Listed_Fire_Dates,Wildfire_Notice,Prescribed_Burn_Notice,Shape_Length,Shape_Area,Distance,Start Date,End Date,Label
0,Wildfire,1963,1,40992.458271,16589.059302,"Wildfire (1), Likely Wildfire (3)",RATTLESNAKE (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,73550.428118,165890600.0,632.041602,1963-08-06,1963-12-31,"Wildfire (1), Likely Wildfire (3)"
1,Wildfire,1963,1,25757.090203,10423.524591,"Wildfire (2), Likely Wildfire (2)","McChord Butte (2), No Fire Name Provided (1), ...",Listed Wildfire Discovery Date(s): 1963-07-28 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,59920.576713,104235200.0,661.238055,1963-07-28,2019-09-13,"Wildfire (2), Likely Wildfire (2)"
2,Wildfire,1963,1,45527.210986,18424.208617,"Wildfire (6), Likely Wildfire (15)","WILLOW CREEK (16), EAST CRANE CREEK (4), Crane...",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,84936.82781,184242100.0,625.424025,1963-08-06,1963-12-31,"Wildfire (6), Likely Wildfire (15)"
3,Wildfire,1963,1,10395.010334,4206.711433,"Wildfire (2), Likely Wildfire (3)","SOUTH CANYON CREEK (4), No Fire Name Provided (1)",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,35105.903602,42067110.0,576.211068,1963-08-06,1963-12-31,"Wildfire (2), Likely Wildfire (3)"
4,Wildfire,1963,1,9983.605738,4040.2219,"Wildfire (1), Likely Wildfire (3)",WEBB CREEK (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,26870.456126,40402220.0,620.880899,1963-08-06,1963-12-31,"Wildfire (1), Likely Wildfire (3)"


In [270]:
relevant_features['First_Fire_Type'] = relevant_features['Listed_Fire_Types'].str.extract(f'({"|".join(fire_type_mapping.keys())})', expand=False)

# Use the replace method to apply the mapping based on the first keyword
relevant_features['Label'] = relevant_features['First_Fire_Type'].replace(fire_type_mapping)

# Drop the 'First_Fire_Type' column if not needed
relevant_features.drop('First_Fire_Type', axis=1, inplace=True)

# Display the DataFrame with the 'Label' column
relevant_features.head()

Unnamed: 0,Assigned_Fire_Type,Fire_Year,Fire_Polygon_Tier,GIS_Acres,GIS_Hectares,Listed_Fire_Types,Listed_Fire_Names,Listed_Fire_Dates,Wildfire_Notice,Prescribed_Burn_Notice,Shape_Length,Shape_Area,Distance,Start Date,End Date,Label
0,Wildfire,1963,1,40992.458271,16589.059302,"Wildfire (1), Likely Wildfire (3)",RATTLESNAKE (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,73550.428118,165890600.0,632.041602,1963-08-06,1963-12-31,5
1,Wildfire,1963,1,25757.090203,10423.524591,"Wildfire (2), Likely Wildfire (2)","McChord Butte (2), No Fire Name Provided (1), ...",Listed Wildfire Discovery Date(s): 1963-07-28 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,59920.576713,104235200.0,661.238055,1963-07-28,2019-09-13,5
2,Wildfire,1963,1,45527.210986,18424.208617,"Wildfire (6), Likely Wildfire (15)","WILLOW CREEK (16), EAST CRANE CREEK (4), Crane...",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,84936.82781,184242100.0,625.424025,1963-08-06,1963-12-31,5
3,Wildfire,1963,1,10395.010334,4206.711433,"Wildfire (2), Likely Wildfire (3)","SOUTH CANYON CREEK (4), No Fire Name Provided (1)",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,35105.903602,42067110.0,576.211068,1963-08-06,1963-12-31,5
4,Wildfire,1963,1,9983.605738,4040.2219,"Wildfire (1), Likely Wildfire (3)",WEBB CREEK (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,26870.456126,40402220.0,620.880899,1963-08-06,1963-12-31,5


### Calculating the number of days a fire has been burning

If the extracted start and end dates are the same day, it is set to 1.
If the nd date value extracted is from a different year as the starting year, it is also set to 1. This is because of 
potential extraction of dates that belong to "listed other fire dates" but are misconstrued to be the end date of the same fire,
as a result, some fires have been misrepresented as running for decades, even if that was not the case.

Finally, if the start and end dates are in the same year but not the same day, the number of days between them is calculated.

In [313]:
relevant_features['Start Date'] = pd.to_datetime(relevant_features['Start Date'])
relevant_features['End Date'] = pd.to_datetime(relevant_features['End Date'])

# Define a function to calculate the number of days in the start year
def days_in_start_year(start_date, end_date):
    if start_date == end_date:
        return 1
    elif start_date.year != end_date.year:
        return 1
    else:
        return (end_date - start_date).days + 1

# Calculate the number of days in the start year and add it as a new column
relevant_features['Fire_Duration'] = abs(relevant_features.apply(lambda row: days_in_start_year(row['Start Date'], row['End Date']), axis=1))

relevant_features.head()

Unnamed: 0,Assigned_Fire_Type,Fire_Year,Fire_Polygon_Tier,GIS_Acres,GIS_Hectares,Listed_Fire_Types,Listed_Fire_Names,Listed_Fire_Dates,Wildfire_Notice,Prescribed_Burn_Notice,Shape_Length,Shape_Area,Distance,Start Date,End Date,Label,Fire_Duration,Fire_Severity
0,Wildfire,1963,1,40992.458271,16589.059302,"Wildfire (1), Likely Wildfire (3)",RATTLESNAKE (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,73550.428118,165890600.0,632.041602,1963-08-06,1963-12-31,5,148,5
1,Wildfire,1963,1,25757.090203,10423.524591,"Wildfire (2), Likely Wildfire (2)","McChord Butte (2), No Fire Name Provided (1), ...",Listed Wildfire Discovery Date(s): 1963-07-28 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,59920.576713,104235200.0,661.238055,1963-07-28,2019-09-13,5,1,5
2,Wildfire,1963,1,45527.210986,18424.208617,"Wildfire (6), Likely Wildfire (15)","WILLOW CREEK (16), EAST CRANE CREEK (4), Crane...",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,84936.82781,184242100.0,625.424025,1963-08-06,1963-12-31,5,148,5
3,Wildfire,1963,1,10395.010334,4206.711433,"Wildfire (2), Likely Wildfire (3)","SOUTH CANYON CREEK (4), No Fire Name Provided (1)",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,35105.903602,42067110.0,576.211068,1963-08-06,1963-12-31,5,148,5
4,Wildfire,1963,1,9983.605738,4040.2219,"Wildfire (1), Likely Wildfire (3)",WEBB CREEK (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,26870.456126,40402220.0,620.880899,1963-08-06,1963-12-31,5,148,5


In [272]:
relevant_features['Fire_Duration'] = relevant_features.apply(lambda row: 1 if row['Start Date'].year != row['Fire_Year'] else row['Fire_Duration'], axis=1)

In [273]:
relevant_features.columns

Index(['Assigned_Fire_Type', 'Fire_Year', 'Fire_Polygon_Tier', 'GIS_Acres',
       'GIS_Hectares', 'Listed_Fire_Types', 'Listed_Fire_Names',
       'Listed_Fire_Dates', 'Wildfire_Notice', 'Prescribed_Burn_Notice',
       'Shape_Length', 'Shape_Area', 'Distance', 'Start Date', 'End Date',
       'Label', 'Fire_Duration'],
      dtype='object')

### Dropping more irrelevant columns

In [274]:
final_data = relevant_features.drop(relevant_features[['Assigned_Fire_Type', 'Listed_Fire_Names', 'Wildfire_Notice', 'Prescribed_Burn_Notice','Fire_Polygon_Tier']], axis = 1)

In [277]:
final_data.to_csv("Final_Data.csv", index = False)

In [278]:
final_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86562 entries, 0 to 86561
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Fire_Year          86562 non-null  int64         
 1   GIS_Acres          86562 non-null  float64       
 2   GIS_Hectares       86562 non-null  float64       
 3   Listed_Fire_Types  86562 non-null  object        
 4   Listed_Fire_Dates  86562 non-null  object        
 5   Shape_Length       86562 non-null  float64       
 6   Shape_Area         86562 non-null  float64       
 7   Distance           86562 non-null  float64       
 8   Start Date         86562 non-null  datetime64[ns]
 9   End Date           86562 non-null  datetime64[ns]
 10  Fire_Severity      86562 non-null  int64         
 11  Fire_Duration      86562 non-null  int64         
dtypes: datetime64[ns](2), float64(5), int64(3), object(2)
memory usage: 7.9+ MB


### Computing the Smoke Estimate Using PandaSQL

The section below calculates the smoke estimate. 
The smoke estimate is calculated as follows:

SE = (GIS_Hectares*100/(Distance*Distance))+(Fire_Severity*Fire_Severity)

I used PandaSQL here because it seemed like a much easier option compared to conventional pandas.

In [317]:
# pandasql is a library that allows you to run SQL queries on Pandas. You will need to install it if you haven't already.
from pandasql import sqldf

# Initialize the pandasql environment
pysqldf = lambda q: sqldf(q, globals())

#query = "SELECT Fire_Year, (GIS_Hectares*Fire_Severity*100/(Distance*Distance)) AS 'Smoke_Estimate' FROM final_data GROUP BY Fire_Year"
query = "SELECT Fire_Year, ((GIS_Hectares*100/(Distance*Distance))+(Fire_Severity*Fire_Severity)) AS 'Smoke_Estimate' FROM final_data GROUP BY Fire_Year"
result = pysqldf(query)

print(result)

    Fire_Year  Smoke_Estimate
0        1963       29.152699
1        1964       28.119148
2        1965       25.684264
3        1966       38.789540
4        1967       26.178780
5        1968       27.573815
6        1969       30.306756
7        1970       34.252991
8        1971       43.883030
9        1972      122.541505
10       1973       35.231876
11       1974       30.602351
12       1975       28.121000
13       1976       45.948106
14       1977       32.729191
15       1978       26.284318
16       1979       33.693399
17       1980       29.967571
18       1981       62.905778
19       1982       27.307752
20       1983       29.956787
21       1984       35.886498
22       1985       33.450799
23       1986       31.460369
24       1987       29.106540
25       1988      226.359028
26       1989      174.212598
27       1990       31.763400
28       1991       45.066107
29       1992       55.454219
30       1993       27.049711
31       1994       44.342491
32       1

## Fire Season-Specific Computation

The fire season is described to be the time of year ranging from May 1 to October 31 each year. The data is filtered down to the said data-ranges and the previous steps are repeated

In [298]:
# Filtering the data for data only specific to the fire season
filtered_df['Start Date'] = pd.to_datetime(df['Start Date'], format='%Y-%m-%d', errors='coerce')
filtered_df['End Date'] = pd.to_datetime(df['End Date'], format='%Y-%m-%d', errors='coerce')

start_date_season = pd.to_datetime('05-01', format='%m-%d')
end_date_season = pd.to_datetime('10-31', format='%m-%d')


time_filtered_df = filtered_df[(filtered_df['Start Date'].dt.month >= start_date_season.month) & (filtered_df['End Date'].dt.month <= end_date_season.month)]

time_filtered_df.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_df['Start Date'] = pd.to_datetime(df['Start Date'], format='%Y-%m-%d', errors='coerce')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_df['End Date'] = pd.to_datetime(df['End Date'], format='%Y-%m-%d', errors='coerce')


Unnamed: 0,OBJECTID,USGS_Assigned_ID,Assigned_Fire_Type,Fire_Year,Fire_Polygon_Tier,Fire_Attribute_Tiers,GIS_Acres,GIS_Hectares,Source_Datasets,Listed_Fire_Types,...,Wildfire_and_Rx_Flag,Overlap_Within_1_or_2_Flag,Circleness_Scale,Circle_Flag,Exclude_From_Summary_Rasters,Shape_Length,Shape_Area,Distance,Start Date,End Date
1,14300,14300,Wildfire,1963,1,"1 (1), 3 (3)",25757.090203,10423.524591,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (2), Likely Wildfire (2)",...,,,0.364815,,No,59920.576713,104235200.0,661.238055,1963-07-28,2019-09-13
6,14305,14305,Wildfire,1963,1,"1 (3), 3 (3)",4995.910129,2021.773099,Comb_National_NIFC_Interagency_Fire_Perimeter_...,Wildfire (6),...,,,0.994948,1.0,Yes,15979.785579,20217730.0,384.787789,2018-05-02,2018-05-02
7,14306,14306,Wildfire,1963,1,"1 (1), 3 (1)",4995.253626,2021.507422,Comb_National_NIFC_Interagency_Fire_Perimeter_...,Wildfire (2),...,,"Caution, this Wildfire in 1963 overlaps with a...",0.994707,1.0,Yes,15980.673439,20215070.0,334.891985,2018-05-02,2018-05-02
28,14343,14343,Wildfire,1963,1,"1 (1), 3 (2)",632.34351,255.900339,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (1), Likely Wildfire (2)",...,,,0.432292,,No,8624.859107,2559003.0,492.083557,1963-07-15,1963-07-15
37,14358,14358,Wildfire,1963,1,"1 (1), 3 (2)",452.466868,183.106845,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (1), Likely Wildfire (2)",...,,,0.277739,,No,9102.046989,1831068.0,505.11585,1963-08-12,1963-08-12


### Removign Irrelevant columns

In [299]:
time_relevant_features = time_filtered_df.drop(time_filtered_df[['OBJECTID', 'USGS_Assigned_ID', 'Fire_Attribute_Tiers', 'Source_Datasets', 'Overlap_Within_1_or_2_Flag', 'Circleness_Scale', 'Circle_Flag', 'Exclude_From_Summary_Rasters']], axis = 1)


In [300]:
time_relevant_features.drop(time_relevant_features[['Listed_Fire_Codes', 'Listed_Fire_IDs', 'Listed_Fire_IRWIN_IDs', 'Wildfire_and_Rx_Flag']], axis = 1, inplace = True)

In [301]:
time_relevant_features.drop(time_relevant_features[['Listed_Notes', 'Processing_Notes']],axis = 1, inplace = True)

In [302]:
time_relevant_features.drop(time_relevant_features[['Listed_Fire_Causes', 'Listed_Fire_Cause_Class', 'Listed_Map_Digitize_Methods', 'Listed_Rx_Reported_Acres']], axis = 1, inplace = True)

In [303]:
time_relevant_features.head()

Unnamed: 0,Assigned_Fire_Type,Fire_Year,Fire_Polygon_Tier,GIS_Acres,GIS_Hectares,Listed_Fire_Types,Listed_Fire_Names,Listed_Fire_Dates,Wildfire_Notice,Prescribed_Burn_Notice,Shape_Length,Shape_Area,Distance,Start Date,End Date
1,Wildfire,1963,1,25757.090203,10423.524591,"Wildfire (2), Likely Wildfire (2)","McChord Butte (2), No Fire Name Provided (1), ...",Listed Wildfire Discovery Date(s): 1963-07-28 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,59920.576713,104235200.0,661.238055,1963-07-28,2019-09-13
6,Wildfire,1963,1,4995.910129,2021.773099,Wildfire (6),No Fire Name Provided (6),Listed Other Fire Date(s): 2018-05-02 - NIFC D...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,15979.785579,20217730.0,384.787789,2018-05-02,2018-05-02
7,Wildfire,1963,1,4995.253626,2021.507422,Wildfire (2),No Fire Name Provided (2),Listed Other Fire Date(s): 2018-05-02 - NIFC D...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,15980.673439,20215070.0,334.891985,2018-05-02,2018-05-02
28,Wildfire,1963,1,632.34351,255.900339,"Wildfire (1), Likely Wildfire (2)",No Fire Name Provided (3),Listed Wildfire Discovery Date(s): 1963-12-31 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,8624.859107,2559003.0,492.083557,1963-07-15,1963-07-15
37,Wildfire,1963,1,452.466868,183.106845,"Wildfire (1), Likely Wildfire (2)",No Fire Name Provided (3),Listed Wildfire Discovery Date(s): 1963-12-31 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,9102.046989,1831068.0,505.11585,1963-08-12,1963-08-12


### Performing FIre Severity Mapping for Fire Season Data

In [304]:
fire_type_mapping = {
    'Wildfire': 5,
    'Likely Wildfire': 4,
    'Unknown - Likely Wildfire': 3,
    'Prescribed Fire': 2,
    'Unknown - Likely Prescribed Fire': 1
}

time_relevant_features['First_Fire_Type'] = time_relevant_features['Listed_Fire_Types'].str.extract(f'({"|".join(fire_type_mapping.keys())})', expand=False)

# Use the replace method to apply the mapping based on the first keyword
time_relevant_features['Fire_Severity'] = time_relevant_features['First_Fire_Type'].replace(fire_type_mapping)

# Drop the 'First_Fire_Type' column if not needed
time_relevant_features.drop('First_Fire_Type', axis=1, inplace=True)

relevant_features.head()

Unnamed: 0,Assigned_Fire_Type,Fire_Year,Fire_Polygon_Tier,GIS_Acres,GIS_Hectares,Listed_Fire_Types,Listed_Fire_Names,Listed_Fire_Dates,Wildfire_Notice,Prescribed_Burn_Notice,Shape_Length,Shape_Area,Distance,Start Date,End Date,Label,Fire_Duration,Fire_Severity
0,Wildfire,1963,1,40992.458271,16589.059302,"Wildfire (1), Likely Wildfire (3)",RATTLESNAKE (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,73550.428118,165890600.0,632.041602,1963-08-06,1963-12-31,5,148,5
1,Wildfire,1963,1,25757.090203,10423.524591,"Wildfire (2), Likely Wildfire (2)","McChord Butte (2), No Fire Name Provided (1), ...",Listed Wildfire Discovery Date(s): 1963-07-28 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,59920.576713,104235200.0,661.238055,1963-07-28,2019-09-13,5,1,5
2,Wildfire,1963,1,45527.210986,18424.208617,"Wildfire (6), Likely Wildfire (15)","WILLOW CREEK (16), EAST CRANE CREEK (4), Crane...",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,84936.82781,184242100.0,625.424025,1963-08-06,1963-12-31,5,148,5
3,Wildfire,1963,1,10395.010334,4206.711433,"Wildfire (2), Likely Wildfire (3)","SOUTH CANYON CREEK (4), No Fire Name Provided (1)",Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,35105.903602,42067110.0,576.211068,1963-08-06,1963-12-31,5,148,5
4,Wildfire,1963,1,9983.605738,4040.2219,"Wildfire (1), Likely Wildfire (3)",WEBB CREEK (4),Listed Wildfire Discovery Date(s): 1963-08-06 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,26870.456126,40402220.0,620.880899,1963-08-06,1963-12-31,5,148,5


### Calculating Fire Duration for Fire Season Specific Data

In [306]:
time_relevant_features['Start Date'] = pd.to_datetime(relevant_features['Start Date'])
time_relevant_features['End Date'] = pd.to_datetime(relevant_features['End Date'])

time_relevant_features['Fire_Duration'] = abs(relevant_features.apply(lambda row: days_in_start_year(row['Start Date'], row['End Date']), axis=1))

time_relevant_features.head()

Unnamed: 0,Assigned_Fire_Type,Fire_Year,Fire_Polygon_Tier,GIS_Acres,GIS_Hectares,Listed_Fire_Types,Listed_Fire_Names,Listed_Fire_Dates,Wildfire_Notice,Prescribed_Burn_Notice,Shape_Length,Shape_Area,Distance,Start Date,End Date,Fire_Severity,Fire_Duration
1,Wildfire,1963,1,25757.090203,10423.524591,"Wildfire (2), Likely Wildfire (2)","McChord Butte (2), No Fire Name Provided (1), ...",Listed Wildfire Discovery Date(s): 1963-07-28 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,59920.576713,104235200.0,661.238055,1963-07-28,2019-09-13,5,1
6,Wildfire,1963,1,4995.910129,2021.773099,Wildfire (6),No Fire Name Provided (6),Listed Other Fire Date(s): 2018-05-02 - NIFC D...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,15979.785579,20217730.0,384.787789,2018-05-02,2018-05-02,5,1
7,Wildfire,1963,1,4995.253626,2021.507422,Wildfire (2),No Fire Name Provided (2),Listed Other Fire Date(s): 2018-05-02 - NIFC D...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,15980.673439,20215070.0,334.891985,2018-05-02,2018-05-02,5,1
28,Wildfire,1963,1,632.34351,255.900339,"Wildfire (1), Likely Wildfire (2)",No Fire Name Provided (3),Listed Wildfire Discovery Date(s): 1963-12-31 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,8624.859107,2559003.0,492.083557,1963-12-31,1963-12-31,5,1
37,Wildfire,1963,1,452.466868,183.106845,"Wildfire (1), Likely Wildfire (2)",No Fire Name Provided (3),Listed Wildfire Discovery Date(s): 1963-12-31 ...,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,9102.046989,1831068.0,505.11585,1963-12-31,1963-12-31,5,1


In [307]:
time_relevant_features['Fire_Duration'] = time_relevant_features.apply(lambda row: 1 if row['Start Date'].year != row['Fire_Year'] else row['Fire_Duration'], axis=1)

In [309]:
time_final_data = time_relevant_features.drop(time_relevant_features[['Assigned_Fire_Type', 'Listed_Fire_Names', 'Wildfire_Notice', 'Prescribed_Burn_Notice','Fire_Polygon_Tier']], axis = 1)

In [310]:
time_final_data.to_csv("Fire_Season_Data.csv", index = False)

In [311]:
time_final_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 48475 entries, 1 to 86546
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   Fire_Year          48475 non-null  int64         
 1   GIS_Acres          48475 non-null  float64       
 2   GIS_Hectares       48475 non-null  float64       
 3   Listed_Fire_Types  48475 non-null  object        
 4   Listed_Fire_Dates  48475 non-null  object        
 5   Shape_Length       48475 non-null  float64       
 6   Shape_Area         48475 non-null  float64       
 7   Distance           48475 non-null  float64       
 8   Start Date         48475 non-null  datetime64[ns]
 9   End Date           48475 non-null  datetime64[ns]
 10  Fire_Severity      48475 non-null  int64         
 11  Fire_Duration      48475 non-null  int64         
dtypes: datetime64[ns](2), float64(5), int64(3), object(2)
memory usage: 4.8+ MB


In [316]:
# Performing the Smoke Estimation On the Fire Season Data
pysqldf = lambda q: sqldf(q, globals())

#query = "SELECT Fire_Year, (GIS_acres*100/Distance) AS 'Smoke_Estimate' FROM time_final_data GROUP BY Fire_Year"
query = "SELECT Fire_Year, ((GIS_Hectares*100/(Distance*Distance))+(Fire_Severity*Fire_Severity)) AS 'Smoke_Estimate' FROM time_final_data GROUP BY Fire_Year"
# Execute the SQL query
result = pysqldf(query)

# Display the result
print(result)

    Fire_Year  Smoke_Estimate
0        1963       27.383960
1        1964       28.119148
2        1965       38.290024
3        1966       26.480105
4        1967       25.894638
5        1968       27.573815
6        1969       30.306756
7        1970       27.136854
8        1971       43.883030
9        1972      122.541505
10       1973       26.191835
11       1974       28.348991
12       1975       25.801608
13       1976       45.948106
14       1977       32.729191
15       1978       25.484703
16       1979       26.667167
17       1980       27.106922
18       1981       29.258299
19       1982       25.796083
20       1983       29.956787
21       1984       26.521806
22       1985       33.531702
23       1986       30.896192
24       1987       29.106540
25       1988      136.774432
26       1989       27.571061
27       1990       26.841689
28       1991       28.471299
29       1992       55.454219
30       1993       27.049711
31       1994       44.342491
32       1

In [318]:
fire_season_estimate = pd.DataFrame(result_fire_season)

In [320]:
fire_season_estimate.to_csv('fire_season_estimates.csv', index = False)

## References

[1] US Geological Survey, https://www.sciencebase.gov/catalog/item/61aa537dd34eb622f699df81

#### * This snippet was taken from the example notebooks provided by Dr.David McDonald