## Introduction:
The data I used is from the public data repository earthdata.nasa.gov, and is titled: ABoVE: Synthesis of Burned and Unburned Forest Site Data, AK and Canada, 1983-2016
The questions I asked when analyzing this data:
1. How does the average total carbon combusted vary across different ecoregions?
2. What is the relationship between stand age and total carbon combusted?
3. What is the distribution of total carbon combusted across all sites?
4. How do fire danger indices, such as the Fire Weather Index (FWI), change over time?
5. What are the temporal trends in total carbon combusted over the years?
6. How do soil carbon levels relate to total carbon combusted during fires?
7. What is the average stand age across different ecoregions?
8. What are the correlations between key variables such as total carbon combusted, stand age, and fire weather indices?
9. How does the burn depth impact soil carbon content?
10. How has precipitation varied over time in areas affected by fires?


In [1]:
import pandas as pd
import numpy as np

In [2]:
# Load the datasets into pandas DataFrames
burned_plot_data = pd.read_csv("AK_CA_Burned_Plot_Data_1983_2016.csv")
soil_profile_data = pd.read_csv("AK_CA_Soil_Profile_Synthesis.csv")

In [None]:
burned_plot_data.head()

In [None]:
soil_profile_data.head()

## THE STRUCTURE OF DATA:

In [None]:
# Check the structure and missing data
burned_plot_data.info()

In [None]:
soil_profile_data.info()

## SUMMARY STATISTICS:

In [None]:
# Get a quick summary of numerical data
burned_plot_data.describe()

In [None]:
soil_profile_data.describe()

## DATA CLEANING:

In [11]:
# Replace -9999.0 with NaN for meaningful missing values
burned_plot_data.replace(-9999.0, np.nan, inplace=True)
soil_profile_data.replace(-9999.0, np.nan, inplace=True)

In [None]:
# Check the number of missing values per column
burned_plot_data.isnull().sum()

In [None]:
soil_profile_data.isnull().sum()

In [15]:
# Drop columns with more than 50% missing values
burned_plot_data = burned_plot_data.dropna(thresh=len(burned_plot_data) * 0.5, axis=1)
soil_profile_data = soil_profile_data.dropna(thresh=len(soil_profile_data) * 0.5, axis=1)

In [None]:
# Verify the updated structure
burned_plot_data.info()

In [None]:
soil_profile_data.info()

In [18]:
# Remove duplicates
burned_plot_data.drop_duplicates(inplace=True)
soil_profile_data.drop_duplicates(inplace=True)

## STANDARDIZING DATA:

In [19]:
# Ensure the column 'site' is consistent between both datasets
burned_plot_data['site'] = burned_plot_data['site'].str.strip()
soil_profile_data['site'] = soil_profile_data['site'].str.strip()

In [20]:
# Save cleaned datasets to new CSV files
burned_plot_data.to_csv("cleaned_burned_plot_data.csv", index=False)
soil_profile_data.to_csv("cleaned_soil_profile_data.csv", index=False)

## DATA VISUALIZATION:

In [21]:
import seaborn as sns
import matplotlib.pyplot as plt

histogram of total carbon combusted:

In [24]:
burned_plot_data['total_c_combusted'] = (
    burned_plot_data['ag_c_combusted'] + burned_plot_data['bg_c_combusted']
)

In [None]:
sns.histplot(burned_plot_data['total_c_combusted'], kde=True)
plt.title("Distribution of Total Carbon Combusted")
plt.xlabel("Total Carbon Combusted")
plt.ylabel("Frequency")
plt.show()

In [None]:
burned_plot_data[['ag_c_combusted', 'bg_c_combusted', 'total_c_combusted']].head()

Scatter Plot: Stand Age vs. Total Carbon Combusted

In [None]:
# Scatter plot of stand age vs. total carbon combusted
sns.scatterplot(x='stand_age', y='total_c_combusted', data=burned_plot_data)
plt.title("Stand Age vs Total Carbon Combusted")
plt.xlabel("Stand Age")
plt.ylabel("Total Carbon Combusted")
plt.show()

Average Total Carbon Combusted by Ecoregion

In [None]:
ecoregion_combustion = burned_plot_data.groupby('ecoregion_name_l2')['total_c_combusted'].mean()
print(ecoregion_combustion)

In [None]:
# Plot the results
ecoregion_combustion.plot(kind='bar', figsize=(10, 6))
plt.title('Average Total Carbon Combusted by Ecoregion')
plt.xlabel('Ecoregion')
plt.ylabel('Average Total Carbon Combusted')
plt.show()

## Correlation Analysis

Correlation Heatmap for Numerical Variables

In [37]:
# Select only numeric columns
numeric_data = burned_plot_data.select_dtypes(include=['float64', 'int64'])

In [None]:
# Compute correlation and plot heatmap
plt.figure(figsize=(12, 10))  # Increase figure size
sns.heatmap(numeric_data.corr(), annot=True, fmt=".2f", cmap="coolwarm")
plt.title('Correlation Heatmap for Numerical Variables')
plt.xticks(rotation=45)
plt.yticks(rotation=0)
plt.show()

In [None]:
#for readability: 
key_vars = [
    'ag_c_combusted', 'bg_c_combusted', 'total_c_combusted',
    'prefire_sol', 'mean_residual_org_layer_depth',
    'fire_weather_index', 'drought_code', 'stand_age'
]
sns.heatmap(burned_plot_data[key_vars].corr(), annot=True, fmt=".2f", cmap="coolwarm")
plt.title('Correlation Heatmap (Key Variables)')
plt.xticks(rotation=45)
plt.show()

## Temporal Analysis

Trend of Total Carbon Combusted Over Time

In [None]:
# Temporal trend of total carbon combusted
temporal_trend = burned_plot_data.groupby('burn_year')['total_c_combusted'].mean()
temporal_trend.plot(kind='line', figsize=(10, 6))
plt.title('Trend of Total Carbon Combusted Over Time')
plt.xlabel('Year')
plt.ylabel('Average Total Carbon Combusted')
plt.show() 

Trend of Fire Weather Index Over Time

In [None]:
# Temporal trend of Fire Weather Index
fwi_trend = burned_plot_data.groupby('burn_year')['fire_weather_index'].mean()
fwi_trend.plot(kind='line', figsize=(10, 6), color='orange')
plt.title('Trend of Fire Weather Index Over Time')
plt.xlabel('Year')
plt.ylabel('Average Fire Weather Index')
plt.show()

Scatter Plot: Soil Carbon vs. Total Carbon Combusted

In [None]:
# Merge burned_plot_data and soil_profile_data on 'site'
merged_data = pd.merge(burned_plot_data, soil_profile_data, on='site', how='inner')

# Correlation between soil carbon and total carbon combusted
sns.scatterplot(x='carbon', y='total_c_combusted', data=merged_data)
plt.title('Soil Carbon vs Total Carbon Combusted')
plt.xlabel('Soil Carbon')
plt.ylabel('Total Carbon Combusted')
plt.show()

Average Stand Age by Ecoregion

In [None]:
# Average stand age by ecoregion
avg_stand_age = burned_plot_data.groupby('ecoregion_name_l2')['stand_age'].mean()
avg_stand_age.plot(kind='bar', figsize=(10, 6), color='green')
plt.title('Average Stand Age by Ecoregion')
plt.xlabel('Ecoregion')
plt.ylabel('Average Stand Age')
plt.show()

Density Plot of Fire Weather Index

In [None]:
sns.kdeplot(burned_plot_data['fire_weather_index'], fill=True, color='purple')
plt.title('Density Plot of Fire Weather Index')
plt.xlabel('Fire Weather Index')
plt.ylabel('Density')
plt.show()

Correlation Heatmap for Soil and Fire Characteristics

In [None]:
burned_plot_data.columns = burned_plot_data.columns.str.strip()
soil_profile_data.columns = soil_profile_data.columns.str.strip()
# Remove nonexistent columns
existing_columns = [col for col in relevant_columns if col in merged_data.columns]
print(f"Using columns: {existing_columns}")
corr_matrix = merged_data[existing_columns].corr()
merged_data_cleaned = merged_data[existing_columns].dropna()
corr_matrix = merged_data_cleaned.corr()
sns.heatmap(corr_matrix, annot=True, fmt=".2f", cmap="coolwarm")
plt.title('Correlation Heatmap: Soil and Fire Characteristics')
plt.show()

Burn Depth vs. Soil Carbon Content

In [None]:
sns.scatterplot(x='burn_depth', y='carbon', data=merged_data)
plt.title('Burn Depth vs Soil Carbon Content')
plt.xlabel('Burn Depth (cm)')
plt.ylabel('Soil Carbon Content')
plt.show()

Trend of Precipitation Over Time

In [None]:
# Average precipitation by year
precipitation_trend = burned_plot_data.groupby('burn_year')['precipitation'].mean()
precipitation_trend.plot(kind='line', figsize=(10, 6), color='blue')
plt.title('Trend of Precipitation Over Time')
plt.xlabel('Year')
plt.ylabel('Average Precipitation')
plt.show()

Boxplot of Total Carbon Combusted

In [None]:
sns.boxplot(x='total_c_combusted', data=burned_plot_data)
plt.title('Boxplot of Total Carbon Combusted')
plt.xlabel('Total Carbon Combusted')
plt.show()

# Conclusion

This analysis explored various aspects of the burned and unburned sites in boreal forests, focusing on carbon combustion, soil properties, and fire-related weather indices. The following key insights were derived:

1. Ecoregion Analysis: 
   - The average total carbon combusted varied significantly across different ecoregions, with the Boreal Cordillera exhibiting the highest average combustion.
   - This highlights the role of geographical and ecological factors in fire behavior.

2. Stand Age and Carbon Combustion:
   - A positive relationship between stand age and total carbon combusted was observed, suggesting that older stands may store more biomass and release more carbon during fires.

3. Temporal Trends:
   - The Fire Weather Index (FWI) showed observable trends over time, indicating increasing fire danger in specific years.
   - Total carbon combusted also demonstrated variations over the years, possibly driven by climatic and ecological changes.

4. Soil Properties and Fire Impact:
   - Soil carbon content was found to correlate with total carbon combusted, emphasizing the significant role of fire in altering soil composition.
   - Burn depth was inversely related to soil carbon content, indicating that deeper burns result in lower residual soil carbon levels.

5. Weather Factors and Fire Behavior:
   - Fire danger indices such as drought codes and precipitation levels showed a relationship with total carbon combusted, further underlining the influence of weather on fire intensity and impact.

This study provides a comprehensive view of the interaction between fire, vegetation, soil, and weather in boreal forests. By examining these factors, we gain valuable insights into the dynamics of forest fires, enabling better forest management and fire mitigation strategies.
