## Problem Statement

### Student Name: Elana von der Heyden

In the last few years, wildfires have starting popping up in Europe in several countries, and has ravaged many homes and land. Although none have been very close to my home in Munich, Germany, my family has seen some taking place in Greece, and multiple other countries have been effected. The fires post an economic, health, and safety threat to families and individuals, and are likely an indicator of increasing global temperatures and underlying climate change.

[NASA Fire Information for Resource Management System (FIRMS)](https://firms.modaps.eosdis.nasa.gov/) uses a near real-time Moderate Resolution Imaging Spectroradiometer (MODIS) (among other tools, listed in the dataframe below) to capture fires as they happen worldwide, and publishes the data for public access with the use of an API key. Using longitudinal/latitudinal features and other features of the data published, it would be possible to use machine learning to attempt to predict where new wildfires might start. 

[Hundreds of firefighters battle a deadly forest fire raging in southern Greece for the third day (AP News)](https://apnews.com/article/greece-wildfire-peloponnese-forest-cfeb415e491edbdce660490bae5aeb3f)

[Europe's wildfires in 2023 were among the worst this century, report says (Reuters)](https://www.reuters.com/world/europe/europes-wildfires-2023-were-among-worst-this-century-report-says-2024-04-10/)


The data contains the following features:

- longitude (numeric)
- latitude (numeric)
- brightness (numeric) - this represents the intensity of the fire (e.g. higher = more intense fire)
- confidence (numeric) - this represents the confidence for a pixel from the images accurately detecting an active fire
- scan (numeric) - represents the approximate size of the imaged fire
- satellite (categorical) (either collected via Aqua or Terra satellite)
- acq_date and acq_time (date & time) - could be useful for understanding wildfire detection throughout the year and how trends might change seasonally

### Interesting questions may include:
- Where are wildfires most likely to occur? Is there one area that wildfires tend to cluster in occurence?
- Would it be possible to predict approximate wildfire locations based on certain features of the data, such as scan, longitude, latitude, and brightness?
- Is either the Terra or Aqua satellite more accurate in capturing wildfire location based on confidence rating?
- Are wildfires more common during certain times of the year? can we predict wildfire location and severity based on the time of year?

### Possible ML uses:
For exploring the questions above, there are a couple of different ML concepts that may be applicable. For understanding wildfire 'hotspots', K-means clustering could possibly be used to help identify patterns in the data points. Time series analysis could be used for understanding and predicting the seasonality of wildfires. Regression could also be used to understand if geographical location / territory affects fire intensity (brightness) and whether that could be used to predict fire intensity. 

In [3]:
import pandas as pd

In [4]:
key = '996b2af0b72c07a0264d19e7d07e2bd5'

In [5]:
da_url = f'https://firms.modaps.eosdis.nasa.gov/api/data_availability/csv/{key}/all'
df = pd.read_csv(da_url)
display(df)

Unnamed: 0,data_id,min_date,max_date
0,MODIS_NRT,2024-07-01,2024-10-22
1,MODIS_SP,2000-11-01,2024-06-30
2,VIIRS_NOAA20_NRT,2019-12-04,2024-10-22
3,VIIRS_NOAA21_NRT,2024-01-17,2024-10-22
4,VIIRS_SNPP_NRT,2024-05-01,2024-10-22
5,VIIRS_SNPP_SP,2012-01-20,2024-04-30
6,LANDSAT_NRT,2022-06-20,2024-10-22
7,GOES_NRT,2022-08-09,2024-10-22
8,BA_MODIS,2000-11-01,2024-05-01


In [6]:
# The last four days of MODIS data for Greece
greece_url = f'https://firms.modaps.eosdis.nasa.gov/api/country/csv/{key}/MODIS_NRT/GRC/4'
df_greece = pd.read_csv(greece_url)
df_greece

Unnamed: 0,country_id,latitude,longitude,brightness,scan,track,acq_date,acq_time,satellite,instrument,confidence,version,bright_t31,frp,daynight
0,GRC,41.06916,26.32377,316.89,1.24,1.1,2024-10-19,837,Terra,MODIS,43,6.1NRT,289.91,16.37,D
1,GRC,40.92032,26.29618,303.05,1.08,1.04,2024-10-19,1214,Aqua,MODIS,40,6.1NRT,292.05,3.87,D
2,GRC,40.96034,24.60342,314.7,1.0,1.0,2024-10-19,1214,Aqua,MODIS,75,6.1NRT,289.14,11.87,D
3,GRC,40.9691,24.60123,304.82,1.0,1.0,2024-10-19,1214,Aqua,MODIS,41,6.1NRT,288.35,4.19,D
4,GRC,40.97202,26.3422,309.2,1.08,1.04,2024-10-19,1214,Aqua,MODIS,48,6.1NRT,291.1,7.2,D
5,GRC,41.11651,26.31739,309.53,1.08,1.04,2024-10-19,1214,Aqua,MODIS,66,6.1NRT,292.65,7.34,D
6,GRC,40.96405,26.3505,300.44,4.57,1.96,2024-10-20,917,Terra,MODIS,36,6.1NRT,288.56,24.66,D
7,GRC,40.92245,24.80128,307.61,1.08,1.04,2024-10-21,820,Terra,MODIS,55,6.1NRT,294.22,6.6,D
8,GRC,39.49736,22.17327,302.06,1.85,1.33,2024-10-21,1157,Aqua,MODIS,21,6.1NRT,289.3,5.98,D
9,GRC,41.11521,26.31254,304.38,1.09,1.04,2024-10-21,1157,Aqua,MODIS,40,6.1NRT,292.71,3.96,D


In [7]:
# The last 3 days of all of mainland Europe (based on bounding region coordinates 36,-10,40,71)
area_url = f'https://firms.modaps.eosdis.nasa.gov/api/area/csv/{key}/VIIRS_NOAA20_NRT/36,-10,40,71/3'
df_area = pd.read_csv(area_url)

df_area

Unnamed: 0,latitude,longitude,bright_ti4,scan,track,acq_date,acq_time,satellite,instrument,confidence,version,bright_ti5,frp,daynight
0,56.31750,36.46303,297.07,0.48,0.65,2024-10-20,113,N20,VIIRS,n,2.0NRT,272.34,0.65,N
1,59.13087,37.78569,304.77,0.40,0.60,2024-10-20,113,N20,VIIRS,n,2.0NRT,274.00,1.22,N
2,59.13162,37.82133,298.24,0.40,0.60,2024-10-20,113,N20,VIIRS,n,2.0NRT,276.61,0.75,N
3,59.14788,37.85181,305.26,0.40,0.60,2024-10-20,113,N20,VIIRS,n,2.0NRT,279.32,2.07,N
4,59.14828,37.86960,301.16,0.40,0.60,2024-10-20,113,N20,VIIRS,n,2.0NRT,279.12,1.56,N
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3126,-3.90197,37.05228,345.02,0.75,0.77,2024-10-22,1151,N20,VIIRS,n,2.0NRT,296.58,12.09,D
3127,-3.90076,37.05918,337.32,0.75,0.77,2024-10-22,1151,N20,VIIRS,n,2.0NRT,297.54,13.99,D
3128,-3.06336,36.03642,333.96,0.66,0.73,2024-10-22,1151,N20,VIIRS,n,2.0NRT,301.22,3.52,D
3129,-2.09234,36.80939,338.34,0.77,0.78,2024-10-22,1151,N20,VIIRS,n,2.0NRT,296.67,16.15,D


In [9]:
df_area.corr(numeric_only=True)

Unnamed: 0,latitude,longitude,bright_ti4,scan,track,acq_time,bright_ti5,frp
latitude,1.0,-0.00508,-0.336589,-0.118848,-0.173112,0.0854,-0.4131,-0.140873
longitude,-0.00508,1.0,-0.062594,-0.500463,-0.433395,-0.065737,0.055798,-0.118785
bright_ti4,-0.336589,-0.062594,1.0,0.12702,0.222474,-0.450898,0.466298,0.361319
scan,-0.118848,-0.500463,0.12702,1.0,0.797265,-0.045725,-0.158143,0.212742
track,-0.173112,-0.433395,0.222474,0.797265,1.0,-0.122244,-0.280838,0.169757
acq_time,0.0854,-0.065737,-0.450898,-0.045725,-0.122244,1.0,-0.312076,-0.131008
bright_ti5,-0.4131,0.055798,0.466298,-0.158143,-0.280838,-0.312076,1.0,0.2124
frp,-0.140873,-0.118785,0.361319,0.212742,0.169757,-0.131008,0.2124,1.0


In [18]:
import plotly.express as px

px.scatter(data_frame=df_area, x='longitude', y='latitude', color='bright_ti5')