# Phase 2 DS3000 Project

## Group Members: Anya Wild, Janina Kurowski, Michael Hrinda, Elana von der Heyden

### Project Phase 1

In the last few years, wildfires have starting popping up in Europe in several countries, and has ravaged many homes and land. The fires post an economic, health, and safety threat to families and individuals, and are likely an indicator of increasing global temperatures and underlying climate change.

[NASA Fire Information for Resource Management System (FIRMS)](https://www.earthdata.nasa.gov/learn/find-data/near-real-time/firms/viirs-i-band-375-m-active-fire-data) uses Visible Infrared Imaging Radiometer Suite (VIIRS) 375 m imaging to capture fires as they happen worldwide, and publishes the data for public access with the use of an API key. Although there are other sensors that NASA uses concurrently, we dicded the VIIRS software is the most accurate and up-to-date, and would servce us best for this project. 

![Comparison of daily fire spread mapped by 1km Aqua/MODIS (left), 750m VIIRS (center) and 375m VIIRS (right) data](https://www.earthdata.nasa.gov/s3fs-public/imported/Brazil_MODIS_VIIRS_1.jpg?VersionId=GCOBEkFW7SVYsKIwwssJrV_MtjTopKdn)

Pictured above is a comparison of daily fire spread mapped by 1km Aqua/MODIS (left), 750m VIIRS (center) and 375m VIIRS (right) data. As you can see, the 375m VIIRS data provides the most detailed image and coherent fire spread compared to MODIS and 750m VIIRS imaging.

Using longitudinal/latitudinal features and other features of the data published, it would be possible to use machine learning to attempt to predict where new wildfires might start.


[Hundreds of firefighters battle a deadly forest fire raging in southern Greece for the third day (AP News)](https://apnews.com/article/greece-wildfire-peloponnese-forest-cfeb415e491edbdce660490bae5aeb3f)

[Europe's wildfires in 2023 were among the worst this century, report says (Reuters)](https://www.reuters.com/world/europe/europes-wildfires-2023-were-among-worst-this-century-report-says-2024-04-10/)


The data contains the following features:

- longitude (numeric)
- latitude (numeric)
- brightness (numeric) - this represents the intensity of the fire (e.g. higher = more intense fire)
- confidence (numeric) - this represents the confidence for a pixel from the images accurately detecting an active fire
- scan (numeric) - represents the approximate size of the imaged fire
- satellite (categorical) (either collected via Aqua or Terra satellite)
- acq_date and acq_time (date & time) - could be useful for understanding wildfire detection throughout the year and how trends might change seasonally


### Interesting questions may include:
- Where are wildfires most likely to occur? Is there one area that wildfires tend to cluster in occurence?
- Would it be possible to predict approximate wildfire locations based on certain features of the data, such as scan, longitude, latitude, and brightness?
- Is either the Terra or Aqua satellite more accurate in capturing wildfire location based on confidence rating?
- Are wildfires more common during certain times of the year? can we predict wildfire location and severity based on the time of year?

### Possible ML uses:
For exploring the questions above, there are a couple of different ML concepts that may be applicable. For understanding wildfire 'hotspots', K-means clustering could possibly be used to help identify patterns in the data points. Time series analysis could be used for understanding and predicting the seasonality of wildfires. Regression could also be used to understand if geographical location / territory affects fire intensity (brightness) and whether that could be used to predict fire intensity. 

### Data Collection:

In [1]:
# takes ~3 min to run

import pandas as pd
import requests

key = '996b2af0b72c07a0264d19e7d07e2bd5'

# list of European countries to get available data
euro_c = ['AUT', 'BEL', 'BGR', 'CYP', 'CZE', 'DEU', 'DNK', 'ESP', 'EST', 'FIN', 'FRA',
 'GRC', 'HRV', 'HUN', 'IRL', 'ITA', 'LTU', 'LUX', 'LVA', 'MLT', 'NLD', 'POL',
 'PRT', 'ROU', 'SVK', 'SVN', 'SWE']

# initial df
euro_c_df = pd.DataFrame()

# loop through European countries
for country in euro_c:

    # loop through: 2014, 2018, 2022 for first of March, June, September, December
    for year in ['2014', '2016', '2018', '2020', '2022']:
        for month in ['03','06','09','12']:

            # format date and url
            date = year + '-' + month + '-01'
            url = f'https://firms.modaps.eosdis.nasa.gov/api/country/csv/{key}/VIIRS_NOAA20_NRT/{country}/1/{date}'

            # add to overall df if no error is produced and the sub df is not empty
            try:
                country_data = pd.read_csv(url)
                if country_data.empty:
                    continue
                euro_c_df = pd.concat([euro_c_df, country_data], ignore_index=True)

            except:
                # print any countries and dates with errors
                print(country + ' ' + date)

BEL 2018-12-01


### Data Cleaning:

In [2]:
# check for empty values
print(euro_c_df.isnull().values.any())

# drop columns that contain the same data for all rows (satellite, instrument, confidence, version)
euro_c_df_cleaned = euro_c_df[
    ['country_id', 'latitude','longitude','bright_ti4','bright_ti5','scan','track','acq_date','acq_time','frp','daynight']]
euro_c_df_cleaned

False


Unnamed: 0,country_id,latitude,longitude,bright_ti4,bright_ti5,scan,track,acq_date,acq_time,frp,daynight
0,AUT,47.34311,9.62378,328.50,290.20,0.53,0.42,2020-03-01,1230,4.40,D
1,AUT,47.54527,9.78854,331.00,287.00,0.54,0.42,2020-03-01,1230,5.00,D
2,AUT,47.54559,9.78841,329.30,286.50,0.54,0.42,2020-03-01,1230,3.70,D
3,AUT,48.27758,14.34202,331.80,276.30,0.59,0.53,2020-03-01,1230,3.80,D
4,AUT,48.27502,14.33618,300.40,279.20,0.58,0.70,2020-06-01,218,1.90,N
...,...,...,...,...,...,...,...,...,...,...,...
2402,SWE,58.34693,11.42657,304.60,285.45,0.50,0.41,2022-09-01,206,1.11,N
2403,SWE,58.35924,12.38133,319.60,282.55,0.53,0.42,2022-09-01,206,2.49,N
2404,SWE,58.68283,17.12900,295.36,280.27,0.48,0.48,2022-09-01,206,1.06,N
2405,SWE,58.35765,12.38209,329.25,295.88,0.44,0.38,2022-09-01,1157,4.09,D
