# Notebook 01: FIRMS Fire Detection Data Extraction (Europe)

This notebook retrieves and prepares European fire detection data from the NASA FIRMS API for downstream spatial analysis.

## Contents
1. Project setup
2. Configuration and parameters
3. FIRMS API request construction
4. Multi-year data retrieval
5. Output validation and storage

## Objectives of this notebook
- Download detection-level fire points for Europe from NASA FIRMS
- Preserve latitude and longitude for spatial mapping
- Save a multi-year raw detection dataset for:
  - Spatial fire-risk analysis
  - Overlay with protected areas
  - Future prediction modelling

## Inputs
- FIRMS API configuration JSON
- VIIRS_SNPP_SP (science-grade fire product)

## Outputs
- Raw detection dataset (`data/raw/*.csv.gz`)
- Extraction log file

## Notes
- Data are requested in multi-day windows for efficiency.
- Each detection retains its true acquisition date (`acq_date`).

## 1. Project Setup

## Import libraries

Importing the libraries required for this notebook:

- The `pathlib` library is used to manage folder structures and file paths clearly.
- `Pandas` is used for structuring and analysing tabular data.
- The `io` module is used to enable pandas to read CSV data directly from API responses without saving temporary files.
- `Matplotlib` is used for creating exploratory visualisations.
- The `requests` library is used to retrieve data from NASA’s FIRMS API.
- The `json` library is used to load API request parameters from a configuration file.
- The `datetime` module is used for handling date-based tasks when looping through time periods.
- The `time` module is used to introduce short pauses between API requests to avoid exceeding transaction limits.


In [1]:
from pathlib import Path
import pandas as pd
import io 
import matplotlib.pyplot as plt
import requests
import json
from datetime import datetime, timedelta
import time

## Project directory
The code below sets the project root directory, defines standard project folders, and loads the FIRMS API configuration file to ensure reproducible data extraction.

In [2]:
from pathlib import Path
import json

# Set project root (moves up from /notebooks)
PROJECT_DIR = Path.cwd()
if PROJECT_DIR.name == "notebooks":
    PROJECT_DIR = PROJECT_DIR.parent

print("Project root:", PROJECT_DIR)

# Define main folders
RAW_DIR = PROJECT_DIR / "data" / "raw"
PROCESSED_DIR = PROJECT_DIR / "data" / "processed"
OUTPUTS_DIR = PROJECT_DIR / "outputs"
FIGURES_DIR = OUTPUTS_DIR / "figures"
TABLES_DIR = OUTPUTS_DIR / "tables"
RESULTS_DIR = OUTPUTS_DIR / "model_results"
CONFIG_DIR = PROJECT_DIR / "config"

# Create folders if they don’t exist
for folder in [RAW_DIR, PROCESSED_DIR, FIGURES_DIR, TABLES_DIR, RESULTS_DIR, CONFIG_DIR]:
    folder.mkdir(parents=True, exist_ok=True)

# Load configuration file
config_path = CONFIG_DIR / "firms_europe_full.json"

with open(config_path, "r", encoding="utf-8") as f:
    config = json.load(f)

# Define output path for raw FIRMS data
out_path = RAW_DIR / Path(config["output_raw_csv_gz"]).name

print("Config loaded successfully")
print("Date range:", config["start_date"], "to", config["end_date"])
print("Raw data will be saved to:", out_path)



Project root: c:\Users\Surface\Documents\capstone_project
Config loaded successfully
Date range: 2020-10-01 to 2025-09-30
Raw data will be saved to: c:\Users\Surface\Documents\capstone_project\data\raw\europe_firms_viirs_snpp_sp_2020_2025.csv.gz


## 2. Configuration and parameters

## Define analysis date range

The analysis uses VIIRS_SNPP_SP (science-grade standard processing) FIRMS fire detection data.

To identify robust wildfire patterns, a **five-year period from October 2020 to September 2025** is analysed.  
This range was selected to provide:

- Multiple complete annual fire seasons for trend comparison  
- Fully published, quality-controlled satellite observations (Data from 01.09.25 onwards is unverified).
- Consistent temporal coverage across all years  
- Avoidance of data gaps affecting very recent unreleased data  

Using a multi-year window strengthens the reliability of seasonal trend detection and supports more confident interpretation of changing wildfire behaviour across Europe.


In [3]:
print("Analysis period:", config["start_date"], "to", config["end_date"])


Analysis period: 2020-10-01 to 2025-09-30


## Retrieve detection-level fire points for Europe

The FIRMS API returns a CSV containing **individual fire detections** for a given date window and region.  
Full detection records are kept (including **latitude and longitude**) so I can:

- map fires across Europe
- overlay detections with protected areas
- build spatial features for prediction modelling

A helper function is created to:

- construct the FIRMS API request URL for a given start date
- request data from the API
- load the returned CSV into a pandas DataFrame
- add basic metadata (query start date, source, bbox)
- return the **full detection dataset** (not aggregated counts)

Note: Daily fire counts can still be created later by grouping the detection data by `acq_date`.
Note: The geographic co-ordinates for Europe are set to: **-31.5, 34, 40.5, 72** for this FIRMS extraction plus both overlay datasets in further notebooks. 

In [4]:
def build_firms_url(date_str: str) -> str:
    base = config["api_base_url"].rstrip("/")
    key = config["api_key"]
    source = config["source"]
    area = config["area"]
    day_range = config["day_range"]
    return f"{base}/{key}/{source}/{area}/{day_range}/{date_str}"

## 3. FIRMS API request construction

In [5]:

def get_fire_detections(date_str: str) -> pd.DataFrame:
    """
    Fetch FIRMS detections for [date_str .. date_str + day_range - 1].
    Returns a dataframe of detection points (includes lat/lon).
    """
    url = build_firms_url(date_str)
    r = requests.get(url, timeout=120)
    text = (r.text or "").strip()

    # Added stage to counter prior FIRMS text errors
    if r.status_code != 200 or len(text) == 0 or text.lower().startswith("invalid"):
        print(f"Failed {date_str} | HTTP {r.status_code} | msg: {(text[:120] if text else 'EMPTY')}")
        return pd.DataFrame()

    # I convert the API response text into a DataFrame.
    # If no detections exist, pandas returns an empty DataFrame.
    df = pd.read_csv(io.StringIO(text))


    # Add query metadata (helpful for auditing)
    df["query_start_date"] = date_str
    df["source"] = config["source"]
    df["bbox"] = config["area"]

    return df

### Testing and Validation Steps

Before running the full data extraction process, I carried out a series of small test checks to confirm that my API connection and data-handling logic worked as expected.

I first tested the FIRMS URL builder function using a single known date to confirm that the generated request URL matched the expected format. I then sent a test API request for an individual date and checked the response status code to confirm that the connection to the FIRMS service was successful.

Next, I inspected the returned text content to verify that valid CSV data was being received. I tested how my code behaved when detections were present and when no detections were returned, confirming that pandas correctly created an empty DataFrame in the no-data case.

Finally, I verified that additional metadata columns (query date, data source, and bounding box) were being added correctly to the output DataFrame.

Once these checks were complete and the functions behaved as intended, I removed the individual test code cells and proceeded with the full automated data collection workflow.

## 4. Multi-year data retrieval & storage

### Full FIRMS Data Extraction (Multi-Year Run)

In this section, I run the full FIRMS data extraction across my configured multi-year period. My configuration file has been updated to cover the complete five-year extraction range. I have replaced the earlier test code cells with markdown to keep the notebook tidy and avoid confusion in the final workflow.

The script below loops through the date range in multi-day windows and downloads detection-level FIRMS fire points for Europe. Each API response is processed and appended into a single gzip-compressed CSV file in my raw data folder.

This output dataset will be used in the next stages of the project for:

spatial mapping of fire detections across Europe

overlaying fire locations with protected area boundaries

feature engineering and prediction modelling

In [None]:
# Full FIRMS extraction run (will not run now as API KEY has been removed from json for Capstone Project Submission).

import time
import pandas as pd

print(f"Starting FIRMS extraction: {config['start_date']} to {config['end_date']}")
print(f"Saving output to: {out_path}")

start = pd.to_datetime(config["start_date"])
end = pd.to_datetime(config["end_date"])
day_range = int(config["day_range"])

# Generate window start dates
window_starts = pd.date_range(start, end, freq=f"{day_range}D")

# Track whether header has been written
wrote_header = out_path.exists()
total_rows = 0

for dt in window_starts:
    date_str = dt.strftime("%Y-%m-%d")

    df_win = get_fire_detections(date_str)

    if df_win.empty:
        print(f"{date_str} → no data")
        continue

    df_win.to_csv(
        out_path,
        mode="a",
        index=False,
        header=not wrote_header,
        compression="gzip"
    )

    wrote_header = True
    total_rows += len(df_win)

    print(f"{date_str} → {len(df_win)} rows saved")

    time.sleep(0.8)  # small pause to respect API limits

print(f"\nExtraction complete. Total rows saved: {total_rows}")
print(f"File location: {out_path}")

## 5. Output validation

In [None]:
df_check = pd.read_csv(out_path, compression="gzip")
print("Rows loaded:", len(df_check))
print("Columns:", df_check.columns[:10])
df_check.head()


Rows loaded: 2680935
Columns: Index(['latitude', 'longitude', 'bright_ti4', 'scan', 'track', 'acq_date',
       'acq_time', 'satellite', 'instrument', 'confidence'],
      dtype='object')


Unnamed: 0,latitude,longitude,bright_ti4,scan,track,acq_date,acq_time,satellite,instrument,confidence,version,bright_ti5,frp,daynight,type,query_start_date,source,bbox
0,58.35962,12.37329,299.82,0.59,0.7,2020-10-01,0,N,VIIRS,n,2,279.14,1.05,N,2,2020-10-01,VIIRS_SNPP_SP,-11343172
1,58.65267,30.28964,296.89,0.44,0.39,2020-10-01,0,N,VIIRS,n,2,278.56,0.6,N,0,2020-10-01,VIIRS_SNPP_SP,-11343172
2,59.06306,28.13327,301.22,0.51,0.41,2020-10-01,0,N,VIIRS,n,2,280.91,0.98,N,2,2020-10-01,VIIRS_SNPP_SP,-11343172
3,59.38455,28.46527,301.06,0.5,0.41,2020-10-01,0,N,VIIRS,n,2,278.51,0.94,N,2,2020-10-01,VIIRS_SNPP_SP,-11343172
4,59.38541,28.45637,295.39,0.5,0.41,2020-10-01,0,N,VIIRS,n,2,278.94,0.94,N,0,2020-10-01,VIIRS_SNPP_SP,-11343172


## End of Notebook 01

The full five-year FIRMS detection-level fire dataset for Europe has now been successfully extracted and saved to the raw data folder.  

This dataset contains spatial fire detection points (latitude and longitude) along with temporal and intensity attributes, and forms the foundation for all subsequent analysis.

The next notebook focuses on data inspection, cleaning, and spatial exploratory analysis of these fire detections.
