<a href="https://colab.research.google.com/github/boroju/aidl-upc-winter2024-satellite-imagery/blob/main/notebooks/segementation_model_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Download for Paradise (California) and Cameron Peak (Colorado) Wildfires

Welcome to this notebook dedicated to the essential steps involved in downloading geospatial data from Google Earth Engine for the wildfires in Paradise, California, and Cameron Peak, Colorado. These wildfires have been integral to our deep learning segmentation model for wildfire prediction.

## 🚀 Purpose of This Notebook

The primary objective of this notebook is to guide you through the process of acquiring satellite imagery and related data corresponding to the specified wildfire locations. The data retrieved from Google Earth Engine plays a critical role in training and validating our wildfire prediction model, ensuring accurate and informed insights.

**Wildfires of Interest:**

1. **Paradise, California:** Explore the steps to obtain high-resolution aerial imagery and associated datasets for the Paradise wildfire. The unique characteristics of this region pose challenges that our segmentation model is adept at addressing.

2. **Cameron Peak, Colorado:** Dive into the process of downloading satellite data specific to the Cameron Peak wildfire. The mountainous terrain and distinct environmental factors contribute to the complexity of this dataset.


## 🌐 Geospatial Data Challenges

Utilizing data from the National Agriculture Imagery Program (NAIP) and MOD14A2.061: Terra Thermal Anomalies & Fire 8-Day Global 1km, processed within Google Earth Engine (GEE), we confront the distinctive challenges posed by working with geospatial data. The journey involves traversing complexities from data acquisition to preprocessing, augmentation, and splitting for model training. This section aims to shed light on the nuanced intricacies inherent in handling large-scale, multi-spectral, and temporal satellite datasets, accentuating the need for careful consideration and adept methodologies in the realm of deep learning applications.

**1. Multispectral Complexity:**
   - Satellite images often consist of multiple spectral bands, each capturing information at distinct wavelengths. Managing and interpreting this multispectral complexity is essential for meaningful analysis.

**2. Spatial and Temporal Variability:**
   - Geospatial datasets exhibit significant variability in both spatial and temporal dimensions. Different satellites offer varying spatial resolutions, and temporal components, such as revisits, add an extra layer of complexity.

**3. File Formats and Libraries:**
   - Datasets are distributed in diverse file formats like GeoTIFF. Handling this formats requires specialized libraries to ensure proper loading and processing.

**4. Coordinate Reference Systems (CRS):**
   - Earth's 3D nature mandates projection onto a 2D representation. Aligning images with different CRS, along with the challenge of equal area projections, demands careful consideration to avoid misalignments.

**5. Large Image Sizes:**
   - Satellite images can be extensive, with dimensions like 10K x 10K pixels. Processing such large images poses computational challenges, necessitating strategies like tiling or patch extraction.

**6. Data Labeling:**
   - Manual labeling of wildfire-affected and non-affected areas involves careful examination of imagery, adding a subjective layer to the process. Achieving consistency in labeling is crucial for model training.



## Environmnet setup

In [16]:
%pip install geedim -q
%pip install geojson -q
%pip install geopandas -q

In [2]:
import numpy as np
import geojson
import geopandas as gpd
import os
import geemap

from google.colab import drive
drive.mount('/content/drive')

from google.colab import drive
drive.mount('/content/drive')

import ee

try:
        ee.Initialize(project='ee-joseramoncajide')
except Exception as e:
        ee.Authenticate()
        ee.Initialize(project='ee-joseramoncajide')

Mounted at /content/drive
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Code for plotting the wild fires

In [3]:
def plot_map(fire_coordinates, no_fire_coordinates, fire_start, fire_end):

  fire_start = ee.Date(fire_start)
  fire_end = ee.Date(fire_end)

  # Define a function to create a bounding box around a point
  def bounding_box_func(feature):
      intermediate_buffer = feature.buffer(5000)  # Buffer radius, half your box width in meters
      intermediate_box = intermediate_buffer.bounds()  # Draw a bounding box around the circle
      return intermediate_box  # Return the bounding box

  # Create a list of features from the coordinates
  def create_point_feature(coord):
      lon, lat = coord
      point = ee.Geometry.Point(lon, lat)
      return ee.Feature(point)

  fire_point_list = list(map(create_point_feature, fire_coordinates))
  no_fire_point_list = list(map(create_point_feature, no_fire_coordinates))

  # Create a FeatureCollection from the lists of coordinates
  fire_point_collection = ee.FeatureCollection(fire_point_list)
  no_fire_point_collection = ee.FeatureCollection(no_fire_point_list)

  # Apply the bounding box function to create bounding boxes around each point
  fire_bounding_boxes = fire_point_collection.map(bounding_box_func)
  no_fire_bounding_boxes = no_fire_point_collection.map(bounding_box_func)

  # Print the bounding boxes
  # print("Fire Bounding Boxes:", fire_bounding_boxes.getInfo())
  # print("No Fire Bounding Boxes:", no_fire_bounding_boxes.getInfo())

  # Create a Map
  Map = geemap.Map()

  # Add MODIS FireMask
  modis_col = ee.ImageCollection('MODIS/061/MCD64A1').filter(ee.Filter.date(fire_start, fire_end))
  burnedArea = modis_col.select('BurnDate')
  burnedAreaVis = {
      'min': 30.0,
      'max': 341.0,
      'palette': ['4e0400', '951003', 'c61503', 'ff1901'],
  }
  Map.addLayer(burnedArea, burnedAreaVis, 'Burned Area')

  # Convert the bounding boxes to a client-side object for visualization
  fire_boxes_client = fire_bounding_boxes.getInfo()
  no_fire_boxes_client = no_fire_bounding_boxes.getInfo()

  # Add the bounding boxes to the map {color: '#FECA1E', fillColor: '#4c4cff'}
  Map.addLayer(geemap.geojson_to_ee(fire_boxes_client), {'color': '#FFA500', 'fillColor': '#FFA500'}, 'Fire Bounding Boxes')
  Map.addLayer(geemap.geojson_to_ee(no_fire_boxes_client), {'color': '00FF00', 'fillColor': '#00FF00'}, 'No Fire Bounding Boxes')

  # Center the map
  Map.centerObject(fire_point_collection, zoom=8)

  # Display the map
  return Map


## 🌍 Data Selection Process: Manual Annotation in Google Earth Engine

In this section, we shed light on the meticulous process of data selection, where the manual annotation of wildfire-affected and non-affected areas was carried out within Google Earth Engine (GEE). This crucial step involved the visual exploration of the BurnDate mask corresponding to MODIS/061/MCD64A1, which provides essential information about fire occurrences.

**BurnDate Mask Exploration:**

The BurnDate mask, derived from MODIS/061/MCD64A1, represents the temporal distribution of burned areas. By visualizing this mask within GEE, we gained valuable insights into the spatiotemporal patterns of wildfires. The mask highlights regions affected by fire events, forming the basis for subsequent manual annotation.

![Paradise Wild Fire](https://github.com/boroju/aidl-upc-winter2024-satellite-imagery/blob/main/notebooks/jose/assets/paradise_wildfire.png?raw=true)

**Manual Annotation:**

With the aid of GEE's visualization capabilities, 145 points corresponding to wildfire-affected areas and 182 points representing non-affected areas were manually annotated. This meticulous process involved zooming into the regions of interest and labeling points based on the BurnDate mask observations.

**Technical Considerations:**

- **Annotation Consistency:** The manual annotation aimed for consistency, ensuring that labeled points accurately represented the presence or absence of wildfires. This step is crucial for the model to learn from reliable ground truth data.

- **Balanced Dataset:** Care was taken to maintain a balanced dataset by selecting an appropriate number of points from both wildfire-affected and non-affected areas. This balance contributes to robust model training and helps avoid biases.

**Visual Representation:**
  
Here is a visual representation of the annotated points for the Paradise wildfire:
  
![Paradise data points](https://github.com/boroju/aidl-upc-winter2024-satellite-imagery/blob/main/notebooks/jose/assets/paradise_wildfire_data_points.png?raw=true)

These annotated points serve as the labeled dataset for training, validating, and testing the wildfire prediction model. The manual curation ensures the model's exposure to accurate and representative instances, contributing to the reliability and effectiveness of the subsequent deep learning segmentation model.

### Annotated points

In [4]:
colorado_no_fire_coordinates=np.array([[-105.32243760850653, 40.69596972749644], [-106.08049424913152, 40.55004201021333], [-106.26726182725652, 40.51246621930688], [-106.25078233506902, 40.38287725980034], [-106.41557725694403, 40.5959393594144], [-106.11894639756903, 40.74176686525013], [-105.94865831163153, 40.80832463070462], [-105.72893174913153, 40.883122399211715], [-105.56413682725653, 40.90803622631216], [-105.26201280381903, 40.407978493015975], [-105.22905381944403, 40.328459157067854], [-105.46525987413153, 40.328459157067854], [-105.82231553819402, 40.412161122097416], [-106.05852159288153, 40.474869360207734], [-106.91545518663153, 40.46651164195792], [-107.03630479600652, 40.5959393594144], [-107.07475694444403, 40.75009023230904], [-106.88249620225653, 40.96613199742417], [-106.61333116319402, 40.88727535528754], [-106.58586534288153, 40.70846298060243], [-106.76713975694402, 40.84157850575792], [-106.65178331163153, 40.533344259989924], [-106.71220811631903, 40.483226038269365], [-106.42656358506902, 41.007597691365106], [-106.53093370225653, 41.14424943492612], [-106.16289171006902, 40.97857444339958], [-105.88278631704806, 41.2494030635121], [-105.8141217662668, 40.80859849086779], [-106.40463690298554, 40.88131938326643], [-106.41836981314181, 40.34967013933125], [-106.5666852428293, 40.498122859254494], [-106.49252752798554, 40.73579781673847], [-106.5007672740793, 40.856395495381314], [-106.1711774303293, 40.8439300335939], [-106.04208807486056, 40.90415803877896], [-105.9239850475168, 40.941513377087304], [-105.4405866100168, 40.88131938326643], [-105.39938787954806, 40.80651957863989], [-105.65207342642306, 40.941513377087304], [-106.17392401236054, 40.537794062472614], [-106.27829412954806, 40.70457311229531], [-106.28378729361054, 40.80651957863989], [-106.36069159048556, 40.70665521493373], [-106.14096502798556, 39.90866890528188], [-105.97617010611054, 39.91920217251271], [-105.91849188345431, 39.85386980595421], [-106.1546979381418, 39.83067238963683], [-105.83609442251681, 39.771589063211785]])

In [5]:
colorado_fire_coordinates=np.array([[-105.55589708116277, 40.62408805358858], [-105.71519883897528, 40.51559833996386], [-105.86626085069402, 40.59489658748228], [-105.76738389756903, 40.66785134080637], [-105.73991807725652, 40.63034170866966], [-105.65477403428778, 40.605323574924086], [-105.46800645616278, 40.54482441039179], [-105.42680772569402, 40.488448433815876], [-105.32243760850653, 40.49053727825598], [-105.59984239366277, 40.6115789866961], [-105.54216417100653, 40.59489658748228], [-106.77812608506903, 40.65118299614114], [-106.76713975694402, 40.61783381292418], [-106.73418077256902, 40.59072533711839], [-106.31395372178777, 40.9485011822901], [-106.25078233506902, 40.965095021007095], [-106.19035753038152, 41.054215426352506], [-106.08598741319402, 41.10390462417455], [-106.12718614366277, 41.19077031535627], [-106.17387803819402, 41.15976006494941], [-106.24528917100653, 41.1349412955242], [-106.34691270616277, 41.20110380308646], [-106.42107042100653, 41.215567945250264], [-106.43754991319403, 41.137009885106785], [-106.41283067491278, 41.093555805778344], [-106.31395372178777, 41.114251811984545], [-106.28374131944402, 41.07492382384571], [-106.36888536241278, 41.062499567880764], [-106.31944688585027, 41.04178725783712], [-106.16563829210028, 41.14114686857016], [-106.25352891710027, 41.05835762753501], [-105.67674669053777, 40.70950397925226], [-105.58610948350652, 40.734483064563484], [-105.61906846788153, 40.70950397925226], [-105.67125352647528, 40.69076351378151], [-105.63280137803777, 40.6824327256989], [-106.03929551866277, 39.84832057056646], [-106.00084337022528, 39.78503132029205], [-105.34441026475652, 40.157594842454124], [-107.15990098741277, 39.628668753029785], [-107.25877794053778, 39.60327927782878], [-107.20659288194402, 39.571529339012116], [-108.67876085069403, 39.46877166186242], [-108.55791124131903, 39.481492191503314], [-108.45354112413152, 39.42635309942997], [-108.53593858506903, 39.34993458613414], [-108.66228135850652, 39.42635309942997], [-108.60185655381903, 39.332941331024294]])

In [6]:
california_no_fire_coordinates=np.array(( [[-121.85993896484375, 39.691345948326905], [-121.80500732421875, 39.65435007197939], [-121.76655517578125, 39.620507934098555], [-121.97529541015625, 39.681834615387814], [-121.75556884765625, 39.865488095353165], [-121.66630493164062, 39.90974481056605], [-121.599013671875, 39.97291911943351], [-121.51386962890625, 39.97607630368619], [-121.4287255859375, 39.97397153038964], [-121.34495483398437, 39.952920232420816], [-121.2749169921875, 39.91711815190346], [-121.5166162109375, 39.5654791496585], [-121.51386962890625, 39.42453364918935], [-121.63197265625, 39.408619611696544], [-121.81599365234375, 39.37571908969214], [-121.92585693359375, 39.41922937368349], [-122.07279907226562, 39.639546174127624], [-123.86854642948475, 39.89528279436593], [-124.00999540409413, 40.0342186293671], [-123.96330350956288, 40.16027882735746], [-123.88639921268788, 40.19385543181088], [-123.92485136112538, 40.00056314355636], [-123.80674833378163, 39.82359963181943], [-123.75318998417225, 39.72121615770172], [-123.7119912537035, 39.60069460850108], [-123.68589872440663, 39.512814285258216], [-123.52934354862538, 39.54247310342824], [-123.58839506229725, 39.723328694540626], [-123.66529935917225, 39.89949711891423], [-123.75318998417225, 40.0373729801269], [-123.85618681034413, 40.10358063803625], [-123.72709745487538, 40.21692970887466], [-123.65568632206288, 40.16027882735746], [-123.60624784550038, 40.02265142834034], [-123.508744183391, 39.90687156325932], [-123.43046659550038, 39.80355713046257], [-123.36180204471913, 39.6990306137821], [-123.30961698612538, 39.55729776160525], [-123.21074003300038, 39.77401016130511], [-123.26567167362538, 39.92161807074487], [-123.48814481815663, 40.167625129255825], [-123.18876737675038, 39.47678297697318], [-123.57054227909413, 39.448156577080454], [-121.33977939187842, 39.43718616384962], [-121.16399814187842, 39.48807742027464], [-121.02117587625342, 39.55799212336968], [-120.90581943094092, 39.62572096646533], [-120.82616855203467, 39.699723554607615], [-120.65038730203467, 39.779979470823044], [-120.59820224344092, 39.938104918060766], [-120.75201083719092, 39.99494085966642], [-120.91131259500342, 40.0517295408251], [-121.01293613015967, 39.95495012308857], [-121.05138827859717, 39.91914910474144], [-121.16125155984717, 39.921255565427415], [-121.08709384500342, 40.0790553787244], [-121.21618320047217, 40.10847094509346], [-121.38647128640967, 40.171461481880826], [-121.53204013406592, 40.15257045275016], [-121.67760898172217, 40.13577401423981], [-121.75725986062842, 40.05803647625943], [-121.19421054422217, 39.40536023398564], [-121.05138827859717, 39.44142852313434], [-120.93328525125342, 39.54740351101674], [-120.82342197000342, 39.56434451502359], [-120.74926425515967, 39.67224613017621], [-120.75475741922217, 39.809523876628106], [-120.57073642312842, 39.893866522458325], [-123.23492099344092, 39.67647413794972], [-123.28985263406592, 39.460515942305676], [-123.52056552469092, 39.43294354618846], [-123.73754550515967, 39.379889027988995], [-123.60021640359717, 39.661674978575334], [-123.63592197000342, 40.31192942779252], [-123.55901767312842, 40.47509195744745], [-123.69360019265967, 40.4270210332179], [-123.81444980203467, 40.38937632651852], [-123.64416171609717, 40.61492860599191], [-123.37499667703467, 40.635774704235004], [-123.32006503640967, 40.56278489753336], [-123.22118808328467, 40.45419578083658], [-123.08660556375342, 40.45001576543892], [-123.07836581765967, 40.59824704292653], [-122.96575595437842, 40.500158787751324], [-122.85589267312842, 40.48344860780257], [-123.08385898172217, 40.83348800494938], [-123.46288730203467, 40.74406950774432], [-123.79797030984717, 40.72325725320105], [-124.10152905372668, 40.321400231046994], [-124.16470044044543, 40.193541295884586], [-124.23336499122668, 40.342337657827144], [-123.96969311622668, 40.33396346662936], [-123.94497387794543, 40.524219075314456], [-123.80489819435168, 40.634779171197714], [-124.07406323341418, 40.67228630966529], [-123.21712963966418, 39.38312091733061], [-123.39016430763293, 39.30240450033656], [-123.41763012794543, 39.497662789771326], [-123.55495922950793, 39.30452981067661], [-123.64010327247668, 39.427687360530456], [-123.69228833107043, 39.29390261364971], [-123.47256176857043, 39.16625045103867], [-123.30776684669543, 39.20882697083235], [-123.63186352638293, 39.19605672314975], [-123.60439770607043, 39.02983232871187], [-123.34621899513293, 39.04903207395762], [-123.24184887794543, 39.11512458725789], [-123.46981518653918, 38.96152432199161], [-123.51925366310168, 38.869631640550345], [-123.25283520607043, 38.86107743586702], [-123.22536938575793, 39.01062736650262], [-122.94521801857043, 38.91025005559934], [-123.12649243263293, 38.854661106931935], [-123.40115063575793, 38.867493185871574], [-123.30502026466418, 38.76477173000658], [-120.51723950294543, 38.6047841403534], [-120.51174633888293, 38.757014322122515], [-120.34695141700793, 38.82979857636388], [-120.22335522560168, 38.83193816449679], [-120.21786206153918, 38.73987781560972], [-120.34145825294543, 38.67343500915181], [-120.29476635841418, 38.570434051773276], [-120.45406811622668, 38.51028186442572], [-120.41561596778918, 38.407046834111], [-120.18215649513293, 38.51458011740762], [-120.19588940528918, 38.39197938415092], [-120.02834790138293, 38.338141430479105], [-120.17666333107043, 38.20876702444559], [-120.60787670997668, 38.490936550741495], [-120.66830151466418, 38.71845139861011], [-120.78365795997668, 38.80197807878271], [-120.74245922950793, 38.964466987219716], [-120.51998608497668, 39.08608967761208], [-120.55294506935168, 38.95592420554094]]))

In [7]:
california_fire_coordinates = np.array( [[-121.69102416992187, 39.76665054706106], [-121.6704248046875, 39.74975906209016], [-121.65669189453125, 39.73708772972515], [-121.64570556640625, 39.71596366466098], [-121.6264794921875, 39.71173807504374], [-121.60038696289062, 39.69271971858151], [-121.599013671875, 39.673696121027014], [-121.54682861328125, 39.724414067312416], [-121.577041015625, 39.729695043137625], [-121.61, 39.74553554354117], [-121.63609252929687, 39.77509473591258], [-121.60862670898437, 39.78248255145433], [-121.58390747070312, 39.80991606524739], [-121.577041015625, 39.7867038042539], [-121.55918823242187, 39.76981723926326], [-121.54545532226562, 39.69588980864089], [-121.54545532226562, 39.69588980864089], [-121.49189697265625, 39.70645572401305], [-121.511123046875, 39.75292653108903], [-121.4726708984375, 39.75292653108903], [-121.4781640625, 39.80358622561076], [-121.53172241210937, 39.80569623690878], [-121.53172241210937, 39.80569623690878], [-121.55918823242187, 39.83101132156629], [-121.51249633789062, 39.843665366807244], [-121.39439331054687, 39.83101132156629], [-121.51386962890625, 39.740255781265844], [-121.405361315643, 39.62580710803471], [-121.3559228390805, 39.606765087325805], [-121.2460595578305, 39.63215328412945], [-121.26940550509613, 39.593009259116606], [-121.295498034393, 39.66176107536845], [-121.34356321993988, 39.665989724821614], [-121.31335081759613, 39.64484388995978], [-121.32983030978363, 39.742061157676815], [-121.33807005587738, 39.79061843717199], [-121.29412474337738, 39.85601101092627], [-121.34768309298676, 39.85390254451801], [-121.3119775265805, 39.82648658726979], [-121.38750853243988, 39.801169835652864], [-121.43694700900238, 39.780065420043464], [-121.3449365109555, 39.72621985867283], [-121.29275145236176, 39.721994897425965], [-121.25567259493988, 39.813829377060884], [-121.240566393768, 39.74417305580157], [-121.22820677462738, 39.5983002855989], [-121.14855589572113, 39.63321092355171], [-121.20348753634613, 39.71460059234517], [-121.24193968478363, 39.6913561870348], [-121.1911279172055, 39.66176107536845], [-121.09362425509613, 39.74628488919979], [-121.08401121798676, 39.70509246382162], [-121.17190184298676, 39.79061843717199], [-121.28176512423676, 39.784286821125576], [-120.90411009493988, 39.85917358908918], [-120.867031237518, 39.88657649455318], [-120.9164697140805, 39.83281431811572], [-120.9494286984555, 39.80327992110169], [-120.98376097384613, 39.776899199285005], [-121.04143919650238, 39.74100518434228], [-121.07851805392426, 39.695583024919046], [-121.13894285861176, 39.69875298347924], [-121.1361962765805, 39.777954622389686], [-121.0373193234555, 39.79905968545377], [-121.09087767306488, 39.82015827357699], [-121.1142236203305, 39.6913561870348], [-121.17602171603363, 39.76951078437701], [-121.29961790743988, 39.583484393027646], [-122.67432521533992, 39.149260821431305], [-122.67432521533992, 39.149260821431305], [-122.81165431690242, 39.26418600061108], [-122.90503810596492, 39.27694385485862], [-122.78968166065242, 39.4892326327325], [-122.69080470752742, 39.667055885854616], [-122.88855861377742, 39.56126366673013], [-122.92151759815242, 39.743126143973065], [-122.78418849658992, 39.8823703864065], [-122.98194240283992, 39.94978118098179], [-123.19068263721492, 40.04656792015103], [-123.03687404346492, 40.13481858107534], [-123.11377834033992, 40.04236265385457], [-123.17420314502742, 40.31097647583238], [-123.28955959033992, 40.2565003646542], [-123.28955959033992, 40.13901813907329], [-122.93250392627742, 39.932934703108536], [-122.78418849658992, 39.76001925544739], [-122.76770900440242, 39.709327494071296], [-122.98194240283992, 39.62898931292536], [-122.81165431690242, 39.24717191540899], [-122.71277736377742, 39.29395071391799], [-122.77869533252742, 39.213131359885736], [-122.77320216846492, 39.16629863888204], [-122.69629787158992, 39.63237380530314], [-122.77869533252742, 39.65352345390929], [-122.70179103565242, 39.57735445134847], [-123.00391505908992, 39.5731203827938], [-122.94898341846492, 39.539238524846404]])

In [8]:
colorado = np.concatenate((colorado_no_fire_coordinates, colorado_fire_coordinates), axis=0)
california = np.concatenate((california_no_fire_coordinates, california_fire_coordinates), axis=0)

no_fire_coordinates = np.concatenate((california_no_fire_coordinates, colorado_no_fire_coordinates), axis=0)
fire_coordinates = np.concatenate((california_fire_coordinates, colorado_fire_coordinates), axis=0)

In [9]:
print(f' * colorado_no_fire_coordinates = {len(colorado_no_fire_coordinates)}')
print(f' * colorado_fire_coordinates = {len(colorado_fire_coordinates)}')
print(f' * california_no_fire_coordinates = {len(california_no_fire_coordinates)}')
print(f' * california_fire_coordinates = {len(california_fire_coordinates)}')
print()
print(f' * colorado = {len(colorado)}')
print(f' * california = {len(california)}')
print()
print(f' * burned areas = {len(fire_coordinates)}')
print(f' * not burned areas = {len(no_fire_coordinates)}')

 * colorado_no_fire_coordinates = 48
 * colorado_fire_coordinates = 48
 * california_no_fire_coordinates = 134
 * california_fire_coordinates = 97

 * colorado = 96
 * california = 231

 * burned areas = 145
 * not burned areas = 182


### California Paradise Wildfire

You can see how data was processed in Google Earth Engine with [this link](https://code.earthengine.google.com/?scriptPath=users%2Fjrcf%2Fupc_project%3Aparadise_wildfire_datapoints).

In [10]:
# California Paradise Wildfire
plot_map(california_fire_coordinates, california_no_fire_coordinates, '2018-06-02', '2020-09-30')

Map(center=[39.72495794572433, -121.80441055370922], controls=(WidgetControl(options=['position', 'transparent…

### Colorado Cameron Peak Fire


In [11]:
# Colorado Cameron Peak Fire
plot_map(colorado_fire_coordinates, colorado_no_fire_coordinates, '2020-08-01', '2020-10-30')

Map(center=[40.535832827424606, -106.40831597615868], controls=(WidgetControl(options=['position', 'transparen…

## Data Download Process

**1. Exporting Annotated Data from Google Earth Engine**

The manually annotated wildfire data, consisting of 145 wildfire-affected points and 182 non-affected points, was exported from Google Earth Engine (GEE) in CSV format. The exported file includes geographic coordinates (latitude and longitude) corresponding to each annotated point.

**2. Conversion to NumPy Array**

Following the export, the CSV file underwent processing to convert the annotated data into a NumPy array. This format facilitates efficient handling and manipulation within a Python environment.

**3. GeoJSON Conversion and GeoPandas Integration**

For enhanced geospatial capabilities, the NumPy array data was further transformed into GeoJSON format. GeoJSON, a standard for encoding geographic data structures, was then integrated with GeoPandas.

**4. Simplifying Data Iteration with GeoPandas**

The conversion to GeoJSON and integration with GeoPandas streamline the iteration and manipulation of geospatial data. GeoPandas offers convenient tools for working with geometric objects, simplifying the process of handling and downloading the annotated data.

By adopting this approach, the annotated data is prepared for subsequent steps in the wildfire prediction model pipeline, ensuring accessibility and efficiency in further analyses.

In [12]:
no_fire_coordinates = no_fire_coordinates.tolist()
fire_coordinates = fire_coordinates.tolist()

# Create a GeoJSON FeatureCollection with attributes for "no_fire"
no_fire_features = []
for coord in no_fire_coordinates:
    point = geojson.Point(coord)
    feature = geojson.Feature(geometry=point, properties={"label": "no_fire"})
    no_fire_features.append(feature)

# Create a GeoJSON FeatureCollection with attributes for "fire"
fire_features = []
for coord in fire_coordinates:
    point = geojson.Point(coord)
    feature = geojson.Feature(geometry=point, properties={"label": "fire"})
    fire_features.append(feature)

# Combine both FeatureCollections
all_features = no_fire_features + fire_features
feature_collection = geojson.FeatureCollection(all_features)

# Convert to GeoJSON string
geojson_str = geojson.dumps(feature_collection, sort_keys=True)

print(geojson_str)

# Save to the same file
with open("data_points.json", "w") as geojson_file:
    geojson_file.write(geojson_str)

print("GeoJSON with attributes for both 'no_fire' and 'fire' saved to data_points.json")

{"features": [{"geometry": {"coordinates": [-121.859939, 39.691346], "type": "Point"}, "properties": {"label": "no_fire"}, "type": "Feature"}, {"geometry": {"coordinates": [-121.805007, 39.65435], "type": "Point"}, "properties": {"label": "no_fire"}, "type": "Feature"}, {"geometry": {"coordinates": [-121.766555, 39.620508], "type": "Point"}, "properties": {"label": "no_fire"}, "type": "Feature"}, {"geometry": {"coordinates": [-121.975295, 39.681835], "type": "Point"}, "properties": {"label": "no_fire"}, "type": "Feature"}, {"geometry": {"coordinates": [-121.755569, 39.865488], "type": "Point"}, "properties": {"label": "no_fire"}, "type": "Feature"}, {"geometry": {"coordinates": [-121.666305, 39.909745], "type": "Point"}, "properties": {"label": "no_fire"}, "type": "Feature"}, {"geometry": {"coordinates": [-121.599014, 39.972919], "type": "Point"}, "properties": {"label": "no_fire"}, "type": "Feature"}, {"geometry": {"coordinates": [-121.51387, 39.976076], "type": "Point"}, "properties"

You can preview de data points [here](https://github.com/boroju/aidl-upc-winter2024-satellite-imagery/blob/main/data/data_points.json).

In [13]:
# Load GeoJSON file
geojson_path = "data_points.json"
gdf = gpd.read_file(geojson_path)

print(gdf.shape)
gdf.head()

(327, 2)


Unnamed: 0,label,geometry
0,no_fire,POINT (-121.85994 39.69135)
1,no_fire,POINT (-121.80501 39.65435)
2,no_fire,POINT (-121.76655 39.62051)
3,no_fire,POINT (-121.97530 39.68183)
4,no_fire,POINT (-121.75557 39.86549)


In [14]:
# Code to test the donwload only on a few data points
gdf = gdf.groupby('label', group_keys=True, sort=True).head(2).apply(lambda x: x.reset_index(drop = True))
gdf

Unnamed: 0,label,geometry
0,no_fire,POINT (-121.85994 39.69135)
1,no_fire,POINT (-121.80501 39.65435)
2,fire,POINT (-121.69102 39.76665)
3,fire,POINT (-121.67042 39.74976)


In [15]:
out_dir = '/content/drive/My Drive/upc_project_dataset/test'
print('* out_dir:', out_dir)

* out_dir: /content/drive/My Drive/upc_project_dataset/test


In [16]:
%mkdir -p "{out_dir}"

### Retrieving National Agriculture Imagery Program (NAIP) satellite images and feature engenieering

The provided code is designed to retrieve and process National Agriculture Imagery Program (NAIP) satellite images for a specified location and time range. Let's break down the critical steps conceptually:

1. **Definition of Spectral Indices:**
   - The `addIndices` function is defined to calculate spectral indices, specifically the Normalized Difference Vegetation Index (NDVI) and the Normalized Difference Water Index (NDWI). These indices provide valuable information about vegetation and water content in the imagery.

2. **Image Retrieval and Region of Interest (ROI) Specification:**
   - The `get_raster` function takes as input a point location (latitude and longitude), a start date, and an end date. It defines a circular area of interest around the specified point with a size of 50 square kilometers. The selected region is then used to filter the NAIP image collection based on location and date.

3. **Image Collection Filtering and Selection:**
   - The code filters the NAIP image collection using the specified bounds and date range. It selects only the relevant spectral bands ('R', 'G', 'B', 'N') needed for subsequent analysis.

4. **Mapping Spectral Indices:**
   - The `addIndices` function is mapped over the image collection using the `map` function. This applies the defined spectral index calculations to each image in the collection, enhancing each image with additional bands for NDVI and NDWI.

5. **Mosaic Creation:**
   - The code then creates a mosaic of the processed images with NDVI and NDWI. The `max()` function is used to select the pixel-wise maximum values across the entire image collection. The resulting mosaic image is clipped to the specified region of interest.

In summary, the critical steps involve defining spectral indices, filtering and selecting relevant NAIP images, mapping spectral indices over the collection, and creating a mosaic of processed images for the specified region and time range. This process ensures that the selected NAIP images are enriched with additional spectral information, setting the stage for subsequent analysis and model training.

In [17]:
# Function to calculate NDWI for an image.
def addNDWI(img):
    ndwi = img.normalizedDifference(['G', 'N']).rename('ndwi')
    return ndwi

# Define the function to calculate NDVI and NDWI, and add them to the image
def addIndices(image):
    ndvi = image.normalizedDifference(['N', 'R']).rename('NDVI')
    ndwi = image.normalizedDifference(['G', 'N']).rename('NDWI')  # Calculate NDWI
    return image.addBands([ndvi, ndwi])

# Define the function to calculate NDVI
def addNDVI(image):
    ndvi = image.normalizedDifference(['N', 'R']).rename('NDVI')
    return image.addBands(ndvi)

def get_raster(point, start_date, end_date):

  start = ee.Date(start_date)
  end = ee.Date(end_date)

  print(f"* Processing point: {tuple(point)}")

  point = ee.Geometry.Point(tuple(point))
  area = 50e6  # (50 square kilometers)
  region = point.buffer(ee.Number(area).sqrt().divide(2), 1).bounds()
  print('area (m2)', region.area(1).getInfo())
  print('error (m2)', region.area(1).subtract(area).getInfo())

  collection = (
    ee.ImageCollection('USDA/NAIP/DOQQ')
    .filterBounds(region)
    .filterDate(start, end)
    .select(['R', 'G', 'B', 'N'])
  )

  # Get the number of images in the collection.
  print('* Number of images:', collection.size().getInfo())

  # Map the function over the collection to add NDVI and NDWI
  collection_with_indices = collection.map(addIndices)

  # Create a mosaic of the images with NDVI and NDWI
  img = collection_with_indices.max().select(['R', 'G', 'B', 'NDVI', 'NDWI']).clip(region)

  # Print the resulting image
  print("Mosaic with NDVI bands:", img.getInfo())

  # Get the number of images in the collection_with_ndvi.
  print('* Number of images with NVDI:', collection_with_indices.size().getInfo())

  # Get the CRS.
  collection_crs = collection.first().geometry().projection().getInfo()['crs']
  print('* collection_crs:', collection_crs)

  # Get the CRS.

  return (img, collection_crs, region)

## Calculating the MOD14A2.061 fire masks

The provided code is focused on obtaining a mask for wildfire occurrences based on the MOD14A2.061: Terra Thermal Anomalies & Fire 8-Day Global 1km dataset. Let's break down the critical steps conceptually:

1. **Definition of Region of Interest (ROI):**
   - The `get_mask` function takes as input a point location (latitude and longitude), a start date, and an end date. It defines a circular area of interest around the specified point with a size of 50 square kilometers. The selected region is then used to filter the MODIS fire dataset based on location and date.

2. **Image Collection Filtering and Selection:**
   - The code filters the MODIS fire dataset using the specified bounds and date range. It selects only the relevant band ('FireMask') needed for identifying thermal anomalies and fires.

3. **Label Filtering:**
   - The code maps a function over the dataset to filter labels, keeping only those corresponding to classes 7, 8, and 9 in the FireMask. This transformation converts these labels to 1, while all other pixels are set to 0.

4. **Maximum FireMask Generation:**
   - The code generates a maximum composite of the filtered FireMask images to obtain a comprehensive representation of thermal anomalies and fires in the specified region and time range.

5. **Clipping to ROI:**
   - The resulting image, representing the mask for thermal anomalies and fires, is clipped to the specified region of interest. This step ensures that the mask aligns with the defined area for subsequent analysis.

6. **Additional Image Generation:**
   - An additional image (`img`) is created from the FireMask by generating a quality mosaic and renaming it as 'mask'. This image is also clipped to the specified region.

In summary, the critical steps involve defining a region of interest, filtering and selecting the MODIS fire dataset, mapping functions to filter labels and generate composite images, and finally clipping the results to the specified region. This process produces a mask highlighting areas with thermal anomalies and fires, essential for subsequent wildfire prediction model training and evaluation.

In [18]:
def get_mask(point, start_date, end_date):

  start = ee.Date(start_date)
  end = ee.Date(end_date)

  print(f"* Processing point: {tuple(point)}")

  point = ee.Geometry.Point(tuple(point))
  area = 50e6  # (50 square kilometers)
  region = point.buffer(ee.Number(area).sqrt().divide(2), 1).bounds()
  print('area (m2)', region.area(1).getInfo())
  print('error (m2)', region.area(1).subtract(area).getInfo())

  # MOD14A1.061: Terra Thermal Anomalies & Fire Daily Global 1km
  # ee.ImageCollection('MODIS/061/MOD14A1')
  # MOD14A2.061: Terra Thermal Anomalies & Fire 8-Day Global 1km
  # ee.ImageCollection("MODIS/061/MOD14A2")

  dataset = (  ee.ImageCollection('MODIS/061/MOD14A2')
  .filterBounds(region)
  .filterDate(start, end)
  .select(['FireMask'])
  )
  # Get only labels 7, 8 & 9 and conver to 1. Other pixels will be 0.
  dataset = dataset.map(lambda img: img.gte(7).copyProperties(img, img.propertyNames()))

  # display(dataset)

  # Get the max for the fire mask and clip to the roi
  fires = dataset.max().clip(region);

  # Generate a image from the FireMask from the image collection. Clip tp the roi because MODIS is unbounded
  img = dataset.qualityMosaic('FireMask').clip(region).rename('mask')

  # Get the number of images in the collection.
  print('* Number of images:', dataset.size().getInfo())
  collection_crs = dataset.first().geometry().projection().getInfo()['crs']
  print('* collection_crs:', collection_crs)

  return (img, collection_crs, region)


The overall workflow iterates over each data point, leveraging Google Earth Engine (GEE) for processing, and subsequently downloads both the satellite images and corresponding masks to local storage. The following summarizes the key aspects of this comprehensive data retrieval and processing operation:

1. **Iterative Data Processing:**
   - The provided code iterates over each data point, ensuring that the specified region around the point is processed individually. This approach allows for a detailed and focused analysis of wildfire occurrences in diverse geographic locations.

2. **Google Earth Engine Processing:**
   - The processing of both satellite images from the National Agriculture Imagery Program (NAIP) and masks from MODIS FireMask is conducted within the Google Earth Engine environment. GEE's powerful capabilities enable efficient querying, spectral index calculations, and mask generation for each selected region.

3. **Satellite Image and Mask Downloads:**
   - Following the processing steps in GEE, the resultant satellite images and masks are downloaded to the local storage. This ensures that the processed data is accessible for subsequent analysis, model training, and validation.

4. **Processing Time Consideration:**
   - It is crucial to note that the entire process, from data iteration to GEE processing and local storage downloads, may take a considerable amount of time. Specifically, due to the complex calculations involved in generating masks, the overall execution time could extend up to approximately 18 hours. Users are advised to plan accordingly and be patient during the execution of the notebook.

5. **Scale Factor Adjustment:**
   - To balance computational efficiency and data quality, a scale factor has been adjusted during the processing. This adjustment aims to reduce the final size of each image while preserving high-resolution quality essential for effective model training. The scale factor optimization contributes to a more streamlined and manageable dataset without compromising the critical details required for wildfire prediction.


In [19]:
def download_raster_files(out_dir, point, label, index):

  out_filename = f'{label}_cp_mask_' + '{0:03}'.format(index) + '.tif'
  out_path = os.path.join(out_dir, out_filename)
  print(out_path)

  if not os.path.isfile(out_path):

    start_date = '2019-08-01'
    end_date = '2020-10-30'

    img, collection_crs, region = get_mask(point = coordinates, start_date = start_date, end_date = end_date)

    display(img)
    display(collection_crs)
    display(region)

    geemap.download_ee_image(img, out_path, crs=collection_crs, region=region, scale=5)

  out_filename = f'{label}_cp_landcover_' + '{0:03}'.format(index) + '.tif'
  out_path = os.path.join(out_dir, out_filename)
  print(out_path)

  if not os.path.isfile(out_path):

    pre_fire_start = '2018-04-01'
    pre_fire_end = '2020-04-01'

    raster_img, raster_collection_crs, raster_region = get_raster(point = coordinates, start_date = pre_fire_start, end_date = pre_fire_end)

    display(raster_img)
    display(raster_collection_crs)
    display(raster_region)
    # Using collection_crs (Mask CRS)
    geemap.download_ee_image(raster_img, out_path, crs=collection_crs, region=raster_region, scale=5)


# Group by the "label" column
grouped_gdf = gdf.groupby("label")

# Iterate through groups
for label, group in grouped_gdf:
    print(f"Processing data for label: {label}")

    # Reset index for the group
    group = group.reset_index(drop=True)

    # Iterate through rows in the group
    for index, row in group.iterrows():
      coordinates = row["geometry"].x, row["geometry"].y

      download_raster_files(out_dir = out_dir, point = coordinates, label = label, index=index)

Processing data for label: fire
/content/drive/My Drive/upc_project_dataset/test/fire_cp_mask_000.tif
* Processing point: (-121.691024, 39.766651)
area (m2) 50059124.2955644
error (m2) 59124.29556439817
* Number of images: 57
* collection_crs: EPSG:4326


'EPSG:4326'

fire_cp_mask_000.tif: |          | 0.00/2.61M (raw) [  0.0%] in 00:00 (eta:     ?)

There is no STAC entry for: None


/content/drive/My Drive/upc_project_dataset/test/fire_cp_landcover_000.tif
* Processing point: (-121.691024, 39.766651)
area (m2) 50059124.2955644
error (m2) 59124.29556439817
* Number of images: 4
Mosaic with NDVI bands: {'type': 'Image', 'bands': [{'id': 'R', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'G', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'B', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'NDVI', 'data_type': {'type': 'PixelType', 'precision': 'float', 'min': -1, 'max': 1}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform':

'EPSG:4326'

fire_cp_landcover_000.tif: |          | 0.00/52.3M (raw) [  0.0%] in 00:00 (eta:     ?)

/content/drive/My Drive/upc_project_dataset/test/fire_cp_mask_001.tif
* Processing point: (-121.670425, 39.749759)
area (m2) 50059124.395039275
error (m2) 59124.39503927529
* Number of images: 57
* collection_crs: EPSG:4326


'EPSG:4326'

fire_cp_mask_001.tif: |          | 0.00/2.61M (raw) [  0.0%] in 00:00 (eta:     ?)

/content/drive/My Drive/upc_project_dataset/test/fire_cp_landcover_001.tif
* Processing point: (-121.670425, 39.749759)
area (m2) 50059124.395039275
error (m2) 59124.39503927529
* Number of images: 4
Mosaic with NDVI bands: {'type': 'Image', 'bands': [{'id': 'R', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'G', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'B', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'NDVI', 'data_type': {'type': 'PixelType', 'precision': 'float', 'min': -1, 'max': 1}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform

'EPSG:4326'

fire_cp_landcover_001.tif: |          | 0.00/52.3M (raw) [  0.0%] in 00:00 (eta:     ?)

Processing data for label: no_fire
/content/drive/My Drive/upc_project_dataset/test/no_fire_cp_mask_000.tif
* Processing point: (-121.859939, 39.691346)
area (m2) 50059124.73856181
error (m2) 59124.73856180906
* Number of images: 57
* collection_crs: EPSG:4326


'EPSG:4326'

no_fire_cp_mask_000.tif: |          | 0.00/2.61M (raw) [  0.0%] in 00:00 (eta:     ?)

/content/drive/My Drive/upc_project_dataset/test/no_fire_cp_landcover_000.tif
* Processing point: (-121.859939, 39.691346)
area (m2) 50059124.73856181
error (m2) 59124.73856180906
* Number of images: 4
Mosaic with NDVI bands: {'type': 'Image', 'bands': [{'id': 'R', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'G', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'B', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'NDVI', 'data_type': {'type': 'PixelType', 'precision': 'float', 'min': -1, 'max': 1}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transfo

'EPSG:4326'

no_fire_cp_landcover_000.tif: |          | 0.00/52.2M (raw) [  0.0%] in 00:00 (eta:     ?)

/content/drive/My Drive/upc_project_dataset/test/no_fire_cp_mask_001.tif
* Processing point: (-121.805007, 39.65435)
area (m2) 50059124.95582524
error (m2) 59124.95582523942
* Number of images: 57
* collection_crs: EPSG:4326


'EPSG:4326'

no_fire_cp_mask_001.tif: |          | 0.00/2.61M (raw) [  0.0%] in 00:00 (eta:     ?)

/content/drive/My Drive/upc_project_dataset/test/no_fire_cp_landcover_001.tif
* Processing point: (-121.805007, 39.65435)
area (m2) 50059124.95582524
error (m2) 59124.95582523942
* Number of images: 6
Mosaic with NDVI bands: {'type': 'Image', 'bands': [{'id': 'R', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'G', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'B', 'data_type': {'type': 'PixelType', 'precision': 'int', 'min': 0, 'max': 255}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transform': [1, 0, 0, 0, 1, 0]}, {'id': 'NDVI', 'data_type': {'type': 'PixelType', 'precision': 'float', 'min': -1, 'max': 1}, 'dimensions': [1, 1], 'origin': [-122, 39], 'crs': 'EPSG:4326', 'crs_transfor

'EPSG:4326'

no_fire_cp_landcover_001.tif: |          | 0.00/52.2M (raw) [  0.0%] in 00:00 (eta:     ?)

In [20]:
%ls "{out_dir}"

fire_cp_landcover_000.tif  fire_cp_mask_001.tif          no_fire_cp_mask_000.tif
fire_cp_landcover_001.tif  no_fire_cp_landcover_000.tif  no_fire_cp_mask_001.tif
fire_cp_mask_000.tif       no_fire_cp_landcover_001.tif


**Final Note:**
For a visual preview and demonstration of the generated images, you can refer to the [segmentation_model_training_demo.ipynb](https://github.com/boroju/aidl-upc-winter2024-satellite-imagery/blob/main/notebooks/segmentation_model_training_demo.ipynb) notebook. This companion notebook provides an interactive showcase, allowing you to explore sample images and masks, understand the processing pipeline, and gain insights into the data that will be utilized for training the wildfire prediction segmentation model. Feel free to explore the demo notebook to enhance your understanding of the geospatial data and the subsequent model training process.