# Sentinel-2 Methane Emission Detection and Analysis Tool

## Overview 
This notebook provides a comprehensive workflow for detecting and analysing methane plumes from oil and gas facilities using Sentinel-2 satellite data. It combines satellite imagery processing, wind speed analysis, and regression modelling to estimate methane emission rates accurately. Key functionalities include:

1. **SWIR Analysis**: Utilises Sentinel-2's Short-Wave Infrared (SWIR) bands to highlight methane plumes by comparing active and non-active regions.
2. **Plume Detection and Tagging**: Supports manual tagging of plume locations and segmentation for precise analysis.
3. **Regression-Based Emission Estimation**: Employs a trained regression model to estimate methane emission rates based on plume characteristics and wind speed data.
4. **Interactive Visualisation**: Creates interactive maps to visualise true-colour images, SWIR-derived plumes, and analysis results.
5. **Dynamic Model Updates**: Facilitates the addition of new training data to refine the regression model for improved predictions.

This tool is designed for researchers, policymakers, and environmental analysts aiming to quantify and monitor methane emissions efficiently.

The section below imports the packages needed to run the script.

In [1]:
# Core System and Numerical Operations
import os  # For file path and system-level operations
import numpy as np  # For numerical operations and array manipulations
import pandas as pd  # For handling tabular data (e.g., CSV files)

# File Handling and Temporary Files
from tempfile import NamedTemporaryFile  # For creating temporary files

# Machine Learning and Statistical Analysis
from sklearn.linear_model import LinearRegression  # Regression model for methane emission estimation
from sklearn.metrics import mean_squared_error, r2_score  # Metrics for model evaluation
from sklearn.decomposition import PCA  # For principal component analysis (PCA)

# Data Analysis and Manipulation
import xarray as xr  # For working with multidimensional arrays (e.g., NetCDF files)
import cdsapi  # For accessing the Copernicus Climate Data Store API
import requests  # For making HTTP requests (e.g., downloading data)
import openeo  # For cloud-based geospatial data processing

# Geospatial Data Handling
import geopandas as gpd  # For working with GeoJSON and vector geospatial data
import rasterio  # For working with raster data
from rasterio.enums import Resampling  # For resampling raster data
from rasterio.plot import show  # For visualising raster data
from rasterio.transform import from_origin  # For creating geospatial transformations
from rasterio.warp import calculate_default_transform, reproject  # For reprojection of raster data
from shapely.geometry import Point, LineString  # For geometric operations in geospatial analysis

# Image Processing
from skimage import exposure  # For adjusting image exposure and contrast
from PIL import Image  # For basic image manipulation
from rasterio.features import rasterize


# Mathematical and Geometric Computations
from scipy.ndimage import label  # For segmentation and labelling of regions
from scipy.spatial import ConvexHull  # For calculating convex hulls of shapes

# Interactive Maps and Visualisation
import folium  # For creating interactive maps
from folium import Map, GeoJson, LayerControl, LatLngPopup  # Map features and interactions
from folium.raster_layers import ImageOverlay  # Overlay raster images on maps
from folium import FeatureGroup  # For grouping map layers
import matplotlib.pyplot as plt  # For plotting and visualisation

# Jupyter Notebook Integration
from IPython.display import display as ipy_display  # For displaying outputs in notebooks



## Connect to OpenEO

The code below establishes a connection with the Copernicus openEO platform which provides a wide variety of earth observation datasets

- If this does not read as 'Authorised successfully' or 'Authenticated using refresh token', then please ensure that you have completed the setup steps as outlined in section 2.3.6 of the how to guide. 

- If you have followed the steps in section 2.3.6 correctly and the problem persists, please look at https://dataspace.copernicus.eu/news for any information about service interruptions. 

- If there is no news of service problems you can raise a ticket here: https://helpcenter.dataspace.copernicus.eu/hc/en-gb/requests/new

In [2]:
connection = openeo.connect(url="openeo.dataspace.copernicus.eu")
connection.authenticate_oidc()

Authenticated using refresh token.


<Connection to 'https://openeo.dataspace.copernicus.eu/openeo/1.2/' with OidcBearerAuth>

## Dispaly field names and select site_id. 

This loads the oil and gas field list Algeria. Hassi Messaoud is site 86. If you are interested in a different field, please look-up its id number. 


In [3]:
studysite_csv = pd.read_csv(r'C:\GIS_Course\Methane_Point_Detection\Sentinel-2_Algeria_Methane\Data\Algerian_Oil_and_Gas_Fields.csv')
pd.set_option('display.max_rows', None)
print(studysite_csv.to_string(index=False))

 id                   name      west     south      east     north
  1           Gour Mahmoud  2.545738 26.832422  2.689150 26.980223
  2          Hassi Hassine  2.481688 26.814902  2.556715 26.878123
  3              In Sallah  2.451958 27.033471  2.528314 27.187470
  4        Mahbes Guenatir  3.166512 26.177879  3.197605 26.203481
  5           Djebel Thara  3.062622 25.988381  3.102147 26.022814
  6                Unknown  3.243313 26.027347  3.265408 26.045501
  7         Krebb Ed Douro  2.583645 25.944587  2.628648 25.980437
  8              Tibardine  2.210629 25.929342  2.342364 26.046971
  9            Oued Djaret  2.305817 26.640617  2.357529 26.703343
 10      Djebel Mouahdrine  2.976608 26.088212  3.050097 26.165123
 11          Hassi Moumene  2.458900 27.402452  2.631038 27.597842
 12       Garet El Befinat  2.248445 27.743627  2.337012 27.842359
 13                    Reg  1.914508 28.002131  2.231025 28.293102
 14               Bouteraa  1.764153 27.805731  1.964158 27.96

# Multi-Band Multi-Pass Analysis

In the code box below, specify the field number we are interested in for analysis. 

<p style="text-align: center;"><b>site_id</b> = 86</p>

In [4]:
site_id = 86  # Specify the oil and gas field ID for the field you want to examine.

# Retrieve the name of the field from the dataset
field_name = studysite_csv[studysite_csv['id'] == site_id].iloc[0]['name']

# Print a confirmation message
print(f"Site {site_id} ({field_name}) loaded correctly.")

Site 86 (Hassi Messaoud) loaded correctly.


# Multi-Band Multi-Pass Analysis

Varon et al. (2021) showed that methane plumes from point sources could be imaged by differencing Sentinel-2’s SWIR-1 and SWIR-2 bands. The tool runs an analysis using a  multi-band-multi-pass retrieval method: 

First it calculates a multi-band-single-pass calculation for both active emission and no emission dates, resulting in two datasets which are then used together for a multi-band-multi-pass method. 
The multi-band-single-pass equation is as follows: 


<div align="center"><b>MBSP = B11 - cB12</b></div>

Where:
- <b>B12</b> is the Sentinel-2 SWIR-2 band.
- <b>B11</b> is the Sentinel-2 SWIR-1 band. 
- <b>c</b> is calculated by least-squares fitting B12 to B11 across the scene.  

Once active emission and no emission scenes have been calculated, the following equation is used to calculate the multi-band-multi-pass raster. 

<div align="center"><b>MBMP = ActiveMBSP − NoMBSP</b></div>

Where:
- <b>ActiveMBSP</b> is the multiband single pass for the active emission scene
- <b>NoMBSP</b> is the multiband single pass for the no emission scene.  

The active emission scene and no emission scene are considered in this analysis to be one satelite pass apart. To begin this process we need to determine what days have available satelite data. 

# Available dates for the analysis. 

Sentinel 2 provides data aproximately once every 2 - 3 days, so not every date you can enter into this tool is valid. The code below will tell you what dates are available to use for the oil/gas field of your choice. 

The one parameter you need to modify before running the code is: 

- <b>temporal_extent</b> = ["2020-01-01", "2020-01-31"] (change this to your chosen date range using "YYYY-MM-DD" format.)

Once you have done this run the code and the available dates should appear below in a matter of seconds. 

In [5]:
def get_spatial_extent(site_id):
    site = studysite_csv[studysite_csv['id'] == site_id].iloc[0]
    return {
        "west": site['west'],
        "south": site['south'],
        "east": site['east'],
        "north": site['north']
    }

def fetch_available_dates(site_id, temporal_extent):
    spatial_extent = get_spatial_extent(site_id)
    catalog_url = f"https://catalogue.dataspace.copernicus.eu/resto/api/collections/Sentinel2/search.json?box={spatial_extent['west']}%2C{spatial_extent['south']}%2C{spatial_extent['east']}%2C{spatial_extent['north']}&sortParam=startDate&sortOrder=ascending&page=1&maxRecords=1000&status=ONLINE&dataset=ESA-DATASET&productType=L2A&startDate={temporal_extent[0]}T00%3A00%3A00Z&completionDate={temporal_extent[1]}T00%3A00%3A00Z&cloudCover=%5B0%2C{cloud_cover}%5D"
    response = requests.get(catalog_url)
    response.raise_for_status()
    catalog = response.json()
    dates = [date.split('T')[0] for date in map(lambda x: x['properties']['startDate'], catalog['features'])]
    return dates

# Please enter your perameters here.
temporal_extent = ["2021-08-15", "2021-09-15"]  # Specify the the date range you want to check for available data.
cloud_cover = 5

available_dates = fetch_available_dates(site_id, temporal_extent)
print("Available dates:", available_dates)

Available dates: ['2021-08-16', '2021-08-16', '2021-08-16', '2021-08-16', '2021-08-19', '2021-08-19', '2021-08-19', '2021-08-19', '2021-08-21', '2021-08-21', '2021-08-21', '2021-08-24', '2021-08-24', '2021-08-24', '2021-08-24', '2021-08-26', '2021-08-26', '2021-08-26', '2021-08-29', '2021-08-29', '2021-08-29', '2021-08-29', '2021-08-31', '2021-08-31', '2021-08-31', '2021-08-31', '2021-09-03', '2021-09-03', '2021-09-05', '2021-09-05', '2021-09-08', '2021-09-10', '2021-09-10', '2021-09-10', '2021-09-10', '2021-09-13', '2021-09-13', '2021-09-13', '2021-09-13']


## Choosing the "Active Emission" Date

A so called active emission date must be chosen from one of the available datasets. This will be the chosen day we are looking for plumes.  

Like before, the one parameter you need to modify before running the code is:

<p style="text-align: center;"><b>temporal_extent</b> = ["2020-01-17", "2020-01-17"]</p>

Change this to your chosen date range using "YYYY-MM-DD" format. 

Please note that the temporal extent dates <b><u>MUST BE IDENTICAL</u></b> because we are only choosing a single date.

If you recieve an error message of 'NoDataAvailable' then please check the list of available data above and try again.

In [None]:
def active_emission(site_id, active_temporal_extent):
    site = studysite_csv[studysite_csv['id'] == site_id].iloc[0]

    active_emission = connection.load_collection(
        "SENTINEL2_L2A",
        temporal_extent=active_temporal_extent,
        spatial_extent={
            "west": site['west'],
            "south": site['south'],
            "east": site['east'],
            "north": site['north']
        },
        bands=["B11", "B12"],
    )
    active_emission.download("Sentinel-2_active_emissionMBMP.Tiff")

# Enter parameters for the active emission day
active_temporal_extent = ["2019-11-20", "2021-11-20"]

active_emission(site_id, active_temporal_extent)


## Choosing the "No Emission" Date

Next we choose the no emission date using the same process. This is the dataset we will compare the "Active Emission" one too. The recommended choice is the satelite overpass immediately before the "Active Emission" one. 

<b>So if your active emission day is 2020-01-17, your no emission day would be 2020-01-14</b>

In an ideal world, the "No Emission" day should contain no emissions, but in fields with a lot of activity like Hassi Messaoud, this may not be possible. Such an instance will not cause problems in most cases. The emissions for these dates will simply appear as dark clouds on the SWIR data and can be ignored in the analysis. 

The one parameter you need to modify before running the code is:

<p style="text-align: center;"><b>temporal_extent</b> = ["2020-01-14", "2020-01-14"]</p>

The temporal extent dates <b><u>MUST BE IDENTICAL</u></b>

If you recieve an error message of 'NoDataAvailable' then please check the list of available data above and try again.


In [None]:
def no_emission(site_id, temporal_extent):
    site = studysite_csv[studysite_csv['id'] == site_id].iloc[0]

    no_emission = connection.load_collection(
        "SENTINEL2_L2A",
        temporal_extent=no_temporal_extent,
        spatial_extent={
            "west": site['west'],
            "south": site['south'],
            "east": site['east'],
            "north": site['north']
        },
        bands=["B11", "B12"],
    )
    no_emission.download("Sentinel-2_no_emissionMBMP.Tiff")

# Enter perameters for the active emission day
no_temporal_extent = ["2019-11-18", "2019-11-18"]

no_emission(site_id, no_temporal_extent)

## Choosing a Background Satelite Image

This section helps with locating the source of the emission by displaying a true colour satelite image of the oil/gas field that the data will be superimposed over. This will help distinguish between true emissions and visual spectrum observable clouds. It is recommended that you choose the same date as your active emission. 

In [None]:
def reproject_to_epsg4326(data, meta):
    """
    Reprojects the given raster data to EPSG:4326 and returns the updated data and metadata.
    """
    target_crs = "EPSG:4326"
    
    # Calculate transform and metadata for the target CRS
    transform, width, height = calculate_default_transform(
        meta['crs'], target_crs, meta['width'], meta['height'], *meta['bounds']
    )
    
    # Update metadata for the new projection
    new_meta = meta.copy()
    new_meta.update({
        "crs": target_crs,
        "transform": transform,
        "width": width,
        "height": height,
    })
    
    # Prepare an in-memory array for reprojected data
    reprojected_data = []
    for i in range(meta['count']):
        # Create an empty numpy array to store the reprojected data for the band
        destination = np.empty((height, width), dtype=data[i].dtype)
        reproject(
            source=data[i],
            destination=destination,
            src_transform=meta['transform'],
            src_crs=meta['crs'],
            dst_transform=transform,
            dst_crs=target_crs,
            resampling=Resampling.nearest
        )
        reprojected_data.append(destination)
    
    return reprojected_data, new_meta

def truecolour_image(site_id, temporal_extent):
    """
    Downloads and reprojects Sentinel-2 true-colour images for a given site and temporal extent.
    """
    site = studysite_csv[studysite_csv['id'] == site_id].iloc[0]

    truecolour_image = connection.load_collection(
        "SENTINEL2_L2A",
        temporal_extent=temporal_extent,
        spatial_extent={
            "west": site['west'],
            "south": site['south'],
            "east": site['east'],
            "north": site['north']
        },
        bands=["B02", "B03", "B04"],
    )
    # Download the true colour image
    file_path = "Sentinel-2_truecolourMBMP.Tiff"
    truecolour_image.download(file_path)
    
    # Read the file into memory
    with rasterio.open(file_path) as src:
        data = [src.read(i) for i in range(1, src.count + 1)]
        meta = src.meta.copy()
        meta['bounds'] = src.bounds

    # Reproject the data in memory
    reprojected_data, reprojected_meta = reproject_to_epsg4326(data, meta)
    
    # Save the reprojected file
    output_file = "Sentinel-2_truecolour_reprojected.Tiff"
    with rasterio.open(output_file, "w", **reprojected_meta) as dest:
        for i, band in enumerate(reprojected_data, start=1):
            dest.write(band, i)
    
    # Print the CRS of the output file
    with rasterio.open(output_file) as reprojected_file:
        print("CRS of the reprojected file:", reprojected_file.crs)

# Enter parameters for the no emission day
temporal_extent = active_temporal_extent

truecolour_image(site_id, temporal_extent)


## Running Plume Visualiser Analysis
The code below will use the satelite data to display plumes above 1,400kgh-1 in ideal conditions. Provided all the variables above have been run correctly, this next section should take moments to complete. 

In [None]:
# find a way to NaN the urban areas in the SWIR_diff raster

# Function to get bounds from the Oil and Gas Field bounding file
def get_bounds(site_id, csv_path):
    df = pd.read_csv(csv_path)
    site = df[df['id'] == site_id]
    if site.empty:
        raise ValueError(f"Site ID {site_id} not found in the CSV file.")
    site = site.iloc[0]
    return [[site['south'], site['west']], [site['north'], site['east']]]

csv_path = r'C:\GIS_Course\Methane_Point_Detection\Sentinel-2_Algeria_Methane\Data\Algerian_Oil_and_Gas_Fields.csv'
bounds = get_bounds(site_id, csv_path)

# Define file paths
Active_Multiband = "Sentinel-2_active_emissionMBMP.Tiff"
No_Multiband = "Sentinel-2_no_emissionMBMP.Tiff"
output_file = "SWIR_diff_4326.tiff"

# Define a function for least squares fitting
def least_squares_fit(x, y):
    mask = ~np.isnan(x) & ~np.isnan(y)
    x_valid = x[mask]
    y_valid = y[mask]
    A = np.vstack([x_valid, np.ones_like(x_valid)]).T
    m, c = np.linalg.lstsq(A, y_valid, rcond=None)[0]
    return m, c

# Open datasets and perform least squares fitting
with rasterio.open(Active_Multiband) as Active_img, rasterio.open(No_Multiband) as No_img:
    Active_B11 = Active_img.read(1)
    Active_B12 = Active_img.read(2)
    No_B11 = No_img.read(1)
    No_B12 = No_img.read(2)

    m_active, c_active = least_squares_fit(Active_B11.flatten(), Active_B12.flatten())
    Corrected_Active_B12 = m_active * Active_B12 + c_active

    m_no, c_no = least_squares_fit(No_B11.flatten(), No_B12.flatten())
    Corrected_No_B12 = m_no * No_B12 + c_no

    SWIR_diff = (Active_B11 - Corrected_Active_B12) - (No_B11 - Corrected_No_B12)
    min_value = np.min(SWIR_diff)
    if min_value < 0:
        SWIR_diff = SWIR_diff - min_value

# Reproject and save SWIR_diff to EPSG:4326
with rasterio.open(Active_Multiband) as src:
    target_crs = "EPSG:4326"
    transform, width, height = calculate_default_transform(
        src.crs, target_crs, src.width, src.height, *src.bounds
    )
    meta = src.meta.copy()
    meta.update({
        "crs": target_crs,
        "transform": transform,
        "width": width,
        "height": height,
        "count": 1,
        "dtype": SWIR_diff.dtype
    })
    with rasterio.open(output_file, "w", **meta) as dest:
        reproject(
            source=SWIR_diff,
            destination=rasterio.band(dest, 1),
            src_transform=src.transform,
            src_crs=src.crs,
            dst_transform=transform,
            dst_crs=target_crs,
            resampling=Resampling.nearest
        )

# Calculate center for map
center_lat = (bounds[0][0] + bounds[1][0]) / 2
center_lon = (bounds[0][1] + bounds[1][1]) / 2

# Create Folium map
m = Map(location=[center_lat, center_lon], zoom_start=10, control_scale=True)

# Load and stretch SWIR_diff for visualization
with rasterio.open(output_file) as src:
    swir_bounds = [[src.bounds.bottom, src.bounds.left], [src.bounds.top, src.bounds.right]]
    swir_data = src.read(1)
    nodata_value = src.nodata if src.nodata is not None else -9999
    swir_data = np.ma.masked_equal(swir_data, nodata_value)

    # Apply stretching logic
    mean = np.nanmean(swir_data)
    std = np.nanstd(swir_data)
    std_factor = 2  # Stretch factor
    lower_bound = mean - std_factor * std
    upper_bound = mean + std_factor * std
    normalized_swir_data = (swir_data - lower_bound) / (upper_bound - lower_bound)
    normalized_swir_data = np.clip(normalized_swir_data, 0, 1)

    # Apply colormap
    cmap = plt.get_cmap('plasma')
    rgb_data = (cmap(normalized_swir_data)[:, :, :3] * 255).astype(np.uint8)

# Load the true color image
truecolour_sat = 'Sentinel-2_truecolour_reprojected.Tiff'
img = rasterio.open(truecolour_sat)
blue, green, red = img.read(1), img.read(2), img.read(3)

# Adjust brightness dynamically
brightness_factor = 0.03
blue = np.clip(blue * brightness_factor, 0, 255)
green = np.clip(green * brightness_factor, 0, 255)
red = np.clip(red * brightness_factor, 0, 255)

# Stack bands to create RGB image
rgb = np.dstack((red, green, blue))
rgb = rgb / rgb.max()
rgb = np.log1p(rgb)
rgb = rgb / rgb.max()

# Add true color image overlay
truecolour_overlay = ImageOverlay(
    image=rgb,
    bounds=swir_bounds,
    opacity=1,  # Lower opacity for blending with SWIR overlay
    interactive=True,
    cross_origin=False,
    zindex=1,  # Lower zindex to place below SWIR overlay
)
truecolour_overlay.add_to(m)

# Add SWIR_diff overlay to map
swir_overlay = ImageOverlay(
    image=rgb_data,
    bounds=swir_bounds,
    opacity=1,  # Adjust opacity for visibility
    interactive=True,
    zindex=2  # Ensure SWIR overlay is above other layers
)
swir_overlay.add_to(m)

# Add GeoJSON data as a layer group
vector_point_path = r"C:\GIS_Course\Methane_Point_Detection\Sentinel-2_Algeria_Methane\Data\known_point_sources.geojson"
gdf = gpd.read_file(vector_point_path)
geojson_layer = FeatureGroup(name="Known Point Sources", show=False)
GeoJson(gdf.to_json()).add_to(geojson_layer)
geojson_layer.add_to(m)

# Layer control
LayerControl().add_to(m)
m.add_child(LatLngPopup())

# Display map
display(m)

## Plume tagging

Over an area the size of an oil and gas field, many objects can erroneously show up as methane like signals if a method like thresholding was used. These include urban areas, agriculrutal irrigation projects and new constructions. To deal with this problem we will select the plumes in the image using a manual tagging system. 

To do this, click on a plume somewhere along its legnth, and then copy the given latitude and longitude coordinates. 

Maually input plume source coordinates below in the format (latitude, longitude), for example:  
<p style="text-align: center;">(31.6887, 5.8102),  # Plume 1 (latitude, longitude)</p> <p style="text-align: center;">(31.7910, 5.8263),  # Plume 2 (latitude, longitude)</p> 

Additional lines for more plumes can be added as needed.

In [None]:
plume_coords = [
    (31.7693, 6.0030),  # Plume 1 (latitude, longitude)
    (31.7791, 5.9952),  # Plume 2 (latitude, longitude)
    (31.7983, 6.0107),  # Plume 3 (latitude, longitude)
]


## Regression Model Development

A regression model is a statistical tool used to predict a dependent variable (here, methane emission rate in kg/h) based on independent variables. It works by identifying relationships in the training data and using these to estimate outcomes for new data.

To train the model, data from methane plumes with emission rates documented in peer-reviewed studies was collected (Gorroño et al., 2023; Pandey et al., 2023; Varon et al., 2021; Wang et al., 2023; Sanchez-Garcia et al., 2021). These plumes were then found using the MBMP Plume Visualiser. Each plume was measured for:

- **Adjusted CS Sum**: The plume intensity in its cross-section after subtracting background values.
- **Plume Width**: The plume's width in pixels, measured perpendicular to its direction of travel.
- **Wind Speed**: ERA5 reanalysis data for the wind speed at the time of observation.

The regression analysis identifies how these factors relate to emission rates, allowing the model to predict methane emissions for other plumes based on their characteristics.

Below are the data that was collected for the regression analysis. Here more data can be added to improve the model, should more studies become available. The data used for the model as of publication, is listed below.

| Plume | Longitude  | Latitude   | Date       | Emission Rate (Q) (kg/h) | Variance (kg/h) | CS Sum (digital numbers) | Width (px) | Wind Speed (m/s) | Source                                |
|:-----:|:----------:|:----------:|:----------:|:------------------------:|:---------------:|:------------------------:|:----------:|:----------------:|:-------------------------------------:|
|   1   |  6.154881  |  31.805489 | 31/08/2021 |          5453           |      ± 2200      |       1157.180431        |     13     |       4.37       |         Gorroño et al., 2023         |
|   2   |   5.9968   |   31.7775  | 04/01/2020 |         21000           |      ± 6000      |       5595.239521        |     74     |       3.65       |         Pandey et al., 2023          |
|   3   |   5.9053   |   31.6585  | 20/11/2019 |          8500           |      ± 5700      |       4520.473715        |     42     |       0.51       | Varon et al., 2021, Pandey et al., 2023 |
|   4   |      6      |    31.78   | 19/08/2021 |          4326           |      ± 2453      |       320.936628         |     13     |       0.92       | Wang et al., 2023, Sanchez-Garcia et al., 2021 |
|   5   |   5.9951   |   31.7789  | 19/08/2021 |          2160           |      ±1108       |       343.933175         |      8     |       0.92       | Wang et al., 2023, Sanchez-Garcia et al., 2021 |
|   6   |   6.0107   |    31.798  | 19/08/2021 |          2757           |      ±1297       |       366.843504         |      7     |       0.92       | Wang et al., 2023, Sanchez-Garcia et al., 2021 |


In [None]:
# Initial dataset for regression model (Add new plumes directly here as needed)
initial_data = {
    "Cross_sectional_Adjusted_Sum": [245.985248, 5626.58803, 4471.054728, 319.004091, 342.789593, 365.813576],
    "Wind_speed": [4.37, 3.65, 0.51, 0.92, 0.92, 0.92],
    "Emission_rate_kg_h": [5453, 21000, 8500, 4326, 2160, 2757],
}

## Detemining wind speed

Wind speed is a crucial factor in determining emission rate. This next code box determines the wind speed on the "Active Emission" date as part of the gas flux calculation. Several warning messages will appear but these can be ignored. 

In [None]:
# Function to extract bounding box and calculate center from site_id
def get_location_from_site_id(site_id, csv_path):
    """
    Extract center latitude and longitude for a site based on site_id.

    Args:
    - site_id (int): The ID of the site to extract.
    - csv_path (str): Path to the CSV containing site boundaries.

    Returns:
    - dict: Dictionary with latitude and longitude of the center.
    """
    df = pd.read_csv(csv_path)
    site = df[df['id'] == site_id]
    if site.empty:
        raise ValueError(f"Site ID {site_id} not found in the CSV file.")
    site = site.iloc[0]
    center_lat = (site['south'] + site['north']) / 2
    center_lon = (site['west'] + site['east']) / 2
    return {'latitude': center_lat, 'longitude': center_lon}

# Get the location for the ERA5 data request
location = get_location_from_site_id(site_id, csv_path)

# Initialize the CDS API client
c = cdsapi.Client()

# Define parameters for the data request
# Extract the start date from active_temporal_extent and assign it to date
date = active_temporal_extent[0]  # Use the first element as the single date

# Now the variable 'date' can be used with the other API

# date = active_temporal_extent (update this somehow so that no input is needed in this box)

# Retrieve ERA5 data and store it in a temporary file
with NamedTemporaryFile(suffix='.nc') as tmp_file:
    result = c.retrieve(
        'reanalysis-era5-single-levels',
        {
            'product_type': 'reanalysis',
            'variable': ['10m_u_component_of_wind', '10m_v_component_of_wind'],
            'year': date.split('-')[0],
            'month': date.split('-')[1],
            'day': date.split('-')[2],
            'time': ['10:00'],  # Specify time of interest
            'format': 'netcdf',  # NetCDF format
            'area': [
                location['latitude'] + 0.25, location['longitude'] - 0.25,
                location['latitude'] - 0.25, location['longitude'] + 0.25,
            ],  # Small bounding box around the location
        }
    )
    # Download data to the temporary file
    result.download(tmp_file.name)
    
    # Load the dataset with xarray
    ds = xr.open_dataset(tmp_file.name)

# Extract u and v components
u10 = ds['u10'].sel(latitude=location['latitude'], longitude=location['longitude'], method='nearest')
v10 = ds['v10'].sel(latitude=location['latitude'], longitude=location['longitude'], method='nearest')

# Calculate wind speed
wind_speed = np.sqrt(u10**2 + v10**2)

# Handle single timestep case
if 'time' in u10.dims:
    # Multiple timesteps (not likely in this case since we specified 10:00 only)
    for time, speed in zip(u10.time.values, wind_speed.values):
        print(f"{time}: Wind Speed = {speed:.2f} m/s")
else:
    # Single timestep
    wind_speed_value = wind_speed.values.item()  # Convert array to scalar
    print(f"Wind Speed at 10:00 on {date}: {wind_speed_value:.2f} m/s")

## Running the tagged plume analysis

This section loads the SWIR data and loads the colourmap in preparation for the analysis. It also provides the average/mean value of the dataset, allowing us to see how much a plume rises above background levels. 

In [None]:
swir_diff_path = r'C:\GIS_Course\Methane_Point_Detection\Sentinel-2_Algeria_Methane\SWIR_diff_4326.tiff'


with rasterio.open(swir_diff_path) as tiff_file:
    raster_data = tiff_file.read(1)  # Read the first band
    nodata_value = tiff_file.nodata if tiff_file.nodata is not None else -9999
    bounds = tiff_file.bounds
    transform = tiff_file.transform

# Mask nodata values
masked_data = np.ma.masked_equal(raster_data, nodata_value)

# Calculate statistical values
mean, std = np.nanmean(masked_data), np.nanstd(masked_data)
std_factor = 2
lower_bound, upper_bound = mean - std_factor * std, mean + std_factor * std
normalized_data = (masked_data - lower_bound) / (upper_bound - lower_bound)
normalized_data = np.clip(normalized_data, 0, 1)  # Clip to [0, 1]

# Calculate the median value of the dataset
dataset_median_value = np.ma.median(masked_data)
print(f"Median value of the dataset: {dataset_median_value}")

# Apply colormap
cmap = plt.get_cmap('plasma')
rgb_data = (cmap(normalized_data)[:, :, :3] * 255).astype(np.uint8)

# Initialize the map
center_lat = (bounds.top + bounds.bottom) / 2
center_lon = (bounds.left + bounds.right) / 2
m = folium.Map(location=[center_lat, center_lon], zoom_start=11, control_scale=True)

# Define helper functions
def get_raster_center(tiff_path):
    with rasterio.open(tiff_path) as tiff_file:
        bounds = tiff_file.bounds
        center_lat = (bounds.top + bounds.bottom) / 2
        center_lon = (bounds.left + bounds.right) / 2
    return center_lat, center_lon

def calculate_plume_width_pixels(plume_pixels, perp_direction):
    perp_vector = np.array(perp_direction)
    perp_vector = perp_vector / np.linalg.norm(perp_vector)
    projections = plume_pixels @ perp_vector
    return projections.ptp()  # Peak-to-peak width in the projection space

def bresenham_line(x0, y0, x1, y1):
    points = []
    dx = abs(x1 - x0)
    dy = abs(y1 - y0)
    sx = 1 if x0 < x1 else -1
    sy = 1 if y0 < y1 else -1
    err = dx - dy

    while True:
        points.append((x0, y0))
        if x0 == x1 and y0 == y1:
            break
        e2 = err * 2
        if e2 > -dy:
            err -= dy
            x0 += sx
        if e2 < dx:
            err += dx
            y0 += sy
    return points

def get_line_pixel_values(start, end, masked_data):
    line_pixels = bresenham_line(int(start[0]), int(start[1]), int(end[0]), int(end[1]))
    pixel_values = [masked_data[row, col] for row, col in line_pixels if 0 <= row < masked_data.shape[0] and 0 <= col < masked_data.shape[1]]
    return pixel_values

def count_line_pixels(start, end):
    line_pixels = bresenham_line(int(start[0]), int(start[1]), int(end[0]), int(end[1]))
    return len(line_pixels)

def analyze_plume_with_cross_section_sum(masked_data, plume_coords, transform, initial_center):
    plume_map = folium.Map(location=initial_center, zoom_start=11, control_scale=True)
    plume_results = []
    labeled_array, _ = label(masked_data > np.percentile(masked_data.compressed(), 80))

    for i, (lat, lon) in enumerate(plume_coords):
        try:
            row, col = rasterio.transform.rowcol(transform, lon, lat)
            row, col = int(row), int(col)
            plume_label = labeled_array[row, col]
            if plume_label == 0:
                plume_results.append({"Plume": i + 1, "Location (lat, lon)": (lat, lon), "Status": "No plume detected"})
                continue

            plume_region = labeled_array == plume_label
            plume_pixels = np.column_stack(np.where(plume_region))
            pca = PCA(n_components=2)
            pca.fit(plume_pixels)
            perp_direction = [-pca.components_[0, 1], pca.components_[0, 0]]

            plume_width_pixels = calculate_plume_width_pixels(plume_pixels, perp_direction)

            centroid = plume_pixels.mean(axis=0)
            perp_line_coords = [
                (centroid[0] - perp_direction[0] * plume_width_pixels / 2, centroid[1] - perp_direction[1] * plume_width_pixels / 2),
                (centroid[0] + perp_direction[0] * plume_width_pixels / 2, centroid[1] + perp_direction[1] * plume_width_pixels / 2),
            ]

            line_pixel_values = get_line_pixel_values(perp_line_coords[0], perp_line_coords[1], masked_data)
            num_intersecting_pixels = count_line_pixels(perp_line_coords[0], perp_line_coords[1])

            pixel_value_sum = sum(line_pixel_values)
            adjusted_sum = pixel_value_sum - (dataset_median_value * num_intersecting_pixels)

            perp_line_latlon = [
                rasterio.transform.xy(transform, int(pt[0]), int(pt[1])) for pt in perp_line_coords
            ]
            folium.PolyLine(
                locations=[(lat, lon) for lon, lat in perp_line_latlon],
                color="red",
                weight=2,
                opacity=1,
                tooltip=f"Plume {i + 1} Width Measurement",
            ).add_to(plume_map)

            hull = ConvexHull(plume_pixels)
            hull_coords = [(plume_pixels[vertex][0], plume_pixels[vertex][1]) for vertex in hull.vertices]
            hull_latlon = [rasterio.transform.xy(transform, int(pt[0]), int(pt[1])) for pt in hull_coords]
            folium.Polygon(
                locations=[(lat, lon) for lon, lat in hull_latlon],
                color="green",
                weight=3,
                fill=False,
                opacity=1,
                popup=f"Plume {i + 1} region",
            ).add_to(plume_map)

            plume_results.append({
                "Plume": i + 1,
                "Location (lat, lon)": (lat, lon),
                "Intersecting Pixels": num_intersecting_pixels,
                "Pixel Value Sum": pixel_value_sum,
                "Adjusted Sum": adjusted_sum
            })
        except Exception as e:
            plume_results.append({"Plume": i + 1, "Location (lat, lon)": (lat, lon), "Status": f"Error: {e}"})
    return plume_results, plume_map

def add_swir_data_to_map(map_object, tiff_path):
    with rasterio.open(tiff_path) as tiff_file:
        swir_data = tiff_file.read(1)
        bounds = tiff_file.bounds
        nodata_value = tiff_file.nodata if tiff_file.nodata is not None else -9999
    masked_data = np.ma.masked_equal(swir_data, nodata_value)
    mean, std = np.nanmean(masked_data), np.nanstd(masked_data)
    lower_bound, upper_bound = mean - 2 * std, mean + 2 * std
    normalized_data = (masked_data - lower_bound) / (upper_bound - lower_bound)
    normalized_data = np.clip(normalized_data, 0, 1)
    cmap = plt.get_cmap("plasma")
    swir_rgb = (cmap(normalized_data)[:, :, :3] * 255).astype(np.uint8)
    image_bounds = [[bounds.bottom, bounds.left], [bounds.top, bounds.right]]
    ImageOverlay(image=swir_rgb, bounds=image_bounds, opacity=1).add_to(map_object)

# Convert the dataset into a DataFrame
model_df = pd.DataFrame(initial_data)

# Function to fit and update the regression model using Model 2
def update_model(df):
    """
    Update the regression model based on the current dataset.

    Args:
    - df: DataFrame containing the plume data.

    Returns:
    - Updated regression model parameters as a dictionary.
    """
    # Prepare features (X) and target (y)
    X = df[["Cross_sectional_Adjusted_Sum", "Wind_speed"]]
    y = df["Emission_rate_kg_h"]

    # Fit Model 2 (simple linear regression)
    reg = LinearRegression()
    reg.fit(X, y)

    # Evaluate the model
    y_pred = reg.predict(X)
    print("Model Evaluation:")
    print(f"Mean Squared Error (MSE): {mean_squared_error(y, y_pred):.2f}")
    print(f"R-squared (R²): {r2_score(y, y_pred):.2f}")

    # Plot actual vs predicted emission rates
    plt.figure(figsize=(8, 6))
    plt.scatter(y, y_pred, color="blue", label="Predicted vs Actual")
    plt.plot([min(y), max(y)], [min(y), max(y)], color="red", label="Ideal Fit Line")
    plt.xlabel("Actual Emission Rate (kg/h)")
    plt.ylabel("Predicted Emission Rate (kg/h)")
    plt.title("Regression Analysis: Actual vs Predicted")
    plt.legend()
    plt.grid()
    plt.show()

    # Return updated model parameters
    return {
        "intercept": reg.intercept_,
        "CS_Sum_coef": reg.coef_[0],
        "Wind_speed_coef": reg.coef_[1],
    }

# Update the model with the initial data
model_params = update_model(model_df)

# Get the center of the SWIR TIFF
center_coords = get_raster_center(swir_diff_path)

# Perform plume analysis, centering the map on the SWIR TIFF
plume_analysis_results, plume_map = analyze_plume_with_cross_section_sum(masked_data, plume_coords, transform, center_coords)

# Add wind speed to each plume analysis result
wind_speed_value = 2.0  # Example wind speed value
for plume in plume_analysis_results:
    if "Adjusted Sum" in plume:
        adjusted_sum = plume["Adjusted Sum"]
        width_px = plume["Intersecting Pixels"]
        emission_rate = (
            model_params["intercept"]
            + model_params["CS_Sum_coef"] * adjusted_sum
            + model_params["Wind_speed_coef"] * wind_speed_value
        )
        plume["Predicted Emission Rate (kg/h)"] = emission_rate

# Convert updated results to a DataFrame
plume_df = pd.DataFrame(plume_analysis_results)
plume_df.set_index("Plume", inplace=True)

# Display updated DataFrame with predicted emission rates
print("Plume Analysis Results with Predicted Emission Rates:")
print(plume_df)

# Add SWIR overlay to the map
add_swir_data_to_map(plume_map, swir_diff_path)

# Add true color image overlay
truecolor_overlay = ImageOverlay(
    image=rgb_data,
    bounds=[[bounds.bottom, bounds.left], [bounds.top, bounds.right]],
    opacity=1,
    interactive=True,
    cross_origin=False,
    zindex=1,
)
truecolor_overlay.add_to(plume_map)

# Add a layer control to toggle map layers
LayerControl().add_to(plume_map)

# Display the map with updated analysis
ipy_display(plume_map)

