# Sentinel-2 Plume Finder Tool

## Overview 
This notebook provides a comprehensive workflow for detecting and analysing methane plumes from oil and gas facilities using Sentinel-2 satellite data. It combines satellite imagery processing, wind speed analysis, and regression modelling to estimate methane emission rates accurately. Key functionalities include:

1. **SWIR Analysis**: Uses Sentinel-2's Short-Wave Infrared (SWIR) bands to detect methane plumes.
2. **Plume Detection and Tagging**: Uses user driven tagging of plume locations.
3. **Regression-Based Emission Estimation**: Employs a XGBoost regression model to estimate methane emission rates based on plume characteristics and wind speed data.
4. **Dynamic Model Updates**: Facilitates the addition of new training data to refine the XGBoost model for improved predictions.
5. **Interactive Visualisation**: Creates an interactive map to visualise SWIR-derived plumes, and provides an estimate of their emission rates in kg/h.

This tool is designed for researchers, policymakers, and environmental analysts aiming to quantify and monitor methane emissions efficiently.

The section below imports the packages needed to run the script.

In [None]:
# Connecting to Sentinel-2 data
import openeo

# Available Date finder imports
import requests
import pandas as pd

# SWIR and Truecolour processing imports
import numpy as np
import geopandas as gpd
import rasterio
from rasterio.features import geometry_mask
from rasterio.warp import calculate_default_transform, reproject, Resampling

# Interactive Maps and Visualisation
import folium  # For creating interactive maps
from folium import Map, LayerControl, LatLngPopup, Rectangle  # Map features and interactions
from folium.raster_layers import ImageOverlay  # Overlay raster images on maps
from folium import FeatureGroup  # For grouping map layers
import matplotlib.pyplot as plt  # For plotting and visualisation

# Wind Speed imports
import cdsapi 
from tempfile import NamedTemporaryFile  
import xarray as xr

# Plume analysis imports
from scipy.ndimage import label  # For segmentation and labelling of regions
from scipy.spatial import ConvexHull  # For calculating convex hulls of shapes
from scipy.spatial.distance import pdist

# Imports related to predictive model  
from sklearn.decomposition import PCA 
from sklearn.preprocessing import StandardScaler
from xgboost import XGBRegressor

## Connect to OpenEO

The code below establishes a connection with the Copernicus openEO platform which provides a wide variety of earth observation datasets

- If this does not read as 'Authorised successfully' or 'Authenticated using refresh token', then please ensure that you have completed the setup steps as outlined in section 2.3.6 of the how to guide. 

- If you have followed the steps in section 2.3.6 correctly and the problem persists, please look at https://dataspace.copernicus.eu/news for any information about service interruptions. 

- If there is no news of service problems you can raise a ticket here: https://helpcenter.dataspace.copernicus.eu/hc/en-gb/requests/new

In [None]:
connection = openeo.connect(url="openeo.dataspace.copernicus.eu")
connection.authenticate_oidc()

## Dispaly Field Names

This loads the oil and gas field list. Hassi Messaoud is site 86. If you are interested in a different field, please look-up its id number. 


In [None]:
studysite_csv = pd.read_csv(r'C:\GIS_Course\Methane_Point_Detection\Sentinel-2_Algeria_Methane\Data\Algerian_Oil_and_Gas_Fields.csv')
pd.set_option('display.max_rows', None)
print(studysite_csv.to_string(index=False))

# Site selection

In the code box below, specify the field number we are interested in for analysis. 

<p style="text-align: center;"><b>site_id</b> = 86</p>

In [None]:
site_id = 86  # Specify the oil and gas field ID for the field you want to examine.

# Retrieve the name of the field from the dataset
field_name = studysite_csv[studysite_csv['id'] == site_id].iloc[0]['name']

# Confirmation message
print(f"Site {site_id} ({field_name}) loaded correctly.")

# Multi-Band Multi-Pass Analysis

Varon et al. (2021) showed that methane plumes from point sources could be imaged by differencing Sentinel-2’s SWIR-1 and SWIR-2 bands. The tool runs an analysis using a  multi-band-multi-pass retrieval method: 

First it calculates a multi-band-single-pass calculation for both active emission and no emission dates, resulting in two datasets which are then used together for a multi-band-multi-pass method. 
The multi-band-single-pass equation is as follows: 


$$ MBSP = \frac{B12 - B11}{B11} $$


Where:
- $B12$ is the Sentinel-2 SWIR band 12.
- $B11$ is the Sentinel-2 SWIR band 11.

Once active emission and no emission scenes have been calculated, the following equation is used to calculate the multi-band-multi-pass raster. 

$$ MBMP = ActiveMBSP − NoMBSP $$

Where:
- $ActiveMBSP$ is the multiband single pass for the active emission scene
- $NoMBSP$ is the multiband single pass for the no emission scene.

The active emission scene and no emission scene are considered in this analysis to be one satellite pass apart unless there is a large amount of interference from features such as clouds or other plumes, in which case an earlier date should be selected.

A final step in this analysis that has been added to scale the MBMP dataset mean to zero and all other valid pixel values by that amount. This has been done to account for seasonal variations in solar radiation levels that may affect the measurements of this tool. 

To begin this process we need to determine what days have available satellite  data. 

# Available dates for the analysis. 

Sentinel 2 provides data aproximately once every 2 - 3 days, so not every date you can input is valid. The code below will tell you what dates are available to use for the oil/gas field of your choice. 

The one parameter you need to modify before running the code is: 

- <b>temporal_extent</b> = ["2020-01-01", "2020-01-31"] (change this to your chosen date range using "YYYY-MM-DD" format.)

Once you have done this run the code and the available dates should appear below in a matter of seconds. 

In [None]:
# Specify the date range you want to check for available data.
temporal_extent = ["2020-12-20", "2021-01-31"] 

def get_spatial_extent(site_id):
    site = studysite_csv[studysite_csv['id'] == site_id].iloc[0]
    return {
        "west": site['west'],
        "south": site['south'],
        "east": site['east'],
        "north": site['north']
    }

def fetch_available_dates(site_id, temporal_extent):
    spatial_extent = get_spatial_extent(site_id)
    catalog_url = f"https://catalogue.dataspace.copernicus.eu/resto/api/collections/Sentinel2/search.json?box={spatial_extent['west']}%2C{spatial_extent['south']}%2C{spatial_extent['east']}%2C{spatial_extent['north']}&sortParam=startDate&sortOrder=ascending&page=1&maxRecords=1000&status=ONLINE&dataset=ESA-DATASET&productType=L2A&startDate={temporal_extent[0]}T00%3A00%3A00Z&completionDate={temporal_extent[1]}T00%3A00%3A00Z&cloudCover=%5B0%2C{cloud_cover}%5D"
    response = requests.get(catalog_url)
    response.raise_for_status()
    catalog = response.json()
    dates = [date.split('T')[0] for date in map(lambda x: x['properties']['startDate'], catalog['features'])]
    return dates
 
cloud_cover = 5 # Specifies that only scenes with >5% cloud cover are chosen

available_dates = fetch_available_dates(site_id, temporal_extent)
print("Available dates:", available_dates)

## Choosing the "Active Emission" Date

A so called active emission date must be chosen from one of the available datasets. This will be the chosen day we are looking for plumes.  

Like before, the one parameter you need to modify before running the code is:

<p style="text-align: center;"><b>temporal_extent</b> = ["2020-01-17", "2020-01-17"]</p>

Change this to your chosen date range using "YYYY-MM-DD" format. 

Please note that the temporal extent dates <b><u>MUST BE IDENTICAL</u></b> because we are only choosing a single date.

If you recieve an error message of 'NoDataAvailable' then please check the list of available data above and try again.

In [None]:
active_temporal_extent = ["2019-11-20", "2019-11-20"] # Enter parameters for the active emission day

def active_emission(site_id, active_temporal_extent):
    site = studysite_csv[studysite_csv['id'] == site_id].iloc[0]

    active_emission = connection.load_collection(
        "SENTINEL2_L2A",
        temporal_extent=active_temporal_extent,
        spatial_extent={
            "west": site['west'],
            "south": site['south'],
            "east": site['east'],
            "north": site['north']
        },
        bands=["B11", "B12"],
    )
    active_emission.download("Sentinel-2_active_emissionMBMP.Tiff")

active_emission(site_id, active_temporal_extent)


## Choosing the "No Emission" Date

Next we choose the no emission date using the same process. This is the dataset we will compare the "Active Emission" one too. The recommended choice is the satelite overpass immediately before the "Active Emission" one. 

<b>If your active emission day is 2020-01-17, it is suggested that your no emission day would be 2020-01-14. However, if background values are raised, this may indicate that another no emission day should be chosen to get a better reading</b>

The one parameter you need to modify before running the code is:

<p style="text-align: center;"><b>temporal_extent</b> = ["2020-01-14", "2020-01-14"]</p>

The temporal extent dates <b><u>MUST BE IDENTICAL</u></b>

If you receive an error message of 'NoDataAvailable' then please check the list of available data above and try again.


In [None]:
no_temporal_extent = ["2019-11-18", "2019-11-18"] # Enter parameters for the no emission day

def no_emission(site_id, temporal_extent):
    site = studysite_csv[studysite_csv['id'] == site_id].iloc[0]

    no_emission = connection.load_collection(
        "SENTINEL2_L2A",
        temporal_extent=no_temporal_extent,
        spatial_extent={
            "west": site['west'],
            "south": site['south'],
            "east": site['east'],
            "north": site['north']
        },
        bands=["B11", "B12"],
    )
    no_emission.download("Sentinel-2_no_emissionMBMP.Tiff")

no_emission(site_id, no_temporal_extent)

## Downloading Background Satelite Image

This section helps with locating the source of the emission by displaying a true colour satelite image of the oil/gas field that the data will be superimposed over. This will help distinguish between true emissions and visual spectrum observable clouds. It is recommended that you choose the same date as your active emission. 

In [None]:
""" The truecolour raster needs to be reprojected to WGS84 line up correctly with the folium map.
The same will be done later for the SWIR dataset.
"""
def reproject_to_epsg4326(data, meta):
    target_crs = "EPSG:4326"
    
    # Calculate transform and metadata for the target CRS
    transform, width, height = calculate_default_transform(
        meta['crs'], target_crs, meta['width'], meta['height'], *meta['bounds']
    )
    
    # Update metadata for the new projection
    new_meta = meta.copy()
    new_meta.update({
        "crs": target_crs,
        "transform": transform,
        "width": width,
        "height": height,
    })
    
    # Prepare an array for reprojected data
    reprojected_data = []
    for i in range(meta['count']):
        # Create an empty numpy array to store the reprojected data for the band
        destination = np.empty((height, width), dtype=data[i].dtype)
        reproject(
            source=data[i],
            destination=destination,
            src_transform=meta['transform'],
            src_crs=meta['crs'],
            dst_transform=transform,
            dst_crs=target_crs,
            resampling=Resampling.nearest
        )
        reprojected_data.append(destination)
    
    return np.array(reprojected_data), new_meta  # Returning as numpy array instead of writing to file


# The truecolour download uses the same date as the active_emission function.
def truecolour_image(site_id, temporal_extent):

    site = studysite_csv[studysite_csv['id'] == site_id].iloc[0]

    truecolour_image = connection.load_collection(
        "SENTINEL2_L2A",
        temporal_extent=temporal_extent,
        spatial_extent={
            "west": site['west'],
            "south": site['south'],
            "east": site['east'],
            "north": site['north']
        },
        bands=["B02", "B03", "B04"],
    )
    
    # Download the true colour image
    file_path = "Sentinel-2_truecolourMBMP.Tiff"
    truecolour_image.download(file_path)
    
    # Read the file into memory
    with rasterio.open(file_path) as src:
        data = [src.read(i) for i in range(1, src.count + 1)]
        meta = src.meta.copy()
        meta['bounds'] = src.bounds

    # Reproject the data in memory
    reprojected_data, reprojected_meta = reproject_to_epsg4326(data, meta)
    
    # Return reprojected data and metadata
    return reprojected_data, reprojected_meta

# Run and store the reprojected image as a variable
temporal_extent = active_temporal_extent
reprojected_image_data, reprojected_image_meta = truecolour_image(site_id, temporal_extent)


## Running Plume Visualiser Analysis
The code below will use the satelite data to display plumes above 2000kg/h in ideal conditions. Provided all the variables above have been run correctly, this next section should take moments to complete. 

In [None]:
# to align the datasets on the folium map
def get_bounds(site_id, csv_path):
    df = pd.read_csv(csv_path)
    site = df[df['id'] == site_id]
    if site.empty:
        raise ValueError(f"Site ID {site_id} not found in the CSV file.")
    site = site.iloc[0]
    return [[site['south'], site['west']], [site['north'], site['east']]]

csv_path = r'C:\GIS_Course\Methane_Point_Detection\Sentinel-2_Algeria_Methane\Data\Algerian_Oil_and_Gas_Fields.csv'
bounds = get_bounds(site_id, csv_path)

# File path definitions
Active_Multiband = "Sentinel-2_active_emissionMBMP.Tiff"
No_Multiband = "Sentinel-2_no_emissionMBMP.Tiff"
output_file = "SWIR_diff.tiff"
masked_output_file = "SWIR_diff_masked_urban.tiff"
urban_geojson = r"C:\GIS_Course\Methane_Point_Detection\Sentinel-2_Algeria_Methane\hassi_messaoud_urban.geojson"

# The main MBMP calculations begin here 
with rasterio.open(Active_Multiband) as Active_img, rasterio.open(No_Multiband) as No_img:
    
    # These divisions convert the Sentinel-2 L1C digital numbers to reflectance data.
    Active_B11 = Active_img.read(1).astype(float) / 10000.0 
    Active_B12 = Active_img.read(2).astype(float) / 10000.0
    No_B11 = No_img.read(1).astype(float) / 10000.0
    No_B12 = No_img.read(2).astype(float) / 10000.0

    #This perfoms two MBSP calculations, one for each satelite pass.
    MBSP_active = (Active_B12 - Active_B11) / Active_B11
    MBSP_no = (No_B12 - No_B11) / No_B11

    #This perfoms the MBMP calculation.
    SWIR_diff = MBSP_active - MBSP_no

# Reproject and save SWIR_diff to EPSG:4326 for the folium map
with rasterio.open(Active_Multiband) as src:
    target_crs = "EPSG:4326"
    transform, width, height = calculate_default_transform(
        src.crs, target_crs, src.width, src.height, *src.bounds
    )
    meta = src.meta.copy()
    meta.update({
        "crs": target_crs,
        "transform": transform,
        "width": width,
        "height": height,
        "count": 1,
        "dtype": SWIR_diff.dtype
    })
    with rasterio.open(output_file, "w", **meta) as dest:
        reproject(
            source=SWIR_diff,
            destination=rasterio.band(dest, 1),
            src_transform=src.transform,
            src_crs=src.crs,
            dst_transform=transform,
            dst_crs=target_crs,
            resampling=Resampling.nearest
        )
"""
Urban areas proved to be a problem whne segmenting the plume from the scene. 
These have been masked but this is an imperfect solution as plume length is 
a strong predictor of emission rate and this can be cut off by the mask. 
It should be assumed that plumes that cross urban areas are underestimated.
"""
# Load GeoJSON and create urban mask
urban_areas = gpd.read_file(urban_geojson)
with rasterio.open(output_file) as src:
    urban_areas = urban_areas.to_crs(src.crs)

    # Rasterize the urban areas
    urban_mask = geometry_mask(
        [feature["geometry"] for feature in urban_areas.to_crs(src.crs).__geo_interface__["features"]],
        out_shape=(src.height, src.width),
        transform=src.transform,
        invert=True
    )

    # Apply the urban area mask to SWIR_diff
    swir_diff = src.read(1)
    swir_diff_masked = np.where((urban_mask) | (swir_diff == -0.0), -32768, -swir_diff)
    swir_diff_masked = np.where(swir_diff_masked > 3000, -32768, swir_diff_masked)
    """ 
    to account for seasonality, the median value of the dataset (i.e. the background) has been 
    adjusted to zero
    """
    
    target_median = 0.0  # Define target median value

    # Compute the current median (ignoring NoData values)
    current_median = np.median(swir_diff_masked[(swir_diff_masked > -3000) & (swir_diff_masked < 3000)])

    # Compute shift needed
    shift_value = target_median - current_median

    # Apply shift to all valid pixels
    swir_diff_masked = np.where(
        (swir_diff_masked > -3000) & (swir_diff_masked < 3000),
        swir_diff_masked + shift_value,
        -32768
    )

    print(f"Adjusting median from {current_median} to {target_median}, shifting by {shift_value}")
    
    # Save the masked SWIR_diff to a new file
    meta = src.meta.copy()
    meta.update(dtype=rasterio.float32, nodata=np.nan)
    with rasterio.open(masked_output_file, "w", **meta) as dest:
        dest.write(swir_diff_masked.astype(rasterio.float32), 1)

# Calculate centre for map using masked SWIR_diff raster bounds
with rasterio.open(masked_output_file) as src:
    map_bounds = src.bounds
    centre_lat = (map_bounds.top + map_bounds.bottom) / 2
    centre_lon = (map_bounds.left + map_bounds.right) / 2

# Create Folium map
m = Map(location=[centre_lat, centre_lon], zoom_start=10, control_scale=True)

# Use the reprojected image stored in memory instead of loading from a file
blue, green, red = reprojected_image_data[0], reprojected_image_data[1], reprojected_image_data[2]

# this is to adjust the brightness of the truecolour image as it can be a little dark sometimes.
brightness_factor = 0.03 # only change this number
blue = np.clip(blue * brightness_factor, 0, 255)
green = np.clip(green * brightness_factor, 0, 255)
red = np.clip(red * brightness_factor, 0, 255)

# Stack bands to create RGB image
rgb = np.dstack((red, green, blue))
rgb = rgb / rgb.max()
rgb = np.log1p(rgb)
rgb = rgb / rgb.max()

# Add true colour image overlay
with rasterio.open(masked_output_file) as src:
    swir_bounds = [[src.bounds.bottom, src.bounds.left], [src.bounds.top, src.bounds.right]]

truecolour_overlay = ImageOverlay(
    name="Truecolour",
    image=rgb,
    bounds=swir_bounds,
    opacity=1,  # Lower opacity for blending with SWIR overlay
    interactive=True,
    zindex=1,  # Lower zindex to place below SWIR overlay
)
truecolour_overlay.add_to(m)

# Load and stretch SWIR_diff for visualization
with rasterio.open(masked_output_file) as src:
    swir_bounds = [[src.bounds.bottom, src.bounds.left], [src.bounds.top, src.bounds.right]]
    swir_data = src.read(1)

    # Mask invalid data and clip negative values
    swir_data = np.ma.masked_invalid(swir_data)
    swir_data = np.ma.masked_where((swir_data <= -3000) | (swir_data >= 3000), swir_data)

    # Calculate mean and std only for valid data
    filtered_swir_data = swir_data[swir_data >= -3000]  # Ignore values below 3000
    mean = np.nanmean(filtered_swir_data)
    std = np.nanstd(filtered_swir_data)
    std_factor = 2  # Stretch factor

    # Calculate stretching bounds within the valid data range
    lower_bound = max(mean - std_factor * std, swir_data.min())
    upper_bound = min(mean + std_factor * std, swir_data.max())

    # Normalize the data to [0, 1]
    normalized_swir_data = np.clip((swir_data - lower_bound) / (upper_bound - lower_bound), 0, 1)

    # Apply colourmap
    cmap = plt.get_cmap('viridis')
    rgb_data = (cmap(normalized_swir_data.filled(0))[:, :, :3] * 255).astype(np.uint8)

# Add SWIR_diff overlay to map
swir_overlay = ImageOverlay(
    name="SWIR Data",
    image=rgb_data,
    bounds=swir_bounds,
    opacity=1,  # Adjust opacity for visibility
    interactive=True,
    zindex=2  # Ensure SWIR overlay is above other layers
)
swir_overlay.add_to(m)

""" 
The next section adds known plume locations as red boxes around the sites. 
"""
# Add GeoJSON data as a layer group
vector_point_path = r"C:\GIS_Course\Methane_Point_Detection\Sentinel-2_Algeria_Methane\Data\known_point_sources.geojson"
gdf = gpd.read_file(vector_point_path)
geojson_layer = FeatureGroup(name="Known Point Sources", show=True)
for _, row in gdf.iterrows():
    lat, lon = row.geometry.y, row.geometry.x
    box_size = 0.002  # Approximate size for 20x20 pixels (adjust if needed)
    bounds = [[lat - box_size, lon - box_size], [lat + box_size, lon + box_size]]
    rect = Rectangle(
        bounds=bounds,
        color="red",
        fill=False,
    )
    geojson_layer.add_child(rect)
geojson_layer.add_to(m)

""" 
This creates a clickable lat long popup event on the 
map that will be used for tagging the plumes
"""
LayerControl().add_to(m)
m.add_child(LatLngPopup())

# Display map
display(m)

## Plume tagging

Over an area the size of an oil and gas field, many objects can erroneously show up as methane like signals if a method like thresholding were used. These include urban areas, agricultural irrigation projects and new constructions. 

To deal with this problem, 11 known plume locations have been programmed into the system which will automatically detect plumes at those locations (Marked as red squares on the map above). Should a plume be located away from these predefined areas, a manual tagging system can be used using the guide below.

![Plume Identification](Data/Plume_Identification.jpg)

| Scenario | True-colour scene                                | MBMP/SWIR scene                                   | CH4 Plume? |
|----------|--------------------------------------------------|--------------------------------------------------|------------|
| A        | Plume visible                                    | No plume visible                                 | No         |
| B        | No plume visible                                 | Bright four-pointed star like diffraction spike  | No         |
| C        | No plume visible                                 | Plume visible with four-pointed diffraction spike| Yes        |
| D        | No plume visible                                 | Plume visible                                    | Yes        |


To tag a plume that is not near one of the predefined pins, click on a plume somewhere along its length, and then copy the given latitude and longitude coordinates into the code box below, using the following format:
<p style="text-align: center;"># User inpitted plumes go below this message.</p>
<p style="text-align: center;">(31.6887, 5.8102),  # User plume 1 (latitude, longitude)</p> <p style="text-align: center;">(31.7910, 5.8263),  # User plume 2 (latitude, longitude)</p> 

Additional lines for more plumes can be added as needed.

In [None]:
plume_coords = [
    (31.6584, 5.9054),  # Site 1 DO NOT DELETE OR MODIFY THESE!
    (31.6174, 5.9671),  # Site 2
    (31.7419, 5.8949),  # Site 3
    (31.7570, 5.9423),  # Site 4
    (31.7341, 5.9670),  # Site 5
    (31.7678, 5.9999),  # Site 6
    (31.7777, 5.9957),  # Site 7
    (31.7975, 6.0109),  # Site 8
    (31.7570, 6.1692),  # Site 9
    (31.8054, 6.1551),  # Site 10
    (31.8640, 6.1733),  # Site 11

    # User inpitted plumes go below this message. Add more lines as needed
    #(31.6231, 5.9601),  # User plume 1
    #(31.7186, 5.9735),  # User plume 2
    #(31.6597, 5.8990),  # User plume 3 
    (31.8580, 6.2153),  # User plume 4 
    ]

## Regression Model

A regression model is a statistical tool used to predict a dependent variable (here, methane emission rate in kg/h or "Q") based on independent variables. It works by identifying relationships in the training data and using these to estimate outcomes for new data.

To train the model, data from methane plumes with emission rates documented in peer-reviewed studies was collected (Gorroño et al., 2023; Pandey et al., 2023; Varon et al., 2021; Wang et al., 2023; Sanchez-Garcia et al., 2021). These plumes were then found using the MBMP Plume Visualiser. Each plume was measured for:

- **CS Sum**: The plume intensity in its cross-section.
- **Plume Length**: The plume's length in pixels.
- **Wind Speed**: ERA5 reanalysis data for the wind speed at the time of observation.

The regression analysis identifies how these factors relate to emission rates, allowing the model to predict methane emissions for other plumes based on their characteristics.

Below are the data that was collected for the regression analysis. The data used for the model as of publication, is listed below.

| Source                            | Long      | Lat       | Date       | Q (kg/h) | C/S Sum   | Wind (m/s) | Length (m) | Width (px) |
|-----------------------------------|-----------|-----------|------------|----------|-----------|------------|------------|------------|
| Gorroño et al., 2023              | 6.1545    | 31.8066   | 2021-08-31 | 5453     | 0.066746  | 4.45       | 16.125     | 7          |
| Pandey et al., 2023               | 6.1736    | 31.8647   | 2020-01-04 | 21000    | 0.574297  | 3.65       | 294.544    | 47         |
| Varon et al., 2021                | 5.9053    | 31.6585   | 2019-11-20 | 8497     | 0.682773  | 0.49       | 106.231    | 38         |
| Sanchez-Garcia et al., 2021       | 6.0015    | 31.769    | 2021-08-19 | 4326     | 0.137231  | 0.96       | 43.174     | 13         |
| Sanchez-Garcia et al., 2021       | 5.9952    | 31.7789   | 2021-08-19 | 2160     | 0.095425  | 0.96       | 13.601     | 11         |
| Sanchez-Garcia et al., 2021       | 6.0107    | 31.7981   | 2021-08-19 | 2757     | 0.100792  | 0.96       | 16.124     | 8          |
| Radman et al. 2023                | 5.9055    | 31.659    | 2020-01-07 | 8240     | 0.692199  | 1.44       | 164.125    | 64         |
| Carbon Mapper Website             | 5.9954    | 31.7775   | 2023-01-31 | 3400     | 0.124114  | 2.3        | 18.788     | 8          |
| Carbon Mapper Website             | 5.9934    | 31.7772   | 2024-09-29 | 3000     | 0.071961  | 8.88       | 20.125     | 7          |
| Naus et al. 2023                  | 6.1684    | 31.7571   | 2020-01-14 | 3700     | 0.332890  | 1.92       | 53.460     | 30         |
| Naus et al. 2023                  | 5.9674    | 31.6172   | 2020-01-02 | 3600     | 0.206687  | 1.33       | 38.013     | 13         |
| Naus et al. 2023                  | 5.9917    | 31.7776   | 2020-08-06 | 4800     | 0.070314  | 5.66       | 18.439     | 8          |
| Naus et al. 2023                  | 5.9987    | 31.7692   | 2020-08-14 | 3400     | 0.100685  | 5.13       | 21.540     | 13         |
| Naus et al. 2023                  | 5.9677    | 31.7341   | 2020-02-28 | 2700     | 0.275828  | 0.22       | 60.745     | 30         |
| Naus et al. 2023                  | 5.9422    | 31.7569   | 2020-02-28 | 2100     | 0.067852  | 0.22       | 8.485      | 6          |
| Naus et al. 2023                  | 5.8986    | 31.66     | 2020-07-30 | 14800    | 0.430201  | 5.51       | 72.173     | 35         |




## How to improve this model

Below more example plumes can be added to improve the model, should more studies become available.

In [None]:
initial_data = {
    "Emission_rate_kg_h": [5453, 21000, 8497, 4326, 2160, 2757, 8240, 3400, 3000, 
                            3700, 3600, 4800, 3400, 2700, 2100, 14800],
    "Cross_sectional_Adjusted_Sum": [0.06674600, 0.57429700, 0.68277300, 0.13723100, 0.09542500, 
                                     0.10079200, 0.69219900, 0.12411400, 0.07196100, 0.33289000, 
                                     0.20668700, 0.07031400, 0.10068500, 0.27582800, 0.06785200, 
                                     0.43020100],
    "Plume_length": [16.125, 294.544, 106.231, 43.174, 13.601, 16.125, 164.125, 18.788, 20.125, 
                     53.460, 38.013, 18.439, 21.541, 60.745, 8.485, 72.173],
    "Width": [7.000, 47.000, 38.000, 13.000, 11.000, 8.000, 64.000, 8.000, 7.000, 
              30.000, 13.000, 8.000, 13.000, 30.000, 6.000, 35.000],
    "Wind_speed": [4.45, 3.65, 0.49, 0.96, 0.96, 0.96, 1.44, 2.3, 8.88, 1.92, 1.33, 
                   5.66, 5.13, 0.22, 0.22, 5.51]
}


## Detemining wind speed

Wind speed is a crucial factor in determining emission rate. This next code box determines the wind speed 10m above the ground on the "Active Emission" date as part of the gas flux calculation using the Climate Data Store API. 

Access to the API requires some intital setup, details of which can be found in this software's accompanying how to guide. Several warning messages will appear but these can be ignored. 



In [None]:
"""
The first function calculates a centroid for the wind speed location 
using the bounding box of the study area as the ERA5 API requires a 
point location. 
"""
def get_location_from_site_id(site_id, csv_path):
    df = pd.read_csv(csv_path)
    site = df[df['id'] == site_id]
    site = site.iloc[0]
    centre_lat = (site['south'] + site['north']) / 2
    centre_lon = (site['west'] + site['east']) / 2
    return {'latitude': centre_lat, 'longitude': centre_lon}

# Get the location for the ERA5 data request
location = get_location_from_site_id(site_id, csv_path)

"""
The cdsapi needs to be set up as per the instructions in the How
to Guide or this will not work!
"""
c = cdsapi.Client()

date = active_temporal_extent[0]  # this takes the date to be the same as the active_emission function. 

"""
This tool is hard coded to retrieve data for 10:00am as Sentinel-2
overpasses occur at around 10:30am If this tool is reconfigured for
another region of the world this may need to be adjusted. 
"""
# Retrieve ERA5 data and store it in a temporary file
with NamedTemporaryFile(suffix='.nc') as tmp_file:
    result = c.retrieve(
        'reanalysis-era5-single-levels',
        {
            'product_type': 'reanalysis',
            'variable': ['10m_u_component_of_wind', '10m_v_component_of_wind'],
            'year': date.split('-')[0],
            'month': date.split('-')[1],
            'day': date.split('-')[2],
            'time': ['10:00'],  # Sentinel 2 overpasses are at around 10:30 am over Algeria. 
            'format': 'netcdf',  # NetCDF format
            'area': [
                location['latitude'] + 0.25, location['longitude'] - 0.25,
                location['latitude'] - 0.25, location['longitude'] + 0.25,
            ],  
        }
    )
    # Download data to the temporary file
    result.download(tmp_file.name)
    
    # Load the dataset with xarray
    ds = xr.open_dataset(tmp_file.name)

"""
ERA5 data provides wind speed in east/west (u10) and north/south (v10). 
Positive u10 and v10 equals a east and north wind. Negative values are
the reverse.  
"""
# Extract u and v components
u10 = ds['u10'].sel(latitude=location['latitude'], longitude=location['longitude'], method='nearest')
v10 = ds['v10'].sel(latitude=location['latitude'], longitude=location['longitude'], method='nearest')

""" 
u10 and v10 form two sides of a right angled triangle so we can calculate
the wind speed using the A^2 + B^2 = C^2 (Pythagoras, 530 BCE). With the 
wind variables this would be: u10^2 + v10^2 = windspeed^2. So this can be 
reconfigured as wind speed = the squareroot of (u10² + v10²).
"""
# Wind speed calculation
wind_speed = np.sqrt(u10**2 + v10**2)

# Extract wind speed value
wind_speed_value = wind_speed.values.item() 
print(f"Wind Speed at 10:00 on {date}: {wind_speed_value:.2f} m/s")


## Running the tagged plume analysis

The next code box analyses methane plumes we tagged earlier and provides the following information:

- **Plume Insights**: Locations, sizes, and predicted methane emission rates (kg/h) visualised on an interactive map and summarised in a table.
- **Model Evaluation**: Details on the regression model used to estimate emissions, including its performance metrics.

There are potentially to variables that can be anjusted should the model not identify plumes adequately. Firstly "window size" which is the size of the search box around the coordinate you selected. If a bright non-plume object is being picked up instead of the plume you clicked this can be reduced. 

        # This defines a 50 pixel by 50 pixel search box
        window_size = 50  <---- change this if needed  

Secondly, for a plume to be counted as a plume a sufficent number of high value pixels need to be adjacent to one another. A cluster size of 60 may mean very small plumes are not picked up by the code. You can reduce the "min_cluster_size" to deal with this but the system will misidentify more features. 

        # Set a minimum threshold for a valid plume detection
        min_cluster_size = 60  <---- change this if needed 


In [None]:
swir_diff_path = r'C:\GIS_Course\Methane_Point_Detection\Sentinel-2_Algeria_Methane\SWIR_diff_masked_urban.tiff'

# Function to open the swir_diff_path file for analysis.
with rasterio.open(swir_diff_path) as tiff_file:
    raster_data = tiff_file.read(1)  # Read the first band
    bounds = tiff_file.bounds
    transform = tiff_file.transform

    # Define nodata_value properly
    nodata_value = tiff_file.nodata  # Extract from metadata if available
    if nodata_value is None:
        nodata_value = -32768  

# Converts raster data into a masked array and hides the no data pixels.
masked_data = np.ma.masked_equal(raster_data, nodata_value)

# Compute min and max from remaining pixels.
lower_bound = masked_data.min()  
upper_bound = masked_data.max()  
normalized_data = (masked_data - lower_bound) / (upper_bound - lower_bound)

# function to centre the folium map on the raster
def get_raster_centre(tiff_path):
    with rasterio.open(tiff_path) as tiff_file:
        bounds = tiff_file.bounds
        centre_lat = (bounds.top + bounds.bottom) / 2
        centre_lon = (bounds.left + bounds.right) / 2
    return centre_lat, centre_lon

""" 
This function sets up a bresenham algorithm to find all the pixel 
coordinates that form a straight line between two points. 
    
    x0, y0: Start point coordinates.
    x1, y1: End point coordinates.

It then records the values of each of those pixels. 
"""
def bresenham_line(x0, y0, x1, y1):
    points = []  # Stores the pixels forming the line
    dx = abs(x1 - x0)
    dy = abs(y1 - y0)
    
    # Determine movement direction (+1 or -1 for each axis)
    step_x = 1 if x0 < x1 else -1
    step_y = 1 if y0 < y1 else -1
    
    # keeps track of how far the current point is from the ideal straight line
    err = dx - dy

    while (x0, y0) != (x1, y1):  # Stop when reaching the endpoint
        points.append((x0, y0))  # Store the current position's value
        
        # double_err is used to decide whether to move horizontally, vertically, or diagonally.
        double_err = err * 2 # 
        if double_err > -dy:
            err -= dy
            x0 += step_x
        if double_err < dx:
            err += dx
            y0 += step_y

    points.append((x1, y1)) 
    return points
    
# This function takes the bresenham_line and then applies it to the masked dataset.
def get_line_pixel_values(start, end, masked_data):
    line_pixels = bresenham_line(int(start[0]), int(start[1]), int(end[0]), int(end[1]))
    pixel_values = [masked_data[row, col] for row, col in line_pixels if 0 <= row < masked_data.shape[0] and 0 <= col < masked_data.shape[1]]
    return pixel_values, line_pixels

""" calculate_plume_dimensions determines:
    - plume_length: The longest distance between any two points in the plume (float).
    - plume_width: The perpendicular width of the plume using PCA (float)."""
def calculate_plume_dimensions(plume_pixels):
    # Compute the convex hull
    hull = ConvexHull(plume_pixels)
    hull_points = plume_pixels[hull.vertices]

    # Compute the plume length as the maximum pairwise distance between hull points
    plume_length = pdist(hull_points).max()

    # Use PCA to determine the major axis of the plume
    pca = PCA(n_components=2)
    pca.fit(plume_pixels)
    main_direction = pca.components_[0]
    perp_direction = np.array([-main_direction[1], main_direction[0]])  # Perpendicular vector

    # Project plume pixels onto the perpendicular vector to compute width
    projections = plume_pixels @ perp_direction
    plume_width = projections.max() - projections.min()

    return plume_length, plume_width

""" analyze_plume_with_cross_section_sum performs the main plume analysis. For each
tagged plume it will return its:
    - C/S Sum
    - Length
    - Width
    - emission rate (Q)
    
It also creates the folium map and a dataframe to show the data"""
def analyze_plume_with_cross_section_sum(masked_data, plume_coords, transform, initial_centre):
    # Folium map perameters
    plume_map = folium.Map(location=initial_centre, zoom_start=11, control_scale=True)
    
    plume_results = [] # A place to store the plume results.

    """Labeled_array identifies plume regions using an absolute SWIR threshold 
    (`absolute_threshold`). This ensures consistent detection across different 
    scenes, independent of pixel distribution. Adjacent pixels are then grouped 
    and labeled as individual plumes."""
    absolute_threshold = 0.009  # pixels at or above this value are marked as plumes
    labeled_array, _ = label(masked_data > absolute_threshold)

    # Loops through each suspected plume location.
    for i, (lat, lon) in enumerate(plume_coords):
        try:
            # Convert Lat/Long to raster grid coordinates.
            row, col = rasterio.transform.rowcol(transform, lon, lat)
            row, col = int(row), int(col)

            # This defines a 50 pixel by 50 pixel search box around the plume coordinate 
            window_size = 50  
            half_window = window_size // 2  

            # Get the boundary of the search area in row/col coordinates
            row_start, row_end = max(0, row - half_window), min(masked_data.shape[0], row + half_window + 1)
            col_start, col_end = max(0, col - half_window), min(masked_data.shape[1], col + half_window + 1)

            # Check if a plume is within the search box.
            plume_label = labeled_array[row, col] 

            # Extract a local window of pixels around (row, col)
            row_start, row_end = max(0, row - half_window), min(masked_data.shape[0], row + half_window + 1)
            col_start, col_end = max(0, col - half_window), min(masked_data.shape[1], col + half_window + 1)
            local_window = labeled_array[row_start:row_end, col_start:col_end]

            """Because of the slightly noisy quality of the data, individual pixels might
            trigger the plume detection code. To minimse the risk of this the following code
            specifies that 20 adjacent plume pixels are required to be counted as a plume. """
            
            # Identify groups of connected plume pixels
            binary_window = (local_window > 0).astype(int)  # Convert plume labels to binary (1 = plume, 0 = no plume)
            labeled_clusters, num_clusters = label(binary_window)  # Label connected components

            # Find the largest connected cluster size in the search box
            cluster_sizes = np.bincount(labeled_clusters.ravel())[1:]  # Ignore background (index 0)
            largest_cluster_size = cluster_sizes.max() if len(cluster_sizes) > 0 else 0

            # Set a minimum threshold for a valid plume detection
            min_cluster_size = 20  # This could be changed to a higher value if false postives persist. 

            if largest_cluster_size >= min_cluster_size:
                # in the event of an overlapping plume, this selects the one with the largest pixel area.
                # the smaller plume will need to be tagged by the user
                unique_labels, counts = np.unique(local_window[local_window > 0], return_counts=True)
                plume_label = unique_labels[np.argmax(counts)] if len(unique_labels) > 0 else 0
            else:
                plume_label = 0  # No valid plume detected

            if plume_label == 0:
                plume_results.append({"Plume": i + 1,"Location": (lat, lon), "Status": "No plume"})
                continue

            # Extract plume pixels.
            plume_region = labeled_array == plume_label
            plume_pixels = np.column_stack(np.where(plume_region))

            # Compute plume width & legnth.
            plume_length, plume_width = calculate_plume_dimensions(plume_pixels)

            # Compute the plume centroid to serve as the cross section point.
            centroid = plume_pixels.mean(axis=0)

            # Draw a perpendicular cross-section line.
            pca = PCA(n_components=2)
            pca.fit(plume_pixels)
            perp_direction = [-pca.components_[0, 1], pca.components_[0, 0]]
            
            # Define the perpendicular line in pixel coordinates.
            perp_line_coords = [
                (centroid[0] - perp_direction[0] * plume_width / 2, centroid[1] - perp_direction[1] * plume_width / 2),
                (centroid[0] + perp_direction[0] * plume_width / 2, centroid[1] + perp_direction[1] * plume_width / 2),
            ]

            # Convert line coordinates to latitude/longitude for mapping
            perp_line_latlon = [
                rasterio.transform.xy(transform, int(pt[0]), int(pt[1])) for pt in perp_line_coords
            ]
            
            # Add the cross-sectional line to the Folium map
            folium.PolyLine(
                locations=[(lat, lon) for lon, lat in perp_line_latlon],
                color="blue",
                weight=2,
                popup=f"Cross Section for Plume {i + 1}"
            ).add_to(plume_map)

            # Get pixel values along the perpendicular line.
            line_pixel_values, line_pixels = get_line_pixel_values(perp_line_coords[0], perp_line_coords[1], masked_data)
            num_intersecting_pixels = len(line_pixels)

            # Compute cross-sectional sum.
            adjusted_sum = sum(abs(value) for value in line_pixel_values)

            # Create a convex hull to outline the plume for visualisation.
            hull = ConvexHull(plume_pixels)
            hull_coords = [(plume_pixels[vertex][0], plume_pixels[vertex][1]) for vertex in hull.vertices]
            hull_latlon = [rasterio.transform.xy(transform, int(pt[0]), int(pt[1])) for pt in hull_coords]

            # Add the plume outline on the map with the site label
            site_label = f"Site {i + 1}" if i < 11 else f"User {i - 10}"
            folium.Polygon(
                locations=[(lat, lon) for lon, lat in hull_latlon],
                color="red",
                weight=3,
                fill=True,
                fill_opacity=0.2,
                popup=site_label,  # Use site label instead of "Plume X"
            ).add_to(plume_map)

            plume_results.append({
                "Plume": i + 1,
                "Location": (lat, lon),
                "Width (px)": num_intersecting_pixels if plume_label > 0 else 0,
                "C/S Sum": adjusted_sum if plume_label > 0 else 0,
                "Length (px)": plume_length if plume_label > 0 else 0,
                "Q (kg/h)": None,  # Placeholder for emission rate
                "Status": "Detected" if plume_label > 0 else "No plume",
            })

  
        # If a plume measurement encounters an error, this will allow any other plumes to be measured.
        except Exception as e:
            plume_results.append({"Plume": i + 1, "Location": (lat, lon), "Status": f"Error: {e}"})
    
    # Return measured plume stats and folium map.    
    return plume_results, plume_map

""" add_swir_data_to_map loads, normalizes, and adds a Short-Wave 
Infrared (SWIR) raster layer to an interactive Folium map."""
def add_swir_data_to_map(map_object, tiff_path):
    with rasterio.open(tiff_path) as tiff_file: 
        swir_data = tiff_file.read(1)
        bounds = tiff_file.bounds
        nodata_value = -32768.0  # NaN value to be ignored. 

    # Mask NaN pixels and normalise data.
    masked_data = np.ma.masked_equal(swir_data, nodata_value)
    filtered_masked_data = masked_data[masked_data >= -3000]  # Ignore values below -3000
    mean, std = np.nanmean(filtered_masked_data), np.nanstd(filtered_masked_data)
    lower_bound, upper_bound = mean - std_factor * std, mean + std_factor * std
    normalized_data = (masked_data - lower_bound) / (upper_bound - lower_bound)
    normalized_data = np.clip(normalized_data, 0, 1)

    # Convert SWIR data to RGB for mapping.
    cmap = plt.get_cmap("viridis")
    swir_rgb = (cmap(normalized_data)[:, :, :3] * 255).astype(np.uint8)
    image_bounds = [[bounds.bottom, bounds.left], [bounds.top, bounds.right]]

    # Add SWIR overlay to the map.
    swir_overlay = ImageOverlay(
        name="SWIR Data",
        image=swir_rgb,
        bounds=image_bounds,
        opacity=1,  # Match Truecolour opacity
        interactive=True,  # Allow user interaction
        zindex=2  # Place above Truecolour
    )
    swir_overlay.add_to(map_object)

# Convert the dataset into a DataFrame
model_df = pd.DataFrame(initial_data)

""" Update_model trains and evaluates a XGBoost regression model 
to predict methane emission rates based on the known plume data shown
earlier."""

def update_model(df):
    # Extract independent variables (X) and dependent variable (y)
    X = df[["Cross_sectional_Adjusted_Sum", "Wind_speed", "Plume_length", "Width"]]  
    y = df["Emission_rate_kg_h"]

    # Apply log transformation to y correctly
    y_log = np.log(y + 1)  # Adding 1 to prevent log(0) errors

    # Standardize the features properly
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    # Train XGBoost model with optimized settings
    xgb_model = XGBRegressor(
        objective="reg:squarederror",
        n_estimators=500,
        learning_rate=0.01,
        max_depth=5,
        colsample_bytree=0.8,
        subsample=0.8,
        gamma=0.1,
        reg_alpha=0.1,
        reg_lambda=0.1,
        random_state=42 # what is the meaning of Life the Universe and Everything?
    )

    # Train the model on the log-transformed y
    xgb_model.fit(X_scaled, y_log)

    return X_scaled, y, scaler, xgb_model  

# Train the improved XGBoost model
X_scaled, y, scaler, xgb_model = update_model(model_df)

# Get the centre of the SWIR TIFF
centre_coords = get_raster_centre(swir_diff_path)

# Perform plume analysis, centreing the map on the SWIR TIFF
plume_analysis_results, plume_map = analyze_plume_with_cross_section_sum(masked_data, plume_coords, transform, centre_coords)

# Add wind speed to each plume analysis result
for plume in plume_analysis_results:
    if plume["Status"] == "Detected":  # Only predict for detected plumes
        adjusted_sum = plume["C/S Sum"]
        plume_length = plume["Length (px)"]
        width_value = plume["Width (px)"]  # Extract width from the plume data

        # Convert input features to a DataFrame to maintain feature names
        input_features = pd.DataFrame([[adjusted_sum, wind_speed_value, plume_length, width_value]], 
                                      columns=["Cross_sectional_Adjusted_Sum", "Wind_speed", "Plume_length", "Width"])
        # Scale the input data using the same scaler
        input_features_scaled = scaler.transform(input_features)
        # Use the trained XGBoost model for prediction
        log_prediction = xgb_model.predict(input_features_scaled)
        # Convert back from log scale
        emission_rate = np.exp(log_prediction[0]) - 1
        # Store the emission rate
        plume["Predicted Emission Rate (kg/h)"] = emission_rate

# Convert updated results to a DataFrame
plume_df = pd.DataFrame(plume_analysis_results)
# Rename the "Plume" index to "Site X" or "User X"
plume_labels = [f"Site {i + 1}" if i < 11 else f"User {i - 10}" for i in range(len(plume_df))]

# Set the custom labels as the new index
plume_df.index = plume_labels

plume_df = plume_df.rename(columns={
    "Cross_sectional_Adjusted_Sum": "C/S Sum",
    "Plume_length": "Length (px)",
    "Width": "Width (px)",
    "Predicted Emission Rate (kg/h)": "Q (kg/h)"
})

# Dataframe column order
column_order = ["Status", "Location", "Width (px)", "C/S Sum", "Length (px)", "Q (kg/h)"]
existing_columns = [col for col in column_order if col in plume_df.columns]
plume_df = plume_df[existing_columns]

# Hiding the dataframe keyfield as it isn't needed
if "Plume" in plume_df.columns: plume_df.drop(columns=["Plume"], inplace=True)

# Display updated DataFrame with predicted emission rates
r_squared = xgb_model.score(X_scaled, np.log(model_df["Emission_rate_kg_h"] + 1))
print(f"Model R-squared (R²): {r_squared:.4f}")
print(f"Active Emission Date: {active_temporal_extent[0]}")
print(f"No Emission Date: {no_temporal_extent[0]}")
print(f"Wind Speed: {wind_speed_value:.2f} m/s")
pd.set_option("display.width", 200)  #Controls dataframe width
pd.set_option("display.max_columns", None)  # "None" puts all a plume's results on one line
# Replace NaN values with an empty string for readability
plume_df = plume_df.fillna("")

# Display the updated DataFrame
print(plume_df)

# Load the true colour image
truecolour_sat = 'Sentinel-2_truecolour_reprojected.Tiff'
img = rasterio.open(truecolour_sat)
blue, green, red = img.read(1), img.read(2), img.read(3)

# Adjust brightness of truecolour image
brightness_factor = 0.03 # only change this if the truecolour image is too dark or bright
blue = np.clip(blue * brightness_factor, 0, 255)
green = np.clip(green * brightness_factor, 0, 255)
red = np.clip(red * brightness_factor, 0, 255)

# Stack bands to create RGB image
rgb = np.dstack((red, green, blue))
rgb = rgb / rgb.max()
rgb = np.log1p(rgb)
rgb = rgb / rgb.max()

# Add true colour image overlay
truecolour_overlay = ImageOverlay(
    name= "Truecolour",
    image=rgb,
    bounds=swir_bounds,
    opacity=1,  # Lower opacity for blending with SWIR overlay
    interactive=True,
    zindex=1,  
)
truecolour_overlay.add_to(plume_map)

# Add SWIR overlay to the map
add_swir_data_to_map(plume_map, swir_diff_path)

# Add a layer control to toggle map layers
LayerControl().add_to(plume_map)

# Display the map with updated analysis
display(plume_map)

## References

1. **Gorroño, J., Varon, D.J., Irakulis-Loitxate, I. and Guanter, L., 2022.** Understanding the potential of Sentinel-2 for monitoring methane point emissions. *Atmospheric Measurement Techniques Discussions, 2022*, pp.1-25.

2. **Pandey, S., van Nistelrooij, M., Maasakkers, J.D., Sutar, P., Houweling, S., Varon, D.J., Tol, P., Gains, D., Worden, J. and Aben, I., 2023.** Daily detection and quantification of methane leaks using Sentinel-3: a tiered satellite observation approach with Sentinel-2 and Sentinel-5p. *Remote Sensing of Environment, 296*, p.113716.

3. **Pythagoras, 530 BCE.** A squared plus B squared equals C squared: The definitive guide to right angled triangles. Ancient Greece: Croton Press.

4. **Naus, S., Maasakkers, J.D., Gautam, R., Omara, M., Stikker, R., Veenstra, A.K., Nathan, B., Irakulis-Loitxate, I., Guanter, L., Pandey, S. and Girard, M., 2023.** Assessing the relative importance of satellite-detected methane superemitters in quantifying total emissions for oil and gas production areas in algeria. Environmental Science & Technology, 57(48), pp.19545-19556.

5. **Radman, A., Mahdianpari, M., Varon, D.J. and Mohammadimanesh, F., 2023.** S2MetNet: A novel dataset and deep learning benchmark for methane point source quantification using Sentinel-2 satellite imagery. *Remote Sensing of Environment, 295*, p.113708.

6. **Varon, D.J., Jervis, D., McKeever, J., Spence, I., Gains, D. and Jacob, D.J., 2020.** High-frequency monitoring of anomalous methane point sources with multispectral Sentinel-2 satellite observations. *Atmospheric Measurement Techniques Discussions, 2020*, pp.1-21.