# Export Sentinel-2 data for surface solar irradiance model

This Jupyter notebook implements an export pipeline for Sentinel-2 cloud mask images using Google Earth Engine. All subsequent modeling steps are performed in Python, outside Google Earth Engine, using the exported files.

The notebook is designed to be reproducible and can be executed by any user with a Google account and a
Google Cloud project linked to Earth Engine.

---

## Prerequisites

To run this notebook, you need:

1. **A Google account**
2. **Access to Google Earth Engine**
3. **A Google Cloud project linked to Earth Engine**
4. **A local Python environment with the required packages installed**


---

## Step 1 – Create a Google Earth Engine Account

1. Go to the Earth Engine signup page:  
   https://earthengine.google.com/

2. Click **“Sign up”** and log in with your Google account.

3. Request access to **Earth Engine for research / non-commercial use** and wait for approval.

---

## Step 2 – Create a Google Cloud Project

Earth Engine requires a Google Cloud project for authentication and billing association
(no charges are incurred for typical research use).

1. Go to the Google Cloud Console:  
   https://console.cloud.google.com/

2. Create a **new project** and choose a project name.

3. Note the **Project ID** - this will be required for authentication.

---

## Step 3 – Link the Cloud Project to Earth Engine

1. Open the Earth Engine Code Editor:  
   https://code.earthengine.google.com/

2. Click the ⚙️ **Settings** icon (top right).

3. Under **Cloud Projects**, select or add your newly created Google Cloud project.

4. Save the settings.

This links Earth Engine to your Cloud project and enables API access from Python.

---

## Step 4 – Set Up the Local Python Environment

Install the required Python packages (example using `pip`):

```bash
pip install earthengine-api geemap numpy pandas xarray rasterio


In [20]:
import pandas as pd
import ee
import geemap
import math

In [21]:
# Authorization and initialization of the GEE
# OBS: You must create our own project on Google
# Authenticate Earth Engine
ee.Authenticate()
#
# Initialize Earth Engine
my_project_name = 'sample-project-452812' # use here the name of your own project on Google
ee.Initialize(project=my_project_name)

In [22]:
# -------------------------------------------
# Definition of study area 
# -------------------------------------------

# Convert degree to radian
def deg2rad(deg):
    return deg * math.pi / 180

# Define the center of the bounding box (Bergen, Norway)
CENTER_LAT = 60.39
CENTER_LON = 5.33

# Approximate degree adjustments for 100km x 100km box
DEG_LAT_TO_KM = 111.412  # 1 degree latitude at 60° converted to km (https://en.wikipedia.org/wiki/Latitude)
DEG_LON_TO_KM = 111.317 * math.cos(deg2rad(CENTER_LAT))  # 1 degree longitude converted to km
LAT_OFFSET = 12.5 / DEG_LAT_TO_KM  # north/south
LON_OFFSET = 12.5 / DEG_LON_TO_KM  # east/west 

# Define the bounding box
BBOX = {
    "north": CENTER_LAT + LAT_OFFSET,
    "south": CENTER_LAT - LAT_OFFSET,
    "west": CENTER_LON - LON_OFFSET,
    "east": CENTER_LON + LON_OFFSET
}

print(BBOX)

# Define large bounding box for cloud mask retrieval 
# These values are defined specificly for Bergen after analyzing the satellite and solar angles 
# over Bergen at 11 UTC (Sentinel-2 overpass time) and need to be adapted for other regions of the world. 
LSOUTH_OFFSET = 100 / DEG_LAT_TO_KM  # add another 100 km south
LWEST_OFFSET =  20 / DEG_LON_TO_KM  # add 20 km west 
LEAST_OFFSET = 20 / DEG_LON_TO_KM  # add 20 km east 

# Define the bounding box
LBOX = {
    "north": CENTER_LAT + LAT_OFFSET,
    "south": BBOX["south"] - LSOUTH_OFFSET,
    "west": BBOX["west"] - LWEST_OFFSET,
    "east": BBOX["east"] + LEAST_OFFSET
}

print(LBOX)

# Geometry Rectangle small and large region of interest (roi)
bergen_small_roi = ee.Geometry.Rectangle([BBOX["west"], BBOX["south"], BBOX["east"], BBOX["north"]])
bergen_large_roi = ee.Geometry.Rectangle([LBOX["west"], LBOX["south"], LBOX["east"], LBOX["north"]])

Map = geemap.Map(center=[CENTER_LAT, CENTER_LON], zoom=10)

# Add the geometry to the map
Map.addLayer(bergen_small_roi, {"color": "red"}, "Bergen ROI")
Map.addLayer(bergen_large_roi, {"color": "blue"}, "Bergen large ROI")

# Display the weather stations
stations = {
    "Flesland Bergen": (60.292792, 5.222689),
    "Florida": (60.3833, 5.3333)
}

for name, (lat, lon) in stations.items():
    point = ee.Geometry.Point([lon, lat])
    Map.addLayer(point, {"color": "red"}, name)
    
# Add elevation map from Hoydedata National Elevation Project 1m
dsm = ee.Image("projects/sample-project-452812/assets/bergen_dsm_1m_zip").clip(bergen_small_roi)

elevation_vis = {
    'min': 0,
    'max': 800,
    'palette': ['blue', "green", 'brown', 'white']
}

Map.addLayer(dsm, elevation_vis, 'Elevation (DSM 1m)')

# Display the map
Map

{'north': 60.50219617276416, 'south': 60.27780382723584, 'west': 5.10273148294384, 'east': 5.55726851705616}
{'north': 60.50219617276416, 'south': 59.3802344451226, 'west': 4.739101855653983, 'east': 5.920898144346017}


Map(center=[60.39, 5.33], controls=(WidgetControl(options=['position', 'transparent_bg'], widget=SearchDataGUI…

In [None]:
# ------------------------------------
# Preprocess DSM 
# ------------------------------------

# Load original 1 m DSM and clip to ROI
# DSM originally downloaded from https://hoydedata.no/LaserInnsyn2/ and uploaded to Earth Engine as asset
dsm = ee.Image("projects/sample-project-452812/assets/bergen_dsm_1m_zip").clip(bergen_small_roi)

# Reproject DSM to EPSG:4326
dsm_1m = dsm.reproject(crs="EPSG:4326", scale=1)

# Resample to 10m and reproject
dsm_10m = dsm.reproject(crs="EPSG:4326", scale=10)
dsm_10m_mean = (dsm
           .reduceResolution(reducer=ee.Reducer.mean(), maxPixels=1024)
           .reproject(crs="EPSG:4326", scale=10))

# Export 1m DSM to Google Drive
task1 = ee.batch.Export.image.toDrive(
    image=dsm_1m,
    description="DSM_Bergen_1m_WGS84",
    folder="EarthEngineExports",
    fileNamePrefix="bergen_dsm_1m_epsg4326",
    scale=1,
    region=bergen_small_roi,
    crs="EPSG:4326",
    maxPixels=1e13
)

# Export 10m DSM to Google Drive
task2 = ee.batch.Export.image.toDrive(
    image=dsm_10m,
    description="DSM_Bergen_10m_WGS84_mean",
    folder="EarthEngineExports",
    fileNamePrefix="bergen_dsm_10m_epsg4326_reducer_mean",
    scale=10,
    region=bergen_small_roi,
    crs="EPSG:4326",
    maxPixels=1e13
)

# Start both tasks
task1.start()
task2.start()

print("Both exports started. Check the Earth Engine Tasks tab or Google Drive when finished.")

Both exports started. Check the Earth Engine Tasks tab or Google Drive when finished.


In [24]:
# ----------------------------------------------------------------------
# Helper functions for preprocessing and exporting Sentinel-2 images 
# ----------------------------------------------------------------------

# Binary mask for cloud mask
CLD_PRB_THRESH = 40 

# Turn int data to binary using recommended threshold, 
# from s2cloudless paper: https://www.sciencedirect.com/science/article/pii/S0034425722001043?via%3Dihub 
def add_cloud_mask(image):
    cloud_mask = image.select('probability').gt(CLD_PRB_THRESH).rename('cloud_mask')
    # drop band probability immediately for performance reasons
    return image.addBands(cloud_mask).select(['cloud_mask'])

# Count how many pixels are actually in bergen_roi
def has_valid_pixels(img, roi):
    # Reduce over ROI, count non-masked pixels
    count = img.reduceRegion(
        reducer=ee.Reducer.count(),
        geometry=roi,
        scale=img.projection().nominalScale(),
        maxPixels=1e9
    ).values().get(0)  # get the first band's count

    # Set property to image
    return img.set('validPixelCount', count)

def add_date(img):
    # Add YYYY-MM-dd as property
    return img.set('date', img.date().format('YYYY-MM-dd'))

# Combine multiple images into one per day (to cover whole Bergen area in one image)
def daily_mosaic(ic):
    # Get unique dates
    dates = ic.aggregate_array('date').distinct()
    
    def make_mosaic(d):
        d = ee.String(d)
        daily_ic = ic.filter(ee.Filter.eq('date', d))
        
        # Get list of acquisition times
        times = daily_ic.aggregate_array('system:time_start')
        min_time = ee.Number(times.reduce(ee.Reducer.min()))
        max_time = ee.Number(times.reduce(ee.Reducer.max()))
        time_range = max_time.subtract(min_time)
        
        mosaic = daily_ic.mosaic()
        
        return (mosaic
                .set('date', d)
                .set('system:time_start', min_time)   # earliest time
                .set('start_time_range', time_range)) # span of times
    
    return ee.ImageCollection(dates.map(make_mosaic))

In [25]:
# =========================================================
# Export pipeline for Sentinel 2 viewing angles
# =========================================================

# -----------------------------
# Parameters
# -----------------------------
start_date_total = ee.Date("2015-06-01")
end_date_total   = ee.Date("2025-08-31")
batch_months = 6   # adjust for your case
drive_folder = "S2_viewing_angles"

# -----------------------------
# Properties of interest
# -----------------------------
# Bands used in s2cloudless algorithm
bands = ['B1','B2','B3','B4','B5','B6','B7','B8','B8A','B9','B10','B11','B12']
azimuth_keys = [f'MEAN_INCIDENCE_AZIMUTH_ANGLE_{b}' for b in bands]
zenith_keys  = [f'MEAN_INCIDENCE_ZENITH_ANGLE_{b}' for b in bands]
all_keys = azimuth_keys + zenith_keys
all_keys = all_keys + ['system:time_start', 'DATASTRIP_ID', 'DATATAKE_IDENTIFIER', 
                'GRANULE_ID', 'MGRS_TILE', 'PRODUCT_ID', 'SENSING_ORBIT_DIRECTION', 
                'SENSING_ORBIT_NUMBER', 'SPACECRAFT_NAME']  # include timestamp and granule identifiers

# Convert image → feature (table of properties)
def img_to_feature(img):
    props = img.toDictionary(all_keys)
    return ee.Feature(None, props)

# Export FeatureCollection to Drive
def export_table(fc, description):
    task = ee.batch.Export.table.toDrive(
        collection=fc,
        description=description,
        folder=drive_folder,
        fileFormat='CSV'
    )
    task.start()
    print(f"Export started: {description}")

# -----------------------------
# Batch processing and export
# -----------------------------
current_start = start_date_total

while current_start.millis().getInfo() < end_date_total.millis().getInfo():
    # Define batch end date
    current_end = current_start.advance(batch_months, 'month')
    if current_end.millis().getInfo() > end_date_total.millis().getInfo():
        current_end = end_date_total

    print(f"Processing batch: {current_start.format('YYYY-MM-dd').getInfo()} "
          f"to {current_end.format('YYYY-MM-dd').getInfo()}")

    # Load S2 harmonized collection for large ROI
    s2_harm_large_roi = (ee.ImageCollection("COPERNICUS/S2_HARMONIZED") \
        .filterDate(current_start, current_end) \
        .filterBounds(bergen_large_roi)) \
        .select("B1") # select only one band for memory efficiency

    # Convert to feature collection
    fc = s2_harm_large_roi.map(img_to_feature)

    # Export batch table
    export_table(fc, f"S2_viewing_angles_large_{current_start.format('YYYY-MM').getInfo()}")

    # Move to next batch
    current_start = current_end


Processing batch: 2015-06-01 to 2015-12-01
Export started: S2_viewing_angles_large_2015-06
Processing batch: 2015-12-01 to 2016-06-01
Export started: S2_viewing_angles_large_2015-12
Processing batch: 2016-06-01 to 2016-12-01
Export started: S2_viewing_angles_large_2016-06
Processing batch: 2016-12-01 to 2017-06-01
Export started: S2_viewing_angles_large_2016-12
Processing batch: 2017-06-01 to 2017-12-01
Export started: S2_viewing_angles_large_2017-06
Processing batch: 2017-12-01 to 2018-06-01
Export started: S2_viewing_angles_large_2017-12
Processing batch: 2018-06-01 to 2018-12-01
Export started: S2_viewing_angles_large_2018-06
Processing batch: 2018-12-01 to 2019-06-01
Export started: S2_viewing_angles_large_2018-12
Processing batch: 2019-06-01 to 2019-12-01
Export started: S2_viewing_angles_large_2019-06
Processing batch: 2019-12-01 to 2020-06-01
Export started: S2_viewing_angles_large_2019-12
Processing batch: 2020-06-01 to 2020-12-01
Export started: S2_viewing_angles_large_2020-06

In [26]:
# =====================================================================
# Export pipeline for cloud cover information large and small roi
# =====================================================================

# -----------------------------
# Parameters
# -----------------------------
start_date_total = ee.Date("2015-06-01")
end_date_total   = ee.Date("2025-08-31")
batch_months = 3   # small batch size enables faster processing 
drive_folder = "S2_cloud_cover_tables_thresh_40"

# -----------------------------
# Helper functions
# -----------------------------

def add_constant(image): 
    valid_pixel = image.select('cloud_mask').gte(0).rename('valid_pixel')
    return image.addBands(valid_pixel)

# Add cloud cover to both large and small roi 
def add_cloud_cover(img, roi):
    # Assume cloud_mask is a binary mask: 1 = cloud, 0 = clear
    cloud_mask = img.select('cloud_mask').eq(1)

    # Count total pixels 
    total_pixels = img.select('valid_pixel').reduceRegion(
        reducer=ee.Reducer.sum(),
        geometry=roi,
        scale=10,
        maxPixels=1e10
    ).get('valid_pixel')

    # Count cloudy pixels (value = 1)
    cloudy_pixels = cloud_mask.reduceRegion(
        reducer=ee.Reducer.sum(),
        geometry=roi,
        scale=10,
        maxPixels=1e10
    ).get('cloud_mask')  
    
    # Calculate cloud cover % - Avoid division by zero
    cloud_cover = ee.Number(cloudy_pixels).divide(ee.Number(total_pixels)).multiply(100)
        
    # Attach as image property
    return img.set({'cloud_cover': cloud_cover.round(), # round to nearest integer
                    'cloudy_pixels': cloudy_pixels,
                    'total_pixels': total_pixels}) 

# Convert image to feature
def img_to_feature(img): 
    return ee.Feature(None, {'cloud_cover': img.get('cloud_cover'), 
                             'cloudy_pixels': img.get('cloudy_pixels'),
                             'total_pixels': img.get('total_pixels'),
                             'system:time_start' : img.get('system:time_start'),
                             'date': img.get('date'),
                             'start_time_range': img.get('start_time_range')})
    
# Export to Drive
def export_table(fc, description):
    task = ee.batch.Export.table.toDrive(
        collection=fc,
        description=description,
        folder=drive_folder,
        fileFormat='CSV'
    )
    task.start()
    print(f"Export started: {description}")

# -----------------------------
# Batch processing and export
# -----------------------------
current_start = start_date_total

while current_start.millis().getInfo() < end_date_total.millis().getInfo():
    # Define batch end date
    current_end = current_start.advance(batch_months, 'month')
    if current_end.millis().getInfo() > end_date_total.millis().getInfo():
        current_end = end_date_total

    print(f"Processing batch: {current_start.format('YYYY-MM-dd').getInfo()} to {current_end.format('YYYY-MM-dd').getInfo()}")

    # ----------------- Small ROI -----------------
    s2_cloud_small_roi = (ee.ImageCollection("COPERNICUS/S2_CLOUD_PROBABILITY")
        .filterDate(current_start, current_end)
        .filterBounds(bergen_small_roi)
        .map(add_cloud_mask)
        .map(lambda img: has_valid_pixels(img, bergen_small_roi))
        .filter(ee.Filter.gt('validPixelCount', 0))
        .map(add_date)
    )
    
    # Mosaic tiles into one image per day 
    s2_cloud_small_roi = daily_mosaic(s2_cloud_small_roi) \
                        .map(add_constant) \
                        .map(lambda img: add_cloud_cover(img, bergen_small_roi))
    

    fc_small = s2_cloud_small_roi.map(img_to_feature)
    export_table(fc_small, f"S2_cloud_cover_small_thresh_40_{current_start.format('YYYY-MM').getInfo()}")

    # ----------------- Large ROI -----------------
    s2_cloud_large_roi = (ee.ImageCollection("COPERNICUS/S2_CLOUD_PROBABILITY")
        .filterDate(current_start, current_end)
        .filterBounds(bergen_large_roi)
        .map(add_cloud_mask)
        .map(lambda img: has_valid_pixels(img, bergen_large_roi))
        .filter(ee.Filter.gt('validPixelCount', 0))
        .map(add_date)
    )
    
    # Mosaic tiles into one image per day 
    s2_cloud_large_roi = daily_mosaic(s2_cloud_large_roi) \
                        .map(add_constant) \
                        .map(lambda img: add_cloud_cover(img, bergen_large_roi))

    fc_large = s2_cloud_large_roi.map(img_to_feature)
    export_table(fc_large, f"S2_cloud_cover_large_thresh_40_{current_start.format('YYYY-MM').getInfo()}")

    # Move to next batch
    current_start = current_end

Processing batch: 2015-06-01 to 2015-09-01
Export started: S2_cloud_cover_small_thresh_40_2015-06
Export started: S2_cloud_cover_large_thresh_40_2015-06
Processing batch: 2015-09-01 to 2015-12-01
Export started: S2_cloud_cover_small_thresh_40_2015-09
Export started: S2_cloud_cover_large_thresh_40_2015-09
Processing batch: 2015-12-01 to 2016-03-01
Export started: S2_cloud_cover_small_thresh_40_2015-12
Export started: S2_cloud_cover_large_thresh_40_2015-12
Processing batch: 2016-03-01 to 2016-06-01
Export started: S2_cloud_cover_small_thresh_40_2016-03
Export started: S2_cloud_cover_large_thresh_40_2016-03
Processing batch: 2016-06-01 to 2016-09-01
Export started: S2_cloud_cover_small_thresh_40_2016-06
Export started: S2_cloud_cover_large_thresh_40_2016-06
Processing batch: 2016-09-01 to 2016-12-01
Export started: S2_cloud_cover_small_thresh_40_2016-09
Export started: S2_cloud_cover_large_thresh_40_2016-09
Processing batch: 2016-12-01 to 2017-03-01
Export started: S2_cloud_cover_small_th

In [None]:
# =============================================================================
# Pipeline for exporting large roi cloud mask images for shadow computation 
# =============================================================================

# -----------------------------
# Parameters
# -----------------------------
CLD_PRB_THRESH = 40
mixed_thresh = 1
overcast_thresh = 99
drive_folder = "S2_cloud_mask_daily_large_ROI_thresh_40"
cloud_cover_filepath = "../../data/processed/s2_cloud_cover_table_small_and_large_with_stations_data.csv"

# Read cloud cover table and extract observations with mixed sky type 
cloud_cover_table = pd.read_csv(cloud_cover_filepath)
cloud_cover_mixed = cloud_cover_table[
    (cloud_cover_table["cloud_cover_large"] > mixed_thresh) &
    (cloud_cover_table["cloud_cover_large"] < overcast_thresh)
]

# Unique dates for mixed sky days
unique_dates = cloud_cover_mixed['date'].unique()

# -----------------------------
# Helper functions
# -----------------------------
def export_image(image, description):
    task = ee.batch.Export.image.toDrive(
        image=image,
        description=description,
        folder=drive_folder,
        fileNamePrefix=description,
        region=bergen_large_roi,
        scale=10,
        maxPixels=1e13
    )
    task.start()

# -----------------------------
# Export loop for unique dates
# -----------------------------
for date_str in unique_dates:
    print(f"Processing date: {date_str}")

    start_date = ee.Date(date_str)
    end_date = start_date.advance(1, 'day')

    # Load image collection for this batch
    s2_cloud_ic = (ee.ImageCollection("COPERNICUS/S2_CLOUD_PROBABILITY")
                   .filterDate(start_date, end_date) # treats end-date exclusively
                   .filterBounds(bergen_large_roi)
                   .map(add_cloud_mask)
                )

    # Check if there is any valid image
    n_images = s2_cloud_ic.size().getInfo()
    if n_images == 0:
        print(f"⚠️ No valid image for {date_str}, skipping...")
        continue

    # Create daily mosaic
    daily_image = s2_cloud_ic.mosaic().set('date', date_str)

    # Export
    description = f"S2_cloud_mask_large_{date_str}"
    export_image(daily_image, description)
    print(f"Exporting {description}...")


Processing date: 2015-07-29
Exporting S2_cloud_mask_large_2015-07-29...
Processing date: 2015-08-08
Exporting S2_cloud_mask_large_2015-08-08...
Processing date: 2015-08-21
Exporting S2_cloud_mask_large_2015-08-21...
Processing date: 2015-08-28
Exporting S2_cloud_mask_large_2015-08-28...
Processing date: 2015-08-31
Exporting S2_cloud_mask_large_2015-08-31...
Processing date: 2015-09-17
Exporting S2_cloud_mask_large_2015-09-17...
Processing date: 2015-10-20
Exporting S2_cloud_mask_large_2015-10-20...
Processing date: 2015-11-16
Exporting S2_cloud_mask_large_2015-11-16...
Processing date: 2016-02-04
Exporting S2_cloud_mask_large_2016-02-04...
Processing date: 2016-02-14
Exporting S2_cloud_mask_large_2016-02-14...
Processing date: 2016-03-05
Exporting S2_cloud_mask_large_2016-03-05...
Processing date: 2016-03-25
Exporting S2_cloud_mask_large_2016-03-25...
Processing date: 2016-04-27
Exporting S2_cloud_mask_large_2016-04-27...
Processing date: 2016-05-07
Exporting S2_cloud_mask_large_2016-0