## Training Materials Outline: Downloading and Visualizing Satellite Data

This outline covers downloading and visualizing Sentinel-1 and Sentinel-2 data using the Descartes Labs platform and the Copernicus Data Space Ecosystem (CDSE) catalogue.

**I. Introduction**

* Overview of satellite data and its applications.
* Introduction to Descartes Labs and CDSE platforms.
* Explanation of the core workflow for downloading and visualizing satellite imagery.

**II. Descartes Labs Platform: Downloading and Visualizing Sentinel Data**

* **A. Setting up the Environment**
    * Installing necessary libraries: `descarteslabs`, `shapely`, `cartopy`, `geojson`, `fiona`, `gdal`, `h5py`, `matplotlib`, `numpy`, etc.
    * Setting API key for Descartes Labs.

In [None]:
import warnings
warnings.filterwarnings('ignore')
#
import os
import sys
import json
import itertools
import pickle
from pprint import pprint
#
import numpy as np
import shapely
import cartopy
import geojson
import fiona
import gdal
import h5py
import matplotlib.pyplot as plt
import ogr
import math
from scipy import stats
import collections

import descarteslabs as dl

ULU_REPO = os.environ["ULU_REPO"]
sys.path.append(ULU_REPO+'/utils')
sys.path.append(ULU_REPO)
print(sys.path)

import util_vectors
import util_descartes
import util_imagery

* **B. Defining the Study Area**
    * Specifying the location of interest (e.g., 'sitapur').
    * Loading the study area shapefile.
    * Visualizing the study area using `cartopy` and `matplotlib`.
    * Defining data storage paths.

In [None]:
place = 'sitapur'

data_root='/data/phase_iv/'
data_path=data_root+place+'/'

tile_resolution = 5
tile_size = 256
tile_pad = 32

processing_level = None
source = 's2'

bands=['blue','green','red','nir','swir1','swir2','alpha']; suffix='BGRNS1S2A'  # S2, Lx
resolution=tile_resolution  # Lx:15 S2:10

In [None]:
print(place, place.title()) # capitalized version of place name
place_title = place.title()
place_shapefile = data_path+place_title+"_studyAreaEPSG4326.shp"

util_vectors.info_studyareas(data_path, place)

shape = util_vectors.load_shape(place_shapefile)
polygon = shape['geometry']['coordinates']
place_bbox = shape['bbox']

lonlat_crs = cartopy.crs.PlateCarree()
clat, clon = (place_bbox[0]+place_bbox[2])/2.0, (place_bbox[1]+place_bbox[3])/2.0
print("center co-ordinates", clat, clon)
albers = cartopy.crs.AlbersEqualArea(central_latitude=clat, central_longitude=clon)

fig = plt.figure(figsize=(6,6))
ax = plt.subplot(projection=albers) # Specify projection of the map here
shp = shapely.geometry.shape(shape['geometry'])
ax.add_geometries([shp], lonlat_crs)
ax.set_extent((place_bbox[0], place_bbox[2], place_bbox[1], place_bbox[3]), crs=lonlat_crs)
ax.gridlines(crs=lonlat_crs)
plt.show()

* **C. Generating Tiles**
    * Utilizing `dl.raster.dltiles_from_shape` to divide the study area into tiles.
    * Visualizing the tiles overlaid on the study area.

In [None]:
tiles = dl.raster.dltiles_from_shape(tile_resolution, tile_size, tile_pad, shape)
single_tile_id = 22
highlights = {single_tile_id:'green'}
util_vectors.draw_tiled_area(shape, tiles, albers, lonlat_crs, highlights=highlights)

* **D. Searching for Imagery**
    * Specifying the satellite product (e.g., 'sentinel-2:L1C').
    * Defining the time range for the search.
    * Setting cloud cover limits.
    * Using `dl.metadata.search` to query for available imagery.
    * Inspecting metadata of returned images.

In [None]:
product = u'sentinel-2:L1C'
satellite='S2A'

feature_collection = dl.metadata.search(product=[product], start_time='2019-01-01', end_time='2019-11-01', 
                                        cloud_fraction_0=0.5, limit=75, geom=shape['geometry'])
s2_ids = [f['id'] for f in feature_collection['features']]
s2_ids.sort()
print (len(s2_ids), s2_ids)

* **E. Previewing Imagery**
    * Displaying image extents on a map using `dl.metadata.get` and `shapely`.
    * Visualizing RGB and other band combinations using `util_descartes.show_scene`.

In [None]:
s2_imgs = s2_ids[55:56]

fig = plt.figure(figsize=(8, 8))
ax = plt.subplot(projection=albers) # Specify projection of the map here

shapes = []
for s2_img in s2_imgs:
    metadata = dl.metadata.get(s2_img)
    shapes.append(shapely.geometry.shape(metadata['geometry']))

ax.add_geometries(shapes, lonlat_crs, alpha=0.3, color='orange')
ax.add_geometries([shapely.geometry.shape(shape['geometry'])],
                   lonlat_crs, alpha=0.5, color='blue')

union = shapely.geometry.MultiPolygon(polygons=shapes)
bbox = union.bounds
ax.set_extent((bbox[0], bbox[2], bbox[1], bbox[3]),crs=lonlat_crs)
ax.gridlines(crs=lonlat_crs)

plt.show()

In [None]:
util_descartes.show_scene(s2_imgs[:],bands=['red','green','blue','alpha'],scales=[[0,3000],[0,3000],[0,3000],], geom=shape['geometry'], resolution=80)

* **F. Downloading Imagery**
    * Using specific image IDs or search results.
    * Using `util_imagery.download_imagery` to download tiled imagery.
    * Specifying bands to download.

In [None]:
s2_dict = {}
s2_dict['ZZ'] = [u'sentinel-2:L1C:2019-06-18_44RMR_83_S2B_v1']

util_imagery.download_imagery(data_root, place, 's2', bands, shape, tiles, s2_dict, processing_level=processing_level)

* **G. Post-processing (Optional)**
    * Reprojecting downloaded tiles to UTM using `gdalwarp`.
    * Merging tiles into a mosaic using `gdal_merge.py`.

In [None]:
image_suffix = 'ZZ'

pad   = int(tiles['features'][0]['properties']['pad'])
if resolution==10: 
    zfill=3
elif resolution==5:
    zfill=4
elif resolution==2:
    zfill=5    
else:
    raise Exception('bad resolution: '+str(resolution))
for tile_id in range(len(tiles['features'])):
    
    path = data_root+place+'/imagery/'+str(processing_level).lower()+'/'\
        place+'_'+source+'_'+image_suffix+'_'+str(resolution)+'m'+'_'+'p'+str(pad)+'_'+'tile'+str(tile_id).zfill(zfill)+'.tif' 
    os.environ['ZRESULTTILE'] = path
    gdal_info_output = !gdalinfo -proj4 $ZRESULTTILE
    for line in gdal_info_output:
        if "PROJCS" in line:
            print(line)

In [None]:
index_start = 0
index_stop  = 6

In [None]:
for t in range(index_start, index_stop):
    os.environ['ZTILESOURCE'] = data_root+place+'/imagery/'+str(processing_level).lower()+'/'\
        place+'_'+source+'_'+image_suffix+'_'+str(resolution)+'m'+'_'+'p'+str(pad)+'_'+'tile'+str(tile_id).zfill(zfill)+'.tif'
    os.environ['ZTILERESULT'] = data_root+place+'/imagery/'+str(processing_level).lower()+'/'\
        place+'_'+source+'_'+image_suffix+'_'+str(resolution)+'m'+'_'+'p'+str(pad)+'_'+'tile'+str(tile_id).zfill(zfill)+'_warp.tif'
    !gdalwarp -t_srs '+proj=utm +zone=44 +datum=WGS84 +units=m +no_defs ' $ZTILESOURCE $ZTILERESULT

In [None]:
for t in range(index_start, index_stop):
    os.environ['ZTILESOURCE'] = data_root+place+'/imagery/'+str(processing_level).lower()+'/'\
        place+'_'+source+'_'+image_suffix+'_'+str(resolution)+'m'+'_'+'p'+str(pad)+'_'+'tile'+str(tile_id).zfill(zfill)+'_warp.tif'
    os.environ['ZTILERESULT'] = data_root+place+'/imagery/'+str(processing_level).lower()+'/'\
        place+'_'+source+'_'+image_suffix+'_'+str(resolution)+'m'+'_'+'p'+str(pad)+'_'+'tile'+str(tile_id).zfill(zfill)+'.tif'
    !mv $ZTILESOURCE $ZTILERESULT

In [None]:
qmarks = '?????'[0:zfill]
path_template = data_root+place+'/imagery/'+str(processing_level).lower()+'/'\
    place+'_'+source+'_'+image_suffix+'_'+str(resolution)+'m'+'_'+'p'+str(pad)+'_'+'tile'+qmarks+'.tif'
path_destination = data_root+place+'/imagery/'+str(processing_level).lower()+'/'\
    place+'_'+source+'_'+image_suffix+'_'+str(resolution)+'m'+'_'+'p'+str(pad)+'_'+'complete'+'.tif'
!gdal_merge.py -n 255 -a_nodata 255 -o {path_destination} {path_template}

**III. Copernicus Data Space Ecosystem (CDSE) Catalogue: Downloading Sentinel Data**

* **A. Setting up the Environment**
    * Installing necessary libraries: `requests`, `json`, `xml.etree.ElementTree`, `os`, `re`, `sys`, `random`, `pandas`, `numpy`, `rasterio`, `matplotlib`, `pathlib`.
    * Setting CDSE username and password as environment variables.

In [None]:
import requests
import json
import xml.etree.ElementTree as ET
import os
import re
import sys
import random
import pandas as pd
import numpy as np
import rasterio
import matplotlib.pyplot as plt
import matplotlib.image
from rasterio.windows import Window
from pathlib import Path

* **B. Querying the Catalogue**
    * Constructing the OData query URL with search parameters (collection, product type, cloud cover, area of interest, time range).
    * Sending the query using `requests.get` and parsing the JSON response.
    * Refining search results based on cloud cover.

In [None]:
catalogue_odata_url = "https://catalogue.dataspace.copernicus.eu/odata/v1"
collection_name = "SENTINEL-2"
product_type = "S2MSI1C"
max_cloud_cover = 1
aoi = "POLYGON((20.888443 52.169721,21.124649 52.169721,21.124649 52.271099,20.888443 52.271099,20.888443 52.169721))"
search_period_start = "2023-06-01T00:00:00.000Z"
search_period_end = "2023-06-10T00:00:00.000Z"

In [None]:
search_query = f"{catalogue_odata_url}/Products?$filter=Collection/Name eq '{collection_name}' and Attributes/OData.CSC.StringAttribute/any(att:att/Name eq 'productType' and att/OData.CSC.StringAttribute/Value eq '{product_type}') and OData.CSC.Intersects(area=geography'SRID=4326;{aoi}') and ContentDate/Start gt {search_period_start} and ContentDate/Start lt {search_period_end}"
print(f"""\n{search_query.replace(' ', "%20")}\n""")

In [None]:
response = requests.get(search_query).json()
result = pd.DataFrame.from_dict(response["value"])
result.head(3)

In [None]:
search_query = f"{search_query} and Attributes/OData.CSC.DoubleAttribute/any(att:att/Name eq 'cloudCover' and att/OData.CSC.DoubleAttribute/Value le {max_cloud_cover})"
print(f"""\n{search_query.replace(' ', "%20")}\n""")

response = requests.get(search_query).json()
result = pd.DataFrame.from_dict(response["value"])
result.head(3)

* **C. Authentication and Download**
    * Obtaining an authentication token using CDSE credentials.
    * Establishing an authenticated session using `requests.Session`.
    * Retrieving the product manifest (MTD_MSIL1C.xml) using the authenticated session.
    * Parsing the manifest file to locate band files.
    * Downloading individual band files (.jp2) using the authenticated session.

In [None]:
username = os.environ["CDSE_USERNAME"]
password = os.environ["CDSE_PASSWORD"]
auth_server_url = "https://identity.dataspace.copernicus.eu/auth/realms/CDSE/protocol/openid-connect/token"
data = {
    "client_id": "cdse-public",
    "grant_type": "password",
    "username": username,
    "password": password,
}
response = requests.post(auth_server_url, data=data, verify=True, allow_redirects=False)
access_token = json.loads(response.text)["access_token"]

In [None]:
product_identifier = result.iloc[0, 1]
product_name = result.iloc[0, 2]
session = requests.Session()
session.headers["Authorization"] = f"Bearer {access_token}"

In [None]:
url = f"{catalogue_odata_url}/Products({product_identifier})/Nodes({product_name})/Nodes('MTD_MSIL1C.xml')/$value"
response = session.get(url, allow_redirects=False)
while response.status_code in (301, 302, 303, 307):
    url = response.headers["Location"]
    response = session.get(url, allow_redirects=False)
file = session.get(url, verify=False, allow_redirects=True)
outfile = Path.home() / "MTD_MSIL1C.xml"
outfile.write_bytes(file.content)

In [None]:
tree = ET.parse(str(outfile))
root = tree.getroot()
band_location = []
band_location.append(f"{product_name}/{root[0][0][12][0][0][1].text}.jp2".split("/"))
band_location.append(f"{product_name}/{root[0][0][12][0][0][2].text}.jp2".split("/"))
band_location.append(f"{product_name}/{root[0][0][12][0][0][3].text}.jp2".split("/"))

In [None]:
bands = []
for band_file in band_location:
    url = f"{catalogue_odata_url}/Products({product_identifier})/Nodes({product_name})/Nodes({'%2f'.join(band_file)})/$value"
    response = session.get(url, allow_redirects=False)
    while response.status_code in (301, 302, 303, 307):
        url = response.headers["Location"]
        response = session.get(url, allow_redirects=False)
    file = session.get(url, verify=False, allow_redirects=True)
    outfile = Path.home() / band_file[-1]
    outfile.write_bytes(file.content)
    bands.append(str(outfile))
    print("Saved:", band_file[-1])

* **D. Image Processing and Visualization**
    * Cropping and creating image patches using `rasterio`.
    * Creating a true color composite image from RGB bands.
    * Normalizing pixel values and applying gain adjustments.
    * Displaying and saving the image using `matplotlib`.

In [None]:
xsize, ysize = 1000, 1000
xoff, yoff, xmax, ymax = 0, 0, 0, 0
n = 2

for band_file in bands:
    full_band = rasterio.open(band_file, driver="JP2OpenJPEG")
    if xmax == 0:
        xmin, xmax = 0, full_band.width - xsize
    if ymax == 0:
        ymin, ymax = 0, full_band.height - ysize
    if xoff == 0:
        xoff, yoff = random.randint(xmin, xmax), random.randint(ymin, ymax)
    window = Window(xoff, yoff, xsize, ysize)
    transform = full_band.window_transform(window)
    profile = full_band.profile
    crs = full_band.crs
    profile.update({"height": xsize, "width": ysize, "transform": transform})
    with rasterio.open(
        f"{Path.home()}/patch_band_{n}.jp2", "w", **profile
    ) as patch_band:
        patch_band.write(full_band.read(window=window))
    print(f"Patch for band {n} created")
    n += 1

In [None]:
band2 = rasterio.open(f"{Path.home()}/patch_band_2.jp2", driver="JP2OpenJPEG")  # blue
band3 = rasterio.open(f"{Path.home()}/patch_band_3.jp2", driver="JP2OpenJPEG")  # green
band4 = rasterio.open(f"{Path.home()}/patch_band_4.jp2", driver="JP2OpenJPEG")  # red

red = band4.read(1)
green = band3.read(1)
blue = band2.read(1)

gain = 2
red_n = np.clip(red * gain / 10000, 0, 1)
green_n = np.clip(green * gain / 10000, 0, 1)
blue_n = np.clip(blue * gain / 10000, 0, 1)

rgb_composite_n = np.dstack((red_n, green_n, blue_n))

plt.imshow(rgb_composite_n)

matplotlib.image.imsave(f"{Path.home()}/Sentinel2_true_color.jpeg", rgb_composite_n)
print("Saved as:", outfile)

**IV. Advanced Topics (Optional)**

* **A. Training Data Download Pipeline (Sentinel Hub)**
    * Downloading training data plots with specific dimensions.
    * Handling cloud cover and shadows using the `s2cloudless` library and custom functions.
    * Data interpolation and smoothing techniques.
    * Calculating vegetation indices (EVI, BI, MSAVI2, SI).
    * Super-resolution using DSen2 model.

In [None]:
import pandas as pd
import numpy as np
from random import shuffle
from osgeo import ogr, osr
from sentinelhub import WmsRequest, WcsRequest, MimeType, CRS, BBox, constants, DataSource, CustomUrlParam
from s2cloudless import S2PixelCloudDetector, CloudMaskRequest
import logging
from collections import Counter
import datetime
import os
import yaml

import scipy.sparse as sparse
from scipy.sparse.linalg import splu

with open("../config.yaml", 'r') as stream:
        key = (yaml.safe_load(stream))
        API_KEY = key['key'] 
DATA_LOCATION = '../data/train-csv/malawi-train.csv'
OUTPUT_FOLDER = '../data/train-new-shadow/'
EPSG = CRS.WGS84
IMSIZE = 48
existing = [int(x[:-4]) for x in os.listdir("../data/train-new-shadow/") if ".DS" not in x]

In [None]:
# Contents of ../src/slope.py (replace with actual code)
def calcSlope(dem, res_x, res_y, zScale = 1, minSlope = 0.02):
    # some code here
    return slope
# Contents of ../src/utils-bilinear.py (replace with actual code)
def evi(x, append = False):
    # some code here
    return x, amin
def bi(x, append = False):
    # some code here
    return x
def msavi2(x, append = False):
    # some code here
    return x
def si(x, append = False):
    # some code here
    return x
# Contents of ../src/dsen2/utils/DSen2Net.py (replace with actual code)
from tensorflow import keras
class s2model(keras.Model):
    # some code here
    pass

In [None]:
# Code from previous cells should be included here for execution
# ... (all code from previous cells related to Advanced Topics)

**V. Conclusion**

* Recap of the key steps.
* Resources for further learning.
* Troubleshooting tips.