# Querying Satellite Data

## ❓ Questions
-  Where can I find open-access satellite data?
-  How do I search for satellite imagery?
-  How do I fetch remote raster datasets using Python?


## ❗ Objectives
-  Search public STAC repositories of satellite imagery using Python.
-  Inspect a search result’s metadata.
-  Download (a subset of) the assets available for a satellite scene.
-  Open satellite imagery as raster data and save it to disk.

---

# Introduction
A number of satellites take snapshots of the Earth’s surface from space. The images recorded by these remote sensors represent a very precious data source for any activity that involves monitoring changes on Earth. 

Satellite imagery is typically provided in the form of geo-spatial raster data, with the measurements in each grid cell (“pixel”) being associated to accurate geographic coordinate information.

In this lesson we will explore how to access open satellite data using Python. In particular, we will consider the Sentinel-2 data collections hosted at SARA and AWS. This dataset consists of multi-band optical images acquired by the two satellites of the Sentinel-2 mission and it is continuously updated with new images.

## API's
An API is an Application Programming Interface.   

It is a way of having one application talk (interface) with another application in a pre-defined way. For what we're doing, these will be using web addresses and JSON data, and will be handled by our library, `EODAG`.   

We will:
1. First initialise `EODAG`.
2. Give it the information it needs.
3. Ask it to send send the request.

`EODAG` will:
1. Receive the information.
2. Interpret it for us.

A useful resource will be the [EODAG documentation on searching](https://eodag.readthedocs.io/en/stable/notebooks/api_user_guide/4_search.html).


# Authenticating

In [None]:
import os
from getpass import getpass

workspace = './workshop_data'
os.environ["EODAG__SARA__AUTH__CREDENTIALS__USERNAME"] = ""
os.environ["EODAG__SARA__AUTH__CREDENTIALS__PASSWORD"] = getpass("Enter Password")
os.environ["EODAG__SARA__DOWNLOAD__OUTPUTS_PREFIX"] = os.path.abspath(workspace)

# Initialising EODAG

In [None]:
from eodag import EODataAccessGateway
from eodag import setup_logging
setup_logging(2)

dag = EODataAccessGateway()
dag.set_preferred_provider("sara")

# Exploring EODAG

In [None]:
# List sara products
dag.list_product_types("sara")

In [None]:
# List available product types of S2 L2A (S2_MSI_L2A)
dag.available_providers("S2_MSI_L2A")

## Search Criteria
We're going to use an advanced feature of Python and set our search criteria out in advance with a dictionary.

In [None]:
lonmin = 116.2
lonmax = 116.5
latmin = -31.5
latmax = -32

default_search_criteria = {
    "productType": "S2_MSI_L2A",
    "start": "2023-08-24",
    "end": "2023-08-26",
    "geom": {"lonmin": lonmin,  "lonmax": lonmax, "latmin": latmin, "latmax": latmax},
    #"cloudCover": 15 # CC < 15
}

# Pagination - A common pitfall
When doing a web-search, one does not get all two billion possible results sent to them. This would not only take a long time to be sent, but most of these would never be used. To alleviate this, most services (such as Google, Bing, X, Facebook etc.) use a technique called `pagination`. While you used to get about 10-20 results per page and had to click next page, now if as you scroll down a page, you look at the side bar, you will see it move up and get smaller now and again.   

This is because your web-browser understands it's nearing the end of what's available, and sends a request to the website for more information. The website then sends the next 'page'.   

The web-based API's we use will also send us a page at a time. Usually they can let us tell them how long a page should be, up to a maximum.    
If you're an advanced user and care about latency, you may want to load a page at a time, in which case use the `dag.search()` function (refer to the documentation).  

If you don't care, and just want all the results like we do, use the `dag.search_all()` function.


In [None]:
all_products = dag.search_all(**default_search_criteria) 
all_products

## Exploring the results
Let's open up the results and explore them!

In [None]:
product = all_products[0]
product

In [None]:
# geometry, geometry.bounds, geometry exterio coords xy....
all_products[0].geometry

In [None]:
# A few other key pieces of info
product.provider, product.product_type, product.search_kwargs

In [None]:
# Use pretty print (pprint) to print out the properties
from pprint import pprint

pprint(product.properties)

# Viewing thumbnails
One of the product properties is the 'thumbnail'. These are smaller versions of the entire image you can use to quickly identify any issues, before downloading the product.

Let's view these for each of the products. For this, we will use a library called `matplotlib`.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

for product in all_products:
    quicklook_path = product.get_quicklook()
    img = mpimg.imread(quicklook_path)
    plt.imshow(img)
    plt.title(f"Image {product.properties['title']}")
    plt.show()

# Folium
Folium is a great library for visualising satellite imagery!

In [None]:
import folium

In [None]:
geometry = default_search_criteria["geom"]
geometry

In [None]:
# Create a map zoomed over the search area
fmap = folium.Map(location=[(latmin + latmax)/2, (lonmin + lonmax)/2], zoom_start=7)

# Add a GeoJson of the Search Results
folium.GeoJson(
    data=all_products,  # SearchResult has a __geo_interface__ interface used by folium to get its GeoJSON representation
    tooltip=folium.GeoJsonTooltip(fields=["title"])
).add_to(fmap)

# Add a Rectangle of our search
folium.Rectangle(
    bounds=[[latmin, lonmin], [latmax, lonmax]],
    color="red",
    tooltip="Search extent"
).add_to(fmap)

fmap

# Authentication & Downloading from THREDDS
If you have issues with authentication from AusCopHub, you can download straight from the same repository at NCI's THREDDS Dataserver.

In [None]:
pprint(product.properties)

## Download
Now that we have queried available files, let's download one of the files.

In [None]:
file_downloaded = all_products[0].download()


In [None]:
from nci_downloader import download_product_thredds
file_list = []

file_downloaded = download_product_thredds(all_products[0], './data')


## Storing our downloaded filenames for future notebooks
Let's now store the file we downloaded in a text file for future use.

In [None]:
product_directory = file_downloaded
dir_text_filename = "product_dir.txt"

with open(dir_text_filename, 'w') as f:
    f.writelines(product_directory)

# Other options
Another valid choice for downloading satellite data using python is [pystac-client.](https://pystac-client.readthedocs.io/en/latest/quickstart.html). 

PySTAC takes a bit more knowhow of API's to use, but gives you more freedom as well.  
PySTAC also supports COGs, which are currently only supported by the bleeding edge `eodag-cube` library rather than `eodag` itself.  
SARA and NCI (the locations we're getting satellite data from for this workshop) supply Sentinel-2 data as zip-files rather than COGs, hence we chose to simply use EODAG for this lesson.