# API Image Extraction (via Sentinel-2)

This tutorial will move to a more advanced topic: Automated extraction of imagery.

Whereas in the previous class we manually extracted satellite imagery from various sources (e.g. Sentinel Hub or Earth Explorer), today we will focus on how to automatically extract satellite imagery from an Application Programming Interface (API).

You may be surprised to find that you already engage with APIs on a daily basis when carrying out your online activity!

## What is an API?

Collecting satellite imagery via an API makes the process of image collection much easier and more efficient - thankfully!

Although there can be a bit of a leap in terms of technical understanding, especially if you haven't used Python much before. 

Consequently, this entire lecture is spent helping you understand how to extract imagery via an API. 

APIs let a product or service communicate with other products and services without needing to know specifically how each one works. Metaphorically, APIs are sometimes thought of as contracts, with documentation that represents an agreement between parties: If party 1 sends a remote request structured in a particular way, this is how party 2’s software will respond. 

For example, we send a request for certain satellite images to a server holding imagery data, and the requested data is collected on the server, processed, potentially compressed, and then sent (probably over a computer network such as the Internet) to our machine. We do not need to know how the server is setup or operating to make this request. And the server owner can completely change the server setup and operation, without us needing to change our code, all thanks to our handy API. 

Essentially, APIs in this case simplify how we connect to an existing computer infrastructure. Indeed, public APIs in general represent unique societal value because they can simplify and expand how we all connect with data (e.g. the Google Maps API is a popular example).

Watch the video overview below to get more of a flavor:

In [None]:
%%HTML

<iframe width="560" height="315" src="https://www.youtube.com/embed/s7wmiS2mSXY" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

# Sentinelsat

`Sentinelsat` provides us with a very flexible and easy-to-use API which we can use to search, download and retrieve Sentinel satellite images and metadata via the Copernicus Open Access Hub.

See here the codebase available in this Github repo: https://github.com/sentinelsat/sentinelsat
       
They also make a handy readthedocs page available which contains all the info you need to get started in extracting Sentinel-2 imagery: https://sentinelsat.readthedocs.io/en/stable/

## Installation

First, you need to install the `sentinelsat` package in your current environment. 

Just to recap, make sure you activated your conda environment before taking this step, as you want to make sure you install `sentinelsat` in there (e.g. in the `sia` conda environment).


In [None]:
import sys
!{sys.executable} -m pip install sentinelsat

## Required functions

Now you've managed to install `sentinelsat`, we need to import any required package functions which include:
- `SentinelAPI` - Class to connect to Copernicus Open Access Hub, search and download imagery. 
- `read_geojson` - Import a geojson file for use. 
- `geojson_to_wkt` - Convert a GeoJSON object to Well-Known Text. Intended for use with OpenSearch queries. 3D points are converted to 2D.

(see the API reference here for all functions: https://sentinelsat.readthedocs.io/en/stable/api_reference.html)

Let's now import them into our current notebook session:

In [None]:
import sentinelsat
from sentinelsat import SentinelAPI, read_geojson, geojson_to_wkt
from datetime import date

## Registration

You then need to register for the Copernicus Scihub so you have an access account for imagery. 

https://scihub.copernicus.eu/dhus/#/home

Once you've complete this process, you should have a username and a password. Keep these handy, you will need both of them below. 


## API object
We are now going to create an object called `api` using the `SentinelAPI` function. 

You will need to provide your username and password. 

This is so that when you eventually send your request to the server, the server knows who you are, via your existing account. 

In [None]:
# Example
api = SentinelAPI(
    'your_username_goes_here', 
    'your_password_goes_here', 
    'https://apihub.copernicus.eu/apihub'
)
api

You will need to make sure you can pass a geometry to the API, so that images which intersect this boundary can be retrieved. 

The easiest way for us to do this is to import the `gmu.geojson` that has been pre-saved to the shapes folder in the main repo. 

Remember, our path here is going from the directory in which this notebook is saved on the github repo file structure, and then up two folders (`../../`) and into the `shapes` directory (thus, the path is `../../shapes/gmu_geojson.geojson`). 


In [None]:
# Example
path = '../../shapes/gmu_geojson.geojson'
footprint = geojson_to_wkt(read_geojson(path))
footprint

This geojson is then converted into the Well Known Text (WKT) format. 

"Well-known text is a text markup language for representing vector geometry objects. A binary equivalent, known as well-known binary, is used to transfer and store the same information in a more compact form convenient for computer processing but that is not human-readable"

https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry

Thus, we are converting from our human-readable GeoJSON into our WKT machine-readable format using the `geojson_to_wkt` function.

Now we can begin specifying the API query (e.g., what we want to order from the menu).

In this first example, we will use the `api.query` function to search for images intersecting with our polygon, for a specific set of dates, for a specific earth observation platform.

    - Footprint is the polygon boundary.
    - Date is the range of dates.
    - Platformname is the earth observation platform.
    
We are then provided with all the metadata for the relevant images, within the date ranges and other criteria we specified. 

In [None]:
# Example
# Search by polygon, time, and SciHub query keywords
image_metadata = api.query(
    footprint,
    date=('20220718', '20220720'), #date(2015, 12, 29)
    platformname='Sentinel-2'                    
)
image_metadata

We can actually save this information to a dataframe.

Remember that we covered pandas dataframes in previous classes. We can now use that data structure to export this metadata and inspect/interrogate the results.

In [None]:
# Example
import pandas as pd
image_metadata_df = api.to_dataframe(image_metadata)
image_metadata_df.to_csv('metadata.csv')

It is now possible for us to sort these images using the available metadata.

Let us now sort this data based on the cloud cover of each image, and the date the image was ingested. 

In [None]:
# Example
image_metadata_df_sorted = image_metadata_df.sort_values(
    ['cloudcoverpercentage'], ascending=[True])
image_metadata_df_sorted.to_csv('metadata_sorted.csv')

We can now subset this sorted dataframe using the `.head()` function.

In [None]:
# Example
image_metadata_df_sorted = image_metadata_df_sorted.head(1)
len(image_metadata_df_sorted)

Now we can begin to download our imagery by using the `api.download_all` function.

As you will have seen from the metadata, each multispectral image may be >1 GB in size, so it is wise to limit the number of downloads you attempt to obtain at first, until you are sure you know what you need.  

In [None]:
# Example
api.download_all(image_metadata_df_sorted.index)

You can now unzip your downloaded data and inspect the results.

As there is a large file structure, follow these instructions:

- Firstly, navigate to the `GRANULE` folder.
- Then follow through the directories to the `IMG_DATA` folder. 
- Finally, open `T18STJ_20220719T154951_TCI.jp2` in a GIS to inspect (e.g., QGIS).
    
TCI is a True Color Image built from RGB layers (red, green and blue).

## Exercise

Now you have the basic code to be able to extract an image from the sentinel-2 api, follow this exercise, writing your code in the box below:

- Try download a set of images for the GMU campus for the first full week of September 
    (for clarity, the 5th-9th Sept. 2022). 
- Extract the metadata and inspect the available images, downloading the best two images, in terms of cloud cover.
- Next, download the image with the highest vegetation percentage.
- Critically review your results - they might surprise you! Think about some of the issues you may encounter in your own coursework projects. 

In [None]:
# Enter your attempt here


## Bounding box geometry

Often we need to specify a geometry for the area where we want to extract the imagery.

For familiarity, let's create a bounding box around GMU's campus.

You can find the actual shape file in the github repo folder `shapes`.

We will produce a GeoJSON object similar to the bounding box you see below.

In [None]:
from IPython.display import Image
Image("images/gmu.png")

### The GeoJSON structure 

GeoJSON is a format for encoding a variety of geographic data structures.

GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, and MultiPolygon. Geometric objects with additional properties are Feature objects. Sets of features are contained by FeatureCollection objects.

https://geojson.org/


### The GeoJSON Specification (RFC 7946)

In 2015, the Internet Engineering Task Force (IETF), in conjunction with the original specification authors, formed a GeoJSON WG to standardize GeoJSON. RFC 7946 was published in August 2016 and is the new standard specification of the GeoJSON format, replacing the 2008 GeoJSON specification.

In [None]:
# Example
# Here is a point locating GMU's Exploratory Hall
my_point = {
  "type": "Feature", 
  "geometry": {
    "type": "Point",
    "coordinates": [-77.305626, 38.829897]
  },
  "properties": {
    "name": "Exploratory Hall"
  }
}
my_point

You can see that each GeoJSON includes:
- `type` which specifies if the GeoJSON is either a Feature or FeatureCollection.
- `geometry` which contains:
    - `type` referring to seven case-sensitive strings, either "Point", "MultiPoint", "LineString", "MultiLineString", "Polygon", "MultiPolygon", and "GeometryCollection".
    - `coordinates` an array containing the all important geographic coordinates.
- `properties` which contains any affiliated information e.g. an object identifier

Let us now define our polygon bounding box of GMU as a GeoJSON format:

In [None]:
# Example 
my_geojson = {
  "type": "Feature", #let's define our GeoJSON type. As it's a single geometry, it's just a single 'Feature'
  "geometry": {
	"type": "Polygon", #let's define our geometry type, which as we have a square, is a polygon.
	"coordinates": [ #Here are our actual geometry coordinates
	  [
		[
		  -77.3153999999999968,
		  38.8239999999999981
		],
		[
		  -77.2956694620074671,
		  38.8239999999999981
		],
		[
		  -77.2956694620074671,
		  38.8392882996798647
		],
		[
		  -77.3153999999999968,
		  38.8392882996798647
		],
		[
		  -77.3153999999999968,
		  38.8239999999999981
		]
	  ],
	]
  },
    "properties": {'id': 'GMU'}, #And an example geometry ID, although not strictly necessary for this task
}
my_geojson

## Unzipping your downloaded imagery

Before you can begin using your imagery, you should unzip and correctly file your data. 

In the code below you have all the basic processing to:

- Obtain the name of all files in your directory.
- Create a new list to hold the filenames for the files to be unzipped.
- Loop over all files and append only those that are .zip to the new list.
- Create a new folder called `unzipped` for your data, if it doesn't already exist.
- Loop over your files, extracting the data and placing them in the `unzipped` folder. 
- Remove the .zip files once unpacked.


In [None]:
import os
import zipfile

# Get a list of all filenames in our directory
all_filenames_in_folder = os.listdir() 

# Create an empty list for the filenames we want to unzip
filenames_to_unzip = []

# Loop over filenames and put .zip files in our list
for filename in all_filenames_in_folder:
    if filename.endswith('.zip'): # Only let .zip files append 
        filenames_to_unzip.append(filename)

# Let's create a new folder for our unzipped files
folder = 'unzipped'
if not os.path.exists(folder):
    os.mkdir(folder) # Make the folder if it does not exist 

for filename in filenames_to_unzip:
    
    # Unzip the zip file and put it in the 'unzipped' folder
    with zipfile.ZipFile(filename, 'r') as zip_ref:
        zip_ref.extractall(folder)

    os.remove(filename)

## Wrapping up

Once complete, you should be all set to begin exploring your Sentinel-2 imagery - woohoo!

So congratulations, you completed the API tutorial.

This is not an easy topic to get to grips with, but is incredibly powerful if you can learn to master it.

Now you should have a good tool to utilize in your group coursework projects for GGS416.
