# Open SAR Toolkit (OST) - Jupyter Notebook
## Sentinel-1 data inventory and download for large-scale time-series/timescan production

This notebook helps you to find specific Sentinel-1 data products for a given area and time of interest.

**The general idea is a 3-step approach:**
- **Step 1:** Data inventory from full catalogue creating a preliminary inventory file of available acquisitions
- **Step 2:** Refine the inventory for homogeneous coverage compatible with the further processing logic
- **Step 3:** Download the data to your local machine/AWS instance

The **output** is either a shapefile, or it can be put into a PostGreSql/PostGIS database (needs to be configured beforehand). The output is fundamental for further steps of data refinement, download and the processing itself.

### 1. (just execute) Add libraries needed to make the notebook work

Please execute this cell (Shift + Enter) in order to add the necessary functionality to make the subsequent commands work.

In [None]:
# Add some standard python libs 
import os

# Add the Open SAR Toolkit libs 
from ost.helpers import scihub, vector as vec
from ost.s1 import search, refine, download

### 2. Create a project directory and sub-folders, where all of the data and metadata will be organized

Each processing should go into a different project folder. Inside a project folder a pre-defined set of subfolders will be created. Within the subfolders, different data products will be stored during the complete workflow (e.g. downloaded products, inventory shapefiles etc...).

Make sure that your project folder is located on a disk with enough disk space available. One Sentinel-1 GRD frame is about 1 GB. All data will be downloaded before processing.

If you are familiar with python, you can also change the subfolders to different places, but you need to keep track of it for the processing (Sentinel-1 large-scale Time-series/Timescan rocessing notebook).

In [None]:
# the main project directory
prjDir = '/home/avollrath/OSTdemo'

#----------------------------------------------
# do not edit this part
os.makedirs(prjDir, exist_ok=True)

# this is where we download the original scenes
dwnDir = '{}/download'.format(prjDir)
os.makedirs(dwnDir, exist_ok=True)

# this folder will be used for the inventory shape files
invDir = '{}/inventory'.format(prjDir)
os.makedirs(invDir, exist_ok=True)

# this folder will be used for the processed data
prcDir = '{}/processed'.format(prjDir)
os.makedirs(prcDir, exist_ok=True)

### 3. Defining the search parameters

Similar to the Copernicus scihub interface you need to define some of the search parameters.
A special feature is that you can search directly for available data of a specific country by using the ISO3 country code. A list of ISO3 country codes can be found here: https://unstats.un.org/unsd/tradekb/knowledgebase/country-code

Also note that except for the output parameter, all parameters can be commented out. In this case a wildcard operator is used to disable the filter of this specific search parameter.

In [None]:
#----------------------------
# Area of interest
#----------------------------

# Here we can either point to a shapefile or as well use 
# an ISO3 country code for the use of national boundaries
# In case you want to check for the whole globe comment out.

#aoi = '/path/to/a/shapefile'     # absolute path to a shapefile 
aoi = 'ECU'                                               # alternative use of ISO3 countrycode 


#----------------------------
# Time of interest
#----------------------------

# Here we set the start and end date for the time period of interest. 
# If you comment both out, the full mission time period up to today will be considered. 
startDate = '2017-01-01'                              # data format (YYYY-MM-DD)
endDate = '2017-12-31'                                # data format (YYYY-MM-DD)

#----------------------------
# Output file/table
#----------------------------

# This can either be a shapefile, in which case it needs to end with '.shp' 
# or point to a existing or non-existing PostGreSql table
output = '{}/fullInventory.shp'.format(invDir)         # name of a PostGreSQL table or a shapefile 


#----------------------------
# Product Type Specification 
#----------------------------

# Here we define what kinds of products we are looking for. You can comment out the ones for which you 
# want to retrieve all types of products.
prdType = 'GRD'                                       # RAW, SLC, GRD or * (for all)
polarisation = '*'                                    # VV, VH, HH, HV or * (for all)
beamMode = 'IW'                                       # IW, EW, SM or * for all

### 4. (just execute) Trigger the search

You **SHOULD NOT** change anything here after. Just execute the cell with Shift+Enter. 
Your _**Copernicus scihub credentials**_ will be asked and you will need a working internet connection.

In case you do not have a scihub account, please go here to register: https://scihub.copernicus.eu/dhus/#/home

**PLEASE NOTE** that OST actually queries the Copernicus Apihub (i.e. a different server), for which user credentials will be transfered only after a week of registration to the standard open scihub. So you may need to wait a couple of days after first registration before it works. For more info, go here:
https://scihub.copernicus.eu/twiki/do/view/SciHubWebPortal/APIHubDescription

In [None]:
# construct the search command (do not change)
aoiStr = scihub.createAoiWkt(aoi)
toiStr = scihub.createToiStr(startDate, endDate)
prodSpecsStr = scihub.createS1ProdSpecs(prdType, polarisation, beamMode)
query = scihub.createQuery('Sentinel-1', aoiStr, toiStr, prodSpecsStr)
uname, pword = scihub.askScihubCreds()

# execute Search
search.s1Scihub(query, output, uname, pword)

### 5. (just execute) Display the initial results

Execute this cell and it will tell you how many scenes have been found and display a map with the AOI and footprints

In [None]:
%matplotlib inline
%pylab inline
pylab.rcParams['figure.figsize'] = (13, 13)

# re-read output file into a GeoDataFrame for further steps
footprintGdf = refine.readS1Inventory(output)

print(' INFO: Found {} products'
      ' for {}'
      ' between {} and {}'.format(len(footprintGdf), aoi, startDate, endDate))

vec.plotInv(aoi, footprintGdf)

### 6. (just execute) Search Refinement

The results returned by the search algorithm on Copernicus scihub might not be 100% appropriate to what we are looking for. In this step we refine the results adressing possible issues and reduce later processing needs.

A first step **splits the data** by **orbit direction** (i.e. ascending and descending) and **polarization mode** (i.e. VV, VV/VH, HH, HH/HV) and checks the coverage for the resulting combinations (e.g. descending VV/VH polarization). If one combination results in a non-full overlap to the AOI, all further steps are disregarded. In case a full coverage is possbile further refinement steps are taken: 

1. Some of the acquisition frames might have been processed and/or stored **more than once** in the ESA ground segment. Therefore they appear twice, with the scene identifier that only changes for the last 4 digits. It is necessary to identify those scenes in order to avoid redundancy. We therefore take the ones with the latest ingestion date to assure the use of the latest processor. 

2. Some of the scenes returned by the search query are actually **not overlapping the AOI**. This is because the search algorithm will actually check for data within a square defined by the outer bounds of the AOI geometry and not the AOI itself. The refinement only takes those frames overlapping with the AOI in order to reduce unnecassary processing later on.

3. In the case of **ascending tracks that cross the equator**, the **orbit number** of the frames will **increase by 1** even though they are practically from the same acquisition. During processing the frames need to be merged and the relative orbit numbers (i.e. tracks) should be the same. The metadata in the inventory is therefore updated in order to normalize the relative orbit number.

4. (optional) The tracks of Sentinel-1 overlap to a certain degree. The data inventory might return tracks that only **marginally cross the AOI**, but there AOI overlap is already covered by the adjacent track. Thus, if tracks do not contribute to the overall overlap of the AOI, they are disregarded.

5. (optional) Some acquisitions might **not cross the entire AOI**. For the subsequent time-series/timescan processing this becomes problematic, since the generation of the time-series will only consider the overlapping region for all acquisitions per track.

6. A similar issue appears when one track **crosses the AOI twice**. In other words some of the frames in the middle of the track are not overlapping the AOI and are already disregarded by step 2. The assembling of the non-subsequent frames during processing would result in a failure. The metadata in the inventory is consequently updated, where the first part of the relative orbit number will be renamed to XXX.1, the second part to XXX.2 and so on. During processing those acquistions will be handled as 2 different tracks, and only merged during the final mosaicking.

7. (optional) A last step is needed to assure that for one mosaic in time that consists of different tracks, is only covered once by each track. 

In [None]:
# do the search refinement and save all mosaic combinations to 
# the inventory folder inside your project directory
invDict, covDict = refine.searchRefinement(aoi, footprintGdf, invDir)

# summing up information
for key in invDict:
    print('')
    print('--------------------------------------------')
    print(' Summing up the info about mosaics')
    print('--------------------------------------------')
    print(' {} mosaics for {}'.format(covDict[key], key))

### 7. Select best combination and visualize the footprints

The above search refinement checks for all possible combinations between orbit direction (i.e. ascending or descending) and polariztaion mode (i.e. single-pol VV, dual-pol VV&VH, single pol HH or dual-pol HH&HV). 

Outside Europe, usually there is only sufficient data for one orbit direction. Based on the above information, choose the best combination, e.g. DESCENDING_VVVH.

In [None]:
# choose the orbit and pol info with most msoaics based
mosaicKey = 'DESCENDING_VVVH'

# do not edit below
if mosaicKey not in invDict.keys():
    print(' ERROR: the combination is not avaiable. ' 
          ' Make sure writing is correct and that sufficient data had been found before.')

else: 
    
    pylab.rcParams['figure.figsize'] = (18, 18)
    vec.plotInv(aoi, invDict[mosaicKey])

### 8. (just execute) Download data

Now that we have a refined selection of the scenes we want to process, we can go on and download them. 

The main entry point is the offcial scihub catalogue from ESA. It is however limited to 2 concurrent donwloads at the same time. 

A good alternative is the download mirror from the Alaska Satellite Facility, which provides the full archive of Sentinel-1 data. In order ot get registered, go on their data portal at https://vertex.daac.asf.alaska.edu and register. If you already have a NASA Earthdata account, make sure you signed the EULA's needed to access the Copernicus data. A good practice is to try a download directly from the vertex data protal, to assure everything works. 

By executing the follwing cell, you can select from which data portal you want to dwonload.

In [None]:
# the OST download routine 
###we need a zip check routine
download.downloadS1(invDict[mosaicKey], dwnDir, concurrent=10)