## Downloading rasters from the WAPOR data portal

#### Introduction

The waterpip package is built around the retrieval and analysis of rasters from the WAPOR portal. The download of data from the wapor portal using the waterpip package is carried out using the script:

*waterpip\scripts\retrieval\wapor_retrieval.py*

This notebook guides you through that first important step the downloading of data using the class **WaporRetrieval** found in the script *wapor_retrieval.py*. 

### **Steps**:<br>

1. Importing of the modules and functions needed<br><br> 

2. Get a download api token from the WAPOR portal<br><br> 

3. activating/initiating the class **WaporRetrieval**: This python class holds all the functions used to interact with the WAPOR portal and retrieve information from it. It is built on top of the class **WaporAPI** originally written by Bich Tran at IHE Delft for the various open source WAPOR packages released by IHE DELFT.<br><br>  

4. running of the function *download_wapor_rasters*: This function donwloads rasters form the WAPOR portal according to the users requirements and processes, masks and stores them accordingly.<br><br>  

5. Find where the retrieved data was stored<br><br> 

6. Visualise the data

NOTE: If this is your first time running this please read the instructions below and follow the steps, otherwise feel free to use the notebook as you wish.

***

NOTE: Reading the following is not required but it is advised

## A quick guide to the waterpip package scripts and the automatic folder structure used in the classes.

#### Scripts:

When you run the functions in some of the scripts found in this package, specifically the ones found in the classes the files used/made are stored and retrieved using a standardised process. That means the file names and paths as well as the folder names and paths are standardised. This is neccescary to automate alot of the process when using the pipelines provided. 

This standardised structure is specified in the class **WaporStructure** found in the file *waterpip\scripts\structure\wapor_structure.py* if you want to take a 
closer look. 

to put it simply all the scripts containing classes follow the standardised file and folder structure set by **WaporStructure**:

- **WaporStructure**: *waterpip\scripts\structure\wapor_structure.py*
- **WaporRetrieval**: *waterpip\scripts\retrieval\wapor_retrieval.py*
- **WaporAnalysis**: *waterpip\scripts\analysis\wapor_analysis.py*

while all the general functions (tools) that support these classes can also be used outside them however the user wants: 

- *waterpip\scripts\support\raster.py*
- *waterpip\scripts\support\vector.py*
- *waterpip\scripts\support\statistics.py*

so feel free to use them however you want

#### Automatic folder structure:

all downloaded and created files are stored in subdirectories under the wapor directory specified on class activation. File names are also automatcially made using 
the inputs given by the user. 

The main folder stucture is as follows:

- **main directory** : working directory specified by the user <br>

    - *<user_specified_waterpip_directory>*<br><br> 

- **metadata folder** : directory made to hold general wapor information (metadata) that is used during any downloads and could be useful for the user to check out. like the wapor catalogue <br>

    - *<user_specified_waterpip_directory>\metadata*<br><br> 

- **wapor level** : directory specific to the project or analysis the user is carrying out. In most cases it is suggested that this relates to a specific area/region/boundingbox <br>

    - *<user_specified_waterpip_directory>\<user_specified_project_name>*<br><br> 

- **wapor level** : wapor level specific directory within the project name directory used to split the data between wapor levels. Each analysis is wapor specific (1,2 or 3) <br>

    - *<user_specified_waterpip_directory>\<user_specified_project_name>*\l<user_specified_wapor_level><br><br> 

in each wapor level directory there are 6 standard sub directories used to organize and hold any data created:

    - *00_reference*: reference data is stored here (masks and related shapefiles etc) (tiffs, csv's and shp's)
    - *01_download*: raw donwloaded rasters are stored here (tiff's and vrt's)
    - *02_processed*: processed rasters are stored here (tiff's and vrt's)
    - *03_masked*: masked rasters are stored here  (tiff's and vrt's)
    - *04_analysis*: analysis output (halfway products) are stored here (tiff's and vrt's, shp's and xls)  
    - *05_results*: results (products) are stored here (tiff's and vrt's, shp's and xls) 
    - *06_images*: any image results are stored here (png's, jpeg's)

- NOTE:<br><br> 

- folders 01 and 02 are input folders and data is stored there according to wapor naming standards.<br><br> 

- folders 00 and 03 -> 06 are output folders and all data is stored there in subfolders according to the mask provided during each analysis

***

## 1. Import modules/libraries

In [None]:
import os
from datetime import datetime
from waterpip.scripts.retrieval.wapor_retrieval import WaporRetrieval
print('class imported successfully, you are at the starting line')

***
## 2. Get a download token from the WAPOR website

Get your API Token from https://wapor.apps.fao.org/profile, once you have it you pass it as an argument below when intiating the class
as api_token='<your_token_goes_here>' . Remember to use '' so that it is recognized as a string object


***
## 3. Initiate/activate the class **WaporRetrieval**. 

**Background info**: 

the class **WaporRetrieval** is built on top of (inherits) an edited version of the class **WaporAPI** originally written by Bich Tran at IHE Delft for the various open source WAPOR packages released by IHE DELFT. It is this class that allows access to the data on the wapor portal. 

It is a great package for accessing the WAPOR data via API and if you want more flexibility in your implementation or if you want to dive into the code directly; I recommend you check out the original code available via their packages on GIT. You can also check out the edited version of their **WaporAPI** class that can be found in this package.

### **Activating the class**:

to intiate the class you need to enter/edit the following inputs below:

#### Required Inputs:

- **waterpip_directory**: path to the directory where the project specific directory will be created. the class *WaporRetrieval* automatically creates a new directory using the input *project_name* on activation and creates subfolders to organise the data as well. The functions that follow automatically use these folders.<br><br> 

- **shapefile_path**: the shapefile is a needed input that specifies the location to download data for as well as the projection to output it in. Directly the input is the path to the shapefile itself. The function retrieves the data for the area(s) shown in the shapefile.<br>

    - **Note**: A shapefile is required and provides alot of the required info for the project including the extent and the output projection. Any projection (crs) is accepted, wapor data is  always downloaded in epsg: 4326 and the shapefile bounding box is transformed as needed to match. transformations are made again while retrieving the data if needed to match the projection (crs) of the input shapefile.<br><br>  

- **wapor_level**: level of WAPOR data to download. There are 3 levels from low resolution 250m (1) and mid resolution 100m (2) to high resolution 30m (3). All of Africa and part of the middle east is available at level 1. Specific countries are available at level 2. Only some specific locations around the size of valleys or hydrosheds are available at level 3. For more info on the levels please see: https://wapor.apps.fao.org/home/WAPOR_2/1. <br> 

    - **Note**: A spatial check is carried out on the download area specified in your shapefile to see if data is available for it at the given level when running (only level 1 and 3 spatial checks exist currently). Error messages provide details.<br><br> 

- **api_token**: the api token retrieved form the WAPOR site goes here. see the instructions above on how to retrieve a token from the WAPOR website.<br><br>

#### Optional Inputs:

The following inputs are optional. They can also be provided too many of the class functions when running them. 

The advantage of passing them during class setup/initialisation is that it is easy to repeatedly use the class functions with the same parameters and inputs. That way you are assured it will always run the same. 

The advantage of passing the class functions when running the functions is that it is flexible. by changing a few of the optional class inputs you can retrieve different sets of data each time you run a function while maintaining the required class structure (folder structure, wapor level, area of interest (shapefile) and api token). 

- **project_name**: name of the directory that will be created, all data retrieved and analysed can be found in here, auto set to *test* if not provided.<br><br> 

- **period_start**: date you want to start your data download from, enter as a datetime object. This can also be provided later when running the class functions. Auto sets to 30 days before the day of running the code if not provided.<br><br> 

- **period_end**: date you want to end your data download at, enter as a datetime object. This can also be provided later when running the class functions. Auto sets to the day of running if not provided. <br>

    - **datetime objects**: A specific way of formatting dates for python. It is made up of the function datetime followed by the date in brackets split into the sections: Year (4 digits), month (2 or 1 digit), day (2 or 1 digits). (google python datetime object for more details)<br>

        - *Example*: November 4th 2020 or 4-11-2020: datetime(2020,11,4)<br>

        - *Note*: do not use leading zeros for single digit dates (1 not 01).<br><br>  

- **return_period**: return period to download data for, given as a single letter code. available periods include: I: Daily, D: Dekadal, S: Seasonal, A: Annual (yearly). This can also be provided later when running the class functions. Auto sets to the Dekadal (D) if not provided.<br><br> 

- **datacomponents**: datacomponents (parameters of interest such as transpiration and net primary productivity) to download data for. These are input as single letter code strings seperated by a ',' in a list such as: ['T', 'NPP']. if you set the datacomponents input to ['ALL'] it will download all datacomponents available for that return period and level at that location.   This can also be provided later when running the class functions. Auto sets to the ['ALL'] if not provided.<br><br> 

- **silent**: boolean option automatically set to False. If set to True the more general messages shared with the user when running the class will be turned off.<br><br> 

In [None]:
# activation of the wapor retrieval class 
retrieval = WaporRetrieval(            
    waterpip_directory=r'directory to store your projects',
    shapefile_path=r"path to the shapefile containing the analysis area",
    wapor_level=3,
    project_name='name for the project specific folder',
    api_token='your api token goes here')

***
### 3.1 Check out the level catalogues and availability shapefile

#### Wapor Catalogs:

- When  you run the class **WaporRetrieval** for the first time the class automatically downloads a catalog of the data available at level 1 2 and 3 as .csv and stores it in:<br>

    - *<user_specified_waterpip_directory>\metadata*<br><br>

    - These catalogs are useful for finding out what data is availalble on the wapor portal as well as which codes represent which datasets/countries/time periods. Feedback on which codes are available is also given as feeback to the user when passed incorrectly to functions from the **WaporRetrieval** class

#### Wapor level 3 availability shapefile:

- When  you run the class **WaporRetrieval** for the first time the class automatically generates a level 3 availability shapefile and also stores it in:<br>

    - *<user_specified_waterpip_directory>\metadata*<br><br>

    - This shapefile shows for which areas wapor level 3 data is available. It is also used to check if level 3 data is available for any given area when attempting to download level 3 data using a shapefile. And provides the level 3 country code required by the **WaporAPI** to donwload data for that area if it is available. <br><br>
    
- NOTE: On activating the class these files are automatically checked for and downloaded again if they are not found/deleted. In cas the files are older than 2 months they are also donwloaded again.

***
## 4. Download data from the WAPOR portal

After activating the class **WaporRetrieval** it is possible to donwload data from the wapor portal using the function: *download_wapor_rasters*. 

### Description

*download_wapor_rasters* is made up of two sub functions *retrieve_wapor_download_info* and *retrieve_wapor_rasters*. So to help you understand what is going on inside both here is some more info.<br><br>

- *retrieve_wapor_download_info*: per raster to be downloaded sets up a download and preprocessing dictionary containing all info needed to retrieve each raster from the wapor portal. including what to call each file and where to store it, preprocessing info and retrieval of the download url <br><br>

    - **NOTE**: you can call this function multiple times if you like in a loop for different parameters and extend the output list using the python function extend() to make one list for input into the follow up function *retrieve_wapor_rasters*<br><br>

- *retrieve_wapor_rasters*: retrieval of the actual rasters using the url provided by *retrieve_wapor_download_info* as well as all preprocessing of the rasters according to the information found in the dictionaries returned by *retrieve_wapor_download_info*. the standardised file paths provided in the dictionaries also allow previosly donwloaded files to be found and skipped. <br><br> 

The reason why *download_wapor_rasters* is split between two subfunctions (aside from better coding practices) is so that you can retrieve different sets of download info and group them together. That way you can make multiple calls to *retrieve_wapor_download_info* with different parameters and then retrieve and format the retrieved rasters all at the same time in the same way using *retrieve_wapor_rasters*.

***
### 4.1 download rasters from the WAPOR portal

to run the **WaporRetrieval** class function *download_wapor_rasters* you need to provide the following inputs:

#### Required Inputs:

- **None**: as all inputs can be supplied when activating the class previously there are no required inputs, however often you may want to change one of those inputs when rerunning, so see the optional list for details.<br>

#### Optional Inputs:

- **period_start**: date you want to start your data download from, enter as a datetime object. This could also have been provided when intitiating the class.<br><br>

- **period_end**: date you want to end your data download at, enter as a datetime object. This could also have been provided when intitiating the class.<br>

    - NOTE: see the class explanation above for more details on *datetime objects*<br><br>

- **return_period**: return period to download data for, given as a single letter code. available periods include: I: Daily, D: Dekadal, S: Seasonal, A: Annual (yearly). This could also have been provided when intitiating the class.<br><br>

- **datacomponents**: datacomponents (parameters of interest such as transpiration and net primary productivity) to download data for. These are input as single letter code strings seperated by a ',' in a list such as: ['T', 'NPP']. if you set the datacomponents input to ['ALL'] it will download all datacomponents available for that return period and level at that location.  This could also have been provided when intitiating the class.<br><br>

- **template_raster_path**: if provided uses the raster as a template and matches the diemansions of all retrieved rasters to this raster, also masks all retrieved rasters too this raster. If not provided the first downloaded raster in the download list is automatically used as the template raster<br>

    - NOTE: make sure you provide a matching mask_folder name if you provide a template raster yourself<br><br>

- **mask_folder**: this is the subfolder where processed data is stored, in case no name is provided it is auto set to nomask. If there is already data in the mask folder the download will not occur as it assumes data already exists. 

    - NOTE: The purpose of the mask_folder is so that you can carry out an analysis in the same area (bbox) for multiple different masks, skipping the download. downloaded rasters are deleted after preprocessing, but preprocessed rasters are maintained therefore the user can skip the download for a new mask in the same area 
    as the code will utlise the preexisting preprocessed rasters masking them to the new mask and storing them in the new mask sub folder<br><br>

- **output_nodata**: nodata value to use for the retrieved rasters auto set to -9999<br>

#### Output:

- a python dictionary containing python dictionaries. Each dictionary named after a datacomponent and each one containing a list of rasters downloaded for that datacomponent 

    - NOTE: a raster list consists of all the rasters found for a datacomponent between the two period dates given at the interval specified by the reurn period and the path to a vrt that compiles all those rasters into one.

In [None]:
# run the code and donwload rasters form the wapor portal
retrieved_rasters = retrieval.download_wapor_rasters(
    datacomponents=['T'])

# see next code cell for the results 

In [None]:
# print the list of retrieved rasters for AETI
print('retrieved AETI rasters:\n {} \n'.format(retrieved_rasters['AETI']['raster_list']))

# print the path to the AETI vrt
print('path to the retrieved AETI vrt:\n {}'.format(retrieved_rasters['AETI']['vrt_path']))

***
## 5. Check out the data 

if the code ran succesfully you should be able to find the data in the subfolders under the folders: <br>

- *<wapor_directory>/<project_name>/L<number>/02_processed* <br><br>

- *<wapor_directory>/<project_name>/L<number>/03_masked*<br>

there is also the folder: <br>

- *<wapor_directory>/<project_name>/L<number>/01_download*<br>

unedited data is placed here while downloading. If the download process is successful the data here is automatically deleted. So in the case of an error during the download, part of the data may be found here.

***
## 6. Visualise the data

You can check the data using a program such as Qgis or arcGIS or however you want.


***
## The next step: statistics

to analyse the data retrieved using this notebook check out the notebook *01B_waterpip_statistics_basics.ipynb* on how to  analyze the data retrieved and produce some statistics