# WaterPIP Download: Basics of retrieving data from the wapor data portal

#### Background

All the waterpip functions have been organized into classes to help automate as much of the process as possible. Most importantly is that folder structuring and naming is automated by calling on and intiating the *WaporStructure* class in the background. A user sets their project directory using the *waterpip_directory* and *project_name* inputs and the functions take care of the rest. 

##### NOTE: 
If this is your first time running this please read the instructions below  and then follow the steps to download the data. 

## 1 Import modules/libraries

In [1]:
import os
from datetime import datetime
from waterpip.scripts.retrieval.wapor_retrieval import WaporRetrieval
print('class imported succesfully, you are at the starting line')

class imported succesfully, you are at the starting line


## 2 Get a download token from the WAPOR website

Get your API Token from https://wapor.apps.fao.org/profile, once you have it you pass it as an argument below when intiating the class
as api_token='<your_token_goes_here>' . Remember to use '' so that it is recognized as a string object


## 3 Initiate the data retrieval class

#### 3.1 Initiate/activate the class *WaporRetrieval*. 

Background info: *WaporRetrieval* uses (inherits) the class *WaporAPI* originally written by Bich Tran at IHE Delft for the various open source WAPOR packages released by IHE DELFT. It is a great package for accessing the WAPOR data via API and if you want more flexibility in your implementation or if you want to dive into the code directly; I recommend you check out the original code available via their packages on GIT. You can also check out the edited version of their *WaporAPI* class that can be found in this package.

to intiate the class you need to enter/edit the following inputs below:

#### Required Inputs:

- **waterpip_directory**: path to the directory where the project specific directory will be created. the class *WaporRetrieval* automatically creates a new directory using the input *project_name* on activation and creates subfolders to organise the data as well. The functions that follow automatically use these folders (**required**).

- **shapefile_path**: the shapefile is a needed input that specifies the location to download data for as well as the projection to output it in. Directly the input is the path to the shapefile itself. The function retrieves the data for the area(s) shown in the shapefile  (**required**).

**Note**: A shapefile is required and provides alot of the required info for the project including the extent and the output projection. Any projection (crs) is accepted, wapor data is  always downloaded in epsg: 4326 and the shapefile bounding box is transformed as needed to match. transformations are made again if needed to retrieve the data and transform it to match the projection (crs) of the input shapefile. 

- **wapor_level**: level of WAPOR data to download. There are 3 levels from low resolution 250m (1) and mid resolution 100m (2) to high resolution 30m (3). All of Africa and part of the middle east is available at level 1. Specific countries are available at level 2. Only some specific locations around the size of valleys or hydrosheds are available at level 3. For more info on the levels please see: https://wapor.apps.fao.org/home/WAPOR_2/1  (**required**).

**Note**: A spatial check is carried out on the download area specified in your shapefile to see if data is available for it at the given level when running (only level 1 and 3 spatial checks exist currently). Error messages provide details.

- **api_token**: the api token retrieved form the WAPOR site goes here. see the instructions above on how to retrieve a token from the WAPOR website (**required**).

- **project_name**: name of the directory that will be created, all data retrieved and analysed can be found in here, auto set to *test* if not provided.

#### Optional Inputs:

The following inputs are optional. They can also be provided when running the class functions for more flexibility. The advantage of passing them during clas setup/initialisation is that it is easy to repeatedly use the class functions with the same inputs. That way you are assured it will always run for the same inputs. The advantage of passing the class functions is that it is flexible. by changing only a few inputs you can retrieve different sets of data each time while maintaining the same required class inputs (folder structure, wapor level and area of interest (shapefile) etc). 

- **period_start**: date you want to start your data download from, enter as a datetime object. This can also be provided later when running the class functions. Auto sets to the before running if not provided.

- **period_end**: date you want to end your data download at, enter as a datetime object. This can also be provided later when 
running the class functions. Auto sets to the day of running if not provided.

**datetime objects**: A specific way of formatting dates for python. It is made up of the function datetime followed by the date in brackets split into the sections: Year (4 digits), month (2 or 1 digit), day (2 or 1 digits). (google python datetime object for more details)

*Example*: November 4th 2020 or 4-11-2020: datetime(2020,11,4)  

*Note*: do not use leading zeros for single digit dates (1 not 01). 

- **return_period**: return period to download data for, given as a single letter code. available periods include: I: Daily, D: Dekadal, S: Seasonal, A: Annual (yearly). This can also be provided later when running the class functions. Auto sets to the Dekadal (D) if not provided.

- **datacomponents**: datacomponents (parameters of interest such as transpiration and net primary productivity) to download data for. These are input as single letter code strings seperated by a ',' in a list such as: ['T', 'NPP']. if you set the datacomponents input to ['ALL'] it will download all datacomponents available for that return period and level at that location.   This can also be provided later when running the class functions. Auto sets to the ['ALL'] if not provided.

In [2]:
# if not using an input delete that line
retrieval = WaporRetrieval(            
    waterpip_directory=r'C:\Users\Safi\eLEAF_IPA\Bekaa',
    shapefile_path=r"C:\DATA\eLEAF\WaPOR\ProjectGezira\Pojects\AETIWaPORpath\ForAnalysis.shp",
    wapor_level=3,
    project_name='BekaaTest',
    api_token='7576e1f763f7d8d494a74f3097c531fea35e784d53ab0992f93483e2d8c7caae9e64c344a06392d7')

bbox shapefile based on the input shapefile can be found at: C:\Users\Safi\eLEAF_IPA\Bekaa\BekaaTest\L3\00_reference\ForAnalysis_bbox.shp
running check for all wapor wapor_level catalogues and downloading as needed:
Loading WaPOR catalog for wapor_level: 1
catalogue location: C:\Users\Safi\eLEAF_IPA\Bekaa\metadata\wapor_catalogue_L1.csv
Loading WaPOR catalog for wapor_level: 2
catalogue location: C:\Users\Safi\eLEAF_IPA\Bekaa\metadata\wapor_catalogue_L2.csv
Loading WaPOR catalog for wapor_level: 3
catalogue location: C:\Users\Safi\eLEAF_IPA\Bekaa\metadata\wapor_catalogue_L3.csv
wapor_level 3 location shapefile exists skipping retrieval
wapor_level 3 location shapefile: C:\Users\Safi\eLEAF_IPA\Bekaa\metadata\wapor_L3_locations.shp
loading wapor catalogue for this run:
Loading WaPOR catalog for wapor_level: 3
catalogue location: C:\Users\Safi\eLEAF_IPA\Bekaa\metadata\wapor_catalogue_L3.csv


#### 3.2 Check out the level catalogues and availability shapefile

When  you run the class for the first time the class also automatically donwloads a catalog of the data available at level 1 2 and 3 as .csv and excel as well as the areas where data is available at level 3 as a shapefile. These are outputted in the metadata folder that is automatically made under the wapor_directory specified. This makes the download take slightly longer the first time. You can use the files to check for details on avaialble datacomponents, return periods and areas.

These files are automatically downloaded again if they are deleted or if the wapor directory changes or if the files are found to be older than 2 months.

if you want you can now edit and run the code using the codes and info found in the catalogues to retrieve different data.

## 4 Retrieve data form the WAPOR portal

The fourth thing you need to do is retrieve data from the WAPOR portal. This is split into two parts retrieval of the download info with the class function *retrieve_wapor_download_info* and retrieval of the rasters with the class function *retrieve_wapor_rasters*. The reason why it is split this way is so that you can retrieve different sets of download info and group them together. That way you can make multiple calls to *retrieve_wapor_download_info* with different parameters and then retrieve and format the retrieved rasters all at the same time in the same way using *retrieve_wapor_rasters* 

### 4.1 Retrieve download info from the WAPOR portal

the function *retrieve_wapor_download_info* outputs a list containing dictionaries. It retrieves all download urls avaialble  using the given input parameters and stores them in dictionaries. With one dictionary per donwload url (raster). Each dictionary also contains useful info on where the raster should be downloaded too and how it should be formatted. This list of dictionaries is required as an input in the function *retrieve_wapor_rasters*. 

to run the class function you need to provide the following inputs:

- **period_start**: date you want to start your data download from, enter as a datetime object. This could also have been provided when intitiating the class.

- **period_end**: date you want to end your data download at, enter as a datetime object. This could also have been provided when intitiating the class.

**datetime objects**: A specific way of formatting dates for python. It is made up of the function datetime followed by the date in brackets split into the sections: Year (4 digits), month (2 or 1 digit), day (2 or 1 digits). (google python datetime object for more details)

*Example*: November 4th 2020 or 4-11-2020: datetime(2020,11,4)  

*Note*: do not use leading zeros for single digit dates (1 not 01). 

- **return_period**: return period to download data for, given as a single letter code. available periods include: I: Daily, D: Dekadal, S: Seasonal, A: Annual (yearly). This could also have been provided when intitiating the class.

- **datacomponents**: datacomponents (parameters of interest such as transpiration and net primary productivity) to download data for. These are input as single letter code strings seperated by a ',' in a list such as: ['T', 'NPP']. if you set the datacomponents input to ['ALL'] it will download all datacomponents available for that return period and level at that location.  This could also have been provided when intitiating the class.

**NOTE**: you can call thsi function multiple times if you like in a loop for different parameters and extend the output list using the python function extend() to make one list for input into the follow up function *retrieve_wapor_rasters*


In [3]:
retrieval_info = retrieval.retrieve_wapor_download_info(
    period_start=datetime(2020,3,5),
    period_end=datetime(2020,5,6),
    return_period='D',
    datacomponents=['AETI'])

print(retrieval_info)

retrieving download info for component: AETI
retrieving download info for wapor_level 3 region: GEZ
attempting to retrieve donwload info for 7 rasters from wapor
Download Info Progress: |██████████████████████████████████████████████████| 100.0% Complete: 7 out of 7
[{'component': 'AETI', 'cube_code': 'L3_GEZ_AETI_D', 'period_str': '20200301_20200311', 'period_start': datetime.datetime(2020, 3, 1, 0, 0), 'period_end': datetime.datetime(2020, 3, 11, 0, 0), 'return_period': 'D', 'raster_id': 'L3_GEZ_AETI_2007', 'multiplier': 0.1, 'download_file': 'C:\\Users\\Safi\\eLEAF_IPA\\Bekaa\\BekaaTest\\L3\\01_download\\L3_AETI_D\\L3_GEZ_AETI_2007.tif', 'download': False, 'preprocessed_file': 'C:\\Users\\Safi\\eLEAF_IPA\\Bekaa\\BekaaTest\\L3\\02_processed\\L3_AETI_D\\L3_AETI_2007_temp.tif', 'preprocess': False, 'processed_file': 'C:\\Users\\Safi\\eLEAF_IPA\\Bekaa\\BekaaTest\\L3\\02_processed\\L3_AETI_D\\L3_AETI_2007.tif', 'process': False, 'url': None}, {'component': 'AETI', 'cube_code': 'L3_GEZ_AE


### 4.2 Retrieve the rasters from the WAPOR portal and format and store them

the function *retrieve_wapor_rasters* outputs rasters to the drive and returns a list of the stored raster paths.
It uses the output from the function *retrieve_wapor_download_info* as its main input. It retrieves all the rasters specified by the donwload urls in that list form the WAPOR portal and stores them. The function then matches all rasters to the dimensions of the template raster if provided and otherwise to the first raster in the list. All rasters are reprojected to the projection of the input shapefile if needed. temporal vrts (raster stacks) are also created by combining all rasters of the same datacomponent.

to run the class function you need to provide the following inputs:

- **wapor_list**: list of dictionaries produced by *retrieve_wapor_download_info* (**required**)

- **create_vrt**: True or False. If True creates vrts if False it does not. Autoset to True.

- **template_raster_path**: path to the raster to use as template when formatting all retrieved rasters.

- **mask_to_template**: True or False. If True masks all retrieved rasters to the template raster. autoset to False.

- **output_nodata**: output nodata value to use for all retrieved rasters. Autoset to -9999

In [4]:
# retrieve the rasters
raster_paths = retrieval.retrieve_wapor_rasters(
        wapor_list=retrieval_info)

print(raster_paths)

attempting to retrieve 7 rasters from wapor
Download Raster Progress: |--------------------------------------------------| 0.0% Complete: 0 out of 7
 preprocessed file already exists skipping: C:\Users\Safi\eLEAF_IPA\Bekaa\BekaaTest\L3\02_processed\L3_AETI_D\L3_AETI_2007_temp.tif
Download Raster Progress: |███████-------------------------------------------| 14.3% Complete: 1 out of 7
 preprocessed file already exists skipping: C:\Users\Safi\eLEAF_IPA\Bekaa\BekaaTest\L3\02_processed\L3_AETI_D\L3_AETI_2008_temp.tif
Download Raster Progress: |██████████████------------------------------------| 28.6% Complete: 2 out of 7
 preprocessed file already exists skipping: C:\Users\Safi\eLEAF_IPA\Bekaa\BekaaTest\L3\02_processed\L3_AETI_D\L3_AETI_2009_temp.tif
Download Raster Progress: |█████████████████████-----------------------------| 42.9% Complete: 3 out of 7
 preprocessed file already exists skipping: C:\Users\Safi\eLEAF_IPA\Bekaa\BekaaTest\L3\02_processed\L3_AETI_D\L3_AETI_2010_temp.tif
Downl

## 5 Check out the data 

if the code ran succesfully you should be able to find it in the subfolders under the folder: 
*<wapor_directory>/<project_name>/L<number>/02_processed*

there is also the folder:
*<wapor_directory>/<project_name>/L<number>/01_download*

unedited data is placed here while downloading. If the download process is succesful the data here is automatically deleted. So in the case of an error during the download, part of the data may be found here.

## 6 Visualise the data

You can check the data using a program such as Qgis or arcGIS or however you want.

## 7 Analyse the data

to analyse the data retrieved using this notebook check out the notebook *03_waterpip_analysis_basics.ipynb* on the basics of analyzing the data (requires the folder structure built in this class to run). 
