## Calculating Performance Assessment Indicators (PAIs)

#### Introduction

The 01 notebooks guide users through the basics on how to download data from the wapor portal using the class **WaporRetrieval**, how to calculate statistics using the retrieved files and plot them.

The goal of the 02 notebooks is to show an example of how the functionsavailable in the waporact package can be combined into a function or a pipeline to achieve a specific result. 

In *02A_step_by_step_yield_calculation* we showed you how to use the waporact functions to calculate yield and water productivity as well as an example of how to do it all in one function.

In this notebook we provide an example of a full pipeline that can be used to calculate Performance Assessment Indicators (PAIs). In this pipeline the classes and functions from the waporact package have been organized into a new class *WaporPAI* with the specific purpose of calculating PAIs. <br>

Performance Assessment Indicators: PAI's are statistics that have been developed specifically to provide information on agricultural productivity and water use effeciency from information taken from sattelite imagery.

Pipeline Defintion: a pipeline is a set of steps carried out one after the other in order to achieve a specific result. It is a process that is also often automated as the steps are standardised allowing for quick and repeated use. 

#### Performance Assessment Indicators (PAIs) in detail

Increasing competition for and limited availability of water and land resources puts a serious constraint on agricultural production systems. Sustainable land and water management practices will be critical to expand production efficiently and address food insecurity while limiting impact on the ecosystem. This requires a good understanding of how agricultural systems are performing and their potential for improvement. Variables affecting the performance of agricultural systems are both biophysical (climate, soil, topography) and socio-ecological (market, infrastructure, farm management, available inputs). The proposed approach is built on performance assessment indicators that look at satellite observations of the actual crop production and water consumption from the WaPOR database. The indicators focus on the actual performance of the agriculture system and the underlying biophysical factors, but as a satellite-based system it cannot provide information on underlying socio-ecological variables.

The approach is based on a number of Performance Assessment Indicators (PAIs) that are derived from FAO WaPOR data on crop, water consumption and growth. The indicators estimate Water Productivity (WP) and Land Productivity (yield) perfomances at various levels for specific crop types for a selected area and time period. 

### **Steps**:<br>

1. Importing of the modules and functions needed<br><br> 

2. Get a download api token from the WAPOR portal if not done previously, see notebook *01A_waporact_download_basics* for details <br><br> 

3. activating/initiating the class **WaporPAI**: This python class holds the pipeline used to calculate 
the PAI's and is built on top of **WaporRetrieval** allowing the user to interact with the WAPOR portal and retrieve information from it.<br> 

    - NOTE: **WaporRetrieval** makes use of the **WaporAPI** class originally written by Bich Tran at IHE Delft 
    for use by the various open source WAPOR packages released by IHE DELFT.<br><br>  

4. Creation of raster and shapefile mask for use during the analysis.<br><br>  

5. calculation of relative evapotranspiration, a PAI.<br><br> 

6. calculation of all available PAIs.<br><br> 

7. Rinse and Repeat for all PAI combined or run it for one of the other available PAI<br><br> 

NOTE: If this is your first time running this please read the instructions below and follow the steps, otherwise feel free to use the notebook as you wish. If you have not run any of the notebooks before i recommend you start with the 01 series such as *01A_downloading_from_wapor*. 

***

NOTE: Reading the following is not required but it is advised

### A quick guide to the waporact package scripts and the automatic folder structure used in the classes can be found via the links below:

- [automated folder structure explained](https://github.com/eLEAF-Github/WAPORACT/wiki/2.-The-WaPORAct-Package-4.-Automated-Folder-Structure-Explained)

- [waporact package structure further explained](https://github.com/eLEAF-Github/WAPORACT/wiki/2.-The-WaPORAct-Package-2.-WaPORAct-Toolset)

***

## 1. Import modules/libraries

In [None]:
import os
from datetime import datetime
from waporact.scripts.pipelines.wapor_pai import WaporPAI
print('class imported succesfully, you are at the starting line')

***
## 2. Get a download token from the WAPOR website if not already done

Get your API Token from https://wapor.apps.fao.org/profile, once you have it you pass it as an argument below when intiating the class
as api_token='<your_token_goes_here>' . Remember to use '' so that it is recognized as a string object

***
## 3. Initiate/activate the class **WaporPAI**. 

**Background info**: 

the class **WaporPAI** is used to calculate performance indicators for areas specified using a shapefile. To allow access to the wapor data it uses (inherits) **WaporRetrieval** the class that allows access to the wapor portal in the waporact package.<br> 

- NOTE: **WaporRetrieval** itself is built on top of (inherits) the class **WaporAPI** originally written by Bich Tran at IHE Delft for the various open source WAPOR packages released by IHE DELFT. It is a great package for accessing the WAPOR data via API and if you want more flexibility in your implementation or if you want to dive into the code directly; I recommend you check out the original code available via their packages on GIT. You can also check out the edited version of their **WaporAPI** class that can be found in this package.

### **Activating the class**:

to intiate the class you need to enter/edit the following inputs below:<br>

- NOTE: Initiation of **WaporPAI** is exactly the same as **WaporRetrieval**.<br>

#### Required Inputs:

- **waporact_directory**: path to the directory where the project specific directory will be created. the class *WaporRetrieval* automatically creates a new directory using the input *project_name* on activation and creates subfolders to organise the data as well. The functions that follow automatically use these folders.<br><br> 

- **shapefile_path**: the shapefile is a needed input that specifies the location to download data for as well as the projection to output it in. Directly the input is the path to the shapefile itself. The function retrieves the data for the area(s) shown in the shapefile.<br>

    - **Note**: A shapefile is required and provides alot of the required info for the project including the extent and the output projection. Any projection (crs) is accepted, wapor data is always downloaded in epsg: 4326 and the shapefile bounding box is transformed as needed to match. If needed the retrieved data is transformed to match the original projection of the input shapefile if needed.<br><br>  

- **wapor_level**: level of WAPOR data to download. There are 3 levels from low resolution 250m (1) and mid resolution 100m (2) to high resolution 30m (3). All of Africa and part of the middle east is available at level 1. Specific countries are available at level 2. Only some specific locations around the size of valleys or hydrosheds are available at level 3. For more info on the levels please see: https://wapor.apps.fao.org/home/WAPOR_2/1. <br> 

    - **Note**: A spatial check is carried out on the download area specified in your shapefile to see if data is available for it at the given level when running (only level 1 and 3 spatial checks exist currently). Error messages provide details.<br><br> 

- **api_token**: the api token retrieved form the WAPOR site goes here. see the instructions above on how to retrieve a token from the WAPOR website.<br><br>

- **period_start**: date you want to start your data download from, enter as a datetime object. This can be provided again later in other functions overwriting the stored period_start.<br><br> 

- **period_end**: date you want to end your data download at, enter as a datetime object.This can be provided again later in other functions overwriting the stored period_end. <br><br>

    - **datetime objects**: A specific way of formatting dates for python. It is made up of the function datetime followed by the date in brackets split into the sections: Year (4 digits), month (2 or 1 digit), day (2 or 1 digits). (google python datetime object for more details)<br>

        - *Example*: November 4th 2020 or 4-11-2020: datetime(2020,11,4)<br>

        - *Note*: do not use leading zeros for single digit dates (1 not 01).<br><br>  


#### Optional Inputs:

The following inputs are optional and have default values. They can also be provided later when running the class functions. 

- **project_name**: name of the directory that will be created, all data retrieved and analysed can be found in here, auto set to *test* if not provided.<br><br> 

- **return_period**: return period to download data for, given as a single letter code. available periods include: I: Daily, D: Dekadal, S: Seasonal, A: Annual (yearly). This can also be provided later when running the class functions. Auto sets to the Dekadal (D) if not provided.<br><br> 

- **silent**: boolean option automatically set to False. If set to True the more general messages shared with the user when running the class will be turned off.<br><br> 

    - **Note**: class arguments period_start, period_end and return_period are stored in the class and the user does not need to provide them anymore afterwards. If the argument is provided again while using a function (example: *download_wapor_rasters*) the argument stored in the class instance is replaced and the new version is used moving forward.

NOTE: if still confused please read and run the notebook *01A_downloading_from_wapor* first

In [None]:
# activation of the wapor analysis class 
pai = WaporPAI(            
    waporact_directory=r'<insert_directory_path_here>',
    vector_path=r"insert_git_directory_path_here\waporact\samples\shapefile\gezira_test_set.shp",
    wapor_level=3,
    period_start=datetime(2021,1,1),
    period_end=datetime(2021,12,1)
    project_name='waporact_test',
    api_token='<insert_api_toke_here>')

***
## 4. Create a crop mask file for use during analysis

Most of the functions in **WaporPAI** require a field raster mask and a matching field shapefile (shapefile with a unique id per field) to function.<br>

You can provide these yourself if you wish, make sure you have a shapefile that contains a unique id per polygon (field) in it and a matching 0,1 raster mask.<br>

alternatively we provide two functions that can be used to create the neccescary shapefile and raster:<br> 

below is a description of each, you can choose whichever one you want to use:

#### What both masking functions have in common:

Both functions provided are from the class **WaporRetrieval** that is inherited by **WaporPAI**. Output from both functions is stored in a mask subfolder in the reference folder of the project directory. The user has to provide a mask name to create the folder and name the mask files:

- *<user_specified_waporact_directory>/<user_specified_project_name>/ <br>l<user_specified_wapor_level>/00_reference/<mask_name>/* 

a new shapefile is made in both cases and stored here as well as the raster mask produced. The new shapefile will also contain an automatically generated id column called *wpid* that holds a unique id per polygon. a unique id column is required when calculating zonal statistics.

**create_raster_mask_from_shapefile**: The function creates a new shapefile and a mask based off a shapefile that the user provides. Of the two methods this is the most reliable as the user determines where the polygons (fields are). If you are certain that your polygons are correct this way is best. The function generates a unique id named *wpid* per polygon and uses the shapefile to generate a matching mask.

**create_raster_mask_from_wapor_LCC**: The second function uses the land cover classification rasters available via the WAPOR portal too  generate the mask files by masking to the categories (crops etc) specified by the user and vectorizing the result. The categories given are also attached to the shapefile as a new column. 

- NOTE: It is recommended you use one of the two methods as it prevents alteration of your original files.

***
NOTE: If masking using a WAPOR LCC raster you can skip this step and follow step 4.2

### 4.1 Using a user defined shapefile to produce a mask<br>

In detail this function uses the shapefile given by the user in combination with a template raster (if not provided by the user this is retrieved from the wapor portal) to create a new shapefile and raster mask as described above<br>

#### Required Inputs:<br>

- **mask_name**: name used in the mask files generated and it is also the name given to the mask subfolder generated.<br>

    - WARNING: the mask name given here needs to be provided as the input for the mask_folder input when running other **WaporPAI** functions. (**IMPORTANT**)<br>

#### Optional Inputs:<br>

- **input_shapefile_path**: path to the shapefile holding the fields/polygons that will be masked too.<br>

    - NOTE: This is only optional as you can also use the shapefile privded on activation of the class **WaporPAI** (this is automatcally selected if no shapefile is provided).<br><br>

- **template_raster_path**: path to the raster used as a template. The raster provides the metadata (dimensions, resolution etc) used to make the raster mask. if not provided the input shapefile is used to retrieve a random raster from the wapor portal to use as a template (the raster retrieved will match the area and wapor level requirements).<br> 

#### Output:

- a tuple containing the path to the mask raster created and the path to the mask shapefile created

In [None]:
# currently working on this code. A template Raster is now a requirement you cna use a raster that you retrieve using wapor retrieval

# method one using the shapefile (if not using this method skip this cell)
mask_raster_path, mask_shape_path = pai.create_raster_mask_from_shapefile(
        input_vector_path=r"insert_git_directory_path_here\waporact\samples\shapefile\gezira_test_set.shp",
        template_raster_path=r"insert_chosen_template_raster_here",
        mask_name='waporact_example')

***
NOTE: If masking using your own shapefile please follow the step 4.1

### 4.2 Using a WAPOR land cover classification (LCC) raster to produce a mask

In detail this function retrieves all the land cover classification rasters available within the period specified by the user at the interval specified by the user and within the area specified by the shapefile given by the user. 

The retrieved LCC rasters are then used to calculate the most commonly occurring category per cell across time from all the retreived rasters  producing a most common land classification raster for that period (this whole step is skipped if only one LCC raster is found). <br> 

this raster is then masked to the categories provided by the user maintaining the values in the unmasked cells. This raster is then also masked to 0,1. This is considered the raw mask file.<br> 

The values raster is vectorized to produce a raw mask shapefile of the fields. The categories are added as a column to the shapefile.<br> 

As a secondary step this shapefile is cleaned up (the geometries filtered and fixed to produce a new shapefile containing geometries that may better fit the fields)<br> 

Lastly the cleaned shapefile is rasterized to produce a matching 'cleaned'mask.<br> 

It is up to the user to select either the 'raw' or 'cleaned' combination of files to use during the analysis. Wether the cleaned file is better than the raw file is subjective.<br>  

#### Required Inputs:<br>

- **mask_name**: name used in the mask files generated and it is also the name given to the mask subfolder generated.<br>

    - WARNING: the mask name given here needs to be provided as the input for the mask_folder input when running other **WaporPAI** functions. (**IMPORTANT**)<br><br>

- **lcc_categories**: list of crops/land classification categories to mask. Categories have to match those used in the wapor database land cover classification codes exactly. If not a list of options will be returned as an error.<br>

    - NOTE: In the case that the categories given do not exist in the area an error will be raised sharing the categories that are available<br><br>

    - NOTE: Some aggregate categories are also being developed to speed up the process such as all crops. All crop categoires are then retrieved if this argument is provded (**in development**).<br>

#### Optional Inputs:<br>

- **input_shapefile_path**: path to the shapefile holding the fields/polygons that will be masked too.<br>

    - NOTE: This is only optional as you can also use the shapefile privded on activation of the class **WaporPAI** (this is automatcally selected if no shapefile is provided).<br><br>

- **period_start**: date you want to start your data download from, enter as a datetime object. Uses the period_start set when initalising the class if not provided.<br><br> 

- **period_end**: date you want to end your data download at, enter as a datetime object. Uses the period_start set when initalising the class if not provided. <br>

    - NOTE: see the class description in step 3 for details on datetime formatting<br><br>

- **area_threshold_multiplier**: area threshold with which to filter out too small polygons when cleaning up the raw shapefile  (single cell area * area_threshold_multiplier sets the threshold)<br>

    - WARNING: for level 3 the area_threshold_multiplier is set to a minimum of 1.5, you have to alter the code to change this<br><br>

- **output_nodata**: nodata value to use for the all rasters made aside from the 0,1 mask rasters auto set to -9999<br>

#### Output:<br>

- a tuple containing the path to the mask raster created and the path to the mask shapefile created<br>

    - NOTE: the paths to the raw files made are not returned but they can be foudn in the same location as the files specified above<br>

In [None]:
# create corp mask using the land classification raster from wapor (if not using this method please use the one above and skip this cell)
mask_raster_path_lcc, mask_shape_path_lcc = pai.create_raster_mask_from_wapor_lcc(
    mask_name='irrigated_cropland',
    lcc_categories=['cropland_irrigated'],
)

# Warning overwrites previous mask function results, use either option

***
## 4.3 Check out the mask raster and shapefile created (if you like)

take the time to check out the mask raster and shapefile you have prodcuced in QGIS or ArcGIS etc if you like

In [None]:
# print out the paths to the files produced
print('mask shapefile:{}\n'.format(mask_raster_path))

print('mask shapefile:{}'.format(mask_shape_path))

***
## 5. Calculate WAPOR based PAIs 

Once you have intitiated the class **WaporPAI** and you have a mask raster and shapefile with a unique id column it is possible to calculate PAIs using WaPOR data. <br>

This is possible for the following PAI's:<br>

    - beneficial fraction (bf): measure of efficiency
    
        - formula: Sum of Transpiration  / Sum of Evapotranspiration 

    - coeffecient of variation (cov): measure of equity
    
        - formula: standard deviation of summed Evapotranspiration per field / mean of summed evapotranspiration per field 

    - crop_water_deficit (cwd): measure of adequacy 
    
        formula: Potential evapotranspiration - Sum of Evapotranspiration 
    
    - relative evapotranspiration (ret): measure of adequacy
    
        - formula: Sum of Evapotranspiration / Potential evapotranspiration 
    
    - temporal relative evapotranspiration (tret): measure of reliability
    
        - formula: Sum of Evapotranspiration / Potential evapotranspiration per return period as a time series

Each PAI function can be considered a small processing chain/pipeline in and of itself as it calls on multiple subfunctions to do its task in a organized manner. Th option is provided to calculate them all via one function or seperately.    

Each PAI can be calculated seperately by calling on their specific function such as crop water deficit: *calc_crop_water_deficit*. Or all of them can be calculated in one go by using the function: *calc_wapor_performance_indicators* 

In the example below we will calculate relative evapotranspiration.

to run the **WaporPAI** class function *calc_relative_evapotranspiration* you need to provide the following inputs:

NOTE: datacomponents is not a argument, since we know exactly what we need to calculate a PAI we can automate the datacomponent chosen.

#### Required Inputs:

- **mask_raster_path**: path to a mask raster to mask output too, also acts as the template raster for the crs, resolution and dimensions of any output rasters etc. (this can be provided by using the mask functions described above).<br><br>

- **aoi_name**: area of interest (aoi) name to use for the mask folder auto set to nomask if not provided.<br>

    - WARNING: If using a mask generated by the internal mask functions this input should match the *mask_name* input provided there. (**IMPORTANT**)<br>

#### Optional Inputs:

- **period_start**: date you want to start your data download from (period of analsyis such as a growth season), enter as a datetime object. This could also have been provided when intitiating the class.<br><br>

- **period_end**: date you want to end your data download at (period of analsyis such as a growth season), enter as a datetime object. This could also have been provided when intitiating the class.<br>

    - NOTE: see the class explanation above for more details on *datetime objects*<br><br>

- **return_period**: return period to download data for, given as a single letter code. available periods include: I: Daily, D: Dekadal, S: Seasonal, A: Annual (yearly). This could also have been provided when intitiating the class.<br><br>

- **percentile**: percentile of evapotranspiration values to choose as the potential evapotranspiration value (potet calculated is relative not absolute)<br><br>

- **id_key**: name of the shapefile column key providing the feature indices.  wpid is a reliable autogenerated index provided while making the in house mask<br><br>

- **fields_shapefile_path**: if the path to the fields shapefile path is provided then the field level statistics are also calculated otherwise only the raster and a plot of the raster are produced<br><br>

- **field_stats**: list of statistics to carry out during the field level analysis, also used in the column names  <br><br>

- **output_static_map**: boolean option if True outputs a static map if field level statistics are calculated. auto set to True<br><br>

- **output_interactive_map**: boolean option if True outputs a html interactive map if field level statistics are calculated. auto set to True<br><br>

- **output_csv**: boolean option if True outputs a csv if field level statistics are calculated. auto set to False<br><br>

- **output_nodata**: nodata value to use for the retrieved rasters auto set to -9999<br><br>

#### Output:

- a  tuple that holds the path to the relative evapotranspiration raster and a dict of the field level statistics if available


      

In [None]:
# calculate relative evapotranspiration
outputs = pai.calc_relative_evapotranspiration(
    fields_shapefile_path=mask_shape_path,
    mask_raster_path=mask_raster_path,
    aoi_name='waporact_example'
    )

# organise the outputs:

# paths to the raster created
pai_ret_raster_path = outputs[0]
print('path to the relative evapotranspiration raster: {}'.format(pai_ret_raster_path))

# dataframe created
pai_ret_dict = outputs[1]
print('ret field dict:')
print(pai_ret_dict)

### 5.1 As an extra we can export the calculated dictionary of statistics to a shapefile

In [None]:
# output dict to shapefile using field shapefile
from waporact.scripts.tools.vector import records_to_vector

records_to_vector(
    field_records=pai_ret_dict,
    output_vector_path=r"<insert_directory_path_here>'\waporact_example_ret.shp",
    fields_vector_path=mask_shape_path,
    union_key="wpid")

***
## 6. calculate all PAI's

The option is also available to calculate all the PAI's which we do below. The method is nearly the same as that for *calc_relative_evapotranspiration* except that fields_shapefile_path is required and not optional. 

This is because it is a requirement for the calculation of temporal variation of relative evapotranspiration (tret) and coefficient of variation (cov)

the function returns a tuple: list of produced rasters, dictionary of calculated pai statistics 

In [None]:
outputs = pai.calc_wapor_performance_indicators(
    fields_vector_path=mask_shape_path,
    mask_raster_path=mask_raster_path,
    aoi_name='waporact_example'
    )


## 7 Check out the data 
if the code ran succesfully you should be able to find some results and images in the folder: <br><br>

*{wapor_directory}/{project_name}/L{number}/05_results*<br>
*{wapor_directory}/{project_name}/L{number}/06_images*<br><br>

## 7 Visualise the data

You can check the data out further using a program such as Qgis or arcGIS or however you want.

## 8 Rinse and Repeat  

Now that you have had an introduction to the functions and pipelines available in the waporact package I hope you have found it to be useful.  Feel free to repeat the notebooks and play around with the parameters. If you feel like it I highly reccomend you dive into the code itself and see what you can code, run, retrieve and analyse! 
