## Basic statistics of  the WaPOR data

#### Background

All the waterpip functions have been organized into classes to help automate as much of the process as possible. Most importantly is that folder structuring and naming is automated by calling on and intiating the *WaporStructure* class in the background. A user sets their project directory using the *waterpip_directory* and *project_name* inputs and the functions take care of the rest. 

##### WARNING: 
This analysis example requires the folder structure built while running and downloading data using the WaporRetrieval class. Please see 01_waterpip_download_basics for more info. It is not recommended to run this notebook on data not downloaded using WaporRetrieval (but it is possible if you get into the code)*

##### NOTE: 
If this is your first time running this please read the instructions below  and then follow the steps to analyse the data. 

## 1 Import modules/libraries

In [34]:
import os
from datetime import datetime
from waterpip.scripts.analysis.wapor_analysis import WaporAnalysis
print('class imported succesfully, you are at the starting line')

class imported succesfully, you are at the starting line


## 2 Initiate/activate WaporAnalysis class to start analysis 

This class was made to make analyzing data retrieved from the WAPOR portal easy. To intiate the class you need to enter/edit the following inputs below:

##### Note: 
Use the same inputs as found in the class WaporRetrieval. Except that the api_token is only required if retrieving data. In the case of using data retrieved using WaporRetrieval the clas inputsd here should be literally the same*

#### Required Inputs:

- **waterpip_directory**: path to the directory where the project specific directory will be created. the class *WaporRetrieval* automatically creates a new directory using the input *project_name* on activation and creates subfolders to organise the data as well. The functions that follow automatically use these folders (**required**).

- **shapefile_path**: the shapefile is a needed input that specifies the location to download data for as well as the projection to output it in. Directly the input is the path to the shapefile itself. The function retrieves the data for the area(s) shown in the shapefile  (**required**). ***Note***: A shapefile is required and provides alot of the required info for the project including the extent and the output projection. Any projection (crs) is accepted, wapor data is  always downloaded in epsg: 4326 and the shapefile bounding box is transformed as needed to match. transformations are made again if needed to retrieve the data and transform it to match the projection (crs) of the input shapefile. 

- **wapor_level**: level of WAPOR data to download. There are 3 levels from low resolution 250m (1) and mid resolution 100m (2) to high resolution 30m (3). All of Africa and part of the middle east is available at level 1. Specific countries are available at level 2. Only some specific locations around the size of valleys or hydrosheds are available at level 3. For more info on the levels please see: https://wapor.apps.fao.org/home/WAPOR_2/1  (**required**).

**Note**: A spatial check is carried out on the download area specified in your shapefile to see if data is available for it at the given level when running (only level 1 and 3 spatial checks exist currently). Error messages provide details.

- **project_name**: name of the directory that will be created, all data retrieved and analysed can be found in here, auto set to *test* if not provided.

In [35]:
analysis = WaporAnalysis(
        waterpip_directory=r'C:/path/to/the/directory',
        shapefile_path=r"C:\path\to\the\shapefile.shp",
        wapor_level=3,
        project_name='',
    )

## 3 Analyse the data 

After setting up the class you want to analyse some data. There are multiple functions that can be used for analysis. In this notebook we will show you one of the basic ones. Each analysis function can be accessed from within the class *WaporAnalysis*. Lets start with the most basic one:

### 3.1 Calculate crop field statistics

To calculate field statistics from a raster you can use the class function *calc_field_statistics*. This function takes a raster and calculates statistics from it for each of the fields (geometries) found in the shapefile provided. Therefore the shapefile provided must include the crop field boundaries. 

to run the class function you need to provide the following inputs:

**REQUIRED**:

- **fields_shapefile_path**: path to the shapefile containing the fields.

NOTE: It is recommended that the shapefile that is automatically produced on creating the cropmask raster is used if available as this contains an automatically generated id_key column that is needed. 

- **input_rasters**: paths to the rasters to analyse and retrieve field statistics for provided in a list 

NOTE: *calc_field_statistics* accepts multiple rasters and/or vrts and will calculate the field statistics for each raster provided, automatically generating names for the columns of output produced to distinguish them. Also works with a single raster provided in list format. 

- **template_raster_path**: path to the the raster used to determine the analysis dimensions can be the same as one of the input rasters. (all rasters must have the smae dimensions in any case) 

**OPTIONAL**:

**crop**: string used for the output file name, if not provided simply uses 'crop'

**field_stats**: list of field statistics to calculate (checked against a list of accepted keywords), if not provided uses the default set: ['min', 'max', 'mean', 'sum', 'stddev']

**analysis_name**: name used in the output file in combo with crop

**period_start**: start of period of analysis the raster covers, used in the output name

**period_end**: end of period of analysis the raster covers, used in the output name

**out_dict**: boolean option if set to True outputs the data in a dict instead of a dataframe.

**id_key**: identifies the column in the *fields_shapefile_path* used to identify the different fields/geometries. This input is autoset to 'wpid' assuming that 
you are using the auto generated cropmask shapefile. However it can be set to use any column in the shapefile you like. NOTE: the id has to be unique per field and is required (**IMPORTANT**)  

**OUTPUT**:

The output for the function *calc_field_statistics* is a tuple containing both the dataframe/dict produced and the path to were the dataframe is automatically stored as a .csv


In [36]:
field_stats = analysis.calc_field_statistics(
    fields_shapefile_path=r"C:\path\to\the\shapefile.shp",
    input_rasters=[r"C:\path\to\the\raster.tif"],
    template_raster_path=r"C:\path\to\the\raster.tif",
    crop="wheat",
    field_stats=['min', 'max', 'mean', 'sum', 'stddev'],
    analysis_name='First',
    id_key='id',
    out_dict=False
)

# access the tuple and print the dataframe and path to the .csv

field_stats_df = field_stats[0]

field_stats_csv_path = field_stats[1]

print(field_stats_df)

print(field_stats_csv_path)

attempting to claculate zonal stats for a single raster
calculating all feature statistics...
(     AETI_WaPOR_min  AETI_WaPOR_max  AETI_WaPOR_mean  AETI_WaPOR_sum  \
1        575.599976      970.900024       832.701172    67448.796875   
2        247.100006     1000.700012       848.698547   112876.906250   
3        547.700012     1020.000000       915.415466   183083.093750   
4        458.200012     1095.400024       739.995789   278238.406250   
5        224.399994     1159.000000       846.208435   479800.187500   
..              ...             ...              ...             ...   
96       618.400024      969.500000       788.987732    38660.398438   
97       497.299988     1045.900024       888.793518   123542.296875   
98       470.100006     1221.400024       917.367188   125679.304688   
99       689.299988     1034.300049       872.895081    70704.500000   
100      689.299988      951.299988       836.816223    61924.398438   

     AETI_WaPOR_stddev  
1            92

## 4 Output to shapefile

to be added soon

## 4 Check out the data 

if the code ran succesfully you should be able to find a csv in the subfolders under the folder: 
*<wapor_directory>/<project_name>/L<number>/03_analysis*

## 5 Visualise the data

You can check the data using a program such as Excel, Qgis or arcGIS or however you want.

## 6 Rinse and Repeat  

Now that you know how to retrieve data and analyse data feel free to repeat the notebooks *02_waterpip_download_basics* and *03_waterpip_analysis_basics* and play around with the parameters. If you feel like it you can even get into the code itself and see what you can code, run, retrieve and analyse! 

## 7 Producing Performance Assessment Indicators (PAIs) for an area

If you feel like it you can also take a look at notebook *04_waterpip_analysis_PAIs.ipynb* where we walk you through the process of producing some more informative statistics: *Performance Assessment Indicators (PAIs)* for an area from download to analysis.