## Basic statistics and WAPOR data

#### Introduction

The waporact package includes a set of statistical functions and visualisation (tools) that can be used to carry out the analysis of any raster or rasters that the user provides. These functions can be found in the scripts: <br>
*waporact\scripts\tools\statistics.py* <br>
*waporact\scripts\tools\plots.py* <br>
- NOTE: These functions can be used on on any file of the correct type however it is easier to use them on files retrieved using the **WaporRetrieval** class. <br>

In this notebook we will walk you through two simple analyses of wapor data. From retrieving the data to calculating statistics to visualising the results. The retrieval steps are a copy of those carried out in the notebook: *waporact\tutorials\01_Basics\01A_downloads\01A_downloading_from_wapor.ipynb*

### **Steps**:<br>

1. Importing of the modules and functions needed<br><br> 

2. Retrieve and analyse a Landcover Classification raster

    2.1) retrieve a landcover classification raster. <br><br> 
    
    2.2) run the function *raster_count_statistics* to analyse the percentage of each land cover found in the raster. <br><br>
    
    2.3) plot the landcover classification in a piechart. <br><br> 

3. Retrieve and analyse an evapotranspiration raster from WAPOR running of the function *calc_field_statistics*: calculate per field statistics from a raster or set of rasters using a shapefile to determine the fields/areas.  <br><br> 

    3.1) retrieve an evapotranspiration raster. <br><br> 
    
    3.2) run the function *calc_field_statistics* to analyse the raster and generate basic field statistics. <br><br>

    3.3) plot the max evapotranspiration of all fields in a barchart. <br><br>

    3.4) plot the field results in an interactive chloropleth map. <br><br> 

4. Export the calculated field statistics too a shapefile <br><br> 

5. Examine the data<br><br> 

6. Rinse and Repeat<br><br> 

NOTE: If this is your first time running this please read the instructions below and follow the steps, otherwise feel free to use the notebook as you wish.
***

NOTE: Reading the following is not required but it is advised
if you did not do it previously 

### A quick guide to the waporact package scripts and the automatic folder structure used in the classes can be found via the links below:

- [automated folder structure explained](https://github.com/eLEAF-Github/WAPORACT/wiki/2.-The-WaPORAct-Package-4.-Automated-Folder-Structure-Explained)

- [waporact package structure further explained](https://github.com/eLEAF-Github/WAPORACT/wiki/2.-The-WaPORAct-Package-2.-WaPORAct-Toolset)

***

## 1. Import modules/libraries

In [None]:
import os
from datetime import datetime

# import retrieval class
from waporact.scripts.retrieval.wapor_retrieval import WaporRetrieval
print('retrieval class succesfully imported')

# import statistics functions
from waporact.scripts.tools import statistics
print('statistics functions succesfully imported')

# import vector functions
from waporact.scripts.tools import vector
print('vector functions succesfully imported')

# import vector functions
from waporact.scripts.tools import plots
print('plotting functions succesfully imported')

print('all scripts imported successfully, you are at the starting line')

***
## 2. count the raster values in a categorical raster (land cover classification)

As a first step carry out a count of the different values that exist in a categorical raster such as a land cover classification. This can be done using the function *raster_count_statistics*. To do this you can either provide your own categorical raster or you can retrieve and use the WAPOR land cover classification raster. <br>


***
### 2.1 Retrieve a WAPOR land cover classification raster

Retrieve the lcc raster form wapor for your given area. The steps taken below are the same as those described in the tutorial notebook:

 *waporact\tutorials\01_Basics\01A_downloads\01A_downloading_from_wapor.ipynb* <br>

 the only difference is that the datacomponents argument has changed to: **LCC**

 we also set the period_start and period_end to **2020/1/1 -> 2020/2/1** to make sure of data availability


In [None]:
# activation of the wapor retrieval class 
retrieval = WaporRetrieval(            
    waporact_directory=r'<insert_directory_path_here>',
    shapefile_path=r"<insert_git_directory_path_here>\waporact\samples\shapefile\gezira_test_set.shp",
    wapor_level=3,
    project_name='waporact_test',
    api_token='<insert_api_toke_here>')

### NOTE: The retrieval function uses a bbox to retrieve data 

To make sure all data falls within the bbox. the bbox is constructed based on the shapefile given as input to the class and then made slightly larger to make sure all the data falls within it. This bbox is stored internally in the class as self.bbox a shapefiel version of it is accessible at:

*<user_specified_waporact_directory>/<user_specified_project_name>/l<user_specified_wapor_level>/00_reference/

In [None]:
# run the code and download rasters from the wapor portal
retrieved_rasters = retrieval.download_wapor_rasters(
    datacomponents=['LCC'],
    period_start=datetime(2020,1,1),
    period_end=datetime(2020,2,1))
 
 # print the list of retrieved rasters for AETI
print('retrieved LCC rasters:\n {} \n'.format(retrieved_rasters['LCC']['raster_list']))

# print the path to the AETI vrt
print('path to the retrieved LCC vrt:\n {}'.format(retrieved_rasters['LCC']['vrt_path']))

***
### 2.2 Retrieving WAPOR land cover classification categories dict

NOTE: this step is only applicable if carrying out *raster_count_statistics* on the land cover classification raster (LCC) retrieved from the WAPOR portal. If analysing your own categorical raster you cna skip this step

To add categories to the wapor LCC we provide a wapor LCC categories dict this can be retrieved from the script: 

*waporact\scripts\retrieval\wapor_land_cover_classification_codes.py*
using the following function: *wapor_lcc* 

to use it all you have to do is import the function and when running it provide the wapor level (1,2,3) matching the wapor level you used when retrieving the wapor LCC raster.

In [None]:
# retrieve the wapor LCC categories dict (OPTIONAL)

from waporact.scripts.retrieval.wapor_land_cover_classification_codes import wapor_lcc

categories = wapor_lcc(wapor_level=3)

***
### 2.3 Run *raster_count_statistics* on the retrieved categorical raster

Use the function *raster_count_statistics* to count the different unique values found in a raster and calculate the percentage of non nan cells they make up as well as the area each value covers. <br><br>

The minimum needed to run the function is a raster, we will be using one of the LCC rasters retrieved in the previous step

#### Required Inputs:<br>

- **input_raster_path**: path to the input raster holding the values to count<br>

The following optional inputs are also available, we will be using the category dict retrieved in the previous step as the *categories_dict* input

#### Optional Inputs:<br>

- **output_csv**: if the path to an output csv is provided then a csv and excel of the output
calculated is made<br><br>

- **categories_dict**: if a dict of categories is provided uses the dict to assign names/categories 
to the values found.<br>

    - NOTE: the categories_dict has to be formatted so that the dictionary keys are the categories (names) 
and the values are the values found in the raster that the categories/names have to match<br><br>

#### Outputs: <br>

the function returns a tuple of a dataframe/dict and the path to a csv if provided on input. Each contains 
the same information on the values counted in the raster. 

For more details see: [statistics wiki](https://github.com/eLEAF-Github/WAPORACT/wiki/2.-The-WaPORAct-Package-3.-Function-and-Class-Descriptions-6.-statistics)

In [None]:
# assign path to output the count csv too
count_csv = r'<insert_directory_path_here>\\waporact_test\\L3\\04_analysis\\L3_LCC_20200101_20200111_count_stats.csv'

# count cells in the raster
statistics.raster_count_statistics(
    input_raster_path=retrieved_rasters['LCC']['raster_list'][0],
    output_csv=count_csv,
    categories_dict=categories 
)

***
### 2.4 plot the landcover classification the count data in a piechart


Use the function *piechart* to plot and visualize the csv outputted above ( using the dataframe is also possible). <br><br>

The minimum needed inputs to run the function is an input table provided either as a dataframe or as the path to a csv/excel, the name of the column containing the categories to plot. the name of the column containing the values to plot and the title of the plot.

#### Required Inputs:<br>

- **input_table**: dataframe or path to the file to create graph from<<br><br> 

- **names**: column holding the names for the pie slices<br><br>  

- **values**: column holding the value for the slices<br><br>

- **title**: title of the plot<br>  


Optional inputs are also available, we will be providing an output path for the html and png. These are only made if paths are provided. For info on all the optional inputs please see the link below:

#### Optional Inputs:<br>

- **output_png_path**: if provided outputs the generated file to static png<br><br>

- **output_html_path**: if provided outputs the generated file to interactive html<br>


#### Outputs: <br>

the function returns nothing directly outputting the made plot to the specified locations. However if show figure is true it also shows the plot made on completion.

for more details see: [plots wiki](https://github.com/eLEAF-Github/WAPORACT/wiki/2.-The-WaPORAct-Package-3.-Function-and-Class-Descriptions-7.-plots)


In [None]:
# create counts png and csv paths
count_png = '<insert_directory_path_here>\\\\waporact_test\\\\L3\\\\05_results\\\\L3_LCC_20200101_20200111_count_stats.png'
count_html = '<insert_directory_path_here>\\\\waporact_test\\\\L3\\\\05_results\\\\L3_LCC_20200101_20200111_count_stats.html'

# plot count data
plots.piechart(
    input_table=count_csv,
    names='landcover',
    values='percentage',
    title='gezira landcover classification crop percentages',
    output_html_path=count_html,
    output_png_path=count_png,
    )

***
## 3. calculate field based statistics from a raster or a set of rasters

in the waporact package we provide a set of statistical tools tha you can use to analyse rasters. One of these is the 
functio waporact field_statistics*. It allows you to carry out zonal statistics on a single raster or a set of rasters using a shapefile to determine the fields/zones/geometries for which to calculate statistics.<br>

***
### 3.1 Retrieve evapotranpiration rasters to calculate field based zonal statistics from

The class retrieval has already been activated therefore all you need to do is run the code below. For more details see earlier isntructions

In [None]:
# run the code and download rasters from the wapor portal
retrieved_AETI_rasters = retrieval.download_wapor_rasters(    
    period_start=datetime(2020,1,1),
    period_end=datetime(2020,2,1),
    datacomponents=['AETI'])


***
### 3.2 calculate field based statistics using calc_field_statistics

in the waporact package we provide a set of statistical tools tha you can use to analyse rasters. One of these is the 
functio waporact field_statistics*. It allows you to carry out zonal statistics on a single raster or a set of rasters using a shapefile to determine the fields/zones/geometries for which to calculate statistics.<br>

- NOTE: When running the function for a single raster the name of each column is taken from the statistic being calculated. However in the case of mulitple rasters this is not feasible so the name of each input raster or vrt band  in combination with the statistic calculated is taken as the column name.<br>

    - WARNING: For csv and excel the names generated above are fine however shapefiles only except 8 characters per column. So before outputting to shapefile csvs/excels may need editing. the input option *waterpip_files* attempts to automate this by maintaining the most important parts of the names when running the script for raster files with the standardised waporact file names. (**IMPORTANT**) <br>

to run the function you need to provide the following inputs:<br>

**Required Inputs**:<br>

- **fields_shapefile_path**: path to the shapefile containing the fields used to designate the zones of analysis. <br><br>

    - NOTE: if working with wapor data it is recommended to use the mask shapefile made when running the function *create_raster_mask_from_shapefile* or *create_raster_mask_from_wapor_lcc* from **WaporRetrieval** or **WaporPAI**. However any correctly formatted shapefile is acceptable.<br><br>

- **input_rasters**: list of paths to the rasters that are to be analyzed. For one raster just provide a list of one raster<br>

    - NOTE: *calc_field_statistics* accepts multiple rasters and/or vrts and will calculate the field statistics for each raster provided, automatically generating names for the columns of output produced to distinguish them. Also works with a single raster provided in list format. <br>

**Optional Inputs**:<br>

- **output_csv_path**: path to the csv where the calcualted statistics are outputted too if provided<br><br>

- **field_stats**: list of field statistics to calculate (checked against a list of accepted keywords), if not provided uses the default set: ['min', 'max', 'mean', 'sum', 'stddev']<br><br>

- **id_key**: identifies the column in the *fields_shapefile_path* used to mark/identify each field in the shapefile. This input is autoset to 'wpid' in the assumption that you are using a mask shapefile produced using *create_raster_mask_from_shapefile* or *create_raster_mask_from_wapor_lcc*. <br>

    - WARNING: the id has to be unique per field and has to exist in the shapefile (**IMPORTANT**)<br><br>

- **out_dict**: boolean option if set to True outputs the data in a dict instead of a dataframe.<br><br>

    - NOTE: only relevant when running for multiple rasters <br>

**Output**:

The output for the function *calc_field_statistics* is a tuple containing both the dataframe/dict produced and the path to where the dataframe is automatically stored as a .csv if a path is provided

For more details see: [statistics wiki](https://github.com/eLEAF-Github/WAPORACT/wiki/2.-The-WaPORAct-Package-3.-Function-and-Class-Descriptions-6.-statistics)

In [None]:
# assign path to output the count csv too
field_csv = r'<insert_directory_path_here>\\waporact_test\\L3\\04_analysis\\L3_LCC_20200121_20200201_field_stats.csv'

field_stats = statistics.calc_field_statistics(
    fields_shapefile_path=r"<insert_git_directory_path_here>\waporact\samples\shapefile\gezira_test_set.shp",
    input_rasters=[retrieved_AETI_rasters['AETI']['raster_list'][0]],
    output_csv_path=field_csv,
    field_stats=['min', 'max', 'mean', 'sum', 'stddev'],
    id_key='wpid',
    out_dict=False
)

# access the tuple and print the dataframe and path to the .csv
field_stats_df = field_stats[0]

field_stats_csv_path = field_stats[1]

print(field_stats_df)

print(field_stats_csv_path)

***
### 3.4 Visualize the data using a bargraph

Use the function *bargraph* to plot and visualize the csv outputted above ( using the dataframe is also possible). <br><br>

The minimum needed inputs to run the function is an input table provided either as a dataframe or as the path to a csv/excel, the name of the column containing x values and the name of the column to create they y axis and the title of the plot.

#### Required Inputs:<br>

- **input_table**: dataframe or path to the file to create graph from<<br><br> 

- **x**: column holding the x values<br><br>  

- **y**: column holding the y values<br><br>

- **title**: title of the plot<br>  


Optional inputs are also available, we will be providing an output path for the html and png. These are only made if paths are provided. For info on all the optional inputs please see the link below:

#### Optional Inputs:<br>

- **output_png_path**: if provided outputs the generated file to static png<br><br>

- **output_html_path**: if provided outputs the generated file to interactive html<br>


#### Outputs: <br>

the function returns nothing directly outputting the made plot to the specified locations. However if show figure is true it also shows the plot made on completion.

for more details see: [plots wiki](https://github.com/eLEAF-Github/WAPORACT/wiki/2.-The-WaPORAct-Package-3.-Function-and-Class-Descriptions-7.-plots)

In [None]:
# create counts png and csv paths
aeti_bar_png = '<insert_directory_path_here>\\\\waporact_test\\\\L3\\\\05_results\\\\L3_AETI_20200101_20200111_bar.png'
aeti_bar_html = '<insert_directory_path_here>\\\\waporact_test\\\\L3\\\\05_results\\\\L3_AETI_20200101_20200111_bar.html'

plots.bargraph(
    input_table=field_stats_csv_path,
    x='wpid',
    y='max_L3_AETI_20200101_20200111',
    title='per field AETI Gezira area',
    output_html_path=aeti_bar_html,
    output_png_path=aeti_bar_png)


***
### 3.4 Visualize the data using an interactive chloropleth map


Use the function *interactive_choropleth_map* to plot and visualize the csv outputted above with a shapefile as a map ( using the dataframe is also possible). <br><br>

The minimum needed inputs to run the function is an input shapefile and an input csv to match that contains the data as well as the z_column and label for the z column.

#### Required Inputs:<br>

- **input_shapefile_path**:  path to the input shape<<br><br> 

- **input_csv_path**: path to the input csv<<br><br> 

- **z_column**: name of the column in the csv to use for the z value<br><br>  

- **z_label**: label for the z value column<br><br>

Optional inputs are also available, we will be providing an output path for the html. These are only made if paths are provided. For info on all the optional inputs please see the link below:

#### Optional Inputs:<br>

- **output_html_path**: if provided outputs the generated file to interactive html<br>

WARNING: union_key is assumed to be **wpid** this means it is assumed that there is a wpid column in both the shapefile and the csv that can be used to link fields to data  

#### Outputs: <br>

the function returns nothing directly outputting the made map to the specified locations. However if show figure is true it also shows the map made on completion.

for more details see: [plots wiki](https://github.com/eLEAF-Github/WAPORACT/wiki/2.-The-WaPORAct-Package-3.-Function-and-Class-Descriptions-7.-plots)

In [None]:
# create counts png and csv paths
aeti_map_html = '<insert_directory_path_here>\\\\waporact_test\\\\L3\\\\05_results\\\\L3_AETI_20200101_20200111_map.html'

plots.interactive_choropleth_map(
    input_shapefile_path=r"<insert_git_directory_path_here>\waporact\samples\shapefile\gezira_test_set.shp",
    input_csv_path=field_stats_csv_path,
    z_column='max_L3_AETI_20200101_20200111',
    z_label='max_AETI',
    output_html_path=aeti_map_html)

***
## 4. Output to shapefile

As a last step we can output the calculated field statistics too shapefile so that it can be visualised in QGIS or ArcGIS as the user wants.<br>

**Required Inputs**:<br>

- **records**: the dictionary or dataframe contain the records/info that is to be outputted to shapefile.<br><br>

- **output_shapefile_path**: path to output the created shapefile too<br><br>

- **fields_shapefile_path**: path to the shapefile holding the reference fields/geometries to which the data should be attached. For exmaple the input shapefiel used to generate the data, or the reference shapefile generated by the crop maks function of wapor analysis.<br><br> 

- **union_key**: identifies the column in the *fields_shapefile_path*  and in the records used to combine the too. if workign with a shapefiel generated by the crop maks script 'wpid' is suggested. otherwise another column/key can also be used.<br>

**Optional Inputs**:<br>

- **output_crs**: if provided warps the shapefile to match this crs<br><br>

WARNING: long column names (like those currently autogenerated in the creation of pai csvs/excels will be truncated, use the csv to match which column is which or edit the csv to have shorter column names)

For more details see: [vector wiki](https://github.com/eLEAF-Github/WAPORACT/wiki/2.-The-WaPORAct-Package-3.-Function-and-Class-Descriptions-5.-vector)


In [None]:
aeti_shape_path = r'<insert_directory_path_here>\\\\waporact_test\\\\L3\\\\05_results\\\\L3_AETI_20200101_20200111.shp'
fields_shape_path = r"<insert_git_directory_path_here>\waporact\samples\shapefile\gezira_test_set.shp"
  
vector.records_to_shapefile(
    records=field_stats_df,
    output_shapefile_path=aeti_shape_path,
    fields_shapefile_path=fields_shape_path,
    union_key="wpid")

***
## 5. Examine the data

Beyond the visualisation methods provided above feel free to check out the data using a program such as Excel, Qgis, ArcGIS or however you want. We highly recommend it that way you gain a further understanding of what you have produced.

***
## 6. Rinse and Repeat  

Now that you know how to retrieve data and analyse data feel free to repeat the notebooks *01A_downloading_from_wapor* and *01B_basic_statistical_analysis* and play around with the parameters. If you feel like it you can even get into the code itself and see what you can code, run, retrieve and analyse! 

***
## The next step: Yield Calculation In Steps 

f you feel like it you can also take a look at notebook *01C_step_by_step_yield_calculation.ipynb* where we walk you through the process of producing yield step by step from a coding perspective for an area from download to analysis.