# EchoPro Python Workflow <a class="tocSkip">

# Import libraries and configure the Jupyter notebook

In [None]:
# libraries used in the Notebook
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd 

# Python version of EchoPro
import EchoPro

# Allows us to grab the SemiVariogram class so we can use its models
from EchoPro.semivariogram import SemiVariogram as SV

# Allows us to easily use matplotlib widgets in our Notebook
%matplotlib widget
# %matplotlib inline

In [None]:
%%time
survey_2019 = EchoPro.Survey(init_file_path='../config_files/initialization_config.yml',
                             survey_year_file_path='../config_files/survey_year_2019_config.yml',
                             source=3, 
                             exclude_age1=True)

In [None]:
%%time 
survey_2019.load_survey_data() 

In [None]:
TX_selected = [4,5,11,14,17,19,20,22,34,39,47,49,53,55,61,69,79,82,85,
               87,89,93,95,97,100,103,105,111,123,125,131,133,135,137,
               140,13,18,31,50,73,74,77,80,38,66]

## Compute the normalized biomass density


When `selected_transects` argument is added, we need to select a subset of the data. We do this as follows: 
1. Use the `gear_df` (the gear file) to create a mapping between transects and hauls.
   - The gear file is used because it creates a 1:many mapping between transects and hauls and it includes the most amount of relations between transects and hauls (in comparison to other files)
   
2. Using the mapping in item 1, obtain a subset of the mapping using `selected_transects`
   - This step will allow us to get only the hauls that are contained within the `selected_transects`
   
3. Select the subset of `length_df`, `specimen_df`, and `strata_df` using the subset of hauls found in item 2. 
   - Note: `strata_df` has the multi-index of `Haul, stratum` and because we are obtaining the subset of data based on hauls, this means that some `stratum` values can be removed completely.
   
4. Select a subset of `nasc_df` using `selected_transects`, since it has the transects as a column of the data


To compute the biomass density estimate, we need to first compute the backscattering cross-section (`sigma_b`). The calculation of `sigma_b` is done by computing the mean differential backscattering cross-section for each haul and this is based on `length_df` and `specimen_df`. Once this is done, the `sigma_b` value for each KS stratum is computed by taking the mean of all hauls within the KS stratum, where the relation between haul and KS stratum is determined by `strata_df`.  

Once `sigma_b` for each stratum is computed, we can calculate the nautical areal density. This calculation is done as follows:
1. Compute `mix_sa_ratio`, which is the weight of a particular haul determined by `strata_df` and the `haul` value is determined by a column in `nasc_df`
2. Multiply `mix_sa_ratio` by `nasc_df.NASC` (where NASC is the NASC value at the corresponding `haul` in `mix_sa_ratio`)
3. Divide the values in item 2 by the `sigma_b` value at the corresponding `stratum` determined by `nasc_df`

It is in item 3 here where the above way of obtaining the subsets of data causes an issue that needs to be resolved. The subset of `nasc_df` values can contain `stratum` that we cannot calculate a `sigma_b` for. Chu's approach to the above problem is to create artificial `sigma_b` values for those strata that do not have a `sigma_b` value. For example, in most scenarios he seems to create the artificial `sigma_b` values by choosing the closest stratum (with respect to the stratum number) to the missing stratum that has the most amount of data in it. 


One remedy to this problem that we tried was to obtain the subset of `nasc_df` values based on those haul values that are in the created subset of `strata_df`. However, the issue with this based off of the 2019 data is that the subset of `nasc_df` can contain several transects that are not in `selected_transects`. This scenario made us believe that the haul values in `nasc_df` do not correctly correspond to the transects or vice versa. This is worrisome as these mappings are crucial in the biomass density calculation. 

In [None]:
%%time
survey_2019.compute_biomass_density(selected_transects=TX_selected)

In [None]:
survey_2019.bio_calc.final_biomass_table.head()

In [None]:
# survey_2019.bio_calc.final_biomass_table["Stratum"].groupby(level=0).mean()

In [None]:
# survey_2019.bio_calc.final_biomass_table.index.unique()

In [None]:
survey_2019.bio_calc.final_biomass_table["normalized_biomass_density"].sum()
# normal 298438694.4207147

# Jolly-Hampton CV Analysis

* Compute the mean of the Jolly-Hampton CV value on data that has not been Kriged
* Note: the algorithm used to compute this value is random in nature

In [None]:
%%time
CV_JH_mean = survey_2019.run_cv_analysis(kriged_data=False)

In [None]:
print(f"Mean Jolly-Hampton CV: {CV_JH_mean:.4f}")

# Obtain Kriging Mesh Data

## Access Kriging mesh object
* Reads mesh data files specified by `survey_2019` 

In [None]:
krig_mesh = survey_2019.get_kriging_mesh()

### Plot the Mesh, Transects and smoothed isobath contour

* Generate interactive map using the Folium package
* Mesh points are in gray
* Transect points are represented by a changing color gradient
* Smoothed contour points (200m isobath) are in blue 

In [None]:
# fmap = krig_mesh.plot_layered_points()
# fmap

## Apply coordinate transformations
* Longitude transformation
* Lat/Lon to distance

### Transect points

In [None]:
krig_mesh.apply_coordinate_transformation(coord_type='transect')

### Mesh points

In [None]:
krig_mesh.apply_coordinate_transformation(coord_type='mesh')

In [None]:
# plot the transformed mesh points 
# plt.plot(krig_mesh.transformed_mesh_df.x_mesh, 
#          krig_mesh.transformed_mesh_df.y_mesh, 'r*', markersize=1.25)
# plt.show()

# Compute biomass density Semi-Variogram and fit a model

* Compute the normalized semi-variogram using the normalized biomass density
* Fit a model to the semi-variogram values

## Compute the semi-variogram

### Initialize semi-variogram calculation
* Transformed transect points
* Parameters specific to semi-variogram algorithm

In [None]:
semi_vario = survey_2019.get_semi_variogram(
    krig_mesh,
    params=dict(nlag=30, lag_res=0.002),
)

### Compute the normalized semi-variogram

In [None]:
%%time
semi_vario.calculate_semi_variogram()

## Fit a model to the semi-variogram

* A widget to easily fit the model

In [None]:
semi_vario.get_widget()

# Perform Ordinary Kriging of biomass density

* transformed mesh points
* semi-variogram model
* normalized biomass density 

## Initialize Kriging routine

In [None]:
kriging_params = dict(
    # kriging parameters
    k_max=10,
    k_min=3,
    R=0.0226287,
    ratio=0.001,
    
    # parameters for semi-variogram model
    s_v_params={'nugget': 0.0, 'sill': 0.95279, 'ls': 0.0075429,
                'exp_pow': 1.5, 'ls_hole_eff': 0.0},
    
    # grab appropriate semi-variogram model
    s_v_model=SV.generalized_exp_bessel
)

# uncomment to use widget values 
# kriging_params.update(semi_vario.get_params_for_kriging())

# initalize kriging routine
krig = survey_2019.get_kriging(kriging_params)

## Perform Kriging
* Also generates total biomass at mesh points

In [None]:
%%time
krig.run_biomass_kriging(krig_mesh)

In [None]:
print(f"Total Kriged Biomass Estimate: {1e-6*survey_2019.krig_results_gdf.krig_biomass_vals.sum():.3f} kmt")

## Plot Kriged Biomass estimate in kmt

In [None]:
# plot all mesh points
survey_2019.krig_results_gdf.krig_biomass_vals = 1e-6 * survey_2019.krig_results_gdf.krig_biomass_vals
krig.plot_kriging_results(survey_2019.krig_results_gdf, krig_field_name="krig_biomass_vals")