# EchoPro Python: Minimal workflow

2022-7-6

Emilio. Adapted from Brandon's `echopro_workflow.ipynb`

Note: Depending on your computer OS, you may obtain the warning `OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.`. This warning will **not** affect the output of this notebook (it can be ignored).

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd

from EchoPro import EchoPro
# grab the SemiVariogram class so we can use its models
from EchoPro.semivariogram import SemiVariogram

%matplotlib widget
#%matplotlib inline

## Load and Process Data 

In this section we use the configuration files and initialization parameters to load all files that are necessary for the biomass density calculation. Additionally, using the prepared files we compute the normalized biomass density of the raw data.

The following variables representing the data are constructed:
* `params` -- a dictionary of all parameters from the configuration files
* `strata_df` -- a minimal Dataframe of the data contained in `filename_strata`
* `strata_ds` -- an Xarray Dataset containing strata_df data and computed quantities
* `geo_strata_df` -- a minimal Dataframe of the data contained in `stratification_filename`
* `length_df` -- a minimal Dataframe of the data contained in `filename_length_US/CAN` and computed quantities for the provided `species_code_ID`
* `specimen_df` -- a minimal Dataframe of the data contained in `filename_specimen_US/CAN` for the provided `species_code_ID`
* `nasc_df` -- a minimal Dataframe of the data contained in the appropriate NASC file e.g. `filename_processed_data_no_age1` or `filename_processed_data_all_ages` 
* `final_biomass_table` -- a Dataframe containing a subset of data from `nasc_df` and the calculated normalized biomass density

All of these variables can be accessed through epro_2019 e.g. `epro_2019.strata_df`.

Note: Once `epro_2019` has been created, all computational routines can be accessed using this object.

Note: The run below will print the statements: `A check of the initialization file needs to be done!, A check of the survey year file needs to be done!, We are using our own biomass density calculation!` these can be ignored as they are reminders. 

In [2]:
epro_2019 = EchoPro(
    init_file_path='./config_files/initialization_config.yml',
    survey_year_file_path='./config_files/survey_year_2019_config.yml',
    source=3,
    bio_data_type=1,
    age_data_status=1, 
    exclude_age1=True
)

A check of the initialization file needs to be done!
A check of the survey year file needs to be done!


In [3]:
%%time
epro_2019.load_data(file_types='all')

CPU times: user 3.57 s, sys: 32.1 ms, total: 3.6 s
Wall time: 3.65 s


In [4]:
%%time
epro_2019.compute_biomass_density()

We are using our own biomass density calculation!
CPU times: user 28.9 s, sys: 36.5 ms, total: 28.9 s
Wall time: 28.9 s


## Jolly-Hampton CV Analysis

Here we compute the mean of the Jolly-Hampton CV value for data that has not been Kriged.

Note: the algorithm used to compute this value is random in nature. Thus, different runs can produce slightly different values.

In [5]:
%%time
lat_INPFC = [np.NINF, 36, 40.5, 43.000, 45.7667, 48.5, 55.0000]  # INPFC
CV_JH_mean = epro_2019.run_cv_analysis(lat_INPFC, kriged_data=False)

CPU times: user 4.35 s, sys: 7.96 ms, total: 4.36 s
Wall time: 4.36 s


In [6]:
# The output should be approximately CV_JH_mean = 0.1337
CV_JH_mean

0.13377903047551687

## Obtain Kriging Mesh Data

Here we obtain the mesh and data, which will be necessary to compute the semi-variogram calculation and actually perform the Kriging. 

This line run produces the following variables: 
* `mesh_gdf` -- A GeoPandas Dataframe obtained from data in `filename_grid_cell`
* `smoothed_contour_gdf` -- A GeoPandas Dataframe obtained from data in `filename_smoothed_contour`

Additionally, this initalization creates routines that can plot and transform the mesh data. 

In [7]:
krig_mesh = epro_2019.get_kriging_mesh()

### Plot the mesh, transects, and smoothed contour

* Transect points are represented by a changing color gradient (these can be seen by zooming in)
* The full mesh points are red 
* The smoothed countour points are blue 

**TODO:** Handle the `.reset_index()` part within `plot_points`

In [8]:
# import folium

In [9]:
# fmap = krig_mesh.get_folium_map()
# transects_poly_geojson = gpd.GeoSeries(krig_mesh.get_polygon_of_transects(epro_2019.final_biomass_table, 2)).to_json()
# folium.GeoJson(transects_poly_geojson, name="transects multipoly").add_to(fmap)

# Plots the transect points on the folium map
fmap = krig_mesh.plot_points(epro_2019.final_biomass_table.reset_index(), 
                             cmap_column='Transect', color='hex')

# Plot full mesh points 
# fmap = krig_mesh.plot_points(krig_mesh.mesh_gdf, 
#                              fmap=fmap, color='red')

# Plot smoothed contour points 
fmap = krig_mesh.plot_points(krig_mesh.smoothed_contour_gdf, 
                             fmap=fmap, color='blue')

# display the folium map
fmap

## Apply coordinate transformations

To run the semi-variogram and Kriging calculations, it is required that one transforms the longitude/latitude points. Below we demonstrate a convenience routine accessible via `krig_mesh` that performs this transformation, applied to transect and mesh points. 

### Transect points

In [10]:
krig_mesh.apply_longitude_distance_transform(
    epro_2019.final_biomass_table, 
    transform_type='transect'
)

In [11]:
krig_mesh.transect_transf_df.head()

Unnamed: 0_level_0,Latitude,Longitude,Stratum,Spacing,geometry,normalized_biomass_density,longitude_transformed,x_transect,y_transect
Transect,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,34.397267,-121.143005,1,10.0,POINT (-125.49372 34.39727),0.0,-125.493716,-0.143604,-0.519568
1,34.397391,-121.133196,1,10.0,POINT (-125.48324 34.39739),0.0,-125.483236,-0.141486,-0.519562
1,34.397435,-121.123057,1,10.0,POINT (-125.47286 34.39744),0.0,-125.472856,-0.139387,-0.51956
1,34.397394,-121.112871,1,10.0,POINT (-125.46289 34.39739),0.0,-125.462891,-0.137373,-0.519562
1,34.397437,-121.102888,1,10.0,POINT (-125.45268 34.39744),0.0,-125.452676,-0.135307,-0.51956


### Mesh points

In [12]:
krig_mesh.apply_longitude_distance_transform(
    krig_mesh.mesh_gdf, 
    transform_type='mesh'
)

## Compute Semi-Variogram and fit a model

Below we demonstrate how to compute the normalized semi-variogram for the transect points using the normalized biomass density. We then show how to fit a model to the normalized semi-variogram data. 

### Compute the semi-variogram

In [13]:
# initialize semi-variogram class using the transect points; set up bins
semi_vario = epro_2019.get_semi_variogram(
    krig_mesh,
    params=dict(nlag=30, lag_res=0.002)
)

In [14]:
%%time
# run the semi-variogram calculation 
semi_vario.calculate_semi_variogram()

CPU times: user 6.45 s, sys: 10.5 s, total: 17 s
Wall time: 15.3 s


### Fit a model to the semi-variogram

To run Kriging, we need to fit a model to the normalized semi-variogram values. We provide a widget to display this model and allow one to actively change parameters within the model. 

Note: When you run the least-squares fit of the model, all model parameters will be updated and the model will be plotted in red. The apply model button will plot the model for the values provided in the box. Each time you change the values in the box, you need to unselect and select the apply model button to display the updated model.  

In [15]:
semi_vario.view_semi_variogram()

GridspecLayout(children=(Dropdown(description='Semi-variogram model', index=1, layout=Layout(grid_area='widget…

## Perform Kriging of biomass density

Below we perform Ordinary Kriging using the constructed transformed mesh points, the semi-variogram model, and the normalized biomass density.   

### Setup preliminary variables necessary for Kriging and initialize kriging routine

In [16]:
kriging_params = dict(
    # kriging parameters
    k_max=10,
    k_min=3,
    R=0.0226287,
    ratio=0.001,
    # parameters for semi-variogram model
    s_v_params={'nugget': 0.0, 'sill': 0.95279, 'ls': 0.0075429,
                'exp_pow': 1.5, 'ls_hole_eff': 0.0},
    # grab appropriate semi-variogram model
    s_v_model=SemiVariogram.generalized_exp_bessel
)

# initalize kriging routine
krig = epro_2019.get_kriging(kriging_params)

### Perform Ordinary Kriging

Below we perform Ordinary Kriging on the normalized biomass density using the established paramters. This routine returns:

Also returns total biomass by mesh point (cell).

In [17]:
%%time
mesh_krigbiomass_gdf = krig.run_biomass_kriging(krig_mesh)

CPU times: user 11.6 s, sys: 8.16 s, total: 19.7 s
Wall time: 17.6 s


In [18]:
mesh_krigbiomass_gdf.head()

Unnamed: 0,Latitude of centroid,Longitude of centroid,Area (km^2),Cell portion,geometry,krig_biomass_vp,krig_biomass_ep,krig_biomass_eps,area_calc,krig_biomass_vals
0,49.099727,-126.024144,21.4369,1.0,POINT (-126.02414 49.09973),0.0,0.295113,,6.25,0.0
1,49.057959,-126.024127,21.4369,1.0,POINT (-126.02413 49.05796),0.0,0.029303,,6.25,0.0
2,49.016196,-126.02411,21.4369,1.0,POINT (-126.02411 49.01620),0.0,0.264295,,6.25,0.0
3,48.974438,-126.024093,21.4369,1.0,POINT (-126.02409 48.97444),0.0,0.566269,,6.25,0.0
4,48.932686,-126.024076,21.4369,1.0,POINT (-126.02408 48.93269),50128.409807,0.717068,1.234471,6.25,0.313303


This should produce a total Kriged biomass estimate of 1725.0331199094 (kmt)

In [19]:
print(f"Total Kriged Biomass Estimate {mesh_krigbiomass_gdf.krig_biomass_vals.sum():.2f} (kmt)")

Total Kriged Biomass Estimate 1725.03 (kmt)


### Plot Kriged Biomass estimate in kmt

Plot non-zero mesh points only, for clarity and faster rendering

In [20]:
mesh_krigbiomass_gt0_gdf = mesh_krigbiomass_gdf[mesh_krigbiomass_gdf["krig_biomass_vals"] > 0]

# To plot all mesh points (nearly 20,000), use mesh_krigbiomass_gdf 
krig.plot_kriging_results(mesh_krigbiomass_gt0_gdf, krig_fieldname="krig_biomass_vals")