# **Example overview of the `echopop` dataflow**

## **`Survey`-class initialization**

Import the latest version of `echopop`.

In [1]:
import pprint

from echopop.survey import Survey

Initialize the `Survey` object by loading in input data (`Survey.input`) and configuration settings (`Survey.config`). The former reads in data from all the defined input files contained within the `./config_files/survey_year_2019_config.yml` configuration file. The latter reads in various arguments as well as the file paths that point to the input files. 

In [2]:
survey = Survey( init_config_path = "C:/Users/Brandyn/Documents/GitHub/echopop/config_files/initialization_config.yml" ,
                 survey_year_config_path = "C:/Users/Brandyn/Documents/GitHub/echopop/config_files/survey_year_2019_config.yml" )

Not only are all the necessary acoustic, biological, kriging, and stratification data imported and contained with `survey`, but they can also be parsed in a relatively straightforward manner. There are five `Survey`-class attributes to be aware of: 
* `Survey.meta`: this is currently undeveloped, but this is where necessary information such as the date the object was created and general data workflow/provenance would be collected.
* `Survey.config`: this stores the background configuration settings. 
* `Survey.input`: this contains the imported acoustic, biological, kriging, and stratification data. This can be further investigated via the various nested dictionaries that correspond to specific types of dataset. 
* `Survey.analysis`: this is the working directory that contains relevant intermediate data products and calculations that may be of interest to the user and/or are required for later calculations. 
* `Survey.results`: this stores the overall results each analysis. 
  
## **Initial data processing**

**`Survey.meta`**

As previously mentioned, `Survey.meta` is undeveloped, but the `provenance` key will be iteratively updated with the performed analyses. Additional metadata can also be appended to this attribute.

In [3]:
pprint.pprint( survey.meta )

{'provenance': {'date': '2024-09-09 09:24:56', 'imported_datasets': set()}}


**`Survey.config`**

This attribute contains a variety of nested dictionaries that help to organize the entries in an intentional format that ideally minimizes ambiguity on how to access the associated values. Accessible dictionaries can be listed via `survey.config.keys()`:

In [4]:
survey.config.keys( )

dict_keys(['stratified_survey_mean_parameters', 'nasc_exports', 'haul_to_transect_mapping', 'transect_region_mapping', 'TS_length_regression_parameters', 'geospatial', 'kriging_parameters', 'survey_year', 'species', 'CAN_haul_offset', 'data_root_dir', 'biological', 'stratification', 'NASC', 'export_regions', 'gear_data', 'kriging', 'biometrics'])

The overall dictionary structure of `self.config` can also be accessed. Although not required for printing out the values in this attribute, the `pprint` library is helpful for formatting nested dictionaries into a legible format in both the console and interactive notebooks. 

In [5]:
pprint.pprint(survey.config)

{'CAN_haul_offset': 200,
 'NASC': {'all_ages': {'filename': 'Exports/US_CAN_detailsa_2019_table1y+_ALL_final '
                                   '- updated.xlsx',
                       'sheetname': 'Sheet1'},
          'no_age1': {'filename': 'Exports/US_CAN_detailsa_2019_table2y+_ALL_final '
                                  '- updated.xlsx',
                      'sheetname': 'Sheet1'}},
 'TS_length_regression_parameters': {'pacific_hake': {'TS_L_intercept': -68.0,
                                                      'TS_L_slope': 20.0,
                                                      'length_units': 'cm',
                                                      'number_code': 22500}},
 'biological': {'catch': {'CAN': {'filename': 'Biological/CAN/2019_biodata_catch_CAN.xlsx',
                                  'sheetname': 'biodata_catch_CAN'},
                          'US': {'filename': 'Biological/US/2019_biodata_catch.xlsx',
                                 'sheetname': 'biod

**`Survey.input`**

Similar to `Survey.config`, the input data are grouped into various nested dictionaries. Data contained within the `Survey.input` attribute are specifically stored in four general nested dictionaries: `acoustics`, `biology`, `spatial`, and `statistics`. 

In [6]:
survey.input.keys()

dict_keys(['acoustics', 'biology', 'spatial', 'statistics'])

This results in the following branched data structure for `Survey.input`:
* `acoustics`
  * `nasc_df`: acoustic trawl data (all age and age-2+ NASC)
* `biology`
  * `catch_df`: unaged haul weight totals
  * `distributions`
    * `age_bins_df`: age distribution histogram bins
    * `length_bins_df`: length distribution histogram bins
  * `haul_to_transect_df`: haul-to-transect key that links haul numbers to their respective transects
  * `length_df`: unaged length measurements
  * `specimen_df`: aged length and weight measurements
* `spatial`
  * `strata_df`: the `KS` stratum definitions and fraction of hake for each haul
  * `geo_strata_df`: latitudinal limits of the `KS` strata
  * `inpfc_strata_df`: the `INPFC` stratum definitions and their respective latitudinal limits
* `statistics`
  * `kriging`
    * `mesh_df`: kriging mesh
    * `isobath_200m_df`: 200 m isobath coordinates
    * `model_config`: dictionary comprising all required arguments for the kriging analysis
  * `variogram`
    * `model_config`: dictionary comprising all required arguments for the variogram analysis

## **`Survey.load_acoustic_data(...)`**

`````{admonition} Type-hinting
:class: tip
Hover your cursor over the various functions included in the code blocks below to get additional type hints and context for usage
`````

The method `Survey.load_acoustic_data(...)` ingests and preprocessed acoustic backscatter data in several forms, including consolidated `*.xlsx` files defined in the `Survey`-class configuration file (`survey_year_config_path`). This class-method currently takes six user arguments:

* `index_variable (string, list)`: Index columns used for defining discrete acoustic backscatter samples and vertical integration (default: `["transect_num", "interval"]`).
* `ingest_exports ('echoview', 'echopype', None)`: The type of acoustic backscatter exports required for generating the associated consolidated `*.xlsx` files (default: `None`). When `ingest_exports = "echoview"`, this searches a directory defined within `init_config_path` for associated Echoview exports (`layers`, `intervals`, `analysis`, `cells`). 
* `region_class_column (string)`: Dataframe column denoting the Echoview export region class such as "zooplankton" (default: `"region_class"`). 
* `transect_pattern (string)`: A (raw) string that corresponds to the transect number embedded within the base name of the file path associated with each export file (default: ``r'T(\\d+)'``).
* `unique_region_id (string)`: Dataframe column that denotes region-specific names and identifiers (default: `"region_id"`).
* `verbose (boolean)`: dialogue messages will appear in the console including a summary report of the results when this is set to `True` (default: `True`) 

In [7]:
survey.load_acoustic_data()

## **`Survey.load_survey_data(...)`**

The method `Survey.load_survey_data(...)` ingests and preprocesses the remaining biological and spatial data files within the `Survey`-class object. (`Survey.analysis`) and results (`Survey.results`). This class-method currently takes one user argument:

* `verbose (boolean)`: dialogue messages will appear in the console including a summary report of the results when this is set to `True` (default: `True`) 

```{warning}
This method creates intermediate `*.xlsx` files representing a haul-to-transect mapping key that links the acoustic backscatter and biological datasets. These files are configured by the associated key within `init_config_path`; however, the filenames of these outputs must be defined within the `survey_year_config_path` configuration file. Successful file creation is indicated via console messages.
```

In [8]:
survey.load_survey_data()

Haul-to-transect mapping file for 'US' saved at 'C:\Users\Brandyn\Documents\GitHub\EchoPro_data\2019_consolidated_files\Biological\US\haul_to_transect_mapping_2019_US.xlsx'.
Haul-to-transect mapping file for 'CAN' saved at 'C:\Users\Brandyn\Documents\GitHub\EchoPro_data\2019_consolidated_files\Biological\CAN\haul_to_transect_mapping_2019_CAN.xlsx'.


## **`Survey.transect_analysis(...)`**

`````{admonition} Type-hinting
:class: tip
Hover your cursor over the various functions included in the code blocks below to get additional type hints and context for usage
`````

The method `Survey.transect_analysis(...)` populates various analysis variables (`Survey.analysis`) and results (`Survey.results`). This class-method currently takes four user arguments:

* `species_id (integer, list)`: the species number code(s) (default: `22500`)
* `exclude_age1 (boolean)`: whether age-1 fish should be excluded from the analysis (default: `True`)
* `stratum (string)`: the stratum used for the various acoustic and biological calculations (default: `'ks'`)
* `verbose (boolean)`: dialogue messages will appear in the console including a summary report of the results when this is set to `True` (default: `True`) 
  
This is the primary biological data processing workhorse that is further used for later analyses, such as computing the number and weight proportions across all animals.

In [9]:
survey.transect_analysis( species_id = 22500 , exclude_age1 = True , stratum = 'ks' , verbose = True )

--------------------------------
TRANSECT RESULTS
--------------------------------
| Variable: Biomass (kmt)
| Age-1 fish excluded: True
| Stratum definition: KS
--------------------------------
GENERAL RESULTS
--------------------------------
| Total biomass: 1651.1 kmt
    Age-1: 7.9 kmt
    Age-2+: 1643.2 kmt
| Total female biomass: 832.2 kmt
    Age-1: 4.0 kmt
    Age-2+: 828.2 kmt
| Total male biomass: 818.5 kmt
    Age-1: 3.9 kmt
    Age-2+: 814.6 kmt
| Total unsexed biomass: 0.4 kmt
| Total mixed biomass: 36.8 kmt
--------------------------------


A variety of intermediate data products are stored in `Survey.analysis` under currently four nested dictionaries: 
* `kriging`: intermediate results specific to the kriging analysis (`Survey.kriging_analysis(...)`)
* `settings`: this provides a full recording of user-inputs and other variable definitions used for each analysis to improve replicability
* `stratified`: intermediate results specific to the stratified sampling analysis (`Survey.stratified_analysis(...)`)
* `transect`: intermediate results specific to the transect analysis (`Survey.transect_analysis(...)`)

In [10]:
survey.analysis.keys( )

dict_keys(['transect', 'settings', 'stratified'])

Since `Survey.transect_analysis(...)` was ran, the specific arguments used for the analysis can be directly accessed via:

In [11]:
pprint.pprint( survey.analysis[ 'settings' ][ 'transect' ] )

{'age_group_columns': {'haul_id': 'haul_no_age1',
                       'nasc_id': 'NASC_no_age1',
                       'stratum_id': 'stratum_no_age1'},
 'exclude_age1': True,
 'species_id': 22500,
 'stratum': 'ks',
 'stratum_name': 'stratum_num'}


The intermediate data products can be similarly accessed under the `transect` dictionary within `Survey.analysis`: 

In [12]:
survey.analysis[ 'transect' ].keys()

dict_keys(['acoustics', 'biology', 'coordinates'])

The results from each analysis are then stored within the `Survey.results` attribute:

In [13]:
survey.results.keys()

dict_keys(['transect', 'stratified', 'kriging', 'variogram'])

So we can generally glean all results recorded within `Survey.results` and also access those specific to `Survey.transect_analysis(...)` within `transect`:

In [14]:
pprint.pprint( survey.results )

{'kriging': {},
 'stratified': {},
 'transect': {'biomass_summary_df':        sex  biomass_age1  biomass_adult   biomass_all
0      all  7.869992e+06   1.643215e+09  1.651085e+09
1   female  3.950822e+06   8.282280e+08  8.321788e+08
2     male  3.919170e+06   8.146258e+08  8.185449e+08
3  unsexed  0.000000e+00   3.609296e+05  3.609296e+05
4    mixed -4.656613e-10   3.680784e+07  3.680784e+07},
 'variogram': {}}


In [15]:
survey.results[ 'transect' ]

{'biomass_summary_df':        sex  biomass_age1  biomass_adult   biomass_all
 0      all  7.869992e+06   1.643215e+09  1.651085e+09
 1   female  3.950822e+06   8.282280e+08  8.321788e+08
 2     male  3.919170e+06   8.146258e+08  8.185449e+08
 3  unsexed  0.000000e+00   3.609296e+05  3.609296e+05
 4    mixed -4.656613e-10   3.680784e+07  3.680784e+07}

## **`Survey.fit_variogram(...)`**

`````{admonition} Optimizing variogram parameters
:class: important
This is an optional method in the general workflow that can be skipped if the defined variograms contained within the configured settings files are desired.
`````

The method `Survey.fit_variogram(...)` uses a non-linear least squares optimizer to evaluate best-fit variogram parameters. This optimization uses the empirical variogram computed from the dataset. populates various analysis variables (`Survey.analysis`) and results (`Survey.results`). This class-method currently takes ten user arguments:

* `variogram_parameters (VariogramBase)`: A dictionary comprising various arguments required for computing the model variogram (default: `{}`). The allowed variogram parameters include: `["sill", "nugget", "correlation_range", "hole_effect_range", "decay_power", "enhance_semivariance"]`; however, the exact parameters required depend on the chosen semivariogram model. 
* `optimization_parameters (VariogramOptimize)`: A dictionary comprising various arguments for optimizing the variogram fit via non-linear least squares (default: `{}`).
* `initialize_variogram (VariogramInitial)`: A dictionary or list that indicates how each variogram parameter is configured for optimization (default: `["nugget", "sill", "correlation_range", "hole_effect_range", "sill"]`). Including parameter names in a list will incorporate default initial values imported from the associated file in the configuration `*.yaml` are used instead. This also occurs when `initialize_variogram` is formatted as a dictionary and the `'value'` key is not present for defined parameters. Parameter names excluded from either the list or dictionary keys are assumed to be held as fixed values.
* `model (list, string)`: A string or list of model names. A single name represents a single family model. Two inputs represent the desired composite model (e.g. the composite J-Bessel and exponential model) (default: `["bessel", "exponential"]`).
* `azimuth_range (float)`: The total azimuth angle range that is allowed for constraining the relative angles between spatial points, particularly for cases where a high degree of directionality is assumed (default: `360.0`).
* `n_lags (int)`: The number of lags (default: `30`).
* `force_lag_zero (boolean)`: Force the semivariance at the zeroth lag to be 0.0 (default: `True`).
* `standardize_coordinates (boolean)`: When set to `True`, transect coordinates are standardized using reference coordinates. (default: `True`).
* `variable ('biomass', 'abundance')`:  Transect data values used for fitting the variogram (default: `"biomass"`). This includes two options: `"abundance"` and `"biomass"`. These inputs correspond to fitting the empirical and theoretical variograms on "number density" and "biomass density", respectively.
* `verbose (boolean)`: dialogue messages will appear in the console including a summary report of the results when this is set to `True` (default: `True`) 

```warning
**The order of variables defined within `initialize_variogram` can effect the model fitting due to how the optimizer functions.**
```

In [16]:
survey.fit_variogram(model=["bessel", "exponential"], n_lags=30, initialize_variogram=["decay_power", "nugget", "sill", "correlation_range", "hole_effect_range"])

Longitude and latitude coordinates (WGS84) converted to standardized coordinates (x and y).
-----------------------------
VARIOGRAM OPTIMIZATION
-----------------------------
| See `self.analysis['settings']['variogram']['optimization'] for parameter settings.
-----------------------------
| Variogram model: ['bessel', 'exponential'] (composite family)
-----------------------------
| Initial fit -> Optimized fit
-----------------------------
Overall fit [MAD]: 0.00127 -> 0.000865
Decay power: 1.5 -> 1.52
Nugget: 0.0 -> 1e-10
Sill: 0.91 -> 0.945
Correlation range: 0.007 -> 0.00795
Hole effect range: 0.0 -> 1e-10
-----------------------------
| Results stored in `self.results['variogram']
-----------------------------


In [17]:
survey.results["variogram"]

{'model_fit': {'decay_power': 1.515771020973907,
  'nugget': 9.999999999970078e-11,
  'sill': 0.9452901787383056,
  'correlation_range': 0.007947505231867843,
  'hole_effect_range': 1e-10},
 'model': ['bessel', 'exponential']}

### **`VariogramBase`, `VariogramOptimize`, `VariogramInitial`**

All variables required for computing the empirical and theoretical variograms are encapsulated within the `VariogramBase`, `VariogramOptimize`, and `VariogramInitial` classes. These combine user-inputs with required default values, and also indicate what keys are allowed from `Survey.fit_variogram(...variogram_parameters, optimization_parameters, initialize_variogram)`.

In [18]:
from echopop.utils.validate import VariogramBase, VariogramOptimize, VariogramInitial

VariogramBase.create(**{})

{'model': ['bessel', 'exponential'],
 'n_lags': 30,
 'lag_resolution': 0.002,
 'max_range': None,
 'sill': 0.91,
 'nugget': 0.0,
 'hole_effect_range': 0.0,
 'correlation_range': 0.007,
 'enhance_semivariance': None,
 'decay_power': 1.5}

In [19]:
VariogramOptimize.create(**{})

{'max_fun_evaluations': 500,
 'cost_fun_tolerance': 1e-06,
 'solution_tolerance': 1e-06,
 'gradient_tolerance': 0.0001,
 'finite_step_size': 1e-08,
 'trust_region_solver': 'exact',
 'x_scale': 'jacobian',
 'jacobian_approx': 'central'}

In [20]:
VariogramInitial.create(["sill", "nugget", "correlation_range", "hole_effect_range", "decay_power"])

{'sill': {'min': 0.0, 'value': 0.0, 'max': inf},
 'nugget': {'min': 0.0, 'value': 0.0, 'max': inf},
 'correlation_range': {'min': 0.0, 'value': 0.0, 'max': inf},
 'hole_effect_range': {'min': 0.0, 'value': 0.0, 'max': inf},
 'decay_power': {'min': 0.0, 'value': 0.0, 'max': inf}}

### **`Survey.variogram_gui()`**

An alternative approach to `Survey.fit_variogram(...)` is available in the form of a GUI that allows manual editing of various parameters. Note that this GUI can **only** be ran within a Jupter notebook at present.

In [21]:
survey.variogram_gui()

HBox(children=(VBox(children=(Accordion(children=(VBox(children=(IntText(value=30, description='Number of lags…

Once you have found a fit that works, click **`Save fit`** under the **`Optimize variogram parameters`** tab to add the updated results to your `Survey`-class object.

In [23]:
survey.results["variogram"]

{'model_fit': {'nugget': 9.999999999969277e-11,
  'sill': 0.9426139842158918,
  'correlation_range': 0.007951858895596499,
  'decay_power': 1.48803677228496,
  'hole_effect_range': 1e-10},
 'optimization_settings': {'max_fun_evaluations': 500,
  'cost_fun_tolerance': 1e-06,
  'solution_tolerance': 1e-06,
  'gradient_tolerance': 0.0001,
  'finite_step_size': 1e-08,
  'trust_region_solver': 'exact',
  'x_scale': 'jacobian',
  'jacobian_approx': 'central'},
 'model': ['bessel', 'exponential']}

## **`Survey.stratified_analysis(...)`**

`Survey.stratified_analysis(...)` computes various stratified statistics, including the coefficient of variation (*CV*) estimates using the Jolly and Hampton (1990) stratified sampling method. There are a variety of arguments used for this function: 
* `dataset ('transect', 'kriging')`: data input selection (default: `'transect'`). This will use either the results of `Survey.transect_analysis(...)` or `Survey.kriging_analysis(...)`
* `stratum ('ks','inpfc')`: the stratum used for the various acoustic and biological calculations (default: `'inpfc'`)
* `variable( 'abundance' , 'biomass' , 'nasc')`: the data variable that will be used for the stratified resampling analysis (default: `'biomass'`)
* `bootstrap_ci`: the confidence interval (default: `0.95`) used for copmuting the uncertainty intervals around population and coefficient of variation (*CV*) estimates
* `bootstrap_ci_method`: the specific method/algorithm used for computing the bootstrap intervals (default: `'BCa'`)
* `bootstrap_ci_method_alt`: an optional argument that provides an alternative `bootstrap_ci_method` in case of skewness issues
* `bootstrap_adjust_bias`: a boolean argument (default: `True`) that determines whether the bootstrap intervals should be adjusted to account for the bootstrap bias
* `verbose (boolean)`: dialogue messages will appear in the console including a summary report of the results when this is set to `True` (default: `True`)

There are also analysis-specific optional arguments that are used depending on how `dataset` is defined:

* `mesh_transect_per_latitude (integer)`: the number of virtual transects per degree latitude when `dataset = 'kriging'`
* `transect_sample`: the resampling proportion used to resample transects within each stratum without replacement (default: inherits value from `Survey.config['stratified_survey_mean_parameters']`)
* `transect_replicates`: the number of resampling iterations that will be run (default: inherits value from `Survey.config['stratified_survey_mean_parameters']`)


In [41]:
survey.stratified_analysis( dataset = 'transect' , stratum = 'inpfc' , variable = 'biomass' , bootstrap_ci = 0.95 , bootstrap_ci_method = "BCa" , bootstrap_ci_method_alt = "t-jackknife", verbose = True )

--------------------------------
 STRATIFIED RESULTS (TRANSECT)
--------------------------------
| Stratified variable: Biomass (kmt)
| Number of transects: 113
| Number of strata (INPFC): 6
| Total area coverage: 53509.0 nmi^2
| Age-1 fish excluded: True
| Bootstrap replicates: 10000 samples
| Resampling proportion: 0.75
| Bootstrap interval method: BCa (CI: 95.0%)
--------------------------------
STRATUM-SPECIFIC ESTIMATES
--------------------------------
| Stratum area coverage (n = 6):
    4246.0 | 10042.0 | 5774.0 | 7060.0 | 7068.0 | 19319.0 nmi^2
| Stratum mean biomass density (kmt/nmi^2):
    0.002 [-0.0, 0.003] | 0.041 [0.03, 0.046] | 0.057 [0.037, 0.067]
    0.063 [0.046, 0.076] | 0.038 [0.025, 0.045] | 0.01 [0.005, 0.013]
| Stratum mean biomass (kmt):
    8.2 [-0.5, 11.0] | 417.3 [309.2, 462.2] | 327.3 [214.6, 386.6]
    446.5 [326.2, 542.4] | 267.3 [178.8, 318.6] | 176.5 [75.1, 232.1]
--------------------------------
SURVEY RESULTS
--------------------------------
| Survey m

```{warning}
You cannot run `Survey.stratified_analysis( dataset = 'kriging' , ... )` unless you have already computed the kriging results via `Survey.kriging_analysis.
```

Depending on how `dataset` is parameterized, the intermediate and final results are stored within a sub-dictionary so the outputs from both `dataset = 'transect'` and `dataset = 'kriging'` can be compared. For `Survey.analysis`, these are separated immediately below the top-level dictionary: 

In [26]:
survey.analysis[ 'stratified' ].keys( )

dict_keys(['transect'])

Here the resampled distributions of multiple statistics can be directly accessed for additional uncertainty analyses and visualizing the underlying statistical distributions: 

In [27]:
survey.analysis[ 'stratified' ][ 'transect' ].keys()

dict_keys(['stratified_replicates_df'])

In [28]:
survey.analysis[ 'stratified' ][ 'transect' ][ 'stratified_replicates_df' ]

Unnamed: 0,realization,unweighted_survey_density,unweighted_survey_total,weighted_survey_total,weighted_survey_variance,survey_cv
0,1,31655.306244,1.693844e+09,8.965528e+11,1.350835e+22,0.129636
1,2,28991.180148,1.551289e+09,8.308540e+11,1.147280e+22,0.128917
2,3,30398.111427,1.626572e+09,8.521054e+11,1.271536e+22,0.132334
3,4,29589.815642,1.583321e+09,7.952773e+11,1.207085e+22,0.138150
4,5,29297.810263,1.567696e+09,8.139463e+11,1.184596e+22,0.133718
...,...,...,...,...,...,...
9995,9996,28333.429587,1.516093e+09,7.876042e+11,1.106522e+22,0.133559
9996,9997,33293.751742,1.781515e+09,9.262805e+11,1.517008e+22,0.132969
9997,9998,27751.804583,1.484971e+09,7.491701e+11,1.145281e+22,0.142848
9998,9999,28916.711688,1.547304e+09,8.315393e+11,1.179917e+22,0.130630


The final results stored within `Survey.results` are formatted in an identical way:

In [29]:
survey.results[ 'stratified' ].keys( )

dict_keys(['transect'])

In [30]:
survey.results[ 'stratified' ][ 'transect' ].keys()

dict_keys(['variable', 'ci_percentile', 'num_transects', 'stratum_area', 'total_area', 'estimate', 'ci', 'bias'])

In [31]:
pprint.pprint( survey.results[ 'stratified' ][ 'transect' ])

{'bias': {'strata': {'density': array([   193.96325158, -12560.33414499,  11929.59836116,   1989.40748471,
         2628.43292714,  -2751.97667212]),
                     'proportion': array([ 0.00039268,  0.0006178 ,  0.00653998, -0.00532383, -0.00268681,
        0.00046018]),
                     'total': array([ 5.69616189e+05, -1.31391125e+08,  7.05242016e+07,  1.00116984e+07,
        1.84513426e+07, -3.33495466e+07])},
          'survey': {'cv': 0.0,
                     'density': -5596.023793829252,
                     'total': -65183813.17214823}},
 'ci': {'strata': {'density': [array([-193.96325158, 2499.53181979]),
                               array([30382.17342179, 45486.02397884]),
                               array([37368.14422708, 67098.09159068]),
                               array([45619.40207899, 76325.09536548]),
                               array([25303.0768114 , 45079.95029436]),
                               array([ 4939.20833666, 13024.90836066])],
     

## **`Survey.kriging_analysis(...)`**

`Survey.kriging_analysis(...)` computes the kriged estimates for the target variable via ordinary kriging with an adaptive search radius. The arguments to `Survey.kriging_analysis(...)` include:
* `coordinate_transform (boolean)`: when `True`, the transect and mesh longitude/latitude coordinates are transformed to a standardized format as x/y (default: `True`)
* `crop_method ('transect_ends', 'convex_hull')`: when `extrapolate = False`, this determines the method used for cropping the kriging mesh. Setting `crop_method = 'transect_ends'` (*default*) resamples the latitudinal resolution of the mesh grid and interpolates over the extent of the eastern and western endpoints of each transect line. This is conducted in a piece-wise fashion to account for the island of Haida Gwaii. Setting `crop_method = 'convex_hull'` uses a polygon-based approach for cropping the mesh grid based on the survey extent.
* `extrapolate(boolean)`: when `True`, the entire kriging mesh is used. Otherwise, different methods are used to crop the kriging mesh to limit extrapolation beyond the extent of the survey transects. 
* `stratum ('ks','inpfc')`: the stratum used for mapping the defined kriged `variable` (default: `'ks'`) 
* `variable(string)`: the data variable that will be used for the kriging analysis (default: `'biomass_density'`)
* `verbose (boolean)`: dialogue messages will appear in the console including a summary report of the results when this is set to `True` (default: `True`)

There are also analysis-specific optional arguments that are used depending on how `crop_method` is defined:
* When `crop_method = 'transect_ends'`:
  * `latitude_resolution (float)`: the updated latitudinal resolution (**in nmi**) used for interpolation
* When `crop_method = 'convex_hull'`:
  * `mesh_buffer_distance`: this is a dilation factor (**in nmi**) that expands/buffers the extent of the polygon defining the survey extent (default: `1.25`)
  * `num_nearest_transects`: this defines the number of nearest neighboring transects used for generating smaller polygons that are then constructed into the survey-wide polygon

Lastly, there are additional arguments that are optional since they are otherwise inherited from various parts of the `Survey` object: 
* `kriging_parameters (dictionary)`: a dictionary containing various kriging parameter variables and arguments
* `projection (string)`: an EPSG string that defines the mapping projection
* `variogram_parameters (dictionary)`: a dictionary containing various variogram parameter variables and arguments
* `best_fit_variogram (boolean)`: a boolean argument that dictates whether to use optimized variogram parameters (see above details for `Survey.fit_variogram()` and `Survey.variogram_gui()`)

In [18]:
survey.kriging_analysis( bearing_tolerance = 15.0 , coordinate_transform = True , crop_method = 'transect_ends' , extrapolate = False , latitude_resolution = 1.25 , stratum = 'ks' , variable = 'biomass_density' , verbose = True )

Longitude and latitude coordinates (WGS84) converted to standardized coordinates (x and y).
Extrapolation applied to kriging mesh points (81 of 9463):
            * 77 points had 0 valid range estimates without extrapolation
            * 4 points had at least 1 valid point but fewer than 3 valid neighbors
Imputed apportioned unaged male biomass at length bins:
(17.0, 19.0], (59.0, 61.0], (61.0, 63.0], (63.0, 65.0], (65.0, 67.0], (67.0, 69.0], (69.0, 71.0], (71.0, 73.0], (73.0, 75.0], (75.0, 77.0]
Imputed apportioned unaged female biomass at length bins:
(17.0, 19.0], (73.0, 75.0], (75.0, 77.0]
--------------------------------
KRIGING RESULTS (MESH)
--------------------------------
| Kriged variable: Biomass density (kg/nmi^2)
| Age-1 fish excluded: True
| Stratum definition: KS
| Mesh extrapolation: False
    Mesh cropping method: Transect ends
| Mesh and transect coordinate standardization: True
--------------------------------
GENERAL RESULTS
--------------------------------
| Mean 

There are then various results stored within `Survey.results[ 'kriging' ]`:

In [25]:
survey.results[ 'kriging' ].keys()

dict_keys(['variable', 'survey_mean', 'survey_estimate', 'survey_cv', 'mesh_results_df', 'tables'])

Some of these values are single values:

In [26]:
pprint.pprint( [survey.results['kriging'].get(key) for key in ['variable' , 'survey_mean' , 'survey_estimate' , 'survey_cv'] ] )

['biomass_density', 27807.10994608753, 1644447357.8872852, 0.02693646385644838]


The meshed results can also be retrieved:

In [27]:
survey.results[ 'kriging' ][ 'mesh_results_df' ]

Unnamed: 0,latitude,longitude,area,kriged_mean,kriged_variance,sample_variance,sample_cv,biomass,stratum_num
1,49.057959,-126.024127,6.250000,0.00000,0.027817,,0.007911,0.000000,7
2,49.016196,-126.024110,6.250000,0.00000,0.246388,,0.023545,0.000000,7
3,48.974438,-126.024093,6.250000,0.00000,0.530815,,0.034559,0.000000,7
4,48.932686,-126.024076,6.250000,51334.44202,0.669093,1.164446,0.038800,320840.262622,7
5,48.890939,-126.024060,6.250000,0.00000,0.711263,,0.040004,0.000000,8
...,...,...,...,...,...,...,...,...,...
19804,52.895008,-132.337719,0.011343,0.00000,0.902214,,0.045055,0.000000,1
19806,52.813140,-132.260812,0.009924,0.00000,0.487711,,0.033126,0.000000,1
19814,38.025533,-123.013372,0.006006,0.00000,0.298523,,0.025917,0.000000,5
19830,35.646423,-121.257388,0.001815,0.00000,0.312910,,0.026534,0.000000,3


The `tables` sub-dictionary includes the sum of each variable distributed over age, length, and sex (in this case, `variable = biomass_density` produces estimates of kriged `biomass` for these tables).

Biomass distributed over age, length, and sex for aged fish:

In [28]:
survey.results['kriging']['tables'][ 'aged_tbl' ]

Unnamed: 0_level_0,age_bin,"(0.5, 1.5]","(1.5, 2.5]","(2.5, 3.5]","(3.5, 4.5]","(4.5, 5.5]","(5.5, 6.5]","(6.5, 7.5]","(7.5, 8.5]","(8.5, 9.5]","(9.5, 10.5]",...,"(12.5, 13.5]","(13.5, 14.5]","(14.5, 15.5]","(15.5, 16.5]","(16.5, 17.5]","(17.5, 18.5]","(18.5, 19.5]","(19.5, 20.5]","(20.5, 21.5]","(21.5, 22.5]"
sex,length_bin,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
female,"(1.0, 3.0]",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
female,"(3.0, 5.0]",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
female,"(5.0, 7.0]",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
female,"(7.0, 9.0]",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
female,"(9.0, 11.0]",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
male,"(71.0, 73.0]",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
male,"(73.0, 75.0]",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
male,"(75.0, 77.0]",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
male,"(77.0, 79.0]",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Biomass distributed over length and sex for unaged fish:

In [29]:
survey.results['kriging']['tables']['unaged_tbl']

sex,female,male
length_bin,Unnamed: 1_level_1,Unnamed: 2_level_1
"(1.0, 3.0]",0.0,0.0
"(3.0, 5.0]",0.0,0.0
"(5.0, 7.0]",0.0,0.0
"(7.0, 9.0]",0.0,0.0
"(9.0, 11.0]",0.0,0.0
"(11.0, 13.0]",0.0,0.0
"(13.0, 15.0]",0.0,0.0
"(15.0, 17.0]",0.0,0.0
"(17.0, 19.0]",5687.932,6783.916
"(19.0, 21.0]",1397287.0,1623948.0


Combined biomass from both the aged and unaged fish distributed over length, age, and sex: 

In [30]:
survey.results['kriging']['tables']['overall_apportionment_df']

Unnamed: 0,age_bin,sex,length_bin,biomass_apportioned
0,"(0.5, 1.5]",all,"(1.0, 3.0]",0.0
1,"(0.5, 1.5]",female,"(1.0, 3.0]",0.0
2,"(0.5, 1.5]",male,"(1.0, 3.0]",0.0
3,"(1.5, 2.5]",all,"(1.0, 3.0]",0.0
4,"(1.5, 2.5]",female,"(1.0, 3.0]",0.0
...,...,...,...,...
2635,"(20.5, 21.5]",female,"(79.0, 81.0]",0.0
2636,"(20.5, 21.5]",male,"(79.0, 81.0]",0.0
2637,"(21.5, 22.5]",all,"(79.0, 81.0]",0.0
2638,"(21.5, 22.5]",female,"(79.0, 81.0]",0.0


Now that the kriging results are computed, they can then be used to parameterize `Survey.stratified_analysis( dataset = 'kriging' , ...)` to conduct the stratified resampling analysis: 

In [31]:
survey.stratified_analysis( 'kriging' )

--------------------------------
 STRATIFIED RESULTS (KRIGING)
--------------------------------
| Stratified variable: Biomass (kmt)
| Number of virtual transects: 102
| Number of strata (INPFC): 6
| Total area coverage: 35290.0 nmi^2
| Age-1 fish excluded: True
| Bootstrap replicates: 10000 samples
| Resampling proportion: 0.75
| Bootstrap interval method: BCa (CI: 95.0%)
--------------------------------
STRATUM-SPECIFIC ESTIMATES
--------------------------------
| Stratum area coverage (n = 6):
    2580.0 | 5614.0 | 3241.0 | 3313.0 | 3841.0 | 16701.0 nmi^2
| Stratum mean biomass density (kmt/nmi^2):
    0.002 [-0.001, 0.003] | 0.035 [0.027, 0.039] | 0.057 [0.036, 0.066]
    0.065 [0.046, 0.077] | 0.036 [0.019, 0.046] | 0.008 [0.007, 0.009]
| Stratum mean biomass (kmt):
    7.9 [0.0, 10.6] | 369.4 [324.3, 394.0] | 364.1 [295.6, 392.8]
    438.5 [375.5, 479.5] | 267.3 [204.8, 306.8] | 197.3 [178.3, 213.2]
--------------------------------
SURVEY RESULTS
--------------------------------


## **Other 'useful' features**

Although a summary of the results are printed in the console when `verbose = True`, it is a bit obnoxious to have to re-run the entire analysis to re-generate the same message. This is addressed via the `Survey.summary(...)` method that comprises a single input: 
* `results_name (string)`: this is the name of the analysis results that should be printed into the console. This can either be formatted as a single input name (e.g. 'transect' , 'kriging') or a nested/layered variable (e.g. 'stratified:transect') where a colon (':') is used as the delimiter that separates the two result layer names.

In [32]:
survey.summary( 'transect' )

--------------------------------
TRANSECT RESULTS
--------------------------------
| Variable: Biomass (kmt)
| Age-1 fish excluded: True
| Stratum definition: KS
--------------------------------
GENERAL RESULTS
--------------------------------
| Total biomass: 1651.1 kmt
    Age-1: 7.9 kmt
    Age-2+: 1643.2 kmt
| Total female biomass: 832.2 kmt
    Age-1: 4.0 kmt
    Age-2+: 828.2 kmt
| Total male biomass: 818.5 kmt
    Age-1: 3.9 kmt
    Age-2+: 814.6 kmt
| Total unsexed biomass: 0.4 kmt
| Total mixed biomass: 36.8 kmt
--------------------------------


In [42]:
survey.summary( 'stratified:transect' )

--------------------------------
 STRATIFIED RESULTS (TRANSECT)
--------------------------------
| Stratified variable: Biomass (kmt)
| Number of transects: 113
| Number of strata (INPFC): 6
| Total area coverage: 53509.0 nmi^2
| Age-1 fish excluded: True
| Bootstrap replicates: 10000 samples
| Resampling proportion: 0.75
| Bootstrap interval method: BCa (CI: 95.0%)
--------------------------------
STRATUM-SPECIFIC ESTIMATES
--------------------------------
| Stratum area coverage (n = 6):
    4246.0 | 10042.0 | 5774.0 | 7060.0 | 7068.0 | 19319.0 nmi^2
| Stratum mean biomass density (kmt/nmi^2):
    0.002 [-0.0, 0.003] | 0.041 [0.03, 0.046] | 0.057 [0.037, 0.067]
    0.063 [0.046, 0.076] | 0.038 [0.025, 0.045] | 0.01 [0.005, 0.013]
| Stratum mean biomass (kmt):
    8.2 [-0.5, 11.0] | 417.3 [309.2, 462.2] | 327.3 [214.6, 386.6]
    446.5 [326.2, 542.4] | 267.3 [178.8, 318.6] | 176.5 [75.1, 232.1]
--------------------------------
SURVEY RESULTS
--------------------------------
| Survey m

In [39]:
survey.summary( 'stratified:kriging' )

--------------------------------
 STRATIFIED RESULTS (KRIGING)
--------------------------------
| Stratified variable: Biomass (kmt)
| Number of virtual transects: 102
| Number of strata (INPFC): 6
| Total area coverage: 35290.0 nmi^2
| Age-1 fish excluded: True
| Bootstrap replicates: 10000 samples
| Resampling proportion: 0.75
| Bootstrap interval method: BCa (CI: 95.0%)
--------------------------------
STRATUM-SPECIFIC ESTIMATES
--------------------------------
| Stratum area coverage (n = 6):
    2580.0 | 5614.0 | 3241.0 | 3313.0 | 3841.0 | 16701.0 nmi^2
| Stratum mean biomass density (kmt/nmi^2):
    0.002 [-0.001, 0.003] | 0.035 [0.027, 0.039] | 0.057 [0.036, 0.066]
    0.065 [0.046, 0.077] | 0.036 [0.019, 0.046] | 0.008 [0.007, 0.009]
| Stratum mean biomass (kmt):
    7.9 [0.0, 10.6] | 369.4 [324.3, 394.0] | 364.1 [295.6, 392.8]
    438.5 [375.5, 479.5] | 267.3 [204.8, 306.8] | 197.3 [178.3, 213.2]
--------------------------------
SURVEY RESULTS
--------------------------------


In [40]:
survey.summary( 'kriging' )

--------------------------------
KRIGING RESULTS (MESH)
--------------------------------
| Kriged variable: Biomass density (kg/nmi^2)
| Age-1 fish excluded: True
| Stratum definition: KS
| Mesh extrapolation: False
    Mesh cropping method: Interpolation
| Mesh and transect coordinate standardization: True
--------------------------------
GENERAL RESULTS
--------------------------------
| Mean biomassdensity: 27807.11 kg/nmi^2
| Total survey biomass estimate: 1644.45 kmt
| Mean mesh sample CV: 0.0241
| Overall survey CV: 0.0269
| Total area coverage: 58186.9 nmi^2
--------------------------------
