## Statistics by Zone

Basic statistics are calculated for the freshwater zone, the mixing zone, and the saltwater zone. To do this, the profile is segmented based on the position of the breakpoints in Vertical Position [m].

> **Important**: By default, the program uses **2 breakpoints**.

---

### Import Libraries

In [1]:
import sys
import os
root = os.path.abspath('..')  
sys.path.append(root)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from modules import load, plots, analysis, utils
from modules import statistics_zones as sz

# styles
plt.style.use('seaborn-v0_8-white')

---

### Load data

1. Basic Parameters: Here, the file names and the columns to be explored are defined. Modify as needed.

In [2]:
#name = 'AW1D_YSI_20230826'
#name = 'AW2D_YSI_20230815'
#name = 'AW5D_YSI_20230824'
#name = 'AW6D_YSI_20230815'
#name = 'AW7D_YSI_20230814'
#name = 'BW1D_YSI_20230824'
#name = 'BW2D_YSI_20230819'
#name = 'BW3D_YSI_20230818'
#name = 'BW4D_YSI_20230816'
name = 'BW5D_YSI_20230822'
#name = 'BW6D_YSI_20230826'
#name = 'BW7D_YSI_20230826'
#name = 'BW8D_YSI_20230823'
#name = 'BW9D_YSI_20230823'
#name = 'BW10D_YSI_20230825'
#name = 'BW11D_YSI_20230823'
#name = 'LRS33D_YSI_20230822'
#name = 'LRS69D_YSI_20230818'
## name = 'LRS70D_YSI_20230822'  por qué best BIC es cero?
## name = 'LRS75D_YSI_20230819'
#name = 'LRS79D_YSI_20230827'
#name = 'LRS81D_YSI_20230823'
#name = 'LRS89D_YSI_20230825'
#name = 'LRS90D_YSI_20230827'

In [4]:
path = f'{root}/data/rawdy/{name}_rowdy.csv'
path_json = f'{root}/data/results/{name}_results.json'
x_column = 'Vertical Position [m]'
y_column = 'Corrected sp Cond [uS/cm]'

2. Load the data profile.

In [5]:
df = pd.read_csv(path)
df = df[[x_column, y_column]]

X = np.array(df[x_column])
Y = np.array(df[y_column])

3. Load the results of the segment fitting.

In [6]:
df_results = load.load_data(filepath=path_json, json=True)
df_results

Unnamed: 0,trial_1,trial_2,trial_3,trial_4,trial_5
df,"{'bic': {'0': 93804.3535159611, '1': 89367.769...","{'bic': {'0': 93804.3535159611, '1': 89367.769...","{'bic': {'0': 93804.3535159611, '1': 89367.769...","{'bic': {'0': 93804.3535159611, '1': 89367.769...","{'bic': {'0': 93804.3535159611, '1': 89367.769..."
best_n_breakpoint_bic,2,2,2,2,2
min_bic_n_breakpoint,10,10,9,10,10
best_n_breakpoint_rss,2,2,2,2,2


---

### Segment the profile into three zones

1. Locate the breakpoints in `Vertical Position`.

In [7]:
trial = analysis.select_best_trial(path_json)
trial_select = df_results[trial[0]]

N_BREAKPOINT = 2 # CHANGE THIS PARAMETER IF A DIFFERENT NUMBER OF BREAKPOINTS IS DESIRED.

params_ms = utils.get_breakpoint_data(trial_select['df'], N_BREAKPOINT)
ms = utils.rebuild_model(X,Y,params_ms)

breakpoints = analysis.extract_breakpoints(ms)
breakpoints

Unnamed: 0,Breakpoint X Position,Breakpoint Y Position,Confidence Interval (X)
1,11.527189,11217.493488,"(11.503393966076617, 11.550983387883202)"
2,16.422381,52924.044107,"(16.402760859809554, 16.442001620394137)"


2. Verify that the breakpoint locations are correct.

In [8]:
df_ms = pd.DataFrame({'n_breakpoints': trial_select['df']['n_breakpoints'], 
                    'estimates': trial_select['df']['estimates']})

plots.interactive_segmented_regression(x=X, y=Y, df=df_ms, title=name)

interactive(children=(IntSlider(value=2, description='n_breakpoints', max=10), Output()), _dom_classes=('widge…

3. Segment the profile based on the breakpoint locations.  

> **Important**: The segmentation is designed for only two breakpoints, as discussed in the virtual meeting. The function could be modified to allow segmentation into more parts.

In [9]:
A = breakpoints['Breakpoint X Position'].iloc[0]
B = breakpoints['Breakpoint X Position'].iloc[1]

segments = sz.split_by_points(A, B, df, x_col = x_column, y_col = y_column)

---
### Calculate Statistics by Zone

In [10]:
stats_df = sz.compute_segment_statistics(segments[0], # fresh water
                                         segments[1], # Mixing zone
                                         segments[2], # Salt water
                                         y_column)
stats_df

Unnamed: 0,mean,std,cv,min,max,median,25%,50%,75%,iqr
Freshwater zone,7136.249656,2558.421794,0.358511,1763.2,13662.6,6741.7,5537.6,6741.7,8596.55,3058.95
Mixing zone,33173.982437,11381.135981,0.343074,13668.8,49965.4,34298.3,26224.05,34298.3,43088.4,16864.35
Saltwater zone,54240.572021,1025.543784,0.018907,49990.5,55091.8,54758.3,54175.5,54758.3,54778.3,602.8


---

### Boxplot by Zone

In [12]:
# Plot the boxplot for the variable 'y' with a custom title and external annotations.
sz.plot_boxplot_segments(segments[0], segments[1], segments[2], 
                        variable=y_column, 
                        segment_names=['Freshwater zone', 'Mixing zone', 'Saltwater zone'], 
                        show_outliers=True,
                        title=f"Boxplot of the different zones: <b>{name}<b>")
