(number-proportions)=
# Computing age- and length-binned number proportions

In [11]:
%run ./stratify_data.ipynb



## Setting up the environment

With all of the data ingested, preprocessed, and stratified, we can begin to run the transect-based analysis for estimating spatial distributions of fish. The first steps of this part of the analysis leverage functions from the `get_proportions` module.

In [10]:
from echopop.nwfsc_feat import get_proportions

## Animal counts over age and length bins

The first step comprises computing the count distributions for length and age bins (if present) for each sex across strata (we will be using `"stratum_ks"` as our stratum definition). These binned counts are done separately for aged and unaged fish since they are processed somewhat differently throughout this workflow. However, the function `compute_binned_counts` from the `get_proportions` module can be parameterized within any number of `groupby_col` columns. 

In [13]:
# Dictionary for number counts
dict_df_counts = {}

# Aged
dict_df_counts["aged"] = get_proportions.compute_binned_counts(
    data=dict_df_bio["specimen"].dropna(subset=["age", "length", "weight"]), 
    groupby_cols=["stratum_ks", "length_bin", "age_bin", "sex"], 
    count_col="length",
    agg_func="size"
)

# Unaged
dict_df_counts["unaged"] = get_proportions.compute_binned_counts(
    data=dict_df_bio["length"].copy().dropna(subset=["length"]), 
    groupby_cols=["stratum_ks", "length_bin", "sex"], 
    count_col="length_count",
    agg_func="sum"
)

The `compute_binned_counts` function produces a `pandas.DataFrame` with the defined `groupby_cols` and their associated `count` totals. For instance, the first 10 rows of `dict_df_counts["aged"]` looks like:

In [15]:
from IPython.display import display

display(dict_df_counts["aged"].head(10))

Unnamed: 0,stratum_ks,length_bin,age_bin,sex,count
0,0,"(1.0, 3.0]","(0.5, 1.5]",female,0
1,0,"(1.0, 3.0]","(0.5, 1.5]",male,0
2,0,"(1.0, 3.0]","(0.5, 1.5]",unsexed,0
3,0,"(1.0, 3.0]","(1.5, 2.5]",female,0
4,0,"(1.0, 3.0]","(1.5, 2.5]",male,0
5,0,"(1.0, 3.0]","(1.5, 2.5]",unsexed,0
6,0,"(1.0, 3.0]","(2.5, 3.5]",female,0
7,0,"(1.0, 3.0]","(2.5, 3.5]",male,0
8,0,"(1.0, 3.0]","(2.5, 3.5]",unsexed,0
9,0,"(1.0, 3.0]","(3.5, 4.5]",female,0


This contrasts `dict_df_counts["unaged"]` which lacks the `age_bin` column:

In [16]:
display(dict_df_counts["unaged"].head(10))

Unnamed: 0,stratum_ks,length_bin,sex,count
0,1,"(1.0, 3.0]",female,0
1,1,"(1.0, 3.0]",male,0
2,1,"(1.0, 3.0]",unsexed,0
3,1,"(3.0, 5.0]",female,0
4,1,"(3.0, 5.0]",male,0
5,1,"(3.0, 5.0]",unsexed,0
6,1,"(5.0, 7.0]",female,0
7,1,"(5.0, 7.0]",male,0
8,1,"(5.0, 7.0]",unsexed,0
9,1,"(7.0, 9.0]",female,0


## Convert counts into number proportions

These binned counts are normalized into proportions using the `number_proportions` function. This takes in the entire dictionary copmrising the aged and unaged fish to compute the within group proportions (i.e. those specific to just `aged` and `unaged`) as well as the overall/global proportions (i.e. counts relative to the summed counts across all aged and unaged animals). This adds two new columns: `proportion` (within-group) and `proportion_overall` (global). 

In [17]:
# Compute number proportions
dict_df_number_proportion = get_proportions.number_proportions(
    data=dict_df_counts, 
    group_columns=["stratum_ks"],
    exclude_filters={"aged": {"sex": "unsexed"}},
)