# Fitting von Bertalanffy growth function on modeling data
We have tried to fit different growth curves on our modeling data. We end up settling on using von Bertalanffy (1938) growth function because it offer the best fit amount other alternatives, such as linear, quadratic, cubic, exponential, and Janoschek (1957) equation. For simplicity we only present von Bertalanffy (1938) results here. 

This script takes part I dataset "data_part1_1250.csv" to produce the growth parameters dataset "growth_params_1250.csv"

Due to the long run time, by default this notebook will not run when you press **Reproducible Run** on CodeOcean. 

This document only shows a general workflow with ample annotations. If you want to knows about all the technical details, please refers to the helper.py

## Import libraries

In [1]:
import numpy as np
import pandas as pd
from tqdm import tqdm
from helper import RawData, GrowthModel

## Import Part I raw data 
Part I data is in a control parameter range that produces somewhat decent reading outcome. 

In [2]:
raw = RawData("../../data/data_part1_1250.csv")

## Create von Bertalanffy (1938) function

In [3]:
def von_bertalanffy(x:float, max_acc:float, k:float, x0:float) -> float:
    """ von Bertalanffy (1938) growth function
    This function is originally used to describe the growth of an organism
    It assume the rate of growth of an organism (in our case accuracy) 
    declines with size (or in our case epoch) 
    so that the rate of change in length, l,  can be described by:
    dl/dt = K (L_inf - l) or under our context: dy/dx = k (max_acc - y)
    max_acc: Maximum accuracy / upper asymtote
    k: growth rate
    x0: x value where model start to learn
    """
    return max_acc * (1 - np.exp(-k * (x - x0)))

## Fit one model for demo
To demonstate what the growth model results looks like in a growth model, we will plot one model. 
1. Get the data required from raw data, 
2. Fit the von_bertalanffy function to the data, *bounds* indicate the accepted range in each parameters
3. Visualize result by comparing actual data against predicted value

In [4]:
# Get one set of data for demo
df = raw.get(code_name=65317510, cond='HF_CON', remove_zero=True)
demo = GrowthModel(growth_function=von_bertalanffy, xdata=df.epoch, ydata=df.score, name="von Bertalanffy")
demo.fit(bounds=(0, [1, np.inf, 1]))
demo.plot()

- This figure showing the result of one growth model
- We choose this growth function (von Bertalanffy) becasue it fit our data well
- You can explore other simulation models / condition by changing the input dataframe (df)

## Fitting this model to entire simulation dataset
Depending on your computer's speed, below chunk may need to run more than 10 minutes. You can see a estimated required time in the progress bar. 

## Fit model to all part I simulation data
Basically just repeating above demo for 1250 * 6 times, and collect each model's parameter for later use.

In [5]:
results = pd.DataFrame()
all_code_names = raw.df.code_name.unique()
all_conds = raw.df.cond.unique()

# Iterate over all "simulation ID (code_name)" and conditions
for m in tqdm(all_code_names):
    for c in all_conds:
        this_df = raw.get(code_name=m, cond=c, remove_zero=True)
        model = GrowthModel(von_bertalanffy, this_df.epoch, this_df.score, "von Bertalanffy")
        model.fit(bounds=(0, [1, np.inf, 1]))
        
        # Collect results
        this_results = pd.DataFrame(
            {
                "code_name": m,
                "cond": c,
                "mse": model.mse,
                "max_acc": model.params[0],
                "k": model.params[1],
                "x0": model.params[2]
            },
            index=[0]
        )

        results = results.append(this_results)

100%|██████████| 1250/1250 [06:59<00:00,  2.98it/s]


## Merge control parameter settings into results and export

In [12]:
control_parameters_settings = raw.df.groupby("code_name").mean().reset_index()
control_parameters_settings.drop(columns=["epoch", "score"], inplace=True)
results = control_parameters_settings.merge(results, on=["code_name"], how="right")
results.to_csv("../../data/growth_params_1250.csv", index=False)