# Creating the Dataset

## Creating Parameter CSV file

### First step of creating the Dataset

This first step focuses on defining ranges parameters of different elements of the cardiac circulation (some ranged and some constant) – based on preset parameters from literature, and then randomising values for these parameters using Latin hypercube (a statistical method for generating a near-random sample of parameter values).

Start by importing the necessary modules
* Imported LatinHypercube from autoemulate module, experimental_design submodule
* Imported NaghaviModelParameters from ModularCirc Models submodule


In [53]:
%load_ext autoreload 
%autoreload 2

import matplotlib.pyplot as plt
import numpy  as np
import pandas as pd
import os
import json
from autoemulate.experimental_design import LatinHypercube
from ModularCirc.Models.NaghaviModel import NaghaviModelParameters

# os.chdir("../utils") # need to change this!

import circ_utils
## Importing a .py file with functions needed to call the script to produce the output data

!pwd

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
/Users/aalexander-ikwue/Documents/GitHub/SensitivityAnalysis/Tutorials


Define the folder path - adapt as necessary to your machine.

In [54]:
main_path = os.getcwd()
# print(current_dir)
# main_path = '/Users/aalexander-ikwue/Documents/GitHub/modular-circ-test' # Review this - change path
# Add instruuction to be in the tutorial file when running this

Imported a dictonary of parameters for structures in the heart, based on the Naghavi parameter values, with a range to allow for variablilty. Some values are constant and do not have  a range.

In [55]:

with open('parameters.json', 'r') as file:
    dict_parameters = json.load(file)

# dict_parameters = json.loads(parameters)
print(dict_parameters)

{'VESSELS': {'ao': {'r': [240, [0.5, 1.5]], 'c': [0.3, [0.5, 1.5]], 'l': [0], 'v_ref': [100], 'v': [130]}, 'art': {'r': [[1125], [0.5, 1.5]], 'c': [3, [0.5, 1.5]], 'l': [0], 'v_ref': [900], 'v': [1092]}, 'ven': {'r': [9, [0.5, 1.5]], 'c': [133.3, [0.5, 1.5]], 'l': [0.0], 'v_ref': [2800], 'v': [378]}}, 'VALVES': {'av': {'r': [6, [0.5, 1.5]]}, 'mv': {'r': [4.1, [0.5, 1.5]]}}, 'CHAMBERS': {'la': {'E_pas': [0.44, [0.5, 1.5]], 'E_act': [0.45, [0.5, 1.5]], 'v_ref': [10, [0.5, 1.5]], 'k_pas': [0.05, [0.333, 1.5]], 'v': [93], 'delay': [150], 't_tr': [225], 'tau': [25], 't_max': [150]}, 'lv': {'E_pas': [1.0, [0.5, 1.5]], 'E_act': [3, [0.5, 1.5]], 'v_ref': [10.0, [0.5, 1.5]], 'k_pas': [0.03, [0.333, 1.5]], 'delay': [0], 't_tr': [420], 'tau': [25], 't_max': [280]}}, 'T': [[800], [0.375, 1.5]]}


In [56]:
## This function splits the values from the parameter dictionary by full stops and separates the values that have ranges from those that don't have ranges.

dict_parameters_condensed_range = dict()
dict_parameters_condensed_single = dict()

def condense_dict_parameters(dict_param:dict, prev=""):
    for key, val in dict_param.items():
        if len(prev) > 0:
            new_key = prev.split('.')[-1] + '.' + key
        else:
            new_key = key
        if isinstance(val, dict):
            condense_dict_parameters(val, new_key)
        else:
            if len(val) > 1:
                value, r = val
                dict_parameters_condensed_range[new_key] = tuple(np.array(r) * value)
            else:
                dict_parameters_condensed_single[new_key] = val[0]
    return  

Use the function to condense the parameters into their respective groups (range or single) with the dictionary that we created. 

In [57]:
condense_dict_parameters(dict_parameters)

Choose the number of samples you would like to create and randomise values for each parameter set - and make the results into a dataframe.

In [58]:
N_samples = 50
lhd = LatinHypercube(dict_parameters_condensed_range.values())
sample_array = lhd.sample(N_samples)
sample_df    = pd.DataFrame(sample_array, columns=dict_parameters_condensed_range.keys())
sample_df.index.name = 'Index'
sample_df.head()

Unnamed: 0_level_0,ao.r,ao.c,art.r,art.c,ven.r,ven.c,av.r,mv.r,la.E_pas,la.E_act,la.v_ref,la.k_pas,lv.E_pas,lv.E_act,lv.v_ref,lv.k_pas,T
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
0,274.988084,0.209519,1558.051337,2.749337,8.735608,69.557549,3.627426,4.374678,0.586213,0.651466,5.784654,0.035784,0.984795,3.692434,12.891043,0.03522,505.757418
1,141.032157,0.165468,1011.805037,1.716107,10.848311,68.152175,4.230966,4.637202,0.354824,0.400141,9.963051,0.060433,0.962821,2.524167,6.274446,0.025017,772.620159
2,287.546329,0.423674,933.534917,4.308887,6.881699,125.800255,6.014242,5.401198,0.470881,0.293268,13.428154,0.053925,0.749092,2.746146,8.22289,0.016817,538.739535
3,257.986308,0.272594,1627.626405,3.901524,12.509697,117.43965,3.333434,3.313422,0.454794,0.462709,10.902334,0.067563,0.876184,4.075654,9.662654,0.020261,827.736235
4,159.651817,0.261205,1184.785864,1.867056,6.70458,145.679557,3.208325,3.48201,0.56304,0.414446,14.362614,0.024941,1.177014,3.924868,9.528478,0.011246,884.591241


Add the previously separated constants back into the new dataframe.

In [59]:
for key, val in dict_parameters_condensed_single.items():
    print(key, val)

    sample_df[key] = val
    ###
    sample_df[key] = val
    ###
sample_df.head()

ao.l 0
ao.v_ref 100
ao.v 130
art.l 0
art.v_ref 900
art.v 1092
ven.l 0.0
ven.v_ref 2800
ven.v 378
la.v 93
la.delay 150
la.t_tr 225
la.tau 25
la.t_max 150
lv.delay 0
lv.t_tr 420
lv.tau 25
lv.t_max 280


Unnamed: 0_level_0,ao.r,ao.c,art.r,art.c,ven.r,ven.c,av.r,mv.r,la.E_pas,la.E_act,...,ven.v,la.v,la.delay,la.t_tr,la.tau,la.t_max,lv.delay,lv.t_tr,lv.tau,lv.t_max
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,274.988084,0.209519,1558.051337,2.749337,8.735608,69.557549,3.627426,4.374678,0.586213,0.651466,...,378,93,150,225,25,150,0,420,25,280
1,141.032157,0.165468,1011.805037,1.716107,10.848311,68.152175,4.230966,4.637202,0.354824,0.400141,...,378,93,150,225,25,150,0,420,25,280
2,287.546329,0.423674,933.534917,4.308887,6.881699,125.800255,6.014242,5.401198,0.470881,0.293268,...,378,93,150,225,25,150,0,420,25,280
3,257.986308,0.272594,1627.626405,3.901524,12.509697,117.43965,3.333434,3.313422,0.454794,0.462709,...,378,93,150,225,25,150,0,420,25,280
4,159.651817,0.261205,1184.785864,1.867056,6.70458,145.679557,3.208325,3.48201,0.56304,0.414446,...,378,93,150,225,25,150,0,420,25,280


Try and move this function to circ utils

In [60]:
def scale_time_parameters_and_asign_to_components(df):

# Scale the time parameters down based on specific pulse duration
# 800 ms in this case

    df['la.delay'] = df['la.delay'] * df['T'] / 800.
    
    df['la.t_tr'] = df['la.t_tr'] * df['T'] / 800.
    df['lv.t_tr'] = df['lv.t_tr'] * df['T'] / 800.
    
    df['la.tau'] = df['la.tau'] * df['T'] / 800.
    df['lv.tau'] = df['lv.tau'] * df['T'] / 800.

    df['la.t_max'] = df['la.t_max']  * df['T'] / 800.
    df['lv.t_max'] = df['lv.t_max']  * df['T'] / 800.
    return 

In [61]:
scale_time_parameters_and_asign_to_components(sample_df)

Save the complete file to a folder that will be used in  upcoming steps!
Used as input for emulation and running simulations.

In [62]:
# sample_df.to_csv(f'{main_path}/data/input/input_parameters_{N_samples}.csv')

## Running Simulations

### Second step of creating the dataset

The randomised input parameters that we created in the first step will now be run through ModularCirc simulations – until a steady state is reached and pressure pulse traces are output.
The number of simulations that are run is based on the number of randomised parameter sets you created as you are generating the coresponding target variables (output).

Define the folder path:

In [66]:
output_path = f"../Tutorials/data/pressure_traces_{N_samples}"
input_file = f"../Tutorials/data/input/input_parameters_{N_samples}.csv"

In [70]:
parameter_data_frame = pd.read_csv(input_file, index_col="Index")
parameter_data_frame.head()

Unnamed: 0_level_0,ao.r,ao.c,art.r,art.c,ven.r,ven.c,av.r,mv.r,la.E_pas,la.E_act,...,ven.v,la.v,la.delay,la.t_tr,la.tau,la.t_max,lv.delay,lv.t_tr,lv.tau,lv.t_max
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,348.304569,0.306675,942.533811,2.964606,7.969164,197.985741,8.648488,3.324483,0.599901,0.603122,...,378,93,67.915655,101.873483,11.319276,67.915655,0,190.163835,11.319276,126.77589
1,304.468405,0.241801,1126.440555,3.866698,8.323828,112.030187,3.681135,4.106844,0.584611,0.450121,...,378,93,71.577585,107.366377,11.929597,71.577585,0,200.417237,11.929597,133.611491
2,172.936062,0.290886,1157.472104,3.189631,10.237013,196.821325,8.351444,2.757103,0.382176,0.445517,...,378,93,139.676237,209.514355,23.279373,139.676237,0,391.093463,23.279373,260.728976
3,185.414668,0.335397,1001.671278,3.761802,11.991618,134.87475,8.47927,4.301734,0.619516,0.525006,...,378,93,220.304846,330.457269,36.717474,220.304846,0,616.853569,36.717474,411.235713
4,178.269221,0.225611,1509.389605,4.012232,4.568441,153.983975,7.484241,4.905259,0.596946,0.434038,...,378,93,126.805335,190.208002,21.134222,126.805335,0,355.054938,21.134222,236.703292


The randomised parameters are run through ModularCirc simulations -  until a steady state is reached and pressure pulse traces are output. These are the coresponding target variables to the input parameters generated in the first step.

In [None]:
dt = 1.
## Describes the time resolution of the results
N_cycles = 30
# The maximum number of heartbeats used to run a simulation

# successful_runs = joblib.Parallel(n_jobs=5)(joblib.delayed(run_case)(row, path_out, N_cycles, dt) for _, row in tqdm(parameter_data_frame.loc[args.restart_from:].iterrows(), total=len(parameter_data_frame.loc[args.restart_from:])))
# This line of code is using the `joblib` library to parallelize the execution of the `run_case` function for each row in the `parameter_data_frame` DataFrame. It is creating multiple parallel jobs (5 in this case) to process the rows concurrently. The `tqdm` function is used to display a progress bar for tracking the completion of the jobs.

os.system(f'rm {output_path}/*')
test = circ_utils.run_in_parallel(output_path, N_cycles, dt, parameter_data_frame)

Defines a dataframe for the values collected from the simulation

In [None]:
pressure_traces_df = circ_utils.simulation_loader(output_path)

Plotting the new dataframe:

In [None]:
for _, row  in pressure_traces_df.iterrows():
    plt.plot(row.values[:100])
    
plt.show()