# Segmenting Cardiomyoctye Model Parameters

### *Personal Note*

*The data and analysis included here was part of a project where I evaluated predictions of cardiomyocyte action potential models. The objective here is to separate different hypotheses from a population of models. Additionally, I review some different clustering stratagies. The model features have biophysical and physiological meaning, but I will put a principal component step in as well. The data set here was the resulting output from training multiple populations of models using evolutionary optimization algorithms. I hope you find it interesting.*

## Introduction

The cardiac action potential (AP) is the electrical waveform that is conducted through individual cells during a cardiac cycle. Cardiac electrophysiology is a complex system consisting of electrically active proteins, organelles, and tonal signalling molecules that work in conjuction to produce the AP. Each singular cell expresses different quantities of each active component, which will influence the risk of an arrhythmia occurance. It is quite challenging to determine the component quanties in individual cells, so it is useful to train AP models on data sets collected from specific cells to estimate these quantities. I used an evolutionary algorithm (EA) to optimize 14 parameters and pooled the best performing models into a single dataset. In this analysis I am going to use clustering algorithms to segment the proposed solutions from the EA. I expect that independent EA optimizations will produce independent solutions, but if multiple EA runs converge on the same solution then it might suggest that a global minimum was discovered.

   Let's look at the data!

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

"""'cluster_tools' is a collection of functions with my preferred defaults."""
import cluster_tools as ct

In [6]:
c5_params = pd.read_csv('./c5_EA_params_20220222.txt', delimiter=' ')
c5_params.head()

Unnamed: 0,phi,G_K1,G_Kr,G_Ks,G_to,P_CaL,G_CaT,G_Na,G_F,K_NaCa,P_NaK,G_b_Na,G_b_Ca,G_PCa,fitness
0,0.775628,0.675315,2.650434,4.334555,0.009196563,0.288796,0.389463,0.335001,1.512508,2.888285,0.391604,0.035992,0.586479,0.609346,132.473587
1,0.775628,0.675315,2.650434,4.334555,0.02450242,0.292615,0.419559,0.349077,1.596385,2.992886,0.438024,0.035335,0.526647,0.609346,133.175263
2,0.775628,0.675315,2.650434,4.334555,0.01432695,0.293383,0.406567,0.335001,1.512135,2.901646,0.370172,0.035933,0.496448,0.609346,133.427304
3,0.775628,0.25414,2.650434,4.334555,8.637277e-17,0.315902,0.475337,0.310485,1.703607,2.99882,0.408051,0.020221,0.154906,0.748181,133.932624
4,0.775628,0.675315,2.640649,4.850079,0.05976922,0.30542,0.500353,0.341133,1.579939,2.898543,0.324433,0.025676,0.168743,0.734143,135.36983


### Data definitions and observations

There are serveral things to point out here that will help build an understanting of the data. Firstly, the first 5 values of the variable *phi* ($\phi$) are identical up to the 6th digit and 4/5 of the *G_K1* values are also identical. This individual similarity is true for the other variables as well, and is expected from optimizations using evolutionary algorithms. As the name of the algorithm implies, EAs select the most fit individuals to reproduce and populate the next generation. The last column named *fitness* is the 