Skip to content

AddisonHowe/dynamical-landscape-inference

Repository files navigation

dynamical-landscape-inference

This repository demonstrates the application of the PLNN architecture described in "Dynamical systems theory informed learning of cellular differentiation landscapes" and implemented in a separate Github repository: https://github.com/AddisonHowe/plnn.

Setup

Basic setup, without GPU acceleration:

conda create -p ./env python=3.9 jax=0.4 numpy=1.26 matplotlib=3.8 scikit-learn=1.5 pytorch=2.0 torchvision equinox=0.11 optax=0.1 pyyaml=6.0 tqdm ipykernel pytest
conda activate env
pip install diffrax==0.6.0

For GPU support:

conda create -p ./env python=3.9 numpy=1.25 matplotlib=3.8 scikit-learn=1.5 pytest=7.4 cuda-compat=12.4 pyyaml=6.0 tqdm ipykernel ipywidgets --yes
conda activate env
pip install --upgrade pip
pip install jax[cuda12] optax==0.1.7 diffrax==0.6.0 equinox==0.11.5 torch==2.0.1 torchvision torchaudio

Then, install the PLNN project at https://github.com/AddisonHowe/plnn:

pip install git+https://github.com/AddisonHowe/plnn.git@v0.1.0-alpha

Synthetic data generation

We first generate synthetic data using the shell scripts available in data/training_data/basic. These scripts run the PLNN module plnn/data_generation/generate_data.py. Each data-generating script is described below, and will produce a subdirectory in the data/training_data/basic folder. This subdirectory will contain three datasets: training, validation, and testing.

Binary choice landscape

The binary choice landscape is given by $$\phi(x,y;\boldsymbol{\tau})=x^4+y^4+y^3-4x^2y+y^2+\tau_1x+\tau_2y.$$ We assume that two signals, $s_1$ and $s_2$, map identically to the tilt parameters, so that in terms of the signal $$\phi(x,y;\boldsymbol{s})=x^4+y^4+y^3-4x^2y+y^2+s_1x+s_2y.$$

  • data_phi1_1[abc]
    $T=100$,
    $\Delta T=[10,50,100]$,
    $\sigma=0.1$,
    $N_{cells}=500$,
    $N_{train}=[100,500,1000]$,
    $N_{valid}=N_{test}=0.3N_{train}$,
    signal switch range: $[0.1T, 0.9T]$
    $T_{burn}=0.1T$,
    $s_{burn}=s|_{t=0}$,
    $x_0=(0, -0.5)$

  • data_phi1_2[abc]
    $T=20$,
    $\Delta T=[5,10,20]$,
    $\sigma=0.1$,
    $N_{cells}=500$,
    $N_{train}=[100,200,400]$,
    $N_{valid}=N_{test}=0.2N_{train}$,
    signal switch range: $[0.1T, 0.9T]$
    $T_{burn}=0.1T$,
    $s_{burn}=s|_{t=0}$,
    $x_0=(0, -0.5)$

  • data_phi1_3[abc]
    $T=20$,
    $\Delta T=[5,10,20]$,
    $\sigma=0.1$,
    $N_{cells}=50$,
    $N_{train}=[50,100,200]$,
    $N_{valid}=N_{test}=0.2N_{train}$,
    signal switch range: $[0.1T, 0.15T]$
    $T_{burn}=0.1T$,
    $s_{burn}=s|_{t=0}$,
    $x_0=(0, -0.5)$

  • data_phi1_4[abc]
    $T=20$,
    $\Delta T=[5,10,20]$,
    $\sigma=0.1$,
    $N_{cells}=200$,
    $N_{train}=[50,100,200]$,
    $N_{valid}=N_{test}=0.2N_{train}$,
    signal switch range: $[0.1T, 0.15T]$
    $T_{burn}=0.1T$,
    $s_{burn}=s|_{t=0}$,
    $x_0=(0, -0.5)$

Binary flip landscape

The binary flip landscape is given by $$\phi(x,y;\boldsymbol{\tau})=x^4+y^4+x^3-2xy^2-x^2+\tau_1x+\tau_2y.$$ Again, we assume that two signals, $s_1$ and $s_2$, map identically to the tilt parameters, so that $$\phi(x,y;\boldsymbol{s})=x^4+y^4+x^3-2xy^2-x^2+s_1x+s_2y.$$

  • data_phi2_1[abc]
    $T=100$,
    $\Delta T=[10,50,100]$,
    $\sigma=0.3$,
    $N_{cells}=500$,
    $N_{train}=[100,500,1000]$,
    $N_{valid}=N_{test}=0.3N_{train}$,
    signal switch range: $[0.1T, 0.9T]$,
    $T_{burn}=0.05T$,
    $s_{burn}=(-0.25, 0)$,
    $x_0=(-1, 0)$

Quadratic potential

The quadratic potential is given by $$\phi(x,y;\boldsymbol{\tau})=\frac{1}{4}x^2 + \frac{1}{9}y^2 +\tau_1x+\tau_2y.$$ Here too we assume that two signals, $s_1$ and $s_2$, map identically to the tilt parameters, so that $$\phi(x,y;\boldsymbol{s})=\frac{1}{4}x^2 + \frac{1}{9}y^2 +s_1x+s_2y.$$

  • data_phiq_1a
    $T=100$,
    $\Delta T=10$,
    $\sigma=0.5$,
    $N_{cells}=500$,
    $N_{train}=100$,
    $N_{valid}=N_{test}=0.3N_{train}$,
    signal switch range: $[0.1T, 0.9T]$,
    $T_{burn}=0.1T$,
    $s_{burn}=s|_{t=0}$,
    $x_0=(0, -0.5)$

  • data_phiq_2a
    $T=100$,
    $\Delta T=10$,
    $\sigma=0.1$,
    $N_{cells}=500$,
    $N_{train}=100$,
    $N_{valid}=N_{test}=0.3N_{train}$,
    signal switch range: $[0.1T, 0.9T]$,
    $T_{burn}=0.1T$,
    $s_{burn}=s|_{t=0}$,
    $x_0=(0, -0.5)$

PLNN model training and evaluation

We apply the PLNN model training procedure to the synthetic datasets. The directories within data/model_training_args contain a number of tsv files, each specifying arguments to be used for an instance of model training. The list of these tsv files is included below, and a table summarizing the various arguments contained in each can be found at data/model_training_args/README.md. (This table can be updated to reflect the argument files using the command sh data/model_training_args/_update_readme.sh)

The shell script scripting/echo_training_cmd_from_runargs.sh takes as input a path specifying one of these argument files, and prints the corresponding python command that will run the training procedure. Note: The argument files contained here assume access to a GPU.

For each argument file, we can train multiple models in order to assess the variation across multiple training runs. A full list of the trained models can be found at data/trained_models/README.md. The training process generates an output model directory containing log files and data saved over the course of training.

Model training argument files

Binary choice landscape

Binary flip landscape

Quadratic potential

Model evaluation

After the training process, we evaluate the resulting model using the notebooks contained in notebooks/model_evaluation/. These notebooks create a number of plots, detailing the evolution of the model over the course of training and the resulting system. The bash scripts located in scripting/model_evaluation/ automate this process, and can be run as follows:

# Evaluate models listed in...

# scripting/model_evaluation/arglist_nb_eval_model_plnn_synbindec.tsv
sh scripting/model_evaluation/run_all_nb_eval_model_plnn_synbindec.sh

# scripting/model_evaluation/arglist_nb_eval_model_plnn_quadratic.tsv
sh scripting/model_evaluation/run_all_nb_eval_model_plnn_quadratic.sh

# scripting/model_evaluation/arglist_nb_eval_model_alg_synbindec.tsv
sh scripting/model_evaluation/run_all_nb_eval_model_alg_synbindec.sh

# scripting/model_evaluation/arglist_nb_eval_model_alg_quadratic.tsv
sh scripting/model_evaluation/run_all_nb_eval_model_alg_quadratic.sh

These commands will run the model evaluation notebook on all models listed in the corresponding arglist file, and generate output in the directory data/model_evaluation/.

It may also be the case that we would like to examine the performance of the trained model on a particular dataset. The model_eval entrypoint allows us to do so, as follows:

model_eval \
    --basedir data/trained_models/<plnn_synbindec> \  # location of trained model
    --modeldir <modelname> \                     # the model directory
    --datdirbase data/training_data/<subdir> \   # base directory containing data
    --datdir <datdir> \                          # data directory
    --dataset <train-test-valid> \               # subdirectory within <datdir>
    --dt0 <dt0> \                                # initial model timestep to use
    --batch_size 20 \       # batch size
    --nresamp 1 \           # num resamplings to perform for each datapoint
    --nreps 10 \            # num simulations to run for each (re)sampled datapoint
    --outdir <outdir> \     # directory to save output
    --nosuboutdir           # save output directly in <outdir>

Automated figure generation

The images generated in the evaluation process can be arranged into a set of pdfs in an automated procedure, using the scripts available in the scripting/autofig/ directory. First, create or modify an environment file .env, and set the following environment variables:

# Filename: dynamical-landscape-inference/.env
ILLUSTRATOR_PATH=<path/to/adobe/illustrator.app>
PROJ_DIR_TILDE=<~/path/to/dynamical-landscape-inference/>  # using tilde for home directory

Then, run the figure generation scripts:

sh scripting/autofig/alg_quadratic/generate_ai_files_from_template.sh
sh scripting/autofig/alg_synbindec/generate_ai_files_from_template.sh
sh scripting/autofig/facs/generate_ai_files_from_template.sh
sh scripting/autofig/plnn_quadratic/generate_ai_files_from_template.sh
sh scripting/autofig/plnn_synbindec/generate_ai_files_from_template.sh

Distorted landscapes

By transforming the synthetic data generated above, we can produce training data that corresponds to observing 2-dimensional gradient dynamics within a curved submanifold of $\mathbb{R}^3$. The datasets located in data/training_data/distortions/paraboloids/ correspond to datasets located at datat/training_data. The values stored in the file xs.npy in the original datasets are treated as $(u,v)$ points, and we transform these points into $(x,y)$ values using the following transformation: $$ \begin{aligned} x &= \sinh\left(\sqrt{\frac{2}{\kappa_1}} x\right) + \sinh\left(\sqrt{\frac{2}{\kappa_2}} y\right) \ y &= \sinh\left(\sqrt{\frac{2}{\kappa_1}} x\right) - \sinh\left(\sqrt{\frac{2}{\kappa_2}} y\right). \end{aligned} $$ This transformation corresponds to a parameterization of the hyperbolic paraboloid, or saddle, $z=h(x,y)=\frac{1}{2}\kappa_1 x^2 - \frac{1}{2}\kappa_2 y^2$. In the $(u,v)$ space, the dynamics are derived from the gradient of a potential. The observed data, however, are constructed from the projection of the transformed $(x,y,z)$ points onto the $(x,y)$ plane, thereby introducing a distortion due to the curvature of the surface.

Figure generation

The figures in the manuscript and supplement can be reproduced using the scripts available in the figures/ directory. In figures/manuscript/ there are a number of python scripts that generate the individual plots appearing in the main figures of the manuscript. The corresponding shell scripts run these python files with any hyperparameters specified. Note that in order to generate the plots for Figures 6 and 7, one must set the environment variable MESC_PROJ_PATH in the file .env to point to the mescs-invitro-facs project directory (available at https://github.com/AddisonHowe/mescs-invitro-facs) containing the FACS data used in those plots.

# Generate all plots appearing in primary figures
sh figures/make_all_manuscript_plots.sh

# The command above runs the following scripts:
# sh figures/manuscript/run_make_fig1_landscape_models.sh
# sh figures/manuscript/run_make_fig3_synthetic_training.sh
# sh figures/manuscript/run_make_fig4_sampling_rate_sensitivity.sh
# sh figures/manuscript/run_make_fig5_dimred_schematic.sh
# export MESC_PROJ_PATH=<path/to/mesc-invitro-facs>
# sh figures/manuscript/run_make_fig6_facs_training.sh
# sh figures/manuscript/run_make_fig7_facs_evaluation.sh

A similar set of commands are available to generate plots appearing in the supplementary information.

# Generate all plots appearing in supplemental figures
sh figures/make_all_supplement_plots.sh

# The command above runs the following scripts:
# sh run_make_figure_s1

Acknowledgments

This work was inspired by the work of Sáez et al. in Statistically derived geometrical landscapes capture principles of decision-making dynamics during cell fate transitions.

References

[1] Sáez M, Blassberg R, Camacho-Aguilar E, Siggia ED, Rand DA, Briscoe J. Statistically derived geometrical landscapes capture principles of decision-making dynamics during cell fate transitions. Cell Syst. 2022 Jan 19;13(1):12-28.e3. doi: 10.1016/j.cels.2021.08.013. Epub 2021 Sep 17. PMID: 34536382; PMCID: PMC8785827.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published