# Dimension Changer

Prerequisites:
- You have preprocessed your adata using some time smoothing method into the data/preprocessed directory. [notebook: 3-time-pseudotime]: 
    - gene values on _X_ (adata.X)
    - phate embeddings of the data on _obsm_ (ex.: adata.obsm['X_phate])
    - any extra metadata on _obs_ (ex.: adata.obs['time_label'], adata.obs['disease_progression'], etc...)
    - pseudotimetime in the on _obsm_ (ex.: adata.obsm['pseudotime'])


In this notebook, we will:
- Train a Dimension Changer network.
- This network will alow us to project from the 50 dimension PCA into the 2D PHATE dimensions.
- This network is important for visualization, since we want to retrieve our trajectories over the PHATE dimensions.

Dimension Changer specifications:
- The dimension changer is a simple encoder-decoder that is converting from PHATE space into PCA and vice-versa
- Since we are using a network to learn this function, we need to define a loss function. In our case we will use a weighted-MSE loss, where the weights are the variance of each PC dimension. This will force our network to give more importance in reconstructing the main PC dimensions.

In [1]:
import scanpy as sc
import os

PROCESSED_DATA_DIR = os.path.join('../../data', 'processed')
DIMENSION_CHANGER_DIR = os.path.join('../../dimension_changer')
os.makedirs(DIMENSION_CHANGER_DIR, exist_ok=True)

print(PROCESSED_DATA_DIR)
print(DIMENSION_CHANGER_DIR)

../../data/processed
../../dimension_changer


We start by loading the data that has all the necessary variables

In [2]:
adata = sc.read(os.path.join(PROCESSED_DATA_DIR, 'adata_time.h5ad'))
adata

AnnData object with n_obs × n_vars = 17944 × 18019
    obs: 'time_label', 'pseudotime'
    uns: 'pca'
    obsm: 'X_pca', 'X_phate'
    varm: 'PCs'

Now we define the necessary variables to train the Dimension Changer

In [3]:
X_phate = adata.obsm['X_phate']
X_pca = adata.obsm['X_pca']
var_ratio = adata.uns['pca']['variance_ratio']

The dimchanger is currently defined under omics_toolbox/dimchanger.py

We saved it under DIMENSION_CHANGER_DIR

In [4]:
from omics_toolbox.dimchanger import DimChanger

dimchanger = DimChanger.train(X_phate, X_pca, None, var_ratio, save_dir=DIMENSION_CHANGER_DIR,train_reducer=False)


Seed set to 42
GPU available: True (mps), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/Users/joaofelipe/miniconda3/envs/omics_toolbox/lib/python3.10/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:654: Checkpoint directory /Users/joaofelipe/Yale/Omics Toolbox/4_Code/Omics Toolbox/notebooks/MIOFlow/checkpoints exists and is not empty.

  | Name    | Type       | Params | Mode 
-----------------------------------------------
0 | net     | Sequential | 2.9 K  | train
1 | loss_fn | MSELoss    | 0      | train
-----------------------------------------------
2.9 K     Trainable params
0         Non-trainable params
2.9 K     Total params
0.012     Total estimated model params size (MB)
11        Modules in train mode
0         Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

/Users/joaofelipe/miniconda3/envs/omics_toolbox/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:420: Consider setting `persistent_workers=True` in 'val_dataloader' to speed up the dataloader worker initialization.
/Users/joaofelipe/miniconda3/envs/omics_toolbox/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:420: Consider setting `persistent_workers=True` in 'train_dataloader' to speed up the dataloader worker initialization.


Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric decoder/val_loss improved. New best score: 0.349


Validation: |          | 0/? [00:00<?, ?it/s]

Metric decoder/val_loss improved by 0.038 >= min_delta = 0.0001. New best score: 0.311


Validation: |          | 0/? [00:00<?, ?it/s]

Metric decoder/val_loss improved by 0.013 >= min_delta = 0.0001. New best score: 0.298


Validation: |          | 0/? [00:00<?, ?it/s]

Metric decoder/val_loss improved by 0.012 >= min_delta = 0.0001. New best score: 0.287


Validation: |          | 0/? [00:00<?, ?it/s]

Metric decoder/val_loss improved by 0.004 >= min_delta = 0.0001. New best score: 0.282


Validation: |          | 0/? [00:00<?, ?it/s]

Metric decoder/val_loss improved by 0.001 >= min_delta = 0.0001. New best score: 0.281


Validation: |          | 0/? [00:00<?, ?it/s]

Metric decoder/val_loss improved by 0.001 >= min_delta = 0.0001. New best score: 0.280


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric decoder/val_loss improved by 0.005 >= min_delta = 0.0001. New best score: 0.275


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric decoder/val_loss improved by 0.001 >= min_delta = 0.0001. New best score: 0.275


Validation: |          | 0/? [00:00<?, ?it/s]

Metric decoder/val_loss improved by 0.001 >= min_delta = 0.0001. New best score: 0.273


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Metric decoder/val_loss improved by 0.005 >= min_delta = 0.0001. New best score: 0.268


Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

Monitored metric decoder/val_loss did not improve in the last 10 records. Best score: 0.268. Signaling Trainer to stop.


