Example notebook demonstrating scMaui on single-cell multi-omics toy data.

In [1]:
import os
import pkg_resources
from scmaui.data import load_data
from scmaui.data import SCDataset
from scmaui.utils import get_model_params
from scmaui.ensembles import EnsembleVAE

## 1) Loading data

In [2]:
data_path = pkg_resources.resource_filename('scmaui', 'resources/')
gtx = os.path.join(data_path, 'gtx.h5ad')
peaks = os.path.join(data_path, 'peaks.h5ad')

`peaks.h5ad` and `gtx.h5ad` contain 100 cells each of which 50 cells are shared

In [3]:
adatas = load_data([gtx, peaks], names=['gtx', 'peaks'])
adatas

{'input': [AnnData object with n_obs × n_vars = 100 × 35300
      var: 'interval', 'genename', 'ensid', 'genome', 'feature_type', 'chrom', 'start', 'end', 'view'
      uns: 'view'
      obsm: 'mask',
  AnnData object with n_obs × n_vars = 100 × 95482
      obs: 'logreads', 'sample'
      var: 'interval', 'genename', 'ensid', 'genome', 'feature_type', 'chrom', 'start', 'end', 'view'
      uns: 'view'
      obsm: 'mask'],
 'output': [AnnData object with n_obs × n_vars = 100 × 35300
      var: 'interval', 'genename', 'ensid', 'genome', 'feature_type', 'chrom', 'start', 'end', 'view'
      uns: 'view'
      obsm: 'mask',
  AnnData object with n_obs × n_vars = 100 × 95482
      obs: 'logreads', 'sample'
      var: 'interval', 'genename', 'ensid', 'genome', 'feature_type', 'chrom', 'start', 'end', 'view'
      uns: 'view'
      obsm: 'mask']}

We can construct a dataset considering only the intersection of cells like below

In [4]:
dataset = SCDataset(adatas, losses=['negbinom', 'negmul'], union=False)
dataset

Inputs: non-missing/samples x features
	gtx: 50/50 x 35300
	peaks: 50/50 x 95482
Outputs:
	gtx: 50/50 x 35300
	peaks: 50/50 x 95482
0 Adversarials: []
0 Conditionals: []

or use the union of cells, in which case missing modalities are kept for the analysis

We can also specify covariates that might be used as conditional input or adversarial training labels:

In [5]:
dataset = SCDataset(adatas, losses=['negbinom', 'negmul'], union=True, adversarial=['logreads'], conditional=['sample'])
dataset

Inputs: non-missing/samples x features
	gtx: 100/150 x 35300
	peaks: 100/150 x 95482
Outputs:
	gtx: 100/150 x 35300
	peaks: 100/150 x 95482
1 Adversarials: ['logreads']
1 Conditionals: ['sample']

In [6]:
dataset = SCDataset(adatas, losses=['negbinom', 'negmul'], union=True)
dataset

Inputs: non-missing/samples x features
	gtx: 100/150 x 35300
	peaks: 100/150 x 95482
Outputs:
	gtx: 100/150 x 35300
	peaks: 100/150 x 95482
0 Adversarials: []
0 Conditionals: []

## 2) Instantiate a scMaui model

First we obtain some default parameters for the model, which are informed by the dataset dimensions:

In [7]:
params = get_model_params(dataset)
params

OrderedDict([('nunits_encoder', 32),
             ('nlayers_encoder', 5),
             ('nunits_decoder', 20),
             ('nlayers_decoder', 1),
             ('dropout_input', 0.1),
             ('dropout_encoder', 0.0),
             ('dropout_decoder', 0.0),
             ('nunits_adversary', 128),
             ('nlayers_adversary', 2),
             ('nlatent', 10),
             ('nmixcomp', 1),
             ('input_modality', ['gtx', 'peaks']),
             ('output_modality', ['gtx', 'peaks']),
             ('adversarial_name', []),
             ('adversarial_dim', []),
             ('adversarial_type', []),
             ('conditional_name', []),
             ('conditional_dim', []),
             ('conditional_type', []),
             ('losses', ['negbinom', 'negmul'])])

You can adjust the default settings by overwriting the dictionary entries

In [8]:
ensemble = EnsembleVAE(params=params)

using vae


## 3) Fit a model

In [9]:
ensemble.fit(dataset, epochs=10)

Run model 1
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


[<tensorflow.python.keras.callbacks.History at 0x7f888edf41d0>]

In [10]:
ensemble.summary()

Model: "encoder"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
modality_gtx (InputLayer)       [(None, 35300)]      0                                            
__________________________________________________________________________________________________
modality_peaks (InputLayer)     [(None, 95482)]      0                                            
__________________________________________________________________________________________________
dropout (Dropout)               (None, 35300)        0           modality_gtx[0][0]               
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 95482)        0           modality_peaks[0][0]             
____________________________________________________________________________________________

## 4) Obtain latent features

In [11]:
latent, latent_list = ensemble.encode(dataset)

In [12]:
latent.head()

Unnamed: 0,D0-0,D0-1,D0-2,D0-3,D0-4,D0-5,D0-6,D0-7,D0-8,D0-9
AACGACAAGGACCGCT-1,-22.525711,-23.246502,11.873528,17.557114,5.350068,-4.672046,6.651594,-4.901463,-11.669211,31.645901
AACTCACAGGGCTAAA-1,-25.07139,-23.022873,18.849289,23.621029,2.92356,-2.218641,9.376145,-6.429822,-7.987369,21.951593
AACTTAGTCACTTCAT-1,-20.167135,-18.211704,12.633711,6.566017,4.202653,-1.967656,-0.58773,-3.013824,-4.554582,10.412749
AAGTAGCCAGTTTGGC-1,-21.676241,-13.835787,8.214723,14.386309,9.693688,-14.296661,1.591737,-3.87244,-15.005081,29.036224
AAACCGCGTGAGGTAG-1,-40.491158,-40.159657,19.214733,30.098549,11.585256,-10.581953,9.407765,-10.81687,-22.702629,60.785679


## 5) Obtain feature imputation

In [13]:
predicted = ensemble.impute(dataset)

In [14]:
predicted[1].shape

(150, 95482)

## 6) Obtain a feature importance attribution

In [15]:
selected_cells = latent.index.tolist()[:10]

In [16]:
attributed = ensemble.explain(dataset, cellids=selected_cells)

In [17]:
attributed[0]

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])