This notebooks provides an example downstream analysis of a trained MOFA+ model in Python. 

[PBMC10K](https://support.10xgenomics.com/single-cell-gene-expression/datasets) dataset is used as an example, which might be familiar to some of the users of Seurat or scanpy. It is a 3' single-cell RNA sequencing data so only one layer of information (view) is available, that is gene expression.

For this tutorial, only a trained PBMC10k model is needed, and it can be downloaded [here](https://github.com/gtca/mpp/tree/master/data/models/pbmc10k.hdf5).

In [1]:
# Set root folder as a working directory
import os
os.chdir("../")

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
from matplotlib import rcParams
rcParams['figure.dpi'] = 200

# Connect to the model

In [4]:
import mpp
m = mpp.mofa_model("data/models/pbmc10k.hdf5")

This creates a connection to an HDF5 file. According to best practices, it should be closed when the access is not needed anymore by calling `m.close()`.

## Basic model features

We can quickly check the overall dimensions of the model:

In [5]:
print(f"""\
Cells: {m.shape[0]}
Features: {m.shape[1]}
Groups of cells: {', '.join(m.groups)}
Views: {', '.join(m.views)}
""")

Cells: 10636
Features: 1921
Groups of cells: 0, 1, 2, 3, 4, 5
Views: rna



The core part of the trained model are 2 matrices: Z (factors) and W (weights, or loadings) matrices. The expectations for Z or W are easily accessible in different formats:

In [6]:
# HDF5 group
print("HDF5 group:\n", m.weights)

# np.ndarray
print("\nnp.ndarray:\n", m.get_weights()[:3,:5])

# pd.DataFrame
print("\npd.DataFrame:\n", m.get_weights(df=True).iloc[:3,:5])

HDF5 group:
 <HDF5 group "/expectations/W" (1 members)>

np.ndarray:
 [[ 6.67683589e-03  1.74740592e-03  5.62395311e-04  1.28811446e-03
  -1.71834763e-04]
 [ 1.71035753e+00 -2.06513377e+00  6.00713378e+00  9.03268548e-03
   1.17228368e-01]
 [ 2.52907104e+00 -1.53323801e+00  1.94592505e+00  3.07668137e+00
   1.54433896e+00]]

pd.DataFrame:
          Factor1   Factor2   Factor3   Factor4   Factor5
KLHL17  0.006677  0.001747  0.000562  0.001288 -0.000172
HES4    1.710358 -2.065134  6.007134  0.009033  0.117228
ISG15   2.529071 -1.533238  1.945925  3.076681  1.544339


Same works for factors: try executing `m.factors`, `m.get_factors()`, and `m.get_factors(df=True)`.