# mESC analysis using the object oriented core

We redesigned the core of Cyclum to a more friendly object oriented core. The core is still under active development, but the major functions are already functional.

We still use the mESC dataset. For simplicity we have converted the dataset into TPM.
The original count data is available at ArrayExpress: [E-MTAB-2805](https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-2805/). Tools to transform data are also provided and explained in the following sections.

## Import necessary packages

In [11]:
%matplotlib inline
%load_ext autoreload
%autoreload 1

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [12]:
import pandas as pd
import numpy as np
import sklearn as skl

In [13]:
import cyclum.tuning
import cyclum.models
from cyclum import writer


## Read data
Here we have label, so we load both. However, the label is not used until evaluation.

In [14]:
input_file_mask = '/home/shaoheng/Documents/data/mESC/mesc-tpm'

def preprocess(input_file_mask):
    """
    Read in data and perform log transform (log2(x+1)), centering (mean = 1) and scaling (sd = 1).
    """
    tpm = writer.read_df_from_binary(input_file_mask).T
    sttpm = pd.DataFrame(data=skl.preprocessing.scale(np.log2(tpm.values + 1)), index=tpm.index, columns=tpm.columns)
    
    label = pd.read_csv(input_file_mask + '-label.txt', sep="\t", index_col=0).T
    return sttpm, label

sttpm, label = preprocess(input_file_mask)

In [15]:
sttpm.shape

(288, 38293)

There is no convention whether cells should be columns or rows. Here we require cells to be rows.

## Set up the model and fit the model

In [None]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA

lda_model = LDA(n_components=1)
lda_model.fit(sttpm, label)

In [None]:
lda_pseudotime = lda_model.transform(sttpm)
lda_pseudotime.shape


In [None]:
from cyclum.hdfrw import mat2hdf
mat2hdf(lda_pseudotime, '/home/shaoheng/Documents/data/EMTAB2805/lda-pseudotime.h5')

In [32]:
from sklearn.decomposition import FastICA as ICA

ica_model = ICA(4)
ica_model.fit(sttpm)

ica_pseudotime = ica_model.transform(sttpm)
ica_pseudotime.shape


from cyclum.hdfrw import mat2hdf
mat2hdf(ica_pseudotime, '/home/shaoheng/Documents/data/EMTAB2805/ica-pseudotime.h5')
