# Stage 1: Model Construst

In this tutorial, we will show how to train a LTNN model to predict the origin and end cell. 

If you want to try the scltnn algorithm directly, you can also use our [trained model](https://github.com/Starlitnightly/scltnn/tree/main/model), which is trained on bladder data, and although our tests show its ability to generalise across species and tissues, we recommend that you train your own LTNN model.

In [1]:
import scltnn
import scanpy as sc
import scvelo as scv
import anndata
import numpy as np

In [2]:
sc.settings.verbosity = 3             # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.logging.print_header()
sc.settings.set_figure_params(dpi=80, facecolor='white')

scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.23.2 scipy==1.9.3 pandas==1.5.1 scikit-learn==1.1.3 statsmodels==0.13.5 python-igraph==0.10.2 pynndescent==0.5.8


## Data prepare

We need to calculate the lsi of cells from anndata of scRNA-seq, and exact the high variable genes

!notice: the anndata need to calculate velocity and latent time, See [scvelo's tutorial](https://scvelo.readthedocs.io/) for detailed calculations

In [3]:
adata=sc.read_h5ad('/Users/fernandozeng/Desktop/velo_git/data/tsp1_bladder_scvelo.h5ad')
adata

AnnData object with n_obs × n_vars = 3795 × 58870
    obs: 'organ_tissue', 'method', 'donor', 'anatomical_information', 'n_counts_UMIs', 'n_genes', 'cell_ontology_class', 'free_annotation', 'manually_annotated', 'compartment', 'gender', 'manual_annotation', 'latent_time'
    var: 'gene_symbol', 'feature_type', 'ensemblid', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: '_scvi', '_training_mode', 'cell_ontology_class_colors', 'dendrogram_cell_type_tissue', 'dendrogram_computational_compartment_assignment', 'dendrogram_consensus_prediction', 'dendrogram_tissue_cell_type', 'donor_colors', 'donor_method_colors', 'hvg', 'method_colors', 'neighbors', 'organ_tissue_colors', 'sex_colors', 'tissue_colors', 'umap'
    obsm: 'X_pca', 'X_scvi', 'X_scvi_umap', 'X_umap'
    layers: 'decontXcounts', 'raw_counts'
    obsp: 'connectivities', 'distances'

In [4]:
import scanpy as sc
sc.pp.highly_variable_genes(adata, n_top_genes=10000, flavor="seurat_v3")
adata=adata[:,adata.var['highly_variable']==True]
adata

If you pass `n_top_genes`, all cutoffs are ignored.
extracting highly variable genes
--> added
    'highly_variable', boolean vector (adata.var)
    'highly_variable_rank', float vector (adata.var)
    'means', float vector (adata.var)
    'variances', float vector (adata.var)
    'variances_norm', float vector (adata.var)


View of AnnData object with n_obs × n_vars = 3795 × 10000
    obs: 'organ_tissue', 'method', 'donor', 'anatomical_information', 'n_counts_UMIs', 'n_genes', 'cell_ontology_class', 'free_annotation', 'manually_annotated', 'compartment', 'gender', 'manual_annotation', 'latent_time'
    var: 'gene_symbol', 'feature_type', 'ensemblid', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std', 'highly_variable_rank', 'variances', 'variances_norm'
    uns: '_scvi', '_training_mode', 'cell_ontology_class_colors', 'dendrogram_cell_type_tissue', 'dendrogram_computational_compartment_assignment', 'dendrogram_consensus_prediction', 'dendrogram_tissue_cell_type', 'donor_colors', 'donor_method_colors', 'hvg', 'method_colors', 'neighbors', 'organ_tissue_colors', 'sex_colors', 'tissue_colors', 'umap'
    obsm: 'X_pca', 'X_scvi', 'X_scvi_umap', 'X_umap'
    layers: 'decontXcounts', 'raw_counts'
    obsp: 'connectivities', 'distances'

In [5]:
scltnn.utils.lsi(adata, n_components=100, n_iter=15)

## Training Data split

We random selected 80% of cells as training dataset, and 20% of cells as test dataset

In [6]:

ran=np.random.choice(adata.obs.index.tolist(),8*(len(adata.obs.index.tolist())//10))
ran_r=list(set(adata.obs.index.tolist())-set(ran))

X_train=adata[ran].obsm['X_lsi']
Y_train=adata.obs.loc[ran,'latent_time']
X_test=adata[ran_r].obsm['X_lsi']
Y_test=adata.obs.loc[ran_r,'latent_time']

In [8]:
model=scltnn.models.creat_scltnn_model(100)
print('......lsi calculate',100)
print('......model fit',100)
history=model.fit(X_train, Y_train.values,
              batch_size=30,
              epochs=100,
              verbose=0,
              validation_data=(X_test, Y_test.values))
#              callbacks=[cb],
#             callbacks=callbacks_list)
score = model.evaluate(X_test, Y_test, verbose=0)
print(score)

Model: "scltnn_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_4 (Dense)             (None, 512)               51712     
                                                                 
 dense_5 (Dense)             (None, 512)               262656    
                                                                 
 dense_6 (Dense)             (None, 512)               262656    
                                                                 
 dense_7 (Dense)             (None, 1)                 513       
                                                                 
Total params: 577,537
Trainable params: 577,537
Non-trainable params: 0
_________________________________________________________________
None
......lsi calculate 100
......model fit 100


2023-01-18 01:22:21.997831: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2023-01-18 01:22:22.173759: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-01-18 01:22:24.910962: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
2023-01-18 01:23:37.541176: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


[0.0033960642758756876, 0.04198100045323372]


## Model save

We now save the model objects for after analysis.

In [None]:
model.save('Best_model.h5')

For LTNN time predicted, please refer to [Latent time predicted by scLTNN](https://scltnn.readthedocs.io/en/latest/Tutorials/human_CD8.html)