#Preparation

Mount your Google Drive when using Google Colab. This is not needed when running this notebook locally.

In [1]:
# Load the Drive helper and mount
from google.colab import drive

# This will prompt for authorization.
drive.mount('/content/drive')

Mounted at /content/drive


Download sparsenet package from GitHub. Note you need to have Python 3 (and NumPy) and TensorFlow 2.0 or above installed first. Commands under Linux and Colab:




In [6]:
%%bash
# pip install --upgrade tensorflow
cd /content/drive/My\ Drive/packages
git clone https://github.com/datapplab/sparsenet.git

Cloning into 'sparsenet'...


Alternatively, you can pip install sparsenet package from GitHub. This will also install the dependencies, eg TensorFlow and NumPy. 

In [None]:
!pip install git+https://github.com/datapplab/sparsenet

Import dependencies for this notebook. Please edit the path to sparsenet package downloaded. You don't need the `sys.path.append` line if the package was installed instead.

In [7]:
import pandas as pd
import numpy as np
import gc
import random
import os

# import tf 2.x
import tensorflow as tf
print(tf.__version__)

import sys
sys.path.append('/content/drive/My Drive/packages/sparsenet')
from sparsenet.core import sparse

2.3.0


#Real single cell RNA-Seq dataset--BSEQ

[BSEQ](https://shenorrlab.github.io/bseqsc/vignettes/bseq-sc.html) is a scRNA-Seq dataset on individual human pancreatic islets cells. This processed dataset include 1822 cells and the top 1000 genes with the biggest variance.

## Data preparation

Load and prepare the BSEQ dataset, and split it into training and validation sets by ratio of 2/3 and 1/3. Note testing set is the same as validation set here. Again, please edit the path to sparse module installed.

In [9]:
# change the path to your own
infile= "/content/drive/My Drive/packages/sparse/data/bseq.tsv"
bseq1 = pd.read_csv(infile,sep="\t", header=0, index_col=0)
X = bseq1.values
nfs=X.shape[1]
y=X[:,nfs-1] -1
X = X[:,:nfs-1]
y=y.astype(int)
ns=y.size
idx=np.arange(ns)
np.random.seed(1)
idx1=np.random.choice(idx, size=ns, replace=False)
X=X[idx1]
y=y[idx1]
n_train = round(ns*2/3)
n_val = round(ns*1/3)
X_train = X[:n_train]
X_val   = X[n_train:n_train+n_val]
X_test= X_val
y_train = y[:n_train]
y_val   = y[n_train:n_train+n_val]
y_test=y_val
n_values = np.max(y_train) + 1


## Classic dense neural network or multilayer perceptron (MLP)

Build a `tf.keras.Sequential` model with two dense layers. Alternatively, you can use 1 or 2 sparse layers in place of the dense layer(s) as in commented code. Choose an optimizer and loss function for training.

In [10]:
tf.keras.backend.clear_session()
gc.collect()

nunits=250
dens=0.1
model = tf.keras.models.Sequential([
  tf.keras.layers.Input(shape=X_train.shape[1]),
  tf.keras.layers.Dense(nunits, activation=None),
  tf.keras.layers.Dropout(0.2),
  # sparse(units=nunits, density=dens, activation=None),
  tf.keras.layers.Dense(10, activation='softmax')
  # sparse(units=10, density=0.4, activation='softmax'),
])

lr=1e-3
optimizer = tf.keras.optimizers.Nadam(lr =lr)
model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=10, batch_size=64, 
          validation_data=(X_test, y_test))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f6a26688d30>

Model summary info, note the number of parameters is much bigger than that in model with sparse layers below.

In [5]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 250)               250250    
_________________________________________________________________
dropout (Dropout)            (None, 250)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                2510      
Total params: 252,760
Trainable params: 252,760
Non-trainable params: 0
_________________________________________________________________


## Sparse neural network or multilayer perceptron (MLP)

Build a `tf.keras.Sequential` model with two sparse layers. Alternatively, you can use 1 dense layer as in the commented code or 2 dense layers as shown above. 

In [11]:
tf.keras.backend.clear_session()
gc.collect()

nunits=250
dens1=0.1
dens2=0.4
model = tf.keras.models.Sequential([
  tf.keras.layers.Input(shape=X_train.shape[1]),
  sparse(units=nunits, density=dens1, activation=None),
  # tf.keras.layers.Dense(10, activation='softmax')
  sparse(units=10, density=dens2, activation='softmax'),
])

lr=1e-3
optimizer = tf.keras.optimizers.Nadam(lr =lr)
model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=10, batch_size=64, 
          validation_data=(X_test, y_test))

weight_type used:  1
weight_type used:  1
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f6a2645db00>

Model summary info, note the number of parameters is ~1/10 of that in model with dense layers above.

In [8]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
sparse (sparse)              (None, 250)               25250     
_________________________________________________________________
sparse_1 (sparse)            (None, 10)                1010      
Total params: 26,260
Trainable params: 26,260
Non-trainable params: 0
_________________________________________________________________


## Sparse Autoencode

Simple autoencoder using sparse layers, alternative autoencoder using dense layers in code commented out.

In [5]:
tf.keras.backend.clear_session()
gc.collect()

nin=X_train.shape[1]
nunits=128
dens=0.5
model = tf.keras.models.Sequential([
  tf.keras.layers.Input(shape=nin),
  # tf.keras.layers.Dense(nunits, activation=None),
  # tf.keras.layers.Dropout(0.2),
  # tf.keras.layers.Dense(nin, activation=None)
  sparse(units=nunits, density=dens, activation=None),
  sparse(units=nin, density=dens, activation=None),
])

lr=1e-3
optimizer = tf.keras.optimizers.Nadam(lr =lr)
model.compile(optimizer=optimizer,
              loss='mse',
              metrics=['accuracy'])

# ns=12000
model.fit(X_train, X_train, epochs=20, batch_size=64, 
          validation_data=(X_test, X_test))

weight_type used:  1
weight_type used:  1
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<tensorflow.python.keras.callbacks.History at 0x7f91f6f68e80>

#Simulated single cell RNA-Seq dataset



## Data preparation

Similar analysis using a simulated scRNA-Seq dataset. Read in and prepare the input data first. Note you need to modified the file paths based on your own setting.

In [12]:
# import pandas as pd
expr = pd.read_csv('/content/drive/My Drive/packages/sparsenet/data/Dataset1/counts.csv',index_col=0)
expr_true = pd.read_csv('/content/drive/My Drive/packages/sparsenet/data/Dataset1/truecounts.csv',index_col=0)
cellinfo = pd.read_csv('/content/drive/My Drive/packages/sparsenet/data/Dataset1/cellinfo.csv',index_col=0)
X = expr.values #Splash generated scRNA-seq data with dropout
X_true = expr_true.values #Splash generated scRNA-seq data without dropout (ground truth)
Y = cellinfo['Group'].values #cell type label
cnames, Y1 = np.unique(Y, return_inverse=True)
unique_class = np.unique(Y)
celltypes = Y
nclass = len(unique_class)
ncell,ngene = X.shape
print('{} genes, {} cells in {} groups'.format(ngene,ncell,nclass))

938 genes, 500 cells in 6 groups


##Classic neural network or multilayer perceptron (MLP)

Build a tf.keras.Sequential model with two dense layers. Alternatively, you can use 1 or 2 sparse layers in place of the dense layer(s).

In [13]:
nn=125
nclass=6
nepoch=10
bs=32
lr=1e-3
model = tf.keras.models.Sequential([
  tf.keras.layers.Input(shape=X.shape[1]),
  tf.keras.layers.Dense(units=nn, activation=None),#'relu'
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(units=nclass, activation='softmax'),
])

optimizer = tf.keras.optimizers.Nadam(lr =lr)
model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
ns=400
hist=model.fit(X[:ns], Y1[:ns], epochs=nepoch, batch_size=bs, validation_data=(X[ns:], Y1[ns:]))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


##Sparse neural network or multilayer perceptron (MLP)

Build a `tf.keras.Sequential` model with two sparse layers. Alternatively, you can use 1 dense layer as in the commented code or 2 dense layers as shown above. 

In [14]:
tf.keras.backend.clear_session()
gc.collect()

dens1=0.2
dens2=0.4
model = tf.keras.models.Sequential([
  tf.keras.layers.Input(shape=X.shape[1]),
  sparse(units=nn, density=dens1, activation=None),
  # tf.keras.layers.Dense(nclass, activation='softmax')
  sparse(units=nclass, density=dens2, activation='softmax'),
])

optimizer = tf.keras.optimizers.Nadam(lr =lr)
model.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

hist=model.fit(X[:ns], Y1[:ns], epochs=nepoch, batch_size=bs, validation_data=(X[ns:], Y1[ns:]))

weight_type used:  1
weight_type used:  1
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
