## This is a much bigger dataset than the 3000 PBMC sample, as it contains 160,000 PBMCs (or a version that is down-sampled to 50,000) derived from a CITE-Seq experiment. We run the visualizer using sparse data in the expression field (.X) of an AnnData object as input and the performance is still quite good

 - Data: https://atlas.fredhutch.org/nygc/multimodal-pbmc/
 - Pre-print: https://www.biorxiv.org/content/10.1101/2020.10.12.335331v1.full
 - Cell publication:
https://www.sciencedirect.com/science/article/pii/S0092867421005833#undfig1

In [1]:
%gui osx
%load_ext py5

In [2]:
import pandas as pd
import numpy as np
from scipy.io import mmread
import scipy.sparse as sparse
import scanpy as sc
from sciviewer import SCIViewer

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
print('DOWNLOADING AND EXTRACTING EXAMPLE DATA')
! mkdir -p ../data

## Full-sized dataset of 160K cells
! wget https://www.dropbox.com/s/5knwop1zs7uaww4/CiteSeqPBMC160K_ProteinAndRNA_merged.h5ad -O ../data/CiteSeqPBMC160K_ProteinAndRNA_merged.h5ad
    
## Sub-sampled datset of 50K cells
! wget https://www.dropbox.com/s/fyusuk61kmnbyv4/CiteSeqPBMC160K_ProteinAndRNA_merged_sub50k.h5ad -O ../data/CiteSeqPBMC160K_ProteinAndRNA_merged_sub50k.h5ad

DOWNLOADING AND EXTRACTING EXAMPLE DATA
--2022-09-05 18:55:32--  https://www.dropbox.com/s/5knwop1zs7uaww4/CiteSeqPBMC160K_ProteinAndRNA_merged.h5ad
Resolving www.dropbox.com (www.dropbox.com)... 162.125.4.18
Connecting to www.dropbox.com (www.dropbox.com)|162.125.4.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/raw/5knwop1zs7uaww4/CiteSeqPBMC160K_ProteinAndRNA_merged.h5ad [following]
--2022-09-05 18:55:32--  https://www.dropbox.com/s/raw/5knwop1zs7uaww4/CiteSeqPBMC160K_ProteinAndRNA_merged.h5ad
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucae28693f74161c44c18533f3f1.dl.dropboxusercontent.com/cd/0/inline/BsbPqeOhUkouf5MkRqLUTiach1zYgYgbFRHaKcRwPqNBjgDRpVeVPIGVKAQoW2lBmKIvAMxVsxApS509McTe9ikQM4HI7Serze0Ywrmx4B3jeLKgh7TJW49Bfn7xoF5RKjhH3tkAY_DMfYzwNuFWaovSzefXeWwhmgsbNqLBNfEQIA/file# [following]
--2022-09-05 18:55:32--  https://ucae28693f74161c44c18533f3f1.dl.dropboxuserconte

In [5]:
data = sc.read('../data/CiteSeqPBMC160K_ProteinAndRNA_merged.h5ad')

In [7]:
data

AnnData object with n_obs × n_vars = 161764 × 17516
    obs: 'nCount_ADT', 'nFeature_ADT', 'nCount_RNA', 'nFeature_RNA', 'orig.ident', 'lane', 'donor', 'time', 'celltype.l1', 'celltype.l2', 'celltype.l3', 'Phase', 'nCount_SCT', 'nFeature_SCT', 'n_counts'
    var: 'features'
    obsm: 'X_apca', 'X_aumap', 'X_pca', 'X_spca', 'X_umap', 'X_wnn.umap'

In [8]:
data.X

<161764x17516 sparse matrix of type '<class 'numpy.float32'>'
	with 370421313 stored elements in Compressed Sparse Column format>

#### We call SCIViewer using the optional parameter `embedding_name` which indicates we want to use the UMAP computed using the weighted nearest neighbors graph that integrates information from each cell's nearest neighbors to determine how to weight the protein and RNA features

In [9]:
svobj = SCIViewer(data, embedding_name='X_wnn.umap')
svobj.explore_data()

Create renderer
Start thread
Finish thread
Setting up...


In [10]:
## Or for down-sampled dataset
data_sub = sc.read('../data/CiteSeqPBMC160K_ProteinAndRNA_merged_sub50k.h5ad')

  utils.warn_names_duplicates("obs")


In [11]:
data_sub

AnnData object with n_obs × n_vars = 50000 × 17516
    obs: 'nCount_ADT', 'nFeature_ADT', 'nCount_RNA', 'nFeature_RNA', 'orig.ident', 'lane', 'donor', 'time', 'celltype.l1', 'celltype.l2', 'celltype.l3', 'Phase', 'nCount_SCT', 'nFeature_SCT', 'n_counts'
    var: 'features'
    obsm: 'X_apca', 'X_aumap', 'X_pca', 'X_spca', 'X_umap', 'X_wnn.umap'

In [12]:
data_sub.X = sparse.csc_matrix(data_sub.X)

EXPORTING DATA...
BYE


2022-09-05 19:01:03.633 python[55594:1346565] NewtNSView::dealloc: softLock still hold @ dealloc!


In [13]:
svobj = SCIViewer(data_sub, embedding_name='X_wnn.umap')
svobj.explore_data()

Create renderer
Start thread
Finish thread
Setting up...
0.35808730125427246 seconds to select and project cells
Selected 3562 cells
Calculating correlations...
1.2121360301971436 seconds to calculate correlations. Sparsity:  True
Selected gene RPS19
Min/max expression level for gene RPS19 0.0 5.5310163
