# Init 

In [1]:
import stlearn as st
import scanpy as sc
import numpy as np
import pandas as pd

# AnnData
AnnData provides a scalable way of keeping track of data and learned annotations.
https://anndata.readthedocs.io/en/stable/index.html

<b>Both Scanpy and stLearn uses AnnData as their data objects (but in different ways)</b>

# Scanpy
Scanpy using AnnData for their objects, and consists 4 objects:

    1) X - A #observations × #variables data matrix (Scipy sparse matrix)
    2) obs - Key-indexed one-dimensional observations annotation of length #observations. (Pandas DataFrame)
    obs features: ['in_tissue', 'array_row', 'array_col']
    3) var - Key-indexed one-dimensional variables annotation of length #variables. (Pandas DataFrame)
    var features: ['gene_ids', 'feature_types', 'genome']
    4) uns - Key-indexed unstructured annotation. (AnnData.OverloadedDict object)

API - https://scanpy.readthedocs.io/en/stable/api.html
    
![title](https://falexwolf.de/img/scanpy/anndata.svg)

## Load from Scanpy's datasets

In [2]:
# Visium - Processed Visium Spatial Gene Expression data from 10x Genomics.
adata = sc.datasets.visium_sge(sample_id="V1_Human_Lymph_Node")
uns_copy = adata.uns
adata

Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


AnnData object with n_obs × n_vars = 4035 × 36601
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

## Reading local files 
** There are lot more options in the API docs

In [None]:
# 10x-Genomics-formatted visum dataset
adata = sc.read_visium(path='', genome=None, *, count_file='filtered_feature_bc_matrix.h5', library_id=None, load_images=True, source_image_path=None)

# h5ad-formatted hdf5 file
adata = sc.read_h5ad(filename='', backed=None, *, as_sparse=(), as_sparse_fmt=<class 'scipy.sparse.csr.csr_matrix'>, chunk_size=6000)

# loom-formatted hdf5 file
adata = sc.read_loom(filename='', *, sparse=True, cleanup=False, X_name='spliced', obs_names='CellID', obsm_names=None, var_names='Gene', varm_names=None, dtype='float32', obsm_mapping=mappingproxy({}), varm_mapping=mappingproxy({}), **kwargs)

## Generate Scanpy-AnnData object

In [57]:
# Generate sample data
from string import ascii_uppercase
n_obs = 1000 # number of observations
obs = pd.DataFrame({'time': np.random.choice(['day 1', 'day 2', 'day 4', 'day 8'], n_obs)})
var_names = [i*letter for i in range(1, 10) for letter in ascii_uppercase]
n_vars = len(var_names)  # number of variables
var = pd.DataFrame({'a':[0]*len(var_names)}, index=var_names)  # dataframe for annotating the variables
X = np.arange(n_obs*n_vars).reshape(n_obs, n_vars)  # the data matrix of shape n_obs x n_vars

# Todo: Build uns ourselfs

# Create the AnnData object
adata = sc.AnnData(X, obs=obs, var=var, uns=uns_copy, dtype='int32')



# stLearn
stLearn using AnnData for their objects, but stores it in a different way from Scanpy.
It consists 4 objects:

    1) X - A #observations × #variables data matrix (Scipy sparse matrix)
    2) obs - Key-indexed one-dimensional observations annotation of length #observations. (Pandas DataFrame)
    obs features: ['in_tissue', 'array_row', 'array_col', 'imagecol', 'imagerow']
    3) var - Key-indexed one-dimensional variables annotation of length #variables. (Pandas DataFrame)
    var features: ['gene_ids', 'feature_types', 'genome']
    4) uns - Key-indexed unstructured annotation. (AnnData.OverloadedDict object)
    
API - https://stlearn.readthedocs.io/en/latest/api.html#module-stlearn.spatial.clustering

## Transform Scanpy object

In [55]:
sc_adata = sc.datasets.visium_sge()
st_adata = st.convert_scanpy(sc_adata)
st_adata

Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.


AnnData object with n_obs × n_vars = 3798 × 36601
    obs: 'in_tissue', 'array_row', 'array_col', 'imagecol', 'imagerow'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

## Load from stLearn's datasets 
Download processed BCBA data (10X genomics published data)

In [52]:
adata = stlearn.datasets.example_bcba()

0.00B [00:00, ?B/s]

OSError: Unable to open file (file signature not found)

## Load local files
** There are other options in the API (ReadOldST, ReadSlideSeq, ReadMERFISH)

In [56]:
# Read Visium data from 10X (wrap read_visium from scanpy) - 
# In addition to reading regular 10x output, this looks for the spatial folder and loads images, 
# coordinates and scale factors.
adata = st.Read10X("data\\\V1_Human_Lymph_Node\\", count_file = "filtered_feature_bc_matrix.h5", load_images = True)

# Read SeqFish data
adata = st.ReadSeqFish(count_matrix_file="cortex_svz_counts.matrix", spatial_file="cortex_svz_cellcentroids.csv")

OSError: ``/Users/guyattia/PycharmProjects/MSC-Final-Project-ST/data\\V1_Human_Lymph_Node\/filtered_feature_bc_matrix.h5`` does not exist

## Generate stLearn-AnnData object
https://stlearn.readthedocs.io/en/latest/tutorials/Read_any_data.html

In [79]:
# Generate sample data
from string import ascii_uppercase
n_obs = 1000 # number of observations
obs = pd.DataFrame({'time': np.random.choice(['day 1', 'day 2', 'day 4', 'day 8'], n_obs)})
var_names = [i*letter for i in range(1, 10) for letter in ascii_uppercase]
var = pd.DataFrame({'a':[0]*len(var_names)}, index=var_names)  # dataframe for annotating the variables
n_vars = len(var_names)  # number of variables
X = np.arange(n_obs*n_vars).reshape(n_obs, n_vars)  # the data matrix of shape n_obs x n_vars

# count – Pandas Dataframe of count matrix with rows as barcodes and columns as gene names
df_count = pd.DataFrame(X, columns=var_names, index=obs)
# spatial – Pandas Dataframe of spatial location of cells/spots.
df_spatial = pd.DataFrame({'imagecol': np.random.randn(n_obs),
                          'imagerow': np.random.randn(n_obs)})

adata = st.create_stlearn(count=df_count, spatial=df_spatial, library_id="Sample_test")
adata

Observation names are not unique. To make them unique, call `.obs_names_make_unique`.


AnnData object with n_obs × n_vars = 1000 × 234
    obs: 'imagecol', 'imagerow'
    uns: 'spatial'
    obsm: 'spatial'