# SpatialDE

SpatialDE is a method to identify genes which signfificantly depend on spatial coordinates in non-linear and non-parametric ways. The intended applications are spatially resolved RNA-sequencing from e.g. Spatial Transcriptomics, or _in situ_ gene expression measurements from e.g. SeqFISH or MERFISH.

This repository contains both the implementations of our method, as well as case studies in applying it.

The key features of our method are

  - Unsupervised - No need to define spatial regions
  - Non-parametric and non-linear expression patterns
  - Extremely fast - Transcriptome wide tests takes only a few minutes on normal computers
  
The primary implementation is as a Python3 package, and can be installed from the command line by 

```
$ pip install spatialde
```

Below follows a typical usage example in interactive form.

In [2]:
%pylab inline
import pandas as pd

rcParams['axes.spines.right'] = False
rcParams['axes.spines.top'] = False

import NaiveDE
import SpatialDE

Populating the interactive namespace from numpy and matplotlib


In [19]:
counts = pd.read_csv('Analysis/MouseOB/data/Rep11_MOB_0.csv', index_col=0)
counts = counts.T[counts.sum(0) >= 3].T  # Filter practically unobserved genes

counts.iloc[:5, :5]

Unnamed: 0,Nrf1,Zbtb5,Ccnl1,Lrrfip1,Bbs1
16.92x9.015,1,1,1,2,1
16.945x11.075,0,0,3,2,2
16.97x10.118,0,1,1,0,0
16.939x12.132,1,0,1,0,4
16.949x13.055,0,0,0,3,0


In [20]:
sample_info = pd.read_csv('Analysis/MouseOB/MOB_sample_info.csv', index_col=0)
counts = counts.loc[sample_info.index]  # Align count matrix with metadata table

sample_info.head(5)

Unnamed: 0,x,y,total_counts
16.92x9.015,16.92,9.015,18790
16.945x11.075,16.945,11.075,36990
16.97x10.118,16.97,10.118,12471
16.939x12.132,16.939,12.132,22703
16.949x13.055,16.949,13.055,18641


In [21]:
norm_expr = NaiveDE.stabilize(counts.T).T
resid_expr = NaiveDE.regress_out(sample_info, norm_expr.T, 'np.log(total_counts)').T

For the sake of this example, let's just run the test on 1000 random genes

In [32]:
sample_resid_expr = resid_expr.sample(n=1000, axis=1, random_state=24)

X = sample_info[['x', 'y']]
results = SpatialDE.run(X, sample_resid_expr)

INFO:root:Performing DE test
INFO:root:Pre-calculating USU^T = K's ...
INFO:root:Done: 0.076s
INFO:root:Fitting gene models
INFO:root:Model 1 of 10
INFO:root:Model 2 of 10                             
INFO:root:Model 3 of 10                            
INFO:root:Model 4 of 10                             
INFO:root:Model 5 of 10                            
INFO:root:Model 6 of 10                             
INFO:root:Model 7 of 10                             
INFO:root:Model 8 of 10                            
INFO:root:Model 9 of 10                             
INFO:root:Model 10 of 10                            
                                                    

The result will be a DataFrame with P-values and other relevant values for each gene.

The most important columns are

  - `g` - The name of the gene
  - `pval` - The P-value for spatial differential expression
  - `qval` - Signifance after correcting for multiple testing
  - `l` - A parameter indicating the distance scale a gene changes expression over

In [33]:
results.head().T

Unnamed: 0,0,1,2,3,4
Gower,0.999295,0.999295,0.999295,0.999295,0.999295
M,4,4,4,4,4
g,Tinagl1,Vstm2l,6330415B21Rik,Galnt4,Leng8
l,0.402001,0.402001,0.402001,0.402001,0.402001
max_delta,0.00628877,0.0484324,0.837928,0.00806104,0.975425
max_ll,11.5958,-125.505,232.757,91.4048,-87.1177
max_mu_hat,0.025265,-4.84373,0.877441,0.611605,-1.56887
max_s2_t_hat,0.0540122,19.2345,0.386566,0.34356,1.19909
model,SE,SE,SE,SE,SE
n,260,260,260,260,260


In [35]:
results.sort_values('qval').head(10)[['g', 'l', 'qval']]

Unnamed: 0,g,l,qval
720,Cck,1.13519,8.802861e-07
865,Ptn,1.907609,8.162537e-05
530,Prokr2,1.13519,0.00191611
505,Nr2f2,1.13519,0.004790035
495,Frzb,1.13519,0.01317798
180,Olfr635,0.675535,0.01963151
437,Map1b,1.13519,0.0495525
615,Agt,1.13519,0.06470005
351,Cpne4,1.13519,0.07150909
397,Sncb,1.13519,0.08444712


We detected a few spatially differentially expressed genes, _Cck_ and _Ptn_ for example.