# R Based Single Cell (Xena, Seurat)

This notebook demonstrates getting data from a functional genomics server (Xena), and preparing those data for analysis in Seurat.

## Acquiring Data from Xena

Xena provides an HTTP interface that accepts AST in a lisp-like syntax. Also included are some domain specific language (DSL) functions for working with functional genomics data, as well as an SQL interface.

Providing named client functions makes working with these data easier, however, for demonstration we will show how you can use the Xena query interface execute an arbitrary query.

In [187]:
# A library for sending/receiving HTTP requests
library('httr')

# The URL for the xena data we are after
hub_url <- "https://toil.xenahubs.net/data/"

# A simple query, should return 2
query <- "(+ 1 1)"

In [188]:
response <- POST(hub_url, body = query, content_type = "text/plain")
content(response)

### Writing Xena Queries

The Xena data model can be accessed using a lisp-like DSL. When creating a Seurat analysis, we will need a list of samples, genes, and the expressions between them.

We can also get a list of gene-names to gene-identifiers to make reading our results easier.

In [189]:
# This library allows one to perform nice string templating
library(gsubfn)

# A named dataset that contains gene-wise expression counts
dataset <- "tcga_RSEM_Hugo_norm_count"

#### Query templates

Each of these literals are a query template. Backticks are used to create logical scope for interpolating strings.

These queries will be used below to get our data. In the future, these functions could be accessed by named methods of a Xena client, as in the python client.

In [190]:
cohort_template <- '(map :cohort (query {:select [:%distinct.cohort]
                     :from [:dataset]
                     :where [:not [:is nil :cohort]]}))'

In [191]:
fetch_template <- '(fetch [{:table "`dataset`"
                               :samples ["`paste(samples_query, collapse = \'", "\')`"]
                               :columns ["`paste(features_query, collapse = \'", "\')`"]}])])'

In [192]:
samples_template <- '(map :value (query {:select [:value]
            :from [:dataset]
            :join [:field [:= :dataset.id :dataset_id]
            :code [:= :field.id :field_id]]
            :where [:and
            [:= :dataset.name "`dataset`"]
            [:= :field.name "sampleID"]]}))'

In [193]:
features_template <- '(map :name (query {:select [:field.name]
             :from [:dataset]
             :join [:field [:= :dataset.id :dataset_id]]
             :where [:= :dataset.name "`dataset`"]}))'

### Getting Data

#### Getting Features
First, we'll get the list of featurees for the dataset. We'll print out the query that will be sent to Xena (including newline characters).

In [194]:
query <- fn$identity(features_template)
query

*Note that we have interpolated in the dataset name to constrain our search.*

In [195]:
response <- POST(hub_url, body = query, content_type = "text/plain")

We can look at the response attributes expecting a Status 200 with some size that seems reasonable for the number of expected samples.

In [196]:
response
features = content(response)

Response [https://toil.xenahubs.net/data/]
  Date: 2017-05-12 16:39
  Status: 200
  Content-Type: application/json;charset=UTF-8
  Size: 682 kB


In [197]:
# Print out some of the features
features_table <- matrix(features, length(features))
features_table[0:10]

We then write the table to file so that it can be used by Seurat (and others). 

In [198]:
write.table(features_table, 'genes.tsv', sep = '\t', append=F, quote=F, col.names=F, row.names=F)

#### Getting Samples

Now we will query the server for the available samples in the dataset.

In [199]:
query <- fn$identity(samples_template)
query
response <- POST(hub_url, body = query, content_type = "text/plain")
response
samples = content(response)
samples_table = matrix(samples, length(samples))
samples_table[0:10]

Response [https://toil.xenahubs.net/data/]
  Date: 2017-05-12 16:39
  Status: 200
  Content-Type: application/json;charset=UTF-8
  Size: 190 kB


And then write the table to file in a similar fashion to the gene list.

In [200]:
write.table(samples_table, 'barcodes.tsv', sep = '\t', append=F, quote=F, col.names=F, row.names=F)

#### Getting Expression Data

Now that we have the list of samples and genes quantified, we can select all, or a subset of the samples from Xena.

In [201]:
features_query <- features_table[7998:8003]
samples_query <- samples_table[0:10]
query <- fn$identity(fetch_template)

Now, with a fully formed query, we can request the weights.

In [217]:
response <- POST(hub_url, body = query, content_type = "text/plain")
response
weights <- content(response)
weights <- matrix(weights, length(weights))
weights <- apply(1:5, 0,as.numeric)
# weights_table
# dim(weights_table)

Response [https://toil.xenahubs.net/data/]
  Date: 2017-05-12 16:56
  Status: 200
  Content-Type: application/json;charset=UTF-8
  Size: 496 B


ERROR: Error in apply(1:5, 0, as.numeric): dim(X) must have a positive length


Now, instead of writing to a table, Seurat will expect the MatrixMarket format.

In [147]:
    #weights_table <- sapply(weights_table,as.numeric)
#weights_table[2]
#weights_matrix <- as(weights_table, "CsparseMatrix")
#weights_matrix <- provideDimnames(weights_matrix, sep = "", base = list(LETTERS), unique = TRUE)
weights_matrix <- as.data.frame(lapply(weights_table, as.numeric))
weights_matrix
writeMM(weights_matrix, "matrix.mtx")

ERROR: Error in lapply(weights_table, as.numeric): (list) object cannot be coerced to type 'double'


# Loading Data into Seurat

Now that we have arranged all of the data we would like from Xena, we can load it into a Seurat object to begin analysis.

In [80]:
library(Seurat)
data <- readMM('matrix.mtx')
data
#pbmc.data <- Read10X('./')

60 x 1 sparse Matrix of class "dgTMatrix"
            
 [1,] 0.9986
 [2,] 2.5138
 [3,] .     
 [4,] 2.4520
 [5,] 0.8849
 [6,] .     
 [7,] 1.0116
 [8,] 0.6755
 [9,] 1.1454
[10,] 2.4370
[11,] .     
[12,] .     
[13,] .     
[14,] .     
[15,] .     
[16,] .     
[17,] .     
[18,] .     
[19,] .     
[20,] .     
[21,] 5.1671
[22,] 5.4964
[23,] 6.7408
[24,] 4.5110
[25,] 4.6269
[26,] 6.2184
[27,] 5.5929
[28,] 6.6627
[29,] 6.2811
[30,] 4.7015
[31,] .     
[32,] .     
[33,] .     
[34,] .     
[35,] .     
[36,] .     
[37,] .     
[38,] .     
[39,] .     
[40,] .     
[41,] .     
[42,] .     
[43,] .     
[44,] .     
[45,] .     
[46,] .     
[47,] .     
[48,] .     
[49,] .     
[50,] .     
[51,] .     
[52,] .     
[53,] .     
[54,] .     
[55,] .     
[56,] .     
[57,] .     
[58,] .     
[59,] .     
[60,] .     