# Generate Cell-type Specific Networks Using TF Motifs

**Authorship:**
Author, *MM/DD/YYYY*
***
**Description:**
Notebook to do some cool stuff
***
**TODOs:**
 - <font color='green'> Done TODO </font>
 - <font color='orange'> WIP TODO </font>
 - <font color='red'> Queued TODO </font>
***

## Set-up

In [3]:
# Set-up reticulate for running Python in R, takes about 10s
Sys.setenv(RETICULATE_PYTHON="/cellar/users/aklie/opt/miniconda3/envs/Renv/bin/python")
library(reticulate)
reticulate::use_python("/cellar/users/aklie/opt/miniconda3/envs/Renv/bin/python")
reticulate::use_condaenv("/cellar/users/aklie/opt/miniconda3/envs/Renv")
reticulate::py_module_available(module='leidenalg') #needs to be TRUE
reticulate::import('leidenalg') #good to make sure this doesn't error

Module(leidenalg)

In [4]:
# Load libraries, takes between 20s and 1 min
suppressMessages(library(hdf5r))
suppressMessages(library(Seurat))
suppressMessages(library(SeuratDisk))
suppressMessages(library(SeuratData))
suppressMessages(library(Signac))
suppressMessages(library(EnsDb.Hsapiens.v86))
suppressMessages(library(BSgenome.Hsapiens.UCSC.hg38))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
suppressMessages(library(Matrix))
suppressMessages(library(harmony))
suppressMessages(library(data.table))
suppressMessages(library(ggpubr))

# TFBS stuff
suppressMessages(library(JASPAR2020))
suppressMessages(library(TFBSTools))
suppressMessages(library(motifmatchr))

warnLevel <- getOption('warn')
options(warn = -1)
set.seed(1234)

library(future)
plan("multicore", workers = 1)
options(future.globals.maxSize = 50 * 1024 ^ 3)

In [5]:
sample <- "R207"
wd <- sprintf(sample)  # Working directory
print(sprintf("Loading from %s", file.path("indv_sample_networks", sprintf("%s.indv.linked.rds", sample)), sample))
adata <- readRDS(file = file.path("indv_sample_networks", sprintf("%s.indv.linked.rds", sample)))

[1] "Loading from indv_sample_networks/R207.indv.linked.rds"


In [6]:
pfm <- getMatrixSet(
  x = JASPAR2020,
  opts = list(species = 9606) # 9606 is the species code for human
)

In [7]:
# Scan the DNA sequence of each peak for the presence of each motif
motif.matrix <- CreateMotifMatrix(
  features = granges(adata),
  pwm = pfm,
  genome = BSgenome.Hsapiens.UCSC.hg38,
  score = TRUE
)

In [12]:
# Create a new Mofif object to store the results
motif <- CreateMotifObject(
  data = motif.matrix,
  pwm = pfm
)

ERROR: Error in CreateMotifObject(data = motif.matrix, pwm = pfm, score = TRUE): unused argument (score = TRUE)


In [9]:
Motifs(adata) <- motif

In [28]:
DefaultAssay(adata) <- "peaks"

In [29]:
adata <- RunTFIDF(adata)

Performing TF-IDF normalization



In [38]:
adata@assays

$RNA
Assay data with 36601 features for 8611 cells
First 10 features:
 MIR1302-2HG, FAM138A, OR4F5, AL627309.1, AL627309.3, AL627309.2,
AL627309.5, AL627309.4, AP006222.2, AL732372.1 

$ATAC
ChromatinAssay data with 108859 features for 8611 cells
Variable features: 0 
Genome: hg38 
Annotation present: TRUE 
Motifs present: FALSE 
Fragment files: 1 

$SCT
SCTAssay data with 24250 features for 8611 cells, and 1 SCTModel(s) 
Top 10 variable features:
 SST, GP2, CUZD1, NEAT1, PPY, ZEB2, REG1A, PDE4C, CADM2, CA12 

$peaks
ChromatinAssay data with 131761 features for 8611 cells
Variable features: 0 
Genome: hg38 
Annotation present: TRUE 
Motifs present: TRUE 
Fragment files: 1 


In [34]:
apply(motif.matrix[Links(delta.sample)$peak, ], 1, function(x) colnames(motif.matrix)[which(x==max(x))])

ERROR: Error in h(simpleError(msg, call)): error in evaluating the argument 'i' in selecting a method for function '[': object 'delta.sample' not found


In [40]:
scaled.counts <- GetAssayData(object = adata, assay = "peaks", slot="data")

In [41]:
dim(scaled.counts)

In [44]:
dim(t(motif.matrix))

In [45]:
test <- t(motif.matrix) %*% scaled.counts 

In [46]:
dim(test)

In [47]:
head(test)

   [[ suppressing 8611 column names 'AAACAGCCAAACGGGC-1', 'AAACAGCCACAAAGAC-1', 'AAACAGCCAGCAAGTG-1' ... ]]



6 x 8611 sparse Matrix of class "dgCMatrix"
                                                                          
MA0030.1 33038.27 22154.04  9093.290 12761.534  9275.449 26560.10 28017.67
MA0031.1 28646.41 18897.77  8694.099 11918.110  9042.087 23967.96 23695.37
MA0051.1 32309.39 23566.34 11357.034 14667.079 11581.503 26345.76 27668.74
MA0057.1 45811.32 39243.20 22967.437 23725.666 19707.182 38638.53 42673.58
MA0059.1 18731.48 15663.16  9416.318 10893.833  8668.800 15992.99 18801.67
MA0066.1 21228.38 17669.95  8416.267  9920.737  8655.932 19639.07 20289.34
                                                                       
MA0030.1 27410.28 13340.10 24582.89 20867.77 22306.15 29691.43 26193.13
MA0031.1 24313.78 12280.58 21329.07 17211.42 20639.57 25774.81 23899.11
MA0051.1 29054.85 16499.67 27894.62 23612.37 24413.42 29418.79 28252.98
MA0057.1 46871.99 32558.48 43654.83 38334.83 33713.58 44536.18 40864.56
MA0059.1 17449.60 13659.64 19032.50 15894.70 15079.56 19249.95 16945.25

In [11]:
head(motif`a.matrix)

   [[ suppressing 633 column names 'MA0030.1', 'MA0031.1', 'MA0051.1' ... ]]



6 x 633 sparse Matrix of class "dgCMatrix"
                                                                              
chr1-9920-10525    . . .        . . .  .       . .  .      17.62553 .  .      
chr1-180694-181606 . . .        . . .  .       . .  .      16.86232 .  .      
chr1-184037-184465 . . 9.639221 . . .  .       . .  .       .       .  .      
chr1-190728-191868 . . .        . . . 14.05013 . .  .       .       . 13.71324
chr1-267842-268165 . . .        . . .  .       . . 12.2184  .       .  .      
chr1-276134-276382 . . .        . . . 11.18238 . .  .       .       .  .      
                                                                           
chr1-9920-10525    . . .  .       .  .       .  .       .        .        .
chr1-180694-181606 . . .  .       .  .       . 10.88946 .        .        .
chr1-184037-184465 . . . 12.17962 . 11.19787 .  .       .        .        .
chr1-190728-191868 . . .  .       .  .       .  .       1.836019 8.193306 .
chr1-267842-268165 . . .

In [36]:
motif.matrix[Links(delta.sample)$peak, ]

   [[ suppressing 633 column names 'MA0030.1', 'MA0031.1', 'MA0051.1' ... ]]



12 x 633 sparse Matrix of class "dgCMatrix"
                                                                       
chr2-1869786-1870458     10.94778  .        .        .       .  .      
chr2-2463941-2465039      .        .        .       12.45733 .  .      
chr2-2526923-2527511      .        .        .        .       . 10.80528
chr2-2665138-2665621      .        .        .        .       .  .      
chr2-208265033-208266811  .        .        .       11.48351 .  .      
chr8-6405618-6407375      .        .        .       13.56425 .  .      
chr8-6648503-6649676      .       12.20899  .       11.63926 .  .      
chr8-6656144-6656829     12.31322  .       11.96229  .       .  .      
chr8-6800113-6801601      .        .        .        .       .  .      
chr8-31794746-31795581    .        .        .        .       .  .      
chr8-31810626-31811298    .        .        .        .       .  .      
chr11-2211023-2211778     .        .        .        .       . 11.58297
                    

In [32]:
enriched.motifs <- FindMotifs(
  object = delta.sample,
  features = Links(delta.sample)$peak
)

Selecting background regions to match input sequence characteristics

Matching GC.percent distribution

Testing motif enrichment in 12 regions



In [33]:
enriched.motifs

Unnamed: 0_level_0,motif,observed,background,percent.observed,percent.background,fold.enrichment,pvalue,motif.name
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>
MA0638.1,MA0638.1,4,996,33.33333,2.4900,13.386881,0.000161221,CREB3
MA0844.1,MA0844.1,3,949,25.00000,2.3725,10.537408,0.002495055,XBP1
MA1466.1,MA1466.1,3,1084,25.00000,2.7100,9.225092,0.003635477,ATF6
MA1120.1,MA1120.1,4,2311,33.33333,5.7775,5.769508,0.003780596,SOX13
MA0143.4,MA0143.4,4,2354,33.33333,5.8850,5.664118,0.004041297,SOX2
MA1474.1,MA1474.1,3,1212,25.00000,3.0300,8.250825,0.004973213,CREB3L4
MA1421.1,MA1421.1,5,4251,41.66667,10.6275,3.920646,0.005634812,TCF7L1
MA0776.1,MA0776.1,2,387,16.66667,0.9675,17.226529,0.005779291,MYBL1
MA0839.1,MA0839.1,3,1294,25.00000,3.2350,7.727975,0.005969358,CREB3L1
MA0593.1,MA0593.1,6,6454,50.00000,16.1350,3.098853,0.006722048,FOXP2


## Part 1
Description

## Part 2
Description

# Scratch
Place for old or testing code

# References