# RanCh (Integrative Analysis of RNA-Seq and ChIP-Seq Data)

Tools

    1. DESEQ2 - Differential Gene Expression Analysis
    2. BETA - Integrative Analysis of RNA-Seq and ChIP-Seq Data

## RNA-Seq

In [1]:
# Imports
library(DESeq2)
library(dplyr)
library(stringr)
library(tidyr)

Loading required package: S4Vectors

Loading required package: stats4

Loading required package: BiocGenerics


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, aperm, append, as.data.frame, basename, cbind,
    colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
    get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
    Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which.max, which.min



Attaching package: ‘S4Vectors’


The following objects are masked from ‘package:base’:

    expand.grid, I, unname


Loading required package: IRanges

Loading required package: GenomicRanges

Loading required package: GenomeInfoDb

Loading required package: SummarizedExperiment

Loading required package: M

In [4]:
# Upload Feature Count Matrix and Metadata
countData <- read.csv("Analysis/E-GEOD-47399-raw-counts.tsv", sep="\t", header=TRUE, row.names=1)
metaData <- read.csv("Analysis/E-GEOD-47399-experiment-design.tsv", sep="\t", header=TRUE, row.names=1)
metaData <- metaData[match(colnames(countData), row.names(metaData)), ]

In [6]:
colnames(metaData)

In [10]:
# Create DESEq2 Object
dds <- DESeqDataSetFromMatrix(countData = countData, colData = metaData, design = ~Factor.Value.phenotype.)
dds <- DESeq(dds)

“some variables in design formula are characters, converting to factors”
  Note: levels of factors in the design contain characters other than
  letters, numbers, '_' and '.'. It is recommended (but not required) to use
  only letters, numbers, and delimiters '_' or '.', as these are safe characters

estimating size factors

  Note: levels of factors in the design contain characters other than
  letters, numbers, '_' and '.'. It is recommended (but not required) to use
  only letters, numbers, and delimiters '_' or '.', as these are safe characters

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

  Note: levels of factors in the design contain characters other than
  letters, numbers, '_' and '.'. It is recommended (but not required) to use
  only letters, numbers, and delimiters '_' or '.', as these are safe characters

final dispersion estimates

  Note: levels of factors in the design contain characters other than
  letters, numbers, '_' and '.'

In [11]:
res <- results(dds)
resdf <- as.data.frame(res)
colnames(resdf) <- c("baseMean", "log2FoldChange", "lfcSE", "stat", "pvalue", "padj")
resdf <- resdf[c("log2FoldChange", "pvalue")]

In [29]:
# Remove the rows with NA values
resdf <- na.omit(resdf)

# Filter the results based on the pvalue and log2FoldChange
resdf <- resdf[which(resdf$pvalue < 0.05), ]
resdf <- resdf[which(abs(resdf$log2FoldChange) > 1), ]

# Write the results to a file
write.table(resdf, file = paste0('Analysis/BETA_Demo/results.csv'), sep = '\t',col.names = FALSE)

## Integration with ChIP-Seq

In [39]:
system('/home/shrey/miniconda3/envs/beta/bin/BETA basic -p Analysis/MAX_HeLa_S3_chip_seq.bed -e Analysis/BETA_Demo/results.csv -g hg38 -k BSF -n demo -o Analysis/BETA_Demo --gname2 -c 0.05', intern = TRUE)

“running command '/home/shrey/miniconda3/envs/beta/bin/BETA basic -p Analysis/MAX_HeLa_S3_chip_seq.bed -e Analysis/BETA_Demo/results.csv -g hg38 -k BSF -n demo -o Analysis/BETA_Demo --gname2 -c 0.05' had status 1”
