Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with hovering for standardized parallel coordinate plot for RNA seq #15

Open
Galaxy154 opened this issue May 23, 2023 · 1 comment

Comments

@Galaxy154
Copy link

Hi there,

I love this tool but am having an issue both with my data and the example data with creating the interactive parallel coordinate plot. I can not get the gene names to show up when hovering over the individual lines.. I have followed the exact steps outlined here: https://lindsayrutter.github.io/bigPint/articles/pipeline.html#step-5-deg-litre-plots-2 , but am unable to get it work even with the example data. Instead, when hovering over the lines, I just get the sample names instead. I think this is a really cool functionally and would appreciate any help with getting the gene names to show up!

Here is an example image below along with the code I used to generate this:

image

library(bigPint)
library(dplyr)
library(ggplot2)
library(plotly)

data = data %>% select(ID, starts_with("B"), starts_with("L"))
str(data, strict.width = "wrap")

data_st <- as.data.frame(t(apply(as.matrix(data[,-1]), 1, scale)))
data_st$ID <- as.character(data$ID)
data_st <- data_st[,c(length(data_st), 1:length(data_st)-1)]
colnames(data_st) <- colnames(data)
nID <- which(is.nan(data_st[,2]))
data_st[nID,2:length(data_st)] <- 0

library(edgeR)
library(data.table)

rownames(data) = data[,1]

y = DGEList(counts=data[,-1])
group = c(1,1,1,1,2,2,2,2)

y = DGEList(counts=y, group=group)
Group = factor(c(rep("B",4), rep("L",4)))
design <- model.matrix(~0+Group, data=y$samples)
colnames(design) <- levels(Group)
y <- estimateDisp(y, design)
fit <- glmFit(y, design)
dataMetrics <- list()

contrast=rep(0,ncol(fit))
contrast[1]=1
contrast[2]=-1
lrt <- glmLRT(fit, contrast=contrast)
lrt <- topTags(lrt, n = nrow(y[[1]]))[[1]]

lrt <- setDT(lrt, keep.rownames = TRUE)[]
colnames(lrt)[1] = "ID"
lrt <- as.data.frame(lrt)

dataMetrics[[paste0(colnames(fit)[1], "_", colnames(fit)[2])]] <- lrt

ret <- plotPCP(data=data_st, saveFile = FALSE)
ret[["B_L"]]

ret <- plotPCP(data_st, dataMetrics, threshVal = 0.1, lineSize = 0.3,
lineColor = "magenta", saveFile = FALSE)
ret[["B_L"]] + ggtitle("DEGs (FDR < 0.1)")

#Making the plot
ret <- plotClusters(data_st, dataMetrics, threshVal = 0.1, nC = 2,
colList = c("#00A600FF", "#CC00FFFF"), lineSize = 0.5, verbose = TRUE)
plot(ret[["B_L_2"]])

ret <- plotPCP(data_st, dataMetrics, threshVal = 0.2, lineSize = 0.5,
lineColor = "magenta", saveFile = FALSE, hover = TRUE)
ret[["B_L"]] %>% layout(title="DEGs (FDR < 0.2)")

@lindsayrutter
Copy link
Owner

lindsayrutter commented May 25, 2023

Hi galaxy:

Thanks for your inquiry. I see that your data has variable names "B" and "L" (instead of "S.1" and "S.2", as in the toy soybean_cn_sub data in the bigPint package) and that you have four samples for each variable (instead of 3 samples per variable, as in the toy soybean_cn_sub data in the bigPint package).

So, I tried to reproduce similar code from the website you were following (here), and altered the example data (soybean_cn_sub) by changing its variable names to "B" and "L" and creating a fourth sample for each variable. I did this using the add_column() function from the tibble package. This, I think, should then create a dataset similar to the one you are working on. After that, I used your exact code, and it seemed to generate the output you intended. See the code below:

library(bigPint)
library(dplyr)
library(ggplot2)
library(plotly)
library(tibble) # to use add_column() function

# Create a dataset similar to Galaxy154's
data("soybean_cn_sub")
data = soybean_cn_sub %>% select(ID, starts_with("S1"), starts_with("S3"))
names(data) = c("ID", "B.1", "B.2", "B.3", "L.1", "L.2", "L.3")
data = add_column(data, B.4 = data[,4], .after = 4) # Add a fourth sample to "B"
data = add_column(data, L.4 = data[,8], .after = 8) # Add a fourth sample to "L"

# Check structure of Galaxy154's data
str(data)

# The rest of the code is the original Galaxy154's code
data_st <- as.data.frame(t(apply(as.matrix(data[,-1]), 1, scale)))
data_st$ID <- as.character(data$ID)
data_st <- data_st[,c(length(data_st), 1:length(data_st)-1)]
colnames(data_st) <- colnames(data)
nID <- which(is.nan(data_st[,2]))
data_st[nID,2:length(data_st)] <- 0

library(edgeR)
library(data.table)

rownames(data) = data[,1]

y = DGEList(counts=data[,-1])
group = c(1,1,1,1,2,2,2,2)

y = DGEList(counts=y, group=group)
Group = factor(c(rep("B",4), rep("L",4)))
design <- model.matrix(~0+Group, data=y$samples)
colnames(design) <- levels(Group)
y <- estimateDisp(y, design)
fit <- glmFit(y, design)
dataMetrics <- list()

contrast=rep(0,ncol(fit))
contrast[1]=1
contrast[2]=-1
lrt <- glmLRT(fit, contrast=contrast)
lrt <- topTags(lrt, n = nrow(y[[1]]))[[1]]

lrt <- setDT(lrt, keep.rownames = TRUE)[]
colnames(lrt)[1] = "ID"
lrt <- as.data.frame(lrt)

dataMetrics[[paste0(colnames(fit)[1], "_", colnames(fit)[2])]] <- lrt

ret <- plotPCP(data=data_st, saveFile = FALSE)
ret[["B_L"]]

ret <- plotPCP(data_st, dataMetrics, threshVal = 0.1, lineSize = 0.3, lineColor = "magenta", saveFile = FALSE)
ret[["B_L"]] + ggtitle("DEGs (FDR < 0.1)")

#Making the plot
ret <- plotClusters(data_st, dataMetrics, threshVal = 0.1, nC = 2, colList = c("#00A600FF", "#CC00FFFF"), lineSize = 0.5, verbose = TRUE)
plot(ret[["B_L_2"]])

ret <- plotPCP(data_st, dataMetrics, threshVal = 0.2, lineSize = 0.5, lineColor = "magenta", saveFile = FALSE, hover = TRUE)
ret[["B_L"]] %>% layout(title="DEGs (FDR < 0.2)")

The above code does seem to create what you intend, i.e. an interactive plot that displays the gene names (instead of the sample names).

output

Does this code work for you too? If so, then the trick might be to determine how your data differs from the toy data I used the code above to simulate what I believe your data looks like.

If you are still stuck, please let me know what you get when you run the command:

str(data)

on your data frame. If your data frame is a different structure than the toy data I used above (i.e. gives a different format than when I ran str(data) in the code above), then that may pinpoint us to the source of the problem.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants