Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next steps for analyzing differential gene expression in mussel feet #1774

Closed
graceleuchtenberger opened this issue Jan 2, 2024 · 16 comments
Assignees

Comments

@graceleuchtenberger
Copy link

graceleuchtenberger commented Jan 2, 2024

Hi all,

I did a preliminary DGE between foot and gill tissue just to see what I could get, my code is here.

I'm thinking in the treatment info file (the file DEseq uses to categorize samples by treatment group) I'll add another column with day of treatment (either day 0 or 3), and another column with experimental treatment (control, hypoxia, ocean warming, ocean acidification). I'm guessing that DESeq will struggle with having four treatment groups, I can keep looking at Sam Bogan's multifactorial gene expression tutorial on Marineomics, but any initial recommendations? Does it seem like I'm on track?

Thanks!

@sr320
Copy link
Member

sr320 commented Jan 3, 2024

I would start out doing single comparison within tissue type.
For instance next do

gill - control v hypoxia,
gill - control v ow,
gill - control v oa
foot - control v hypoxia..
etc

@mattgeorgephd
Copy link
Contributor

mattgeorgephd commented Jan 3, 2024

Hi Grace,

Steven beat me to it, but yes - single comparisons will give you the DEG lists you are after.

The design may look a bit complicated at first, but really all treatments within a given tissue type can be compared to their respective control treatment at day 0 (the treatment control, naïve mussels that were samples from the wild). The samples labeled control at day 3 act as an laboratory control. With a laboratory control you can determine what the impact of holding the mussels in the lab for 3 days was; does being in the lab mimic the effects of one of the treatments? With this in mind, we have four "treatments" and one true control:

Control:

  1. Control at day 0 - laboratory control (TC; no lab exposure)

Treatments:

  1. Ocean Acidification (OA; 3 day exposure)
  2. Hypoxia or low dissolved oxygen (DO; 3 day exposure)
  3. Ocean Warming (OW; 3 day exposure)
  4. Treatment control (LC; 3 days in laboratory)

When constructing your colData input for DESEQ (I think you called it tissuesub), I would have have three columns:

mussel_ID tissue treatment

You can then filter your count matrix by tissue or treatments before inputting into the DESeqDataSetFromMatrix argument like this:

# DEG in the foot after exposure to ocean acidification

# Filter data
tissuesub <- tissuesub %>% filter(tissue == "foot" | treatment == "TC" | treatment == "OA")
countmatrix_2 <- subset(countmatrix_2, select=row.names(tissuesub))

# Calculate DESeq object
dds <- DESeqDataSetFromMatrix(countData = countmatrix_2,
                              colData = tissuesub,
                              design = ~ treatment)

dds <- DESeq(dds)
resultsNames(dds) # lists the coefficients

# Filtering: keep genes that have at least 10 counts across 1/3 of the samples - https://support.bioconductor.org/p/110307/
keep <- rowSums(DESeq2::counts(dds) >= 10) >= ncol(countmatrix_2)/3
dds <- dds[keep,]

# Generate Contrasts
contrast_list    <- c("treatment", "OA", "TC") # order is important: factor, treatment group, control
res_table        <- results(dds, contrast=contrast_list, alpha = 0.05)
res_table_normal <- lfcShrink(dds,
                              coef=2, 
                              type="normal") # lfcThreshold = 0.585)  # a lfc threshold of 1 = 2-fold change, 0.585 = 1.5-fold change
res_table_apeglm <- lfcShrink(dds,
                              coef=2, 
                              type="apeglm") # lfcThreshold = 0.585)  # a lfc threshold of 1 = 2-fold change, 0.585 = 1.5-fold change
res_table_ashr   <- lfcShrink(dds,
                              coef=2, 
                              type="ashr")

The order of the contrast_list argument is important as it sets which "treatment" is the control or baseline condition
The different result tables are with various shrinkage estimators, as we discussed earlier. Apeglm worked for the best for the oyster project.

Before doing pairwise comparisons though, I would throw all the data from a single tissue type in an visualize using the PCA or biplots like this:

# Visualize differences between all treatments, foot tissue only

# Filter data
tissuesub <- tissuesub %>% filter(tissue == "foot")
countmatrix_2 <- subset(countmatrix_2, select=row.names(tissuesub))

# Calculate DESeq object
dds <- DESeqDataSetFromMatrix(countData = countmatrix_2,
                              colData = tissuesub,
                              design = ~ treatment)

dds <- DESeq(dds)
resultsNames(dds) # lists the coefficients

# Plot PCA
p <- pca(assay(vst(dds)), metadata = tissuesub)
screeplot(p, axisLabSize = 18, titleLabSize = 22)

bp0 <- biplot(p, showLoadings = FALSE, colby = 'treatment',
    labSize = 5, pointSize = 5, sizeLoadingsNames = 5)

bp0

#Plot biplot

bp1 <- biplot(p, x = "PC1", y = "PC2",
              colby = 'treatment',
              colkey = c('TC' = 'royalblue1' , 
                         'OA' = 'green1'     ,
                         'OW' = 'yellow2' ,
                         'DO' = 'orange' ,
                         'LC' = 'orangered1'),
              # shape = 'tissue',
              # shapekey = c('foot' = 19,  # circle
              #             'gill' = 17), # triangle
              pointSize = 4,
              showLoadings = FALSE,
              lab = NULL,
              drawConnectors = FALSE,
              ellipse = TRUE,
              ellipseLevel = 0.95,
              ellipseFill = FALSE,
              # ellipseFillKey = c('foot' = 'blue', 'gill' = 'green'),
              ellipseAlpha = 1/4,
              ellipseLineSize = 1,
              # xlim = c(-150,150), ylim = c(-80, 50),
              gridlines.major = FALSE,
              gridlines.minor = FALSE,
              borderWidth = 1,
              legendPosition = 'top', legendLabSize = 16, legendIconSize = 6.0) + 
              theme(axis.text.x       = element_text(size=16,color="black"),
                    axis.text.y       = element_text(size=16,color="black"),
                    axis.ticks.x      = element_line(color="black"),
                    axis.ticks.y      = element_line(color="black"))
bp1

Looking at the biplot should help identify outliers as well

Here is the oyster code that I referenced earlier that you can use as a guide.

@graceleuchtenberger
Copy link
Author

Okay I've been poking around with it and for some reason when I try to create the p object (for pca) I keep getting this error:

Error: useNames = NA is defunct. Instead, specify either useNames = TRUE or useNames = FALSE.

Do I not have the correct packages installed? What do I need to run this command?

@kubu4
Copy link
Contributor

kubu4 commented Jan 13, 2024

Link (and/or copy/paste) code you're trying to run?

Also, the message tells you what to do. Instead of useNames = NA, you'll need to replace that with either useNames = TRUE OR useNames = FALSE.

@mattgeorgephd
Copy link
Contributor

mattgeorgephd commented Jan 13, 2024

The pca() function used here is from the bioconductor package PCAtools.

Try BiocManager::install("PCAtools") if you haven't already installed it.

For reference, here is the installed packages that I used in the oyster code. Running the sapply chunk will load/install dependencies after you have installed the bioconductor packages manually with the commented out lines at the top.

### Load libraries
```{r load_libraries, inlcude = TRUE}

## clear
rm(list=ls())

## Install Rtools directly from (https://cran.r-project.org/bin/windows/Rtools/), then install these on first run:
# install.packages("BiocManager")
# BiocManager::install("DESeq2")
# BiocManager::install("vsn")
# BiocManager::install("tidybulk")
# BiocManager::install("goseq")
# BiocManager::install("affycoretools")
# BiocManager::install("EnhancedVolcano")
# BiocManager::install("pcaExplorer")
# BiocManager::install("apeglm")
# BiocManager::install("PCAtools")


# List of packages we want to install (run every time)
load.lib<-c("DESeq2","edgeR","goseq","dplyr","GenomicFeatures","data.table","calibrate","affycoretools","data.table","vsn","tidybulk","ggplot2","cowplot","pheatmap","gplots","RColorBrewer","EnhancedVolcano","pcaExplorer","readxl","apeglm","ashr","tibble","plotly","sqldf","PCAtools","ggpubr","beepr","genefilter","ComplexHeatmap","circlize","scales")

# Select only the packages that aren't currently installed (run every time)
install.lib <- load.lib[!load.lib %in% installed.packages()]

# And finally we install the missing packages, including their dependency.
for(lib in install.lib) install.packages(lib,dependencies=TRUE)
# After the installation process completes, we load all packages.
sapply(load.lib,require,character=TRUE)
                        

@graceleuchtenberger
Copy link
Author

graceleuchtenberger commented Jan 24, 2024

Okay I installed everything, I'm still getting that same error. Also, I should've included this before, but I've tried using the useNames argument (both False and True) and I keep getting this error:

Error in pca(assay(dds2vst), metadata = treatmentinfo.F, useNames = TRUE) :
unused argument (useNames = TRUE)

So if I include it the code won't run, but if I don't include it the code won't run. R did yell at me when I installed the packages though because the version of R on Raven is not the most current (and therefore isn't compatible with the newest version of Bioconductor), could that be the culprit?

@kubu4
Copy link
Contributor

kubu4 commented Jan 24, 2024

Possibly try solution here: satijalab/seurat#7501 (comment)

For this:

R did yell at me when I installed the packages though because the version of R on Raven is not the most current (and therefore isn't compatible with the newest version of Bioconductor), could that be the culprit?

Please post commands and screenshot(s) of warnings/error messages.

@graceleuchtenberger
Copy link
Author

Thank you, I tried that solution unfortunately to no avail. R is stubborn.
Also, this happened at the beginning of the analysis a few weeks ago, so I unfortunately can't bring up the error message anymore (even with reinstalling Bioconductor), but I did find this comment from another thread and the same user talking about the same issue:
satijalab/seurat#8183 (comment)

@kubu4
Copy link
Contributor

kubu4 commented Jan 24, 2024

Did you try that second option (saw that, too, but wanted you to try the first possible solution).

Sorry, this isn't clear to me:

so I unfortunately can't bring up the error message anymore (even with reinstalling Bioconductor)

So, are you saying you are not getting an error message any more?

@graceleuchtenberger
Copy link
Author

Screenshot 2024-01-23 at 5 32 00 PM

This is the error I get when I try to install the most recent version of Bioconductor. Although, when I install Biocmanager, it's fine? However, I get warnings like this though when I use it to download Bioconductor packages:

Screenshot 2024-01-23 at 5 35 44 PM

@kubu4
Copy link
Contributor

kubu4 commented Jan 24, 2024

Try restarting your R session, try again, and please report back.

@sr320
Copy link
Member

sr320 commented Jan 24, 2024

@graceleuchtenberger can you provide a url to the code on github

Sorry - just saw link at top

@graceleuchtenberger
Copy link
Author

Got the same useNames error after restarting my session. Also, for when I was installing the packages (see screenshot below), I kept clicking "a" before (update all packages), but I just tried doing "n" and got the same result.
Screenshot 2024-01-24 at 12 11 36 PM

Thank you guys for your help!

@sr320
Copy link
Member

sr320 commented Jan 24, 2024

Can you add me as collaborator so I can push - and to clarify the current issue is you can create a PCA?

@graceleuchtenberger
Copy link
Author

Steven I should've just added you, check for an invite and let me know.
The issue is that I cannot create a pca via PCAtools right now (Matt's code that he gave above in his first reply), maybe I can do it another way but I don't know how.

@kubu4
Copy link
Contributor

kubu4 commented Jan 25, 2024

@graceleuchtenberger - I've created a separate issue for resolving the package installation issue here: #1790. Please switch to that issue to deal with this specific problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants