extract brain tissue from gtex gene count #5

ghost · 2022-02-15T00:51:22Z

Hello,
I have downloaded GTEx rna-seq data from this site:
wget https://storage.googleapis.com/gtex_analysis_v8/rna_seq_data/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_reads.gct.gz
I want to extract only brain tissue gene expression data from this file.
I checked yarn package and does it has any function that can perform this function?
checkTissuesToMerge(obj, majorGroups, minorGroups, filterFun = NULL,plotFlag = TRUE, ...)
But this function will merge tissue based on gene expression file.
I just specifically want to extract brain tissue gene expression data only.
Can you please provide me some guidance on how to proceed with this?
Thank you

Xenophong · 2022-02-23T20:11:16Z

hi, even if I am not the author of this package, I think this is a more general question. You have to download the sample attribute manifest file, find which sample ids belong to the brain tissues, and then extract those samples from your gene expression file.

marouenbg · 2023-10-22T01:12:44Z

Hi,
we moved the software to netZooR, we will be happy to help you here: https://github.com/netZoo/netZooR

gmillerscripps · 2023-11-09T22:04:53Z

I don't know if you've looked into the netZooR package that @marouenbg mentioned (I didn't know that support had transitioned to netZooR until I read this comment today), but yes @Xenophong is correct that you need the sample attribute data. For version 8 it would be:
https://storage.googleapis.com/adult-gtex/annotations/v8/GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt

You can then look at column SMTSD and grepl for "Brain" (at least that is what I did), and then match the SAMPIDs in the phenodata to the column names of the gene reads data.

I just downloaded the files from GTEx to local and read them in that way. Most of this is directly from yarn::downloadGTEx but I had to modify some things.


  # Reading in Sample Phenotype Data
  # Data should contain:
  #' SAMPID
  #' SMTSD
  pd <- read.delim(file.path(data.dir, samp_pheno))
  
  message("Sample Phenotype Found")
  
  # Set as matrix and get row names
  pd <- as.matrix(pd)
  rownames(pd) <- pd[, "SAMPID"]
  
  # Subset if tissue subset given
tissue_subset <- "Brain"

    # Look for any matches on the tissue subset
    subset_rows <- which(grepl(paste0(tolower(tissue_subset), collapse  = "|"), tolower(pd[, 'SMTSD'])))
    pd <- pd[subset_rows, ]
  
  options <- unique(pd[, 'SMTSD']) 
  message("Unique Tissues:")
  message(paste0(options, collapse = '\n'))

  # Get IDs
  ids <- sapply(strsplit(pd[, "SAMPID"], "-"), function(i) paste(i[1:2], 
                                                                 collapse = "-"))

  # Read in the counts data
  #' Assuming data is in .gct format
  message("Reading in count data") 
  
  cnts_raw <- data.table::fread(file = file.path(data.dir, gene_file))
  print(dim(cnts_raw))
  # Get unique samples
  cnt_names <- unique(names(cnts_raw))
  
  # Join samples with sample phenotype data
  all <- data.frame(SAMPID = cnt_names) %>%
    left_join(
      as.data.frame(pd[, c("SAMPID","SMTSD")]), by = 'SAMPID'
    )
  
  # Subset to tissue subset if provided
  subset_samples <- all %>%
      filter(.,grepl(paste0(tolower(tissue_subset), collapse = '|'), tolower(SMTSD)))   %>%
    .$SAMPID
  
  # Need to change dashes to "." for comparing against pd data
  subset_samples <- gsub("\\-", ".", subset_samples)
  subset_samples <- subset_samples[grepl("GTEX", subset_samples)]

  # Subset to given subset    
  names(cnts_raw) <- gsub("\\-", "\\.", names(cnts_raw))
  cnts <- cnts_raw[,c("Name", "Description", subset_samples), with = FALSE]

  names(cnts) <- gsub("\\.", "\\-", names(cnts))
  
  # Get unique genes
  genes <- unlist(cnts[, 1])
  geneNames <- unlist(cnts[, 2])
  
  # Get the full names for those genes
  counts <- cnts[, -c(1:3)]
  
  counts <- as.matrix(counts)
  mode(counts) <- "integer"

  # Set the names of the counts to those genes
  rownames(counts) <- genes

marouenbg · 2023-11-10T04:16:02Z

Hi @gmillerscripps ,
Apologies for the confusion, yarn is not maintained in netZooR but rather just called as a dependency. I will have to check which repo is the source code for bioconductor ad will let you know asap.
Best,
Marouen

marouenbg · 2024-03-03T02:16:42Z

Hi @gmillerscripps and al,
This took too long but we now moved YARN in netZooR to maintain it long term netZoo/netZooR#314
Please let me know if you want anything changed/updated.

Marouen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extract brain tissue from gtex gene count #5

extract brain tissue from gtex gene count #5

ghost commented Feb 15, 2022

Xenophong commented Feb 23, 2022

marouenbg commented Oct 22, 2023

gmillerscripps commented Nov 9, 2023

marouenbg commented Nov 10, 2023

marouenbg commented Mar 3, 2024

extract brain tissue from gtex gene count #5

extract brain tissue from gtex gene count #5

Comments

ghost commented Feb 15, 2022

Xenophong commented Feb 23, 2022

marouenbg commented Oct 22, 2023

gmillerscripps commented Nov 9, 2023

marouenbg commented Nov 10, 2023

marouenbg commented Mar 3, 2024