TCGA biolinks recognize already downloaded data? #153

NatashaJorge · 2017-10-03T13:40:52Z

Hello,

I'm a post doc and just started using TCGAbiolinks, it's been really helpful! Many thanks!! :)

I'm currently working with some data already downloaded by a colleague and saved in the same file structure as GDCDownload and GDCprepare. I was wondering, is there is a way of TCGAbiolinks recognize this already downloaded data?

Thanks for your time and attention,
Best wishes!

tiagochst · 2017-10-03T14:23:34Z

Hello,

It would be possible if you have the query object.

GDCquery returns a table with the query results (data category, data type, file id, file, etc, platform, workflow etc...) that is used to prepare the data correctly. Several fields are used; if it is from the legacy archive, data category, data type, platforms, among others. But to find the data it will used the following pattern
Root directory/project/source/data_category/data_type/file_id/file_name

Example: GDCdata/TCGA-GBM/harmonized/DNA_Methylation/Methylation_Beta_Value/079fcaff-3ae6-4150-b2e6-2b7330ffbcd9/jhu-usc.edu_GBM.HumanMethylation450.10.lvl-3.TCGA-19-A6J5-01A-21D-A33U-05.gdc_hg38.txt

Code: https://github.com/BioinformaticsFMRP/TCGAbiolinks/blob/master/R/prepare.R#L90-L94

If you have the query object it should be easy to read the data already downloaded. You would just need to be in the same directory as the root download folder (default is 'GDCdata').

Recreating the query object by hand would need you to know the data and the some of the fields values. I don't believe it is worthy to create it by hand. I would ask the query object he used.

Best regards,
Tiago

NatashaJorge · 2017-10-04T00:53:25Z

Dear Tiago,

Thank you for your reply.

My file structure is exactly as your example, and I have the command line she used to get the query and download the data.

So, what command lines should I use? Just the GDCquery and GDCprepare?

Thanks again,
Best wishes,
Natasha

tiagochst · 2017-10-04T01:01:24Z

You only need to use GDCquery and GDCprepare. But if you use GDCdownload it should say that all samples were already downloaded.

NatashaJorge · 2017-10-17T14:14:11Z

Dear Tiago,

I'm still having problems with TCGAbiolinks recognizing the already downloaded data. I've checked the directory structure and it is the same as the one requested by TCGAbiolinks.

If I use GDCquery and GDCprepare together, it gives me the following error:
"Error in GDCprepare(cnv) :
I couldn't find all the files from the query. Please check if the directory parameter right or GDCdownload downloaded the samples.",

If I try the GDCdownloads, it starts downloading, but it gives me an error saying the file or directory does not exist:
"<simpleWarning in file.create(to[okay]): cannot create file 'GDCdata/TCGA-SKCM/harmonized/Copy_Number_Variation/Copy_Number_Segment/c70811e8-deb7-4a96-9d70-b322c15fe1a4/SPICS_p_TCGA_B_314_315_316_NSP_GenomeWideSNP_6_G06_1361494.grch38.seg.txt', reason 'No such file or directory'>
"

However, when I check the file, it is there and in the same directory requested:
[natasha@lobster MetilTCGA]$ ls GDCdata/TCGA-SKCM/harmonized/Copy_number_variation/Copy_Number_Segment/c70811e8-deb7-4a96-9d70-b322c15fe1a4/
SPICS_p_TCGA_B_314_315_316_NSP_GenomeWideSNP_6_G06_1361494.grch38.seg.txt

Please, do you know where am I getting it wrong?

Best wishes,
Natasha

huwenhuo · 2017-10-20T15:46:38Z

Hi Natasha, I just tried this. It seems fine with me. One thing to point out, you need your current working directory in the folder above GDCdata, not inside GDCdata. If you want to know the details of GDCprepare, just type GDCprepare and see the source code.

modarzi · 2019-03-06T12:33:13Z

@NatashaJorge
Hi Natasha

I have this problem too. could you solve your problem?
Thanks
Mohammad

latifizadehhabib · 2021-06-01T14:02:16Z

HI, I have the same issue. Would you please help me out if you could get any solution for that? Thank you.

HuaZou · 2021-06-08T02:16:45Z

HI, I have the same issue. Would you please help me out if you could get any solution for that? Thank you.

After adding the directory = Outdir into GDCprepare function, my problem has been solved. the codes as follows:

get_OmicsData <- function(project  = cancer_type,
                          Outdir   = "mRNA"){
  if(Outdir == "mRNA"){
    query_Data <- GDCquery(project = project,
                           data.category = "Transcriptome Profiling",
                           data.type = "Gene Expression Quantification",
                           workflow.type = "HTSeq - Counts")    
  }else if(Outdir == "miRNA"){
    query_Data <- GDCquery(project = project,
                           data.category = "Transcriptome Profiling",
                           data.type = "miRNA Expression Quantification",
                           workflow.type = "BCGSC miRNA Profiling")     
  }else if(Outdir == "CNV"){
    query_Data <- GDCquery(project = project,
                           data.category = "Copy Number Variation",
                           data.type = "Copy Number Segment")     
  }else if(Outdir == "DNA_Methylation"){
    query_Data <- GDCquery(project = project,
                           data.category = "DNA methylation",
                           legacy = TRUE)     
  }
  
  GDCdownload(query = query_Data,
              method = "api",
              files.per.chunk = 60,
              directory = Outdir)
  
  expdat <- GDCprepare(query = query_Data,
                       directory = Outdir)
  return(expdat)
}

dat_mRNA <- get_OmicsData(project = cancer_type,
                          Outdir = "mRNA")
saveRDS(dat_mRNA, file = "TCGA-KIRC_mRNA.RDS")

Hope it would be helpful to you.

tiagochst · 2022-04-19T15:02:03Z

@HuaZou Thank you

DzenisKoca · 2023-01-12T15:39:44Z

I am probably late with answer. Windows has limit to the length of path/filename, and limit is 260 characters. For example, the maximum path on drive D is "D:\some 256-character path string". To solve this issue, try creating or moving project folder directly into the drive "D:\path_to_project_that_is_less_then_60_character_long", or on some shorter path (less than 60 characters, since length from project folder to file that contains data is around 199 characters by default).

alopehba · 2023-02-01T16:55:38Z

I am probably late with answer. Windows has limit to the length of path/filename, and limit is 260 characters. For example, the maximum path on drive D is "D:\some 256-character path string". To solve this issue, try creating or moving project folder directly into the drive "D:\path_to_project_that_is_less_then_60_character_long", or on some shorter path (less than 60 characters, since length from project folder to file that contains data is around 199 characters by default).

I have the same problem; I try but still can't solve. 555

alopehba · 2023-02-02T04:24:35Z

I am probably late with answer. Windows has limit to the length of path/filename, and limit is 260 characters. For example, the maximum path on drive D is "D:\some 256-character path string". To solve this issue, try creating or moving project folder directly into the drive "D:\path_to_project_that_is_less_then_60_character_long", or on some shorter path (less than 60 characters, since length from project folder to file that contains data is around 199 characters by default).

I have the same problem; I try but still can't solve. 555

The funny thing is, I went through it again today, 'GDCquery-GDCDownload-GDCprepare', and I found that there was no difference between where I had stored the GDCdata and where I had stored it before, it was in the same directory as the script location, and all the files in the GDCdata were exactly the same. The Windows path is also adjusted to within 60 characters.
The funny thing is that it didn't work last time, but it did work this time?
The only difference is that the last time I did GDCquery, GDCprepare did not operate continuously, and I manually created an empty folder of GDCdata in the middle, and when I ran GDCDownload, the subfiles in it were created one after another. but the contents of the files are exactly the same. But it just doesn't recognize it. Download final it turned' reason 'No such file or directory'>'
This time, I deleted the empty folder 'GDCdata', deleted everything in it, and then strict followed the 'GDCquery-GDCDownload-GDCprepare' process. Then it recognized it.

tiagochst closed this as completed Apr 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TCGA biolinks recognize already downloaded data? #153

TCGA biolinks recognize already downloaded data? #153

NatashaJorge commented Oct 3, 2017

tiagochst commented Oct 3, 2017

NatashaJorge commented Oct 4, 2017

tiagochst commented Oct 4, 2017

NatashaJorge commented Oct 17, 2017

huwenhuo commented Oct 20, 2017

modarzi commented Mar 6, 2019 •

edited

Loading

latifizadehhabib commented Jun 1, 2021

HuaZou commented Jun 8, 2021 •

edited

Loading

tiagochst commented Apr 19, 2022

DzenisKoca commented Jan 12, 2023

alopehba commented Feb 1, 2023

alopehba commented Feb 2, 2023 •

edited

Loading

TCGA biolinks recognize already downloaded data? #153

TCGA biolinks recognize already downloaded data? #153

Comments

NatashaJorge commented Oct 3, 2017

tiagochst commented Oct 3, 2017

NatashaJorge commented Oct 4, 2017

tiagochst commented Oct 4, 2017

NatashaJorge commented Oct 17, 2017

huwenhuo commented Oct 20, 2017

modarzi commented Mar 6, 2019 • edited Loading

latifizadehhabib commented Jun 1, 2021

HuaZou commented Jun 8, 2021 • edited Loading

tiagochst commented Apr 19, 2022

DzenisKoca commented Jan 12, 2023

alopehba commented Feb 1, 2023

alopehba commented Feb 2, 2023 • edited Loading

modarzi commented Mar 6, 2019 •

edited

Loading

HuaZou commented Jun 8, 2021 •

edited

Loading

alopehba commented Feb 2, 2023 •

edited

Loading