-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCGA biolinks recognize already downloaded data? #153
Comments
Hello, It would be possible if you have the query object. GDCquery returns a table with the query results (data category, data type, file id, file, etc, platform, workflow etc...) that is used to prepare the data correctly. Several fields are used; if it is from the legacy archive, data category, data type, platforms, among others. But to find the data it will used the following pattern Example: GDCdata/TCGA-GBM/harmonized/DNA_Methylation/Methylation_Beta_Value/079fcaff-3ae6-4150-b2e6-2b7330ffbcd9/jhu-usc.edu_GBM.HumanMethylation450.10.lvl-3.TCGA-19-A6J5-01A-21D-A33U-05.gdc_hg38.txt Code: https://github.com/BioinformaticsFMRP/TCGAbiolinks/blob/master/R/prepare.R#L90-L94 If you have the query object it should be easy to read the data already downloaded. You would just need to be in the same directory as the root download folder (default is 'GDCdata'). Recreating the query object by hand would need you to know the data and the some of the fields values. I don't believe it is worthy to create it by hand. I would ask the query object he used. Best regards, |
Dear Tiago, Thank you for your reply. My file structure is exactly as your example, and I have the command line she used to get the query and download the data. So, what command lines should I use? Just the GDCquery and GDCprepare? Thanks again, |
You only need to use GDCquery and GDCprepare. But if you use GDCdownload it should say that all samples were already downloaded. |
Dear Tiago, I'm still having problems with TCGAbiolinks recognizing the already downloaded data. I've checked the directory structure and it is the same as the one requested by TCGAbiolinks. If I use GDCquery and GDCprepare together, it gives me the following error: If I try the GDCdownloads, it starts downloading, but it gives me an error saying the file or directory does not exist: However, when I check the file, it is there and in the same directory requested: Please, do you know where am I getting it wrong? Best wishes, |
Hi Natasha, I just tried this. It seems fine with me. One thing to point out, you need your current working directory in the folder above GDCdata, not inside GDCdata. If you want to know the details of GDCprepare, just type GDCprepare and see the source code. |
@NatashaJorge I have this problem too. could you solve your problem? |
HI, I have the same issue. Would you please help me out if you could get any solution for that? Thank you. |
After adding the directory = Outdir into GDCprepare function, my problem has been solved. the codes as follows: get_OmicsData <- function(project = cancer_type,
Outdir = "mRNA"){
if(Outdir == "mRNA"){
query_Data <- GDCquery(project = project,
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts")
}else if(Outdir == "miRNA"){
query_Data <- GDCquery(project = project,
data.category = "Transcriptome Profiling",
data.type = "miRNA Expression Quantification",
workflow.type = "BCGSC miRNA Profiling")
}else if(Outdir == "CNV"){
query_Data <- GDCquery(project = project,
data.category = "Copy Number Variation",
data.type = "Copy Number Segment")
}else if(Outdir == "DNA_Methylation"){
query_Data <- GDCquery(project = project,
data.category = "DNA methylation",
legacy = TRUE)
}
GDCdownload(query = query_Data,
method = "api",
files.per.chunk = 60,
directory = Outdir)
expdat <- GDCprepare(query = query_Data,
directory = Outdir)
return(expdat)
}
dat_mRNA <- get_OmicsData(project = cancer_type,
Outdir = "mRNA")
saveRDS(dat_mRNA, file = "TCGA-KIRC_mRNA.RDS") Hope it would be helpful to you. |
@HuaZou Thank you |
I am probably late with answer. Windows has limit to the length of path/filename, and limit is 260 characters. For example, the maximum path on drive D is "D:\some 256-character path string". To solve this issue, try creating or moving project folder directly into the drive "D:\path_to_project_that_is_less_then_60_character_long", or on some shorter path (less than 60 characters, since length from project folder to file that contains data is around 199 characters by default). |
I have the same problem; I try but still can't solve. 555 |
The funny thing is, I went through it again today, 'GDCquery-GDCDownload-GDCprepare', and I found that there was no difference between where I had stored the GDCdata and where I had stored it before, it was in the same directory as the script location, and all the files in the GDCdata were exactly the same. The Windows path is also adjusted to within 60 characters. |
Hello,
I'm a post doc and just started using TCGAbiolinks, it's been really helpful! Many thanks!! :)
I'm currently working with some data already downloaded by a colleague and saved in the same file structure as GDCDownload and GDCprepare. I was wondering, is there is a way of TCGAbiolinks recognize this already downloaded data?
Thanks for your time and attention,
Best wishes!
The text was updated successfully, but these errors were encountered: