Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in if (ret == 1) break : argument is of length zero while downloading SNV data #512

Open
swapnil-nis opened this issue May 2, 2022 · 16 comments

Comments

@swapnil-nis
Copy link

I am trying to download SNV data of TCGA-HNSC from last two months. But, I am not able to do so, and every time I used to get the same error as appended below:

query <- GDCquery(
project = "TCGA-HNSC",
data.category = "Simple Nucleotide Variation",
data.type = "Annotated Somatic Mutation",
)

GDCdownload(query)
Downloading data for project TCGA-HNSC
GDCdownload will download 5120 files. A total of 2.598036157 GB
The total size of files is big. We will download files in chunks
Downloading chunk 1 of 3 (1970 files, size = 1.002520368 GB) as Mon_May__2_21_37_55_2022_0.tar.gz
|======================================================================| 100%
/bin/tar: This does not look like a tar archive

gzip: stdin: not in gzip format
/bin/tar: Child returned status 1
/bin/tar: Error is not recoverable: exiting now
Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 3 (1970 files, size = 1.002520368 GB) as Mon_May__2_21_37_55_2022_0.tar.gz
|======================================================================| 100%
/bin/tar: This does not look like a tar archive

gzip: stdin: not in gzip format
/bin/tar: Child returned status 1
/bin/tar: Error is not recoverable: exiting now
Download completed
Error in if (ret == 1) break : argument is of length zero

@tiagochst
Copy link
Contributor

Hi,
Which version of TCGAbiolinks are you using?

This data is controlled. This should not be accessible without a token.
Also the data seems to be either a maf file or vcf.
Screen Shot 2022-05-02 at 3 04 27 PM

@Mo7tafa
Copy link

Mo7tafa commented May 24, 2023

I have the same issue!
And I have checked the samples I want to download couple of times, seems like there is nothing wrong with the samples I chose. Still I can't figure out what is going wrong.

@tiagochst
Copy link
Contributor

@Mo7tafa This is a controlled data. You need to provide token.file parameter to GDCdownload. Or export the Manifest and use GDC token to download controlled data.

@Mo7tafa
Copy link

Mo7tafa commented May 25, 2023

@Mo7tafa This is a controlled data. You need to provide token.file parameter to GDCdownload. Or export the Manifest and use GDC token to download controlled data.

I`ve checked my data there is no control! I also used 'access = "Open"' argument in GDCquery to make sure it's not involving controlled data !

@tiagochst
Copy link
Contributor

tiagochst commented May 25, 2023 via email

@Mo7tafa
Copy link

Mo7tafa commented May 26, 2023

@tiagochst
I am using R 4.2.3 from windows

library("TCGAbiolinks")

NewData <- GDCquery(project = "TCGA-UVM",data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification", workflow.type = "STAR - Counts", access = "Open")

GDCdownload(NewData, method = "api")

@Mo7tafa
Copy link

Mo7tafa commented May 26, 2023

I also tried

GDCdownload(query = NewData, method = "api", files.per.chunk = 1)

but still the same problem :
Error in if (ret == 1) break : argument is of length zero

@tiagochst
Copy link
Contributor

@Mo7tafa which TCGAbiolinks version ? Your code is working for me, but I am using Mac/Linux.

@Mo7tafa
Copy link

Mo7tafa commented May 28, 2023

@tiagochst Actually I have updated the package yesterday so I am sure it is the last version! The strange part is when I tested the code on other computer(still windows, same R version, Same internet connection) it worked well but it is still not running for me! I don't know where the problem is.

@Mo7tafa
Copy link

Mo7tafa commented May 28, 2023

image

@jowkar
Copy link

jowkar commented Jun 2, 2023

I have a similar issue. Even using the example provided in the documentation for the function GDCdownload gives this error:

query <- GDCquery(
  project = "TCGA-ACC",
  data.category = "Copy Number Variation",
  data.type = "Copy Number Segment"
)

GDCdownload(query, files.per.chunk = 1)

Output:

Downloading data for project TCGA-ACC
GDCdownload will download 180 files. A total of 6.3478 MB
Downloading chunk 1 of 180 (1 files, size = 42.731 KB) as Fri_Jun__2_16_22_52_2023_0.tar.gz
  |====================================================================================================| 100%
Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 180 (1 files, size = 42.731 KB) as Fri_Jun__2_16_22_52_2023_0.tar.gz
  |====================================================================================================| 100%
Download completed
Error in if (ret == 1) break : argument is of length zero

I also get this error when trying to download open access gene expresison quantification data from another cohort.

I have tried both the github development version and the current release on bioconductor (v2.28.2).

@tiagochst
Copy link
Contributor

@jowkar If you set GDCdownload(query, files.per.chunk = 2) does it work ?

@jowkar
Copy link

jowkar commented Jun 3, 2023

@jowkar If you set GDCdownload(query, files.per.chunk = 2) does it work ?

No, I get the same error. I've tried various values for this parameter, including 5 and 100 as well, besides also the default value. This is the output for files.per.chunk = 2:

Downloading data for project TCGA-ACC
GDCdownload will download 180 files. A total of 6.3478 MB
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
Error in if (ret == 1) break : argument is of length zero

@Mo7tafa
Copy link

Mo7tafa commented Jun 7, 2023

@jowkar If you set GDCdownload(query, files.per.chunk = 2) does it work ?

No, I get the same error. I've tried various values for this parameter, including 5 and 100 as well, besides also the default value. This is the output for files.per.chunk = 2:

Downloading data for project TCGA-ACC
GDCdownload will download 180 files. A total of 6.3478 MB
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
Error in if (ret == 1) break : argument is of length zero

I have come to realize this error depends on 3 situations :

  1. Your data must be a controlled data and if you set access argument to "Open" you will be alright
  2. Your internet is unstable and your chunks are heavy so you need to set it on smaller chuncks to download like 2 or more
  3. The cash of the hardware you are using is not enough! set your working directory into an empty drive and GDCdata file inside it and you will be alright

@jowkar
Copy link

jowkar commented Jun 8, 2023

@jowkar If you set GDCdownload(query, files.per.chunk = 2) does it work ?

No, I get the same error. I've tried various values for this parameter, including 5 and 100 as well, besides also the default value. This is the output for files.per.chunk = 2:

Downloading data for project TCGA-ACC
GDCdownload will download 180 files. A total of 6.3478 MB
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 90 (2 files, size = 63.956 KB) as Sat_Jun__3_13_23_15_2023_0.tar.gz
Downloading: 15 kB     Download completed
Error in if (ret == 1) break : argument is of length zero

I have come to realize this error depends on 3 situations :

1. Your data must be a controlled data and if you set access argument to "Open" you will be alright

2. Your internet is unstable and your chunks are heavy so you need to set it on smaller chuncks to download like 2 or more

3. The cash of the hardware you are using is not enough! set your working directory into an empty drive and GDCdata file inside it and you will be alright

With the latest development version from GitHub, the TCGA-ACC query for CNV data works for me now. However, the actual query I wanted to run is the following, and it still has this issue on a laptop with high-speed internet connection over ethernet (ca 280 Mbps download speed) and 400GB free space. On the other hand, I also tried installing this same version of TCGAbiolinks on a separate server, where the query successfully downloaded the data (attempts with a previous version of the package did not work on this server, however). So while I managed to get the data on that other system, I still think there is some bug here. Note that I both provide the parameter access = "open" and set files.per.chunk = 473, which is the total size of the cohort, and which results in only one chunk being downloaded (only 2GB) but which still fails.

query <- GDCquery(
  project = "TCGA-SKCM", 
  data.category = "Transcriptome Profiling",
  data.type = "Gene Expression Quantification",
  access = "open"
)

GDCdownload(query, method = "api",files.per.chunk = 473)

Downloading data for project TCGA-SKCM
GDCdownload will download 473 files. A total of 2.002491876 GB
Downloading chunk 1 of 1 (473 files, size = 2.002491876 GB) as Thu_Jun__8_13_46_20_2023_0.tar.gz
Downloading: 480 MB     Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 1 (473 files, size = 2.002491876 GB) as Thu_Jun__8_13_46_20_2023_0.tar.gz
Downloading: 480 MB     Download completed
Error in if (ret == 1) break : argument is of length zero

@ankushs0128
Copy link

Iam Also having same issues with MMRF-COMMPASS Data.

MM <- GDCquery(
  project = "MMRF-COMMPASS",
  
    data.category = "Transcriptome Profiling",
    data.type = "Gene Expression Quantification",
    workflow.type = "STAR - Counts",
   access = "open"
  )
GDCdownload(MM, method = "api", files.per.chunk = 1)
data <- GDCprepare(MM)

The error is appearing , when the 3rd iteration of download starts

Downloading data for project MMRF-COMMPASS
GDCdownload will download 859 files. A total of 3.625715802 GB
Downloading chunk 1 of 859 (1 files, size = 4.223307 MB) as Tue_Sep__5_10_16_36_2023_0.tar.gz
  |============================================================================================================================================| 100%
Download completed
At least one of the chunks download was not correct. We will retry
Downloading chunk 1 of 859 (1 files, size = 4.223307 MB) as Tue_Sep__5_10_16_36_2023_0.tar.gz
  |============================================================================================================================================| 100%
Download completed
Error in if (ret == 1) break : argument is of length zero

any pointer to resolve the error ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants