Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download-files: initial release #22

Closed
tiborsimko opened this issue Aug 6, 2020 · 3 comments · Fixed by #34
Closed

download-files: initial release #22

tiborsimko opened this issue Aug 6, 2020 · 3 comments · Fixed by #34
Assignees

Comments

@tiborsimko
Copy link
Member

If a user wishes to download files belonging to a record, the current technique is to list file locations:

$ cernopendata-client get-file-locations --recid 5500 --protocol http
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/BuildFile.xml
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/HiggsDemoAnalyzer.cc
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/List_indexfile.txt
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/M4Lnormdatall.cc
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/M4Lnormdatall_lvl3.cc
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level3MC.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level3data.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level4MC.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level4data.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/mass4l_combine.pdf
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/mass4l_combine.png

and then launch wget or curl commands to download them.

The goal of this issue is to simplify this process by introducing new command download-files that would do this for the user.

Possible options:

$ cernopendata-client download-files --recid 5500 --protocol http --parallel-processes 2

This would launch two parallel downloading processes, using a suitable Python library, to download the files into current directory.

P.S.: MVP is simply to download files; resuming interrupted downloads will be part of another issue, but it is good to think about this functionality upfront.

P.S. An option --target-directory could be introduced which would recreate directory structure known from the original record. This will be important for AOD files which have subdirectory structure such as this one. So the corresponding subdirectories would have to be created in the target directory.

@tiborsimko
Copy link
Member Author

Example record to support in this issue: 5500.

The "harder" use case of recrord 1 with index files was separated into issue #25.

@ParthS007 ParthS007 self-assigned this Sep 1, 2020
@ParthS007
Copy link
Member

ParthS007 commented Sep 1, 2020

Documenting here the options to test the download files functionality

  • Pycurl
  • Requests with chunk sizes
  • Asyncio

I will be going ahead with these options and check the time taken in downloading and if they are compatible with Python 2 and 3 both.

@tiborsimko Can you please provide a recid with large files. Total maybe around (5 gigs). I will test with 5500 first.

@tiborsimko
Copy link
Member Author

@tiborsimko Can you please provide a recid with large files. Total maybe around (5 gigs). I will test with 5500 first.

  • one O(5GB) file: record ID 3851 or 3853
  • one O(20GB) file: record ID 15007
  • several O(20GB) files in one record: record ID 15001

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants