Skip to content
This repository has been archived by the owner on Dec 29, 2023. It is now read-only.

Tool to download genome data files from EMBL-EBI

Notifications You must be signed in to change notification settings

BigCheeze45/genomedownloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Genome Downloader

genomedownload.py is a python script that can be use to download anything but was written specifically to download genome data files from EMBL-EBI FTP. It can take a single URL or a list of them (via file) to download.

Note: This script is written for Python 3 or later. This was cowboyed together and has no tests. It is intended as both a starting point for future development and a useful example.

Setup

The following instructions assume you're on a Unix like system with Python 3 or later.

  1. Clone the repository: git clone git@github.com:BigCheeze45/genomedownloader.git
  2. Create and activate a Python virtual environment
    1. python3 -m venv env
    2. source /path/to/env/bin/activate
  3. Installed the required packages: pip install -r /path/to/genomedownloader/requirements.txt

Once installation is complete the script is ready to use!

Using genomedownload.py

genomedownload.py takes a single URL or a list of URLs of genome data files to download. You can also provide an output directory to place the downloaded data.

If using a file make sure each URL is on its own line.

# Single URL no output folder specifed
python genomedownload.py --url ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR752/ERR752938/ERR752938_1.fastq.gz

# Single URL with output folder
python genomedownload.py --url ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR752/ERR752938/ERR752938_1.fastq.gz -o output/

# List of URLs with output folder
python genomedownload.py -f yeastDNAlinks.txt -o output/


# Complete usage guide
usage: genomedownload.py [-h] [--url URL | -f FILE] [-o OUTPUT]

Tool to download genome data files from EMBL-EBI

optional arguments:
  -h, --help            show this help message and exit
  --url URL             The absolute URL to the genome data you want
                        downloaded
  -f FILE, --file FILE  Path to a plain text file containing full URLs to the
                        genome data you want downloaded
  -o OUTPUT, --output OUTPUT
                        Location you want to store the downloaded genome data.
                        Default to current working directory

To-dos

  • Publish to PyPi for easier end uer installation
  • Progress bar
  • Refine logging control

Contributing

Submit a pull request with your changes. Open an issue if you find any!

About

Tool to download genome data files from EMBL-EBI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages