Skip to content

GTP-programmers/KARAJ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

========================================================================================================================================================================================

A command-line software to streamline acquiring biological data

We here developed KARAJ a fast and flexible Linux command-line tool to automate the end-to-end process of querying and downloading a wide range of file formats containing genomic and transcriptomic sequence data. KARAJ takes a list of PMCIDs or publication’s URL and automates four main tasks; firstly, it gives a summary list of accessible datasets generated by or used in a list of scientific articles and enables users to select whichever ones willing to download; secondly, KARAJ calculates the size of files that users are willing to download and checks with the local driver to ensure the availability of adequate space in the local disk; thirdly, it generates the metadata table containing sample information and experimental design of the corresponding study; and lastly, it enables users to download supplementary data tables attached to publications. KARAJ is publicly available for researchers to use.

--------------------------------------------------------------------------------------------------------------------------------------------

Table of Contents


Installation

KARAJ runs on LINUX. Install the package from Github using the following commands.

wget https://github.com/GTP-programmers/KARAJ/archive/refs/heads/main.zip
unzip main.zip
cd KARAJ-main/src
chmod +x Installer.sh
./Installer.sh
chmod +x KARAJ.sh


Required arguments

Flags Description Default Syntax
-l Passing URL(s) empty ./KARAJ.sh -l [URL1 URL2 URL3 ... URLn ]
-p Passing PMCID(s) empty ./KARAJ.sh -p [PMCID1 PMCID2 PMCID3 ... PMCIDn]
-o Output working directory The current working directory ./KARAJ.sh -o [directory/output]
-t Specifying type of files formats KARAJ downloads all of file types ./KARAJ.sh -t [bam/vcf/fastq ]
-s Downloading supplementary tables 0 ./KARAJ.sh -s [1/0]
-f Passing list of URL(s), PMCID(s) or accession number(s). Value 1 corresponds to a file named “PMCIDs” for passing a list of PMCIDs. Value 2 corresponds to a file named “ACCESSIONS” for passing a list of accession numbers. Value 3 corresponds to a file named “URLS” for passing a list of URL(s). These files must be created in the working directory. Each line in either “PMCIDS”, “URLS” and “ACCESIONS” must contain only one entity. KARAJ downloads based on the -l, -p or -i flags ./KARAJ.sh -f [1/0]
-i Passing accession number(s) KARAJ downloads all of accession IDs ./KARAJ.sh -i [SRR/SRP/PRJ/PRJNA]
-d Selecting accession number(s) to download. Value 1 corresponds to go to selection module before downloading files, (0): downloads all files. This option must be passed along with options -l, -p, -f or -i. ./KARAJ.sh downloads all of files associated to the -l, -p, -f or -i flags ./KARAJ.sh -d [1/0]
-m Downloading metadata. This option must be passed along with options -l, -p, -f or -i. empty ./KARAJ.sh -m [0/1]
-h Help empty ./KARAJ.sh -h
-u Usage and examples empty ./KARAJ.sh -u]
-j Number of cores Number of accessible cores minus one ./KARAJ.sh -j [core]
-n Using this option, the processed data of selected accession numbers can be retrieved. By passing 1, only the processed data will be downloaded. The default value is 0 which ignores downloading processed data. This option must be used along with one of the options -p, -l or -f empty ./KARAJ.sh -n [0/1]

Common Error messages

Error message Description
Invalid option, please check help using -h option The error message states that a wrong option is passed.
Value is missing for option [X] The error message states that the value is missed for a particular option. X corresponds to any options.
One of the obligatory options is missing The error message states that one of the options -p, -l, -f or -i has not passed.
Conciliatory options have passed One of the options -p, -l or -f can be passed at the time.
No sample found. Either the provided accession number is invalid or raw data was not provided for this record The error message states that provided accession number is not registered on the respective database.
The needed memory size for downloading specified datasets is larger than the free space available on the designated local directory. Please change the directory. The error message states that there is no adequate space on the designated directory to download all specified datasets.

Examples

1. Command for downloading sequence data of one accession number:
$ ./KARAJ.sh -i GSE126379   

2. Command for downloading sequence data of multiple accession numbers:
$ ./KARAJ.sh -i GSE126379 GSE92521 PRJNA427709 SRR10668798 GSE115469 

3. Command for downloading sequence data of a list of accession numbers:
First, make a file in the working directory entitled “ACCESSIONS.txt” containing the list of accession numbers. Then run the following command
$ ./KARAJ.sh -f 1 

4. Command for mining the text of an article for accession numbers and downloading se-quence data corresponding to them – using PMCID of the article:
$ ./KARAJ.sh -p PMC6492329 -s 0

5. Command for mining the text of multiple articles for accession numbers and download-ing the sequence data corresponding to them – using PMCID of the articles:
$ ./KARAJ.sh -p PMC7182534 PMC6492329 PMC8000127 PMC6957475 PMC8455923 PMC8844275 PMC8426200 PMC7789210 -s 0

6. Command for mining a list of articles for accession numbers and downloading the se-quence data corresponding to them – using PMCID of the articles:
First, make a file in the working directory entitled “PMCIDS.txt” containing the list of ar-ticle PMCIDs. Then run the following command
$ ./KARAJ.sh -f 2 -s 0

7. Command for mining the text of an article for accession numbers and downloading se-quence data corresponding to them – using URL of the article:
$ ./KARAJ.sh -l https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6492329/ -s 0

8. Command for mining the text of multiple articles for accession numbers and download-ing the sequence data corresponding to them – using URL of the articles: 
$ ./KARAJ.sh -l https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6492329/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7182534/ -s 0

9. Command for mining the text of a list of articles for accession numbers and downloading the sequence data corresponding to them – using the article URLs: 
First, make a file in the working directory entitled “URLS.txt” containing the list of article URLs. Then run the following command.
$ ./KARAJ.sh -f 3 -s 0

10. Command for downloading supplementary files using article URL:
$ ./KARAJ_V1.sh -l https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6492329/ -s 1

11. Command for downloading supplementary files using article PMCID (please see the supplementary Supplementary figure 3):
$ ./KARAJ.sh -p PMC6492329 -s 1

12. Command for downloading supplementary tables using article PMCID (see supplemen-tary figure 3):
$ ./KARAJ.sh -p PMC7182534 PMC6492329 PMC8000127 PMC6957475 PMC8455923 PMC8844275 PMC8426200 PMC7789210 -s 1

13. Command for downloading supplementary tables of a list of articles using PMCID:
First, make a text file in the working directory entitled “PMCIDS.txt” containing the list of article PMCIDs. Then run the following command
$ ./KARAJ.sh -f 2 -s 1

14. Command for downloading supplementary tables of a list of articles using article URL:
First, make a file in the working directory entitled “URLS.txt” containing the list of article URLs. Then run the following command. 
$ ./KARAJ.sh -f 3 -s 1

15. Command for downloading the metadata table for sequence data of using accession number: 
$ ./KARAJ.sh -i GSE126379 -s 0 -m 1



Reference

Please consider citing the follow paper when you use this code.
  Title={KARAJ: a command-line software to streamline acquiring biological data},
  Authors={Mahdieh Labani, Amin Beheshti, Nigel Lovell, Hamid Alinejad-Rokny, Ali Afrasiabi}
}

Contacts

I will be pleased to address any question or concern about the KARAJ package: In case of queries, please email: m.labani@unsw.edu.au or a.afrasiabi@unsw.edu.au


Author Info

People who contributed to the KARAJ idea and code:


Acknowledgements

This research received no external funding. Mahdieh Labani was supported by a Macquarie University PhD Scholarship. Hamid Alinejad-Rokny was funded by a University of New South Wales (UNSW) Scientia Program Fellowship and an Australian Research Council Discovery Early Career Researcher Award. Ali Afrasiabi was supported by an Australian Government Research Training Program (RTP) scholarship. This project would not have been possible without High Performance Computing resources provided by the BioMedical Machine Learning Lab, UNSW Graduate School of Biomedical Engineering.


License

This package is free software; you can redistribute it and/or modify it under the terms of the , MIT License as published by the Free Software Foundation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages