Project description: A CLI to enable creating, querying and parsing a Blast database using fasta files. This CLI also allows the user to create a PostgreSQL database on which to store the parsed results.
Date: September 2020
Researcher(s):
- Alexandre Jousset (A.L.C.Jousset@uu.nl)
Research Software Engineer(s):
- Casper Kaandorp (c.s.kaandorp@uu.nl)
- Leonardo Vida (l.j.vida@uu.nl)
To get a local copy up and running follow these simple steps.
Ensure to have the project dependencies. For this project you need to have python ^3.7 and pip
or poetry
(or any other dependency manager) to install the project's dependencies
This is a CLI, which is a program controlled from the terminal. Before beginning to use it, it is only necessary to install the Python package. The package is not (yet) distributed on pypi, therefore you will need to use the wheel package or the tar package available in the dist folder of this repository.
- Create a project folder.
- Either download the wheel package or tar package and move it to the project folder.
To install this package you can use pip
, or any other packaging dependency management system you use, such as Anaconda or Poetry (highly recommended!).
3. Install the package. From the project folder in which you placed the downloaded package:
pip install microbiome-0.1.0-py3-none-any.whl
or
poetry add ./microbiome-0.1.0-py3-none-any.whl
On MacOS, with brew:
brew update
brew install postgres
On Ubuntu:
sudo apt update
sudo apt install postgresql postgresql-contrib
Enter the PostgreSQL shell:
sudo -u postgres psql
- Create username
sudo -u postgres createuser <username>
- Create database
sudo -u postgres createdb <dbname>
- Make the user become superuser
ALTER USER <username> WITH SUPERUSER;
- Provide the rights to the database
GRANT ALL PRIVILEGES ON DATABASE database_name TO username;
- Reload and restart the service:
sudo /etc/init.d/postgresql reload && sudo /etc/init.d/postgresql start
On MacOS, with brew
brew install blast
On Ubuntu
- In depth instructions are available on the official website
Within the project folder, you need to create the following necessary folders, that will be used to create the database
, retireve the queries
and store the results
.
data
database
queries
results
Now that you installed the CLI, we can move on to explaining it usage.
The CLI is available by calling microbiome
on your terminal, although this behavior might change if you installed it within a virtual environment. However, I trust that you'd know what you are doing and therefore also how to access it.
By writing microbiome
the following the CLI displays all the available functions and a short description of what the effect of that function will be.
If instead, you need more information about a specific function, you can request it using microbiome <function-name> --help
Before any action can be carried out, you need to setup your environemnt variables calling: microbiome setup
You will need to insert the path to the folder you created before and choose the names of the blast database and the PostgreSQL database. You can use the same name from the example below.
Now, you can create the blast database by just typing microbiome create-blast-database
.
Remember: you will need to microbiome create-blast-database
each time you want to add new .fa files to the blast database. Lukily this process does not take too long.
The Blast database will be automatically created into the data/database
folder you previously created.
Once the database is created, you can query it using other .fa files. To do this:
- Add one or more files to the
data/queries
folder you previously created. - Run
microbiome blast-query
and follow the instruction:- Type
Y
(es) if you have more than one file to query,N
(o) if you only have one file - You can add an
evalue
(in the first place) andoutfmt
(in the second place) value to improve your query. This can be done in the following way:microbiome blast-query 0.0001 2
for querying the blast database with anevalue = 0.0001
and andoutfmt = 2
- Type
- The results (.xml) of the blasting will be available in the
data/results
folder.
To parse the database and add the results, alongside with information about the match, the original .fa file and the .xml file created by the gapseq algorithm on that .fa file you can use microbiome blast-parse
.
The query also can receive two additional parameters:
add_to_db
, will load the parsed results to the database and can beTrue
orFalse
. By default it isTrue
.top_k
, defines the number of top results to parse and to eventually add to the database. By default it is3
.
As visible in the screenshot above, the results are automatically added to the PostgreSQL database from which they can be retrieved.
The following variables are currently added to the database for each result:
id
, sequence, the primary keyfull_name
, the name of the result result (ID)bitscore
, the score of the match as from the blast callevalue
the evalue score of the match as from the blast callorder_match
, the order of the match. This was created as we currently retrive the top 3 matches (but this can be configured in theblast-parse
call)query_range
, the range of the matchhit_range
, the hit range of the matchsmbl_xml
, the smbl file created by gapseqfasta
, the fasta file that produced the smblcreated
, the date of creation
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Leonardo Vida - @leonardojvida - l.j.vida@uu.nl
Project Link: https://github.com/UtrechtUniversity/microbiome