Microbiome CLI

About the Project

Project description: A CLI to enable creating, querying and parsing a Blast database using fasta files. This CLI also allows the user to create a PostgreSQL database on which to store the parsed results.

Date: September 2020

Researcher(s):

Alexandre Jousset (A.L.C.Jousset@uu.nl)

Research Software Engineer(s):

Casper Kaandorp (c.s.kaandorp@uu.nl)
Leonardo Vida (l.j.vida@uu.nl)

Built with

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

Ensure to have the project dependencies. For this project you need to have python ^3.7 and pip or poetry (or any other dependency manager) to install the project's dependencies

Installation

1. Install the CLI package

This is a CLI, which is a program controlled from the terminal. Before beginning to use it, it is only necessary to install the Python package. The package is not (yet) distributed on pypi, therefore you will need to use the wheel package or the tar package available in the dist folder of this repository.

Create a project folder.
Either download the wheel package or tar package and move it to the project folder.

To install this package you can use pip, or any other packaging dependency management system you use, such as Anaconda or Poetry (highly recommended!). 3. Install the package. From the project folder in which you placed the downloaded package:

pip install microbiome-0.1.0-py3-none-any.whl

or

poetry add ./microbiome-0.1.0-py3-none-any.whl

2. Install and setup a PostgreSQL database

On MacOS, with brew:

brew update
brew install postgres

On Ubuntu:

sudo apt update
sudo apt install postgresql postgresql-contrib

Enter the PostgreSQL shell:

sudo -u postgres psql
Create username sudo -u postgres createuser <username>
Create database sudo -u postgres createdb <dbname>
Make the user become superuser ALTER USER <username> WITH SUPERUSER;
Provide the rights to the database GRANT ALL PRIVILEGES ON DATABASE database_name TO username;
Reload and restart the service: sudo /etc/init.d/postgresql reload && sudo /etc/init.d/postgresql start

3. Install NCBI+ blast

On MacOS, with brew

brew install blast

On Ubuntu

In depth instructions are available on the official website

4.Create folders

Within the project folder, you need to create the following necessary folders, that will be used to create the database, retireve the queries and store the results.

data
- database
- queries
- results

Usage

Now that you installed the CLI, we can move on to explaining it usage.

The CLI is available by calling microbiome on your terminal, although this behavior might change if you installed it within a virtual environment. However, I trust that you'd know what you are doing and therefore also how to access it.

`microbiome`: Main menu

By writing microbiome the following the CLI displays all the available functions and a short description of what the effect of that function will be.

If instead, you need more information about a specific function, you can request it using microbiome <function-name> --help

`microbiome setup`: Set variables

Before any action can be carried out, you need to setup your environemnt variables calling: microbiome setup

You will need to insert the path to the folder you created before and choose the names of the blast database and the PostgreSQL database. You can use the same name from the example below.

`microbiome blast-create-database`: Create Blast database

Now, you can create the blast database by just typing microbiome create-blast-database.

Remember: you will need to microbiome create-blast-database each time you want to add new .fa files to the blast database. Lukily this process does not take too long.

The Blast database will be automatically created into the data/database folder you previously created.

`microbiome blast-query`: Query blast database with .fa files

Once the database is created, you can query it using other .fa files. To do this:

Add one or more files to the data/queries folder you previously created.
Run microbiome blast-query and follow the instruction:
1. Type Y(es) if you have more than one file to query, N(o) if you only have one file
2. You can add an evalue (in the first place) and outfmt (in the second place) value to improve your query. This can be done in the following way: microbiome blast-query 0.0001 2 for querying the blast database with an evalue = 0.0001 and and outfmt = 2
The results (.xml) of the blasting will be available in the data/results folder.

`microbiome blast-parse`: Parse result and load to database

To parse the database and add the results, alongside with information about the match, the original .fa file and the .xml file created by the gapseq algorithm on that .fa file you can use microbiome blast-parse.

The query also can receive two additional parameters:

add_to_db, will load the parsed results to the database and can be True or False. By default it is True.
top_k, defines the number of top results to parse and to eventually add to the database. By default it is 3.

As visible in the screenshot above, the results are automatically added to the PostgreSQL database from which they can be retrieved.

The following variables are currently added to the database for each result:

id, sequence, the primary key
full_name, the name of the result result (ID)
bitscore, the score of the match as from the blast call
evalue the evalue score of the match as from the blast call
order_match, the order of the match. This was created as we currently retrive the top 3 matches (but this can be configured in the blast-parse call)
query_range, the range of the match
hit_range, the hit range of the match
smbl_xml, the smbl file created by gapseq
fasta, the fasta file that produced the smbl
created, the date of creation

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Contact

Leonardo Vida - @leonardojvida - l.j.vida@uu.nl

Project Link: https://github.com/UtrechtUniversity/microbiome

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
microbiome		microbiome
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microbiome CLI

Table of Contents

About the Project

Built with

Getting Started

Prerequisites

Installation

1. Install the CLI package

2. Install and setup a PostgreSQL database

3. Install NCBI+ blast

4.Create folders

Usage

`microbiome`: Main menu

`microbiome setup`: Set variables

`microbiome blast-create-database`: Create Blast database

`microbiome blast-query`: Query blast database with .fa files

`microbiome blast-parse`: Parse result and load to database

Contributing

Contact

About

Releases

Packages

Languages

UtrechtUniversity/microbiome

Folders and files

Latest commit

History

Repository files navigation

Microbiome CLI

Table of Contents

About the Project

Built with

Getting Started

Prerequisites

Installation

1. Install the CLI package

2. Install and setup a PostgreSQL database

3. Install NCBI+ blast

4.Create folders

Usage

microbiome: Main menu

microbiome setup: Set variables

microbiome blast-create-database: Create Blast database

microbiome blast-query: Query blast database with .fa files

microbiome blast-parse: Parse result and load to database

Contributing

Contact

About

Topics

Resources

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

`microbiome`: Main menu

`microbiome setup`: Set variables

`microbiome blast-create-database`: Create Blast database

`microbiome blast-query`: Query blast database with .fa files

`microbiome blast-parse`: Parse result and load to database

Packages