Skip to content

AppliedBioinformatics/gene-daisychain

Repository files navigation

gene-daisychain

Linking several gene annotations in an easy to use web interface. It's a neo4j graph database with a web interface which stores different and connects annotations using MCL.

The web interface uses a JavaScript library from KnetMiner: https://github.com/Rothamsted/knetminer, https://knetminer.com, https://onlinelibrary.wiley.com/doi/10.1111/pbi.13583

Usage

Let's walk through a simple usage example. Users can, depending on the dataset used, select an example gene ID to search Daisychain:

image

This will display all genes with similar gene IDs:

image

Selecting this gene and clicking on 'Show graph' will reveal the gene-view, where the user can right-mouse click on genes to see more options:

image

Selecting 'Show homologs' will then display all homologs for the current gene:

image

3'/5' genes and more homologs can be displayed. Clicking on the 'refresh button' (arrow in a circle) will refresh the layout of the currently displayed genes.

Setup

You have three things that have to be running, could be on the same machine, could be on two. There is a Daisychain_config.txt file in every folder, make sure that the servers' IPs and ports are correct, and that these ports are open to each other.

In the following we have two machines: a server which runs the graph database, and a client which runs the web-frontend and sends user queries to the graph database running on server.

Daisychain server

This is the neo4j graph database containing the links between genes running on server. It needs to be up and running and visible to the world, in a screen session or as a daemon.

python3 /mnt/Daisychain_server/Daisychain_server.py

Daisychain gateway

There's a client script which facilitates communication between the server and the web-server (=client). It also has to be up and running on the client.

python3 /mnt/Daisychain_Gateway/Daisychain_Client.py

Daisychain web frontent

This is the web server running on client. There's a bash script in Daisychain_knet_web which will host the server using npm's serve, but feel free to use Apache too.

bash start_server.sh

Setting up a new database

The script Daisychain_admin/Daisychain_admin.py is used to build new databases. Run this (with the correct config file):

python3 Daisychain_admin/Daisychain_admin.py 

Press 1 to see all databases, press 2 to create new databases, :

image

Press (2) to make a new project, give it a name, then (3) to import data into this project. Here is an example CSV for two assemblies that step (3) will take:

Brassica napus,ZS11,genome,/some/location/brassica_napus/Brassica_napus_ZS11_genome_assemblyV201608.fa
Brassica napus,ZS11,annotation,/some/location/brassica_napus/Brassica_napus_ZS11_GenesetV201608_head.gff
Brassica napus,Darmorv5,genome,/some/location/brassica_napus/Brassica_napus_v4.1.chromosomes.fa
Brassica napus,Darmorv5,annotation,/some/location/brassica_napus/Brassica_napus.annotation_v5_head.gff3

Copy-paste the paste to this CSV, and the clustering and import step will begin. The running server will print status updates, example:

Importing now
Brassica napus,ZS11,genome,/some/location/brassica_napus/Brassica_napus_ZS11_genome_assemblyV201608.fa
Importing now
Brassica napus,ZS11,annotation,/some/location/brassica_napus/Brassica_napus_ZS11_GenesetV201608_head.gff
Importing now

Finished importing

Once Finished importing has printed, we can build the database. Go to the admin script and press (4), to build a projects database. Choose the right project ID. It will print how many genomes and annotations it has found (2 in the above example).

One can then choose whether admin should guess as much as possible from the gff, or set fields to manual:

image

(a)utomatic mode works with many gffs. After a few seconds, the parser will print potential annotations it has found:

image

Choose one that makes sense via 1, 2, 3, etc., and proceed through all assemblies in this way. Then wait for the clustering to end, the Server window will provide constant updates.

You can use the admin menu to check on the status of the database via the (1) option, List available projects, it should look like this once it's finished:

image

While the clustering is running, this status window will display INIT_SUCCESS.

There is E. coli example input data in the folder example in this repo. Building this database should only take 5 minutes.

About

A web server linking genome annotations. Aimed at researchers working on single genes that want to know what the homolog in other published assemblies is.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published