gene-daisychain

Linking several gene annotations in an easy to use web interface. It's a neo4j graph database with a web interface which stores different and connects annotations using MCL.

The web interface uses a JavaScript library from KnetMiner: https://github.com/Rothamsted/knetminer, https://knetminer.com, https://onlinelibrary.wiley.com/doi/10.1111/pbi.13583

Usage

Let's walk through a simple usage example. Users can, depending on the dataset used, select an example gene ID to search Daisychain:

This will display all genes with similar gene IDs:

Selecting this gene and clicking on 'Show graph' will reveal the gene-view, where the user can right-mouse click on genes to see more options:

Selecting 'Show homologs' will then display all homologs for the current gene:

3'/5' genes and more homologs can be displayed. Clicking on the 'refresh button' (arrow in a circle) will refresh the layout of the currently displayed genes.

Setup

You have three things that have to be running, could be on the same machine, could be on two. There is a Daisychain_config.txt file in every folder, make sure that the servers' IPs and ports are correct, and that these ports are open to each other.

In the following we have two machines: a server which runs the graph database, and a client which runs the web-frontend and sends user queries to the graph database running on server.

Daisychain server

This is the neo4j graph database containing the links between genes running on server. It needs to be up and running and visible to the world, in a screen session or as a daemon.

python3 /mnt/Daisychain_server/Daisychain_server.py

Daisychain gateway

There's a client script which facilitates communication between the server and the web-server (=client). It also has to be up and running on the client.

python3 /mnt/Daisychain_Gateway/Daisychain_Client.py

Daisychain web frontent

This is the web server running on client. There's a bash script in Daisychain_knet_web which will host the server using npm's serve, but feel free to use Apache too.

bash start_server.sh

Setting up a new database

The script Daisychain_admin/Daisychain_admin.py is used to build new databases. Run this (with the correct config file):

python3 Daisychain_admin/Daisychain_admin.py

Press 1 to see all databases, press 2 to create new databases, :

Press (2) to make a new project, give it a name, then (3) to import data into this project. Here is an example CSV for two assemblies that step (3) will take:

Brassica napus,ZS11,genome,/some/location/brassica_napus/Brassica_napus_ZS11_genome_assemblyV201608.fa
Brassica napus,ZS11,annotation,/some/location/brassica_napus/Brassica_napus_ZS11_GenesetV201608_head.gff
Brassica napus,Darmorv5,genome,/some/location/brassica_napus/Brassica_napus_v4.1.chromosomes.fa
Brassica napus,Darmorv5,annotation,/some/location/brassica_napus/Brassica_napus.annotation_v5_head.gff3

Copy-paste the paste to this CSV, and the clustering and import step will begin. The running server will print status updates, example:

Importing now
Brassica napus,ZS11,genome,/some/location/brassica_napus/Brassica_napus_ZS11_genome_assemblyV201608.fa
Importing now
Brassica napus,ZS11,annotation,/some/location/brassica_napus/Brassica_napus_ZS11_GenesetV201608_head.gff
Importing now

Finished importing

Once Finished importing has printed, we can build the database. Go to the admin script and press (4), to build a projects database. Choose the right project ID. It will print how many genomes and annotations it has found (2 in the above example).

One can then choose whether admin should guess as much as possible from the gff, or set fields to manual:

(a)utomatic mode works with many gffs. After a few seconds, the parser will print potential annotations it has found:

Choose one that makes sense via 1, 2, 3, etc., and proceed through all assemblies in this way. Then wait for the clustering to end, the Server window will provide constant updates.

You can use the admin menu to check on the status of the database via the (1) option, List available projects, it should look like this once it's finished:

While the clustering is running, this status window will display INIT_SUCCESS.

There is E. coli example input data in the folder example in this repo. Building this database should only take 5 minutes.

Name		Name	Last commit message	Last commit date
Latest commit History 1,202 Commits
Daisychain_Docs		Daisychain_Docs
Daisychain_Gateway		Daisychain_Gateway
Daisychain_Server		Daisychain_Server
Daisychain_admin		Daisychain_admin
Daisychain_cmd		Daisychain_cmd
Daisychain_knet_web		Daisychain_knet_web
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Daisychain_Docs

Daisychain_Docs

Daisychain_Gateway

Daisychain_Gateway

Daisychain_Server

Daisychain_Server

Daisychain_admin

Daisychain_admin

Daisychain_cmd

Daisychain_cmd

Daisychain_knet_web

Daisychain_knet_web

LICENSE

LICENSE

README.md

README.md

Repository files navigation

gene-daisychain

Usage

Setup

Daisychain server

Daisychain gateway

Daisychain web frontent

Setting up a new database

About

Releases

Packages

Contributors 2

Languages

License

AppliedBioinformatics/gene-daisychain

Folders and files

Latest commit

History

Repository files navigation

gene-daisychain

Usage

Setup

Daisychain server

Daisychain gateway

Daisychain web frontent

Setting up a new database

About

Resources

License

Stars

Watchers

Forks

Languages