Skip to content

Open source research project to find viral photosystem genes in publicly available data

Notifications You must be signed in to change notification settings

bartaelterman/find-viral-photosystem-genes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Photosystem genes in cyanophages

Background

This project was based on the results published in the following article: Photosystem I gene cassettes are present in marine virus genomes by Sharon et al. The authors identified a number of photosystem I genes in marine virus genomes by performing a tBLASTx search for photosynthesis genes in the contigs of the Global Ocean Sampling project.

Data

The analysis performed in the article was done on publicly available data from the Global Ocean Sampling project. The sequences used were contigs, so the assembly was already done by the GOS researchers. The data can be downloaded from the CAMERA portal. You can obviously use your own data if you like.

Goal

Here we will try to reproduce the results from the published article.

Run the analysis

If you want to run the analysis yourself, on the same or on your own data, you can clone this repository.

Dependencies

  • Python (I use 2.7)
  • A local version of the NCBI BLAST+ tools with the executables in your local PATH variable
  • gnu make

What is in this repository?

  • results/ : directory containing results of experiments. You'll find experiment-specific code there.
  • README file: this file
  • notes: notes taken while developing this project
  • Makefile: contains all commands needed to run this analysis. To run all commands, do make all.
  • data/: directory containing data. This mainly includes a collection of photosynthesis genes that are used as query for BLAST searches.

What is not in this repository?

  • The Global Ocean Sampling dataset, as these files are too large to put them here. You can download them yourself from the CAMERA portal and create a simlink in the data/ folder of this repo called input.fasta referencing to the Global Ocean Sampling fastafile. The one I use contains the assembled sequences. There is a rule in the makefile that will create a blastable database from that file.

Get the code

git clone thisrepo

Run the analysis

make all

View the Makefile for more documentation.

Contribute

Please feel free to contribute to this project. You can track my progress and you can fork this repository or propose changes if you like.

About

Open source research project to find viral photosystem genes in publicly available data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published