Skip to content

Bioinformatics research project on metagenomic in samples from TaraOceans expediction

Notifications You must be signed in to change notification settings

dalmolingroup/meta_taraoceans

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

meta_taraocean

This repository is the result of a master's project in Bioinformatics, focusing on ocean metagenomics. For this analysis, data obtained from the Tara Oceans Expedition and the MEDUSA open source pipeline were used.

To reproduce the results, it is important to correctly follow the steps present in this README or even those present in each directory of this repository.

Repository structure

  • doc: Tables used in the project.
    • tables/original: Tables retrieved directly from the Tara Oceans website.
    • tables/used: Tables containing only information about the samples selected for the present work.
  • data: All material developed in this project.
    • pipeline_workflow: MEDUSA pipeline configuration and execution files.
    • results: Code and results obtained with the output of the MEDUSA pipeline. Each subfolder of these repositories has as respective subfolders "biodiversity", "functional" and "taxonomic", with their due explanations present in the README.md files present in each one of them.
      • results/code: All the code created to build tables, graphs and figures for interpreting the results.
      • results/input: Input files for analyzing the results. These files are the MEDUSA pipeline output files.
      • results/output: Output files with the results obtained in the analyses, such as tables, graphs and figures.

Preparation of the working environment

Here, we used the MEDUSA, a tool for metagenomics analysis. The code used in this project followed the step-by-step guide described on the Supplementary Material of the MEDUSA paper.

  1. Go to data/pipeline_workflow/00-default
cd meta_taraoceans/data/pipeline_workflow/00-default
  1. Adjust the paths of variables present in 00-var.sh to your local computer
nano 00-var.sh
  1. Follow the steps present in 00-environment.sh

Pipeline execution

  1. Activate the environment
conda activate medusaPipeline
  1. Execute, in order, each step of the pipeline. Note: only start the next one after finishing the previous one.
nohup 01-raw_data.sh &
nohup 02-preprocessing.sh &
nohup 03-alignment.sh &
nohup 04-taxonomic_classification.sh &
nohup 05-functional_annotation.sh &

Results

  • Taxonomic Classification
cd meta_taraoceans/data/results/code/taxonomic
Rscript taxonomic.R
  • Biodiversity
cd meta_taraoceans/data/results/code/biodiversity
Rscript biodiversity.R
  • Functional Annotation
cd meta_taraoceans/data/results/code/functional
Rscript functional.R

For more information about the MEDUSA pipeline, check:

Morais, D.A.A.; Cavalcante, J.V.F.; Monteiro, S.S.; Pasquali, M.A.B.; Dalmolin, R.J.S. MEDUSA: A Pipeline for Sensitive Taxonomic Classification and Flexible Functional Annotation of Metagenomic Shotgun Sequences, 2022. https://doi.org/10.3389/fgene.2022.814437.

About

Bioinformatics research project on metagenomic in samples from TaraOceans expediction

Resources

Stars

Watchers

Forks

Languages

  • R 85.4%
  • Shell 14.6%