Skip to content

amartinsan/MetabolicProfile_Inferring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A pipeline to process metagenome samples

Workflow diagram

Plot

List of programs needed

Quality and filter

Assembly and binning

Functional Assingment

Mapping of reads to contigs

• bowtie2 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

Extras

For gene predicition, functional annotation or pipeline manager

Docker for reproducibility

The Dockerfile for making a container:

example

  docker build -t my_dockerpipeline my_FOLDER/

for using the docker you have to use a docker share folder

 mkdir $PATH/dockershare
 cd $PATH/dockershare       

Run docker example

  cd $HOME/dockershare
  docker run -it --rm -v $PATH/dockershare:/data  -i /data/"FOLDER WITH SAMPLPES" -o /data/"FOLDER WITH SAMPLES"
  #Check for installed image
  docker images

Also the container is already ready for download from dockerhub

https://hub.docker.com/r/amartinsan/metagenompipeline

  #You can pull de image from dockerhub, REMEMBER to specifty version or tag (1.0 in this case)
  
   docker pull amartinsan/metagenompipeline:1.0

General idea

Quality assesment

Instead of fastqc/multiqc and trimmomatic, the idea is to use fastp to filter low quality reads from pair end data.

Also a CD-HIT-DUP pass can be implemented to futher filter, althought it is very time consuming and not always recommended.

output :
  • Quality proceessed seqs with a "Q_" prefix

  • quality folder with orignal seqs and .json and .html outputs of fastqc quality metrics

Assembly

Mestaspades or megagit assembly, followed by metaQuast ond contigs and scaffolds.

Also a cd-hit-est pass can be done, althought it takes a good ammount of time (not recomended).

output:

  • Directory named : Q_"SAMPLE NAME"_SPADES_ASSEMBLY or_MEHAGIT_ASSEMBLY

It contains the assembly results.

  • Directory named: contigs_Quast_"Q_SAMPLE_NAME"

It contains the metrics of MetaQuast for the assembly.

Binning

Taking the output of the assembly (scaffolds) and uses bowtie2 to generate the index, .sam and .bam files necessary for the binning with metabat.

Also it uses checkm to check the quality of the bins.

output:

All the results are inside the assembly folder.
  • bowtieINDEX folder with bowtie results.

  • scaffolds.sort.bam (and sam) and stats.

  • bins folder with assembled bins.

Gene prediction

Using the contigs.fasta file from the assembly and the program Prodigal.

output:

All the results are inside the assembly folder.
  • prodigal folder with the predicted proteins and genes.

Functional assingment

This it where it gets tricky, depending on the program and database used.

Here we are using mmseq2 with eggNOG emapper. It uses the protein_spades.fasta file from the Prodigal prediction

Other functional assingment strategies

As always, getting the databes is the issue

Blastp against Swiisprot (uniprot)

Download : https://www.uniprot.org/help/downloads

A regular blastp of the obtained protein_spades.fasta of the assembly.

   blastp -query protein_spades.fasta  -db uniprot_sprot.pep  -num_threads $THREADS   -out swissblast.fasta

Diamond alingment with KEGG db

For diamond the chosen database has to be in a reference format

    /diamond makedb --in reference.fastaCHOSEN-DATABASE -d referenceDB
    
    # running a search in blastp mode
    
    ./diamond blastp -d referenceDB -q protein_spades.fasta -o matches.tsv

Hmmer of the obtained fasta

Firs the db has to be ini a profile form (recommended uniprot or interpro)

    hmmaling --amino hmmer_profileDATABASE proteins_spades.fasta 

###############################################################

Support and Finance

This project has been funded by CONACyT in its Basic Science and/or Frontier Science convening.

Modality: Paradigms and Controversies of Science 2022

Project 319234 awarded to Dr. Rosa María Gutierrez Ríos

#################################################################

Este proyecto ha sido financiado por CONACyT en su Convocatoria de Ciencia Básica y/o Ciencia de Frontera.

Modalidad: Paradigmas y Controversias de la Ciencia 2022

Proyecto 319234 otorgado a la Dra. Rosa María Gutierrez Ríos

Releases

No releases published

Packages

No packages published