Genomic insights into the coupling of a Chlorella-like microeukaryote and sulfur bacteria in the chemocline of permanently stratified Lake Cadagno

Saini et al. ISMEJ 2023 https://www.nature.com/articles/s41396-023-01396-y

Introduction

Meromictic Lake Cadagno is situated at 1921 m within Swiss Alps. Lake Cadagno is permanently stratisfied into three zones called 1) mixolimnion (upper oxic), 2) monimolimnion (lower anoxic), and 3) in between oxic-anoxic interface called chemocline which harbours a persistent microbial bloom. This study focussed on eukaryotic microbes, and investigated the microbial bloom using Shotgun DNA sequencing (Illumina HighSeq 4000). Following pipeline was used to obtain near complete genome of Chlorella-like microbial eukaryote from metagenomics dataset.

Microbial Eukaryotes Metagenomics Pipeline

R Code and Data availibility

Raw reads are available at the NCBI under the sequence read archive SUB11916861 and under the accessions SRR21025699, SRR21025700, SRR21025701, and SRR21025702. The GenBank ID for the Chlorella-like MAG is JAOAOU000000000.1. The assembled contigs from raw reads, MAGs, R scriptsincluding files to generate figures for the manuscript and other data were deposited in the Zenodo (https://zenodo.org/record/7505505).

Step by step guide

There are total eight metagenomics samples collected from the water column of Lake Cadagno. Three from mixolimnion (5m, 9m, 11m), four from chemocline (13m, 15mw, 15mm, 15.5m), and one from monimolimnion (17m). The samples were collected from different meter depths of the Lake Cadagno. Chemocline metagenomics samples were primarily used in this study to obtain the near complete genome of Chlorella-like microbial eukaryote.

1-3 Sample collection

For metagenomics sequencing, 20L lake water was collected from the chemocline (oxic-anoxic boundry; 13-15.5 m) of Lake Cadagno. The biomass was captured on 0.2μm filters, and after DNA extractions samples were sent for Shotgun sequencing. Metagenomics sample collection is in parallel to 16S rRNA gene sequencing mentioned in ASM mbio article (ASM mBIO).

4-5 Pre-processing, and assembly of raw reads

Required tools with installation links: Anvio: https://anvio.org/install/ | BBtools https://jgi.doe.gov/data-and-tools/software-tools/bbtools/ | Spades https://github.com/ablab/spades

4.1 Trimming of reads prior to assembly to remove low quality calls

     bbduk.sh in1=read1.fq in2=read2.fq out1=clean1.fq out2=clean2.fq

4.2 Error-correction and normalization of reads

     bbnorm.sh in=<input> out=<reads to keep> outt=<reads to toss> hist=<histogram output>

5.1 Assembly of raw reads using SPAdes

     spades.py -1 R1.fastq -2 R2.fastq -o name_of_output_folder -t 32 --meta

5.2 Contig names were simplified using Anvio with minimum length of 1KB

     anvi-script-reformat-fasta ./spades.contigs.fasta -o ./renamed.contigs.fa --min-len 1000 --simplify-names --report ./Spades_13m/name_conversions.txt

6 Competitive and Non Competitive Binning from Assemblies

6.1 CONCOCT based competitive binning using Anvio

Required tools with installation links: Anvio: https://anvio.org/install/ | Concoct https://anaconda.org/bioconda/concoct | Bowtie2 and Samtools (Already installed with Anvio).

Following is the example of Spades Assembly of 15.5m sample of Lake Cadagno mapped to different samples (5m, 9m, 11m, 13m, 15mw, 15mm, 15.5m, 17m). There are total of eight metagenomics samples collected from the Lake. Four samples are from the chemocline (13m, 15mw, 15mm, 17m).

  bowtie2-build ../Anvio.15_5m.S.contigs.fa   ./Anvio.15_5m.S.contigs
  
  #15.5m Assembly mapped to 5m sample raw reads
  bowtie2 --threads 32 -x ./Anvio.15_5m.S.contigs -1 Sample_119834_R1.fastq \
                                          -2 Sample_119834_R2.fastq \
                                           -S lib_15_5m_mapped_5m_r.sam
  samtools view -F 4 -bS lib_15_5m_mapped_5m_r.sam > lib_15_5m_mapped_5m_r-RAW.bam
  
  #15.5m Assembly mapped to 9m sample raw reads
  bowtie2 --threads 32 -x ./Anvio.15_5m.S.contigs -1 Sample_119835_R1.fastq \
                                          -2 Sample_119835_R2.fastq \
                                           -S lib_15_5m_mapped_9m_r.sam
  samtools view -F 4 -bS lib_15_5m_mapped_9m_r.sam > lib_15_5m_mapped_9m_r-RAW.bam
  
  #15.5m Assembly mapped to 11m sample raw reads
  bowtie2 --threads 32 -x ./Anvio.15_5m.S.contigs -1 Sample_119836_R1.fastq \
                                            -2 Sample_119836_R2.fastq \
                                            -S lib_15_5m_mapped_11m_r.sam
  samtools view -F 4 -bS lib_15_5m_mapped_11m_r.sam > lib_15_5m_mapped_11m_r-RAW.bam
  
  
  #15.5m Assembly mapped to 13m sample raw reads
  bowtie2 --threads 32 -x ./Anvio.15_5m.S.contigs -1 Sample_119837_R1.fastq \
                                            -2 Sample_119837_R2.fastq \
                                           -S lib_15_5m_mapped_13m_r.sam                                 
  samtools view -F 4 -bS lib_15_5m_mapped_13m_r.sam > lib_15_5m_mapped_13m_r-RAW.bam
  
  #15.5m Assembly mapped to 15mw sample raw reads (whole water sample)
  bowtie2 --threads 32 -x ./Anvio.15_5m.S.contigs -1 Sample_119838_R1.fastq \
                                            -2 Sample_119838_R2.fastq \
                                            -S lib_15_5m_mapped_15mw_r.sam
  samtools view -F 4 -bS lib_15_5m_mapped_15mw_r.sam > lib_15_5m_mapped_15mw_r-RAW.bam

  #15.5m Assembly mapped to 15mm sample raw reads 
  bowtie2 --threads 32 -x ./Anvio.15_5m.S.contigs -1 Sample_119839_R1.fastq \
                                                 -2 Sample_119839_R2.fastq \
                                           -S lib_15_5m_mapped_15mm_r.sam
  samtools view -F 4 -bS lib_15_5m_mapped_15mm_r.sam > lib_15_5m_mapped_15mm_r-RAW.bam
  
  #15.5m Assembly mapped to 15.5m sample raw reads 
  bowtie2 --threads 32 -x ./Anvio.15_5m.S.contigs -1 Sample_119840_R1.fastq \
                                            -2  Sample_119840_R2.fastq \
                                            -S lib_15_5m_mapped_15_5m_r.sam
  samtools view -F 4 -bS lib_15_5m_mapped_15_5m_r.sam > lib_15_5m_mapped_15_5m_r-RAW.bam


  #15.5m Assembly mapped to 17m sample raw reads 
  bowtie2 --threads 32 -x ./Anvio.15_5m.S.contigs -1 Sample_119841_R1.fastq \
                                                 -2 Sample_119841_R2.fastq \
                                                 -S lib_15_5m_mapped_17m_r.sam
  samtools view -F 4 -bS lib_15_5m_mapped_17m_r.sam > lib_15_5m_mapped_17m_r-RAW.bam


  #Sorting RAW bam using Anvio
   anvi-init-bam lib_15_5m_mapped_15mw_r-RAW.bam -o lib_15_5m_mapped_15mw_r.bam
   anvi-init-bam lib_15_5m_mapped_15mm_r-RAW.bam -o lib_15_5m_mapped_15mm_r.bam
   anvi-init-bam lib_15_5m_mapped_15_5m_r-RAW.bam -o lib_15_5m_mapped_15_5m_r.bam
   anvi-init-bam lib_15_5m_mapped_17m_r-RAW.bam -o lib_15_5m_mapped_17m_r.bam
   anvi-init-bam lib_15_5m_mapped_9m_r-RAW.bam -o lib_15_5m_mapped_9m_r.bam
   anvi-init-bam lib_15_5m_mapped_5m_r-RAW.bam -o lib_15_5m_mapped_5m_r.bam
   anvi-init-bam lib_15_5m_mapped_13m_r-RAW.bam -o lib_15_5m_mapped_13m_r.bam
   anvi-init-bam lib_15_5m_mapped_11m_r-RAW.bam -o lib_15_5m_mapped_11m_r.bam

  #Generating Anvio profiles using BAM files

   anvi-profile -i lib_15_5m_mapped_11m_r.bam -c ../Anvio.15_5m.S.contigs.db -T 32
   anvi-profile -i lib_15_5m_mapped_13m_r.bam -c ../Anvio.15_5m.S.contigs.db  -T 32
   anvi-profile -i lib_15_5m_mapped_15mm_r.bam -c ../Anvio.15_5m.S.contigs.db -T 32
   anvi-profile -i lib_15_5m_mapped_15mw_r.bam -c ../Anvio.15_5m.S.contigs.db  -T 32
   anvi-profile -i lib_15_5m_mapped_15_5m_r.bam -c ../Anvio.15_5m.S.contigs.db  -T 32
   anvi-profile -i lib_15_5m_mapped_17m_r.bam -c ../Anvio.15_5m.S.contigs.db  -T 32
   anvi-profile -i lib_15_5m_mapped_5m_r.bam -c ../Anvio.15_5m.S.contigs.db  -T 32
   anvi-profile -i lib_15_5m_mapped_9m_r.bam -c ../Anvio.15_5m.S.contigs.db - -T 32

   anvi-merge ./lib_15_5m_mapped_5m_r/PROFILE.db ./lib_15_5m_mapped_9m_r/PROFILE.db ./lib_15_5m_mapped_11m_r/PROFILE.db ./lib_15_5m_mapped_13m_r/PROFILE.db \
      ./lib_15_5m_mapped_15mm_r/PROFILE.db  ./lib_15_5m_mapped_15mw_r/PROFILE.db ./lib_15_5m_mapped_15_5m_r/PROFILE.db \
      ./lib_15_5m_mapped_17m_r/PROFILE.db  -c ../Anvio.15_5m.S.contigs.db  -o ../All_SAMPLES-MERGED_P 

  #Concoct binning inside Anvio using coverage information stored from all the samples. 
  anvi-cluster-contigs -p ./All_SAMPLES-MERGED_P/PROFILE.db  -c ../Anvio.15_5m.S.contigs.db -C Bins_concoct_15_5m --driver concoct --just-do-it -T 32

6.2 CONCOCT based non competitive binning (optional)

Required tools with installation links: Anvio: https://anvio.org/install/ | Concoct | Bowtie2 and Samtools (Already installed with Anvio).

   #Mapping followed by Binning

         #Creating BAM file
         bowtie2-build ./renamed.contigs.fa ./output_file

         bowtie2 --threads 16 -x ./output_file -1 R1.fastq -2 R2.fastq -S output_file.sam

         samtools view -F 4 -bS ./output_file.sam > ./output_file-RAW.bam
   
         #Sorting and indexing BAM files
         anvi-init-bam ./output_file-RAW.bam -o ./output_file.bam
         

   #CONCOCT based non-competitve binning steps

       cut_up_fasta.py ./renamed.contigs.fa -c 10000 -o 0 --merge_last -b renamed.contigs_10k.bed > renamed.contigs_10k.fa

       concoct_coverage_table.py renamed.contigs_10k.bed ./output_file.bam > coverage_table.tsv

       concoct --composition_file renamed.contigs_10k.fa --coverage_file coverage_table.tsv  -b ./output_folder_name/ -t 8

       merge_cutup_clustering.py ./output_folder_name/clustering_gt1000.csv > ./output_folder_name/clustering_merged.csv

       extract_fasta_bins.py ./renamed.contigs.fa ./output_folder_name/clustering_merged.csv --output_path ./output_folder_name/

7-8 Classification, Quality and Abundance Assessment of Eukaryotic and Prokaryotic MAGs

Required tools with installation links: CAT: https://github.com/dutilh/CAT | BUSCO: https://busco.ezlab.org/busco_userguide.html | Phyloflash http://hrgv.github.io/phyloFlash/usage.html | CoverM https://github.com/wwood/CoverM

7.1 CAT BAT Taxonomic Classification

     #Requires high memory: 250GB RAM with 16 cores
     CAT bins -b /folder_containing_bins/ -s .fa -d ../CAT_database.2021-07-24/ -t ../CAT_taxonomy.2021-07-24/ -n 16 --block_size 20 --index_chunks 1

     #Getting taxonomy for each MAG
     CAT add_names -i  out.BAT.bin2classification.txt -o GT_tax.txt -t ../CAT_taxonomy.2021-07-24/ --only_official

8.1 Classification and Quality Assessment of MAGs using BUSCO

     busco --in /folder_containing_bins/ --mode genome --cpu 16 --out busco_output_folder

8.2 CoverM based Relative Abundance assessment of MAGs

    coverm genome --coupled  /Sample_R1.fastq \
                            /Sample_R2.fastq \
                           --genome-fasta-extension .fa --genome-fasta-directory /Coverm_bins/ -m relative_abundance --threads 16 -o Main_MAGs_abundance.tsv

8.3 Overall Eukaryotic Community Composition using 18S

 phyloFlash.pl -lib run01 -read1 reads_F.fq.gz -read2 reads_R.fq.gz -readlength 150

Important: After running busco, we detected the Chlorophyta genome in all four samples of the chemocline (>90% complete). This is the stage you can detect if there are eukaryotic MAGs in your samples

9 Extracting raw reads of Chlorophyta Eukaryote MAG of interest

Required tools with installation links: dRep: https://drep.readthedocs.io/en/latest/installation.html | Bedtools https://anaconda.org/bioconda/bedtools | Bowtie2, Samtools, Spades (Already installed previously).

9.1 dRep MESH clustering of Eukaryotic MAGs detected by BUSCO.

     dRep compare output_directory -g path/to/genomes/*.fasta

9.2 Concatenation of similar eukaryotic genomes

     #Added the sample name to contigs of MAGs before concatenation
     sed ’s/^>/>15m_/g’ finename.fna > Chlorophyta_file_modified.fna 
     
     #Concatenate all Eukaryotic MAGs together
     cat  Chlorophyta_13m_164_meta_mod.fa  Chlorophyta_15mw_87_mod.fa  Chlorophyta_15mm_94_meta_mod.fa  Chlorophyta_15_5m_12_meta_mod.fa > Compiled_Chlorophyta.fa

9.3 Mapping Chlorophyta specific raw reads from 13m, 15mw, 15mm, and 15.5m sample

     bowtie2-build ./Compiled_Chlorophyta.fa ./Compiled_Chlorophyta

     bowtie2 --threads 16 -x ./Compiled_Chlorophyta -1 13m_R1.fastq -2  13m_R2.fastq -S  13m_output_file.sam
     samtools view -F 4 -bS 13m_output_file.sam > 13m_output_file-RAW.sam
     #Perform the same step with other three samples (15mw, 15mm, and 15.5m)

9.4 Extracting Chlorophyta specific raw reads (R1 and R2) from BAM files

     bamToFastq -i output_file-RAW.bam -fq lib_13m_mapped.1.fastq -fq2 lib_13m_mapped.2.fastq

9.5 Assembling Eukaryotic Genome of Interest raw reads (We only used 13m, and 15.5m because of high N50 value).

     #raw reads from two samples was taken
     spades.py -1 lib_13m_mapped.1.fastq -2 lib_13m_mapped.2.fastq -1 lib_15_5m_mapped.1.fastq -2 lib_15_5m_mapped.2.fastq  -o ./Anvio2/Spades_13_15p5_Map_GBII

10 Rebinning and visualization of Chlorophyta Genome Assembly

Required tools with installation links: Anvio | Concoct | Bowtie2, Samtools | BBMap (Already installed with Anvio).

10.1 Anvio based rebinning of Chlorophyta Genome

     #Generate contig database of Chlorophyta Assembly
     anvi-gen-contigs-database -f ./Spades_13_15p5_Map_GBII/Anvio_13_15p5_GBII.contigs.fa -o ./mapping2/contigs_Anvio13m_run2.db -n 'An example contigs database'
     
     #HMMS RUN
     anvi-run-hmms -c ./mapping2/contigs_Anvio13m_run2.db
     
     #Create profile database with each sample
     anvi-profile -i ./mapping2/lib_13m_mapped_F_13_15p5.bam -c ./mapping2/contigs_Anvio13m_run2.db -o ./mapping2/P_An_13_w13_run2 -T 16
     anvi-profile -i ./mapping2/lib_15mw_mapped_F_13_15p5.bam -c ./mapping2/contigs_Anvio13m_run2.db -o ./mapping2/P_An_13_w13_15mw_run2 -T 16
     anvi-profile -i ./mapping2/lib_15_5m_mapped_F_13_15p5.bam -c ./mapping2/contigs_Anvio13m_run2.db -o ./mapping2/P_An_13_15p5_run2 -T 4
     anvi-profile -i ./mapping2/lib_15mm_mapped_F_13_15p5.bam -c ./mapping2/contigs_Anvio13m_run2.db -o ./mapping2/P_An_13_15mm_run2 -T 16
 
     #Merge Anvio Profiles
     anvi-merge ./mapping2/P_An_13_w13_run2/PROFILE.db ./mapping2/P_An_13_w13_15mw_run2/PROFILE.db ./mapping2/P_An_13_15p5_run2/PROFILE.db ./mapping2/P_An_13_15mm_run2/PROFILE.db -o ./mapping2/SAMPLES-MERGED_run2 -c ./mapping2/contigs_Anvio13m_run2.db
             
     #Refine assembly by Anvio interface
     anvi-refine -p ./mapping2/SAMPLES-MERGED_run2/PROFILE.db -c ./mapping2/contigs_Anvio13m_run2.db --server-only -P 8080

     anvi-cluster-contigs  -p ./mapping2/SAMPLES-MERGED_run2/PROFILE.db -c ./mapping2/contigs_Anvio13m_run2.db -C CONCOCT2 --driver CONCOCT -T 8 --just-do-it
     anvi-summarize -p ./mapping2/SAMPLES-MERGED_run2/PROFILE.db -c ./mapping2/contigs_Anvio13m_run2.db -o ./mapping2/SAMPLES-SUMMARY_run23 -C CONCOCT2

10.2 Removal of contigs using command line (optional)

     #filter minimum length 2500
     reformat.sh in=Chlophyta_2ks_nohit_filtered_bin.fa out=Chlophyta_2ks_nohit_filtered_bin2.fa minlength=2500
  
     #Remove contigs with no-hits
     filterbyname.sh in=Chlophyta_2ksfiltered_bin.fa out=Chlophyta_2ks_nohit_filtered_bin.fa names=c_000000000762,c_000000000733,
     c_000000000503,c_000000000488,c_000000000717,c_000000000753,c_000000000834,c_000000000754,c_000000000786,c_000000000711,
     c_000000000784,c_000000000836,c_000000000832,c_000000000779,c_000000000781,c_000000000633,c_000000000726,c_000000000773,
     c_000000000683,c_000000000795,c_000000000725,c_000000000837,c_000000000770,c_000000000621,c_000000000742,c_000000000830,
     c_000000000748,c_000000000755,c_000000000732,c_000000000701,c_000000000330,c_000000000746,c_000000000602,c_000000000661,
     c_000000000769,c_000000000768,c_000000000715,c_000000000819,c_000000000841,c_000000000588,c_000000000644,c_000000000827,
     c_000000000844,c_000000000232,c_000000000824,c_000000000718,c_000000000817,
     c_000000000806,c_000000000314,c_000000000758,c_000000000774,c_000000000838,c_000000000804,c_000000000767,
     c_000000000678,c_000000000820,c_000000000568,c_000000000818 include=f

10.3 Visualization of eukaryotic MAG using Blobtools

Required tools with installation links: Blobtools2 https://blobtoolkit.genomehubs.org/blobtools2/ | Diamond (Install with Anvio installation)

10.3.1 Performing Blastx required for blobtools.

     diamond blastx -query /Refined_Eukaryotic_genome.fa --db ./nr_Final2.dmnd --outfmt 6 qseqid staxids bitscore qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue --sensitive --max-target-seqs 1 --evalue 1e-25 --threads 16 > /home/users/s/saini7/scratch/MS2/Anvio_P/Refined_Eukaryotic_genome_dmnd.blastx.out

10.3.2 Blobtools command line for generating plots

     ~/blobtoolkit/blobtools2/blobtools create --fasta ../Chlophyta_2ks_nohit_filtered_bin/Chlophyta_2ks_nohit_filtered_bin2.fa  --cov ../lib_13_15_5m_mapped_F_13_15p5_filter_nohit.bam --hits ../Chlophyta_2ks_nohit_filtered_bin/Chlophyta_2ks_nohit_filtered_blast.out --taxdump ~/blobtoolkit/taxdump --threads 4 --replace /home/users/s/saini7/scratch/MS2/Anvio2/refined_bin/bin_by_bin/Chlorophyta_1/busco_chlo2k/busco/blob2

11. Organelle (Chloroplast) hunting and Visualization

Required tools with installation links: Blast (Already installed with Anvio) | Seqtk https://github.com/lh3/seqtk | Novoplasty https://github.com/ndierckx/NOVOPlasty | Geseq https://chlorobox.mpimp-golm.mpg.de/geseq.html

11.1 Performing BLAST using available NCBI chloroplast genomes

     #Blast using NCBI Parachlorella kessleri chloroplast genome (NC_012978.1) 
     blastn -db ~/scratch/Spades_15mw2/Anvio.15mw.S.contigs.fa -query NC_012978.1_para_chlorella.fa -outfmt 6 -max_target_seqs 1 > 15mw_PC_chloroplast.out

     #Blast using NCBI Cryptomonas curvata chloroplast genome (KY856939.1) 
     blastn -db ~/scratch/15mm/Spades_15mm/Anvio.15mm.S.contigs.fa -query KY856939.1.fasta -outfmt 6 -max_target_seqs 1 > 15m_GT_chloroplast.out

11.2 Extraction of contigs/prospective chloroplast genomes from main assemblies

     #Contig with promising hit against Parachlorella kessleri chloroplast genome (NC_012978.1) 
     printf "c_000000000152" | seqtk subseq ~/scratch/Spades_15mw2/Anvio.15mw.S.contigs.fa  - > c_000000000152_PC_O_15mw.fa

     #Contig with promising hit against Cryptomonas curvata chloroplast genome (KY856939.1) 
     printf "c_000000000134" | seqtk subseq ~/scratch/15mm/Spades_15mm/Anvio.15mm.S.contigs.fa  - > c_000000000134_GT_15mm.fa

11.3 Circularization of Chloroplast genomes by NOVOPlasty (Template)

     #Example script
     Project:
     -----------------------
     Project name          = CC_15_5m
     Type                  = chloro
     Genome Range          = 80000-200000
     K-mer                 = 33
     Max memory            =
     Extended log          = 0
     Save assembled reads  = no
     Seed Input            = ./c_000000000134_GT_15mm.fa
     Extend seed directly  = yes
     Reference sequence    = ./KY856939.1.fasta 
     Variance detection    =
     Chloroplast sequence  =

     Dataset 1:
     -----------------------
     Read Length           = 151
     Insert size           = 300
     Platform              = illumina
     Single/Paired         = PE
     Combined reads        =
     Forward reads         = /R1.fastq
     Reverse reads         = /R2.fastq
     Store Hash            =

11.4 Visualization of Circularized Chloroplast Genomes using online GeSeq Platform
```
     https://chlorobox.mpimp-golm.mpg.de/geseq.html
```

12. Proteome Prediction and functional annotation of Microbial Eukaryotic Genomes

Required tools with installation links: EukMetaSanity https://github.com/cjneely10/EukMetaSanity | Orthologer https://orthologer.ezlab.org/ | KEGG https://www.kegg.jp/ghostkoala/ | Anvio | Eggnog https://github.com/eggnogdb/eggnog-mapper

12.1 EukMetaSanity was used for gene prediction

      yapim run -i directory_containing_genome -c run-config.yaml -p $EukMS_run -o name_of_out_put_directory #(needs to make the output directory manually prior to running code)

12.2 Orthologer annotations of Chlorella-like MAG:

       docker run -u $(id -u) -v $(pwd)/odb:/odbwork ezlabgva/orthologer:v3.0.2 orthomapper -c run -p LC_MAG -f input/Chlorella-like_MAG.faa -n 3041

12.3 EggNOG annotations of Chlorella genomes:

     python ../emapper.py -m diamond --sensmode more-sensitive -i GCA_002245835.2_unplaced.scaf.1.Tier.faa --decorate_gff GCA_002245835.2_unplaced.scaf.1.Tier.gff3 -o GCA_002245835.2_unplaced.scaf.1.Tier --cpu 32 --target_taxa 3041

#Optionally

12.4 Mapping of protein coding gene sequences (.faa) from EukMetaSanity to KEGG Pathways using GhostKoala Online
```
      https://www.kegg.jp/ghostkoala/
```

OR

12.5 Anvio Metabolism Prediction

      anvi-setup-kegg-kofams --reset

      anvi-gen-contigs-database -f input.fa -o input.db -n 'An example contigs database'

      anvi-run-hmms -c input.db -T 4

      anvi-run-kegg-kofams -c input.db -T 16 --just-do-it
           
      anvi-estimate-metabolism -c input.db  -p ./All_SAMPLES-MERGED_P/PROFILE.db  --add-coverage -O kegg_out

      anvi-interactive -c ./contigs_197.db -p ./All_SAMPLES-MERGED_P/PROFILE.db --server-only -P 8008

13. Phylogenomic analyses of Chloroplasts and Microbial Eukaryote Genomes

Required tools with installation links:

13.1 Chloroplasts

     #Concatenation of 18 marker genes were aligned using MAFFT and ambigious sequences were removed from Gblocks.List of Marker genes ATP synthase (atpA, atpB, atpC), large ribosomal subunits (rpl2, rpl5, rpl12, rpl14, rpl19, rpl23) and small ribosomal subunits (rps3, rps8, rps9, rps19), photosystem I (psaC) and photosystem II (psbA, psbB, psbE, psbH).

     https://mafft.cbrc.jp/alignment/software/

13.2 Microbial Eukaryote Nuclear Genome

     #BUSCO (v5.2.2) was run in genome mode (BUSCO_Metaeuk workflow) on each genome assembly using the chlorophyta_odb10 dataset (1,519 markers). The identified single-copy genes that were shared across 100% of the species (with no duplicates across all species) were extracted. For each orthologous group, proteins were aligned using MAFFT (v7.505) 89 and trimmed using trimAl (v1.4 rev15) 90. The single alignments were concatenated with AMAS (v1.0) 91, and the resulting super-alignment was used to infer a maximum likelihood phylogeny with IQ-TREE (v2.1.2) 92
     
     [selected_markers_with_links_and_description.txt]    (https://github.com/JSSaini/Eukaryotic_Metagenomic_Lake_Cadagno/files/8986487/selected_markers_with_links_and_description.txt)

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
Microbial_Eukaryote_M_Github.png		Microbial_Eukaryote_M_Github.png
README.md		README.md
R_script_03_03_2023_euk_metagenomics.R		R_script_03_03_2023_euk_metagenomics.R
R_script_05_01_2023_euk_metagenomics.R		R_script_05_01_2023_euk_metagenomics.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microbial_Eukaryote_M_Github.png

Microbial_Eukaryote_M_Github.png

README.md

README.md

R_script_03_03_2023_euk_metagenomics.R

R_script_03_03_2023_euk_metagenomics.R

R_script_05_01_2023_euk_metagenomics.R

R_script_05_01_2023_euk_metagenomics.R

Repository files navigation

Genomic insights into the coupling of a Chlorella-like microeukaryote and sulfur bacteria in the chemocline of permanently stratified Lake Cadagno

Introduction

Microbial Eukaryotes Metagenomics Pipeline

R Code and Data availibility

Step by step guide

1-3 Sample collection

4-5 Pre-processing, and assembly of raw reads

6 Competitive and Non Competitive Binning from Assemblies

7-8 Classification, Quality and Abundance Assessment of Eukaryotic and Prokaryotic MAGs

9 Extracting raw reads of Chlorophyta Eukaryote MAG of interest

10 Rebinning and visualization of Chlorophyta Genome Assembly

11. Organelle (Chloroplast) hunting and Visualization

12. Proteome Prediction and functional annotation of Microbial Eukaryotic Genomes

13. Phylogenomic analyses of Chloroplasts and Microbial Eukaryote Genomes

About

Releases

Packages

Languages

JSSaini/Pipeline_For_Lake_Cadagno_Eukaryotic_Metagenomics

Folders and files

Latest commit

History

Repository files navigation

Genomic insights into the coupling of a Chlorella-like microeukaryote and sulfur bacteria in the chemocline of permanently stratified Lake Cadagno

Introduction

Microbial Eukaryotes Metagenomics Pipeline

R Code and Data availibility

Step by step guide

1-3 Sample collection

4-5 Pre-processing, and assembly of raw reads

6 Competitive and Non Competitive Binning from Assemblies

7-8 Classification, Quality and Abundance Assessment of Eukaryotic and Prokaryotic MAGs

9 Extracting raw reads of Chlorophyta Eukaryote MAG of interest

10 Rebinning and visualization of Chlorophyta Genome Assembly

11. Organelle (Chloroplast) hunting and Visualization

12. Proteome Prediction and functional annotation of Microbial Eukaryotic Genomes

13. Phylogenomic analyses of Chloroplasts and Microbial Eukaryote Genomes

About

Resources

Stars

Watchers

Forks

Languages