Mini-Metagenomic_Analyses

This repository contains the original code used to produce the results in the following publication: http://biorxiv.org/content/early/2017/03/07/114496

Disclaimer:

All scripts are provided for reference purposes only. There may be file paths hard coded into the scripts such that they will not work on computing clusters set up differently. The bioinformatic pipeline has undergone significant changes as a result of

including newer versions of bioinformatic tools,
incorporating new functionalities, and
taking advantage of more powerful compution infrastructure at Stanford.

As a result, the older version of the bioinformatics pipeline is no longer maintained and is provided as is in order for the reader to get a general idea of the bioinformatic processes.

Contig Assembly Fram Raw Sequencing Reads

Required tools

Trimmomatic-0.30

Fastqc

SPAdes-3.5.0

Quast-2.3

Dnaclust

Fastx

Blast-2.2.30

Bowtie2-2.1.0

Samtools-0.1.19

Snakemake

Process Flow

Prepare a seed files including location of all sub-sample fastq files

python generate_snakemake_seed.py biosample_directory output_file_name read_threshold

Fill in the parameters file. A sample is provided

parameters.txt

Run Snakefile_toplevel1.py to produce sub-sample contigs and combined corrected reads. The following command submits at most 20 jobs at a time to a cluster managed by Slurm. root_folder is specified as the biosample directory.

snakemake -j 20 -w 600 -k --config location=$root_folder --cluster "sbatch --job-name={params.name} --ntasks=1 --cpus-per-task={threads} --partition={params.partition} --mem={params.mem} -o {params.name}_%j.log" --rerun-incomplete -s Snakefile_toplevel1.py

Perform joint assembly using SPAdes on the large memory node separately. Then, continue with the second part of the analysis including aligning sub-sample reads back to combined contigs

snakemake -j 20 -w 600 -k --config location=$root_folder --cluster "sbatch --job-name={params.name} --ntasks=1 --cpus-per-task={threads} --partition={params.partition} --mem={params.mem} -o {params.name}_%j.log" --rerun-incomplete -s Snakefile_toplevel2.py

The final output files are

super_contigs.[biosample_id].fasta
super_contigs.[biosample_id].alignment_report.txt

Contig Analysis and Plotting

After assembly and alignment, all contig analysis and plotting are done using MATLAB. First, alignment based contig occurrence and p values extracted from Fisher's Exact Test are computed using

snakemake_result_analysis_V2.m

tSNE plots, functional annotation, SNP, and abundance analyses are carried out with MATLAB scripts in following file

cluster_supercontigs.m

A folder containing helper functions to explore contigs and genomes is also included

Questions and Comments

For questions or comments, please contact Brian Yu

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
matlab analysis code		matlab analysis code
LICENSE		LICENSE
ProcessClusteredFastq.py		ProcessClusteredFastq.py
README.md		README.md
Snakefile.utils_Felix		Snakefile.utils_Felix
Snakefile.utils_Mark		Snakefile.utils_Mark
Snakefile_biosample_assembly.py		Snakefile_biosample_assembly.py
Snakefile_combined_analysis.py		Snakefile_combined_analysis.py
Snakefile_helper_Brian.py		Snakefile_helper_Brian.py
Snakefile_import.py		Snakefile_import.py
Snakefile_subsample_assembly.py		Snakefile_subsample_assembly.py
Snakefile_superContigAnalysis.py		Snakefile_superContigAnalysis.py
Snakefile_toplevel1.py		Snakefile_toplevel1.py
Snakefile_toplevel2.py		Snakefile_toplevel2.py
cluster_supercontigs.m		cluster_supercontigs.m
copy_biosample_spade_results.sh		copy_biosample_spade_results.sh
generate_snakemake_seed.py		generate_snakemake_seed.py
randomizeFastq.py		randomizeFastq.py
snakehelper_blast.sh		snakehelper_blast.sh
snakehelper_bowtie2SampleSupercontig.sh		snakehelper_bowtie2SampleSupercontig.sh
snakehelper_bowtie2align2contig.sh		snakehelper_bowtie2align2contig.sh
snakehelper_clusterFastqPair.sh		snakehelper_clusterFastqPair.sh
snakehelper_combine_fastq.sh		snakehelper_combine_fastq.sh
snakehelper_combine_subsample_contigs.py		snakehelper_combine_subsample_contigs.py
snakehelper_construct_superContigs.py		snakehelper_construct_superContigs.py
snakehelper_contig_similarity.py		snakehelper_contig_similarity.py
snakehelper_fastqc.sh		snakehelper_fastqc.sh
snakehelper_localblast.sh		snakehelper_localblast.sh
snakehelper_quast.sh		snakehelper_quast.sh
snakehelper_spadeAssembly.sh		snakehelper_spadeAssembly.sh
snakehelper_subsample_topspecies.py		snakehelper_subsample_topspecies.py
snakehelper_trimmingWithTrimmomatic.sh		snakehelper_trimmingWithTrimmomatic.sh
snakemake_result_analysis_V2.m		snakemake_result_analysis_V2.m
threshold_scaffolds.py		threshold_scaffolds.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini-Metagenomic_Analyses

Disclaimer:

Contig Assembly Fram Raw Sequencing Reads

Required tools

Process Flow

Contig Analysis and Plotting

Questions and Comments

About

Releases

Packages

Languages

License

brianyu2010/Mini-Metagenomic_Analyses

Folders and files

Latest commit

History

Repository files navigation

Mini-Metagenomic_Analyses

Disclaimer:

Contig Assembly Fram Raw Sequencing Reads

Required tools

Process Flow

Contig Analysis and Plotting

Questions and Comments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages