This repository contains scripts for processing data for Volkova & Ruggles paper:
"Metagenomic Analysis of Autoimmune Disease Identifies Robust Autoimmunity and Disease Specific Microbial Signatures".
Brief description of each of the scripts:
barplots_genus_16s_for_fig2.R makes bar plots for each of the 16S studies on genus level.
barplots_genus_meta_for_fig2.R makes bar plots for each of the metagenomics studies on genus level.
barplots_info_16S_for_fig2.R makes horizontal bars and bars of the number of healthy and disease samples in each of the 16S studies.
barplots_info_meta_for_fig2.R makes horizontal bars and bars of the number of healthy and disease samples in each of the metagenomics studies.
compute_f1.R computes F1 macro score.
correlate_w_metabolomics.R performs spearman correlation between iHMP metagenomics and metaproteomics data for bacteria found to be predictive of disease.
make_lollipop_plots.R makes lollipop and logfc plots of ranked features.
make_venn_diagrams.R makes 3 Venn diagrams for IBD, MS and RA and calculates how many features overlap in models that contain one of the above diseases.
merge_mapping_w_abundance.R merges mapping file with the abundance tables and converts absolute abundance to relative abundance.
metagenomics_paired.sh processes paired read metagenomics data with the accession numbers being supplied as a txt file.
metagenomics_single.sh processes single read metagenomics data with the accession numbers being supplied as a txt file.
ml_functions.R contains machine learning functions for running random forest, xgboost, svm with rfe and ridge regression.
ml_performance_heatmap.R makes a heatmap of AUCs and F1 scores.
pcoa_16s.R creates PCOA plots from 16S relative abundance tables.
pcoa_meta.R creates PCoA plots from metagenomics relative abundance tables.
piecharts_excluded_studies.R makes pie charts to show how the studies were excluded based on different criteria.
prepare_for_ml_disease_vs_disease.R prepares a table with metadata and microbial abundance for machine learning for disease vs disease models.
prepare_for_ml_disease_vs_healthy.R prepares a table with metadata and microbial abundance for machine learning.
qiime2_454.sh processes 454 fastq files paths to which are provided in a manifest file.
qiime2_paired.sh processes paired-read Illumina fastq files paths to which are provided in a manifest file.
qiime2_single.sh processes single-read Illumina and Ion Torrent fastq files paths to which are provided in a manifest file.
qiime2_train_taxonomy.sh trains a taxonomy classifier on whole 16s rRNA sequences from GreenGenes 13_8 database.
rank_features.R ranks the features by mean importance from all four models.
run_ml_disease_vs_disease.R runs ml on prepared tables from prepare_for_ml_disease_vs_disease.R.
run_ml_disease_vs_healthy.R run ml on prepared tables from prepare_for_ml_disease_vs_healthy.R.
visualize_metabolomics_correlation.R plots a heatmap and metabolites correlated with individual bacteria colored by p values.