Code from the Scientific Data manuscript to publish the shotgun metagenomics of allo-HCT patients. We also demonstrate functional study of those data such as antibiotic resistance genes.
- Create the t-SNE plot calculated from Bray-curtris dissimilarity matrix using 16S and plot the samples with metagenome shotgun sequencing.
- Compares read counts in different stool consistency.
- * CAUTION: running this function requires >30min! A saved result is available in 'savedMat' so it can be loaded directly. A parameter named 'calculateTSNE' is set to 1 when the script will recalculate the t-SNE score; else the script will load the saved data.
- Plot and compare the stool composition between 16S and metagenome of the samples from a single patient.
- Compare the relative abundance in different stool consistency of each taxa.
- *CAUTION: This script load the output from kraken2 of patric database and create a .mat file for each sample. This step is time consuming and takes >1 hour. When this is done, the variable 'rewriteShotgunAbundances' can be set to a value != 1, the script will read from a saved csv file.
- *NOTE: This script does not include the (U)nclassified reads from PATRIC output.
- Calculate and plot the correlation between 16S and metagenomic sequencing;
- calculate the alpha diversity using Shannon Index and compare among stool consistency.
- Plot the vanA PCR result in the t-SNE plot (the same as in Figure 1) using the saved .mat file.
- Compare the relative abundance of vanA gene in the PCR(+) and PCR(-) groups. Examine the correlation between vanA and vanB gene in shotgun metagenomes.
- Plot the phylogenetic tree built from Enterococcus isolated from stool of a HCT patient and metagenome assembled genomes from the same samples.
- Make the output cardTbl.csv file.
- Make the output vfdbTbl_2021.csv file.
-
Containing .csv files used for metagenome analysis
-
Containing saved data during figure generation to avoid re-calculation. This direcotry contains the results of t-SNE score calculated from Bray-curtris dissimilarity matrix using 16S, the correlation between 16S and shotgun, and the Shannon Index.
-
Files from previous 16S data paper used in this study for data comparison. The tblASVsampes from our other project with an additional columns, AccessionShotgun, which is the SRA accession for the shotgun fastq files (only available for 395 samples).
-
Directories containing the kraken2, CARD and VFDB output of all shotgun samples by PATRIC. Only the files used for data analysis was included due to the limit of size limitation.
-
Directories with full output of the kraken2 for 10 samples, including kraken2, CARD and VFDB, and a .txt file containing the name of the 10 samples. This folder provides a complete view of the patric output and allows to try data analysis with a small sized data output.
-
To run our scripts smoothly, the following functions are in use:
- - violinplot: Bechtold, Bastian, 2016. Violin Plots for Matlab, Github Project
https://github.com/bastibe/Violinplot-Matlab, DOI: 10.5281/zenodo.4559847
- - beeswarm: https://github.com/ihstevenson/beeswarm
- - distinguishable_colors: https://www.mathworks.com/matlabcentral/fileexchange/29702-generate-maximally-perceptually-distinct-colors
- - brewermap: https://www.mathworks.com/matlabcentral/fileexchange/45208-colorbrewer-attractive-and-distinctive-colormaps
- - bh-tsne: https://lvdmaaten.github.io/tsne/
-
Color legends used for the major taxa in the 16S gene and shotgun metagenome sequencing.