PLEASE data analysis by Eric Z Chen
This is a repo for my paper (Inflammation, Antibiotics, and Diet as Environmental Stressors of the Gut Microbiome in Pediatric Crohn’s Disease. Cell Host & Microbe. 2015). This repo includes the preprocessed data used in this paper.
We called the data used in this paper as PLEASE data, which are shotgun metagenomic sequencing data. For details, please refer to my paper.
Raw fastq files
All the fastq files are saved in the folder
1_COMBO_PLEASE on the server. This folder contains four sub-folders:
1_Raw_Fastq: the raw fastq files.
2_Quality_Filter: the fastq files after removing the low quality reads.
3_Remove_Human_DNA: the fastq files after removing the low quality reads and human reads.
Clinical data and sample information
- The raw clinical data are in the folder
- The processed clinical data and sample information are in the folder
Bacterial abundance by MetaPhlAn
The MetaPhlAn outputs for COMBO samples and PLEASE samples are in the following folders. The "unclassfied" taxa were removed and the total relative abundance in each sample were normalized to be one. "P","F","G","S" at the beginning of each file indicate taxonomic levels "phylum", "family", "genus", "species".
Phylogenetic trees by PhyloPhlAn
I used the Phylogenetic trees by PhyloPhlAn to calculate the phylogenetic diversity. However, the trees from PhyloPhlAn can not be directly used (it caused some errors) and I processed the tree files. "P","F","G","S" indicate taxonomic levels "phylum", "family", "genus", "species".
Fastq data processing
- The raw fastq data were processed by FASTX to remove low quality reads.
- Then the processed reads were aligned to human genome to remove human reads by Deconseq. The code is in
Bacterial abundance quantification by MetaPhlAn
- The processed reads were used as input for MetaPhlAn. Since MetaPhlAn can only take single end reads as input and our data are pair-end reads, the paired reads (R1 and R2) were provided as two single end reads to MetaPhlAn. I also tested MetaPhlAn on the first read pair (R1) only and the results are quite similar to the previous ones.