# Microbiome Abundance Analysis — Gut Microbial Community Profiling 
**Author:** Sahil Bilal  
**Affiliation:** B.Sc. Medical, Cluster University Srinagar  
**Email:** sahiilbilal19@gmail.com  
**Project:** Genus-level gut microbiome exploration


## Overview
I have used a genus-level gut microbiome abundance table (100 samples × 40 genera) to demonstrate a full EDA workflow:
- Data loading and QC
- Alpha diversity (Shannon, Simpson)
- CLR transform and PCA (beta diversity)
- Top taxa visualization and correlation heatmaps


In [1]:
import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sns
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
sns.set(style='whitegrid', context='talk')
%matplotlib inline

In [2]:
counts = pd.read_csv('genus_counts_realstyle.csv', index_col=0)
rel = pd.read_csv('genus_rel_abundance_realstyle.csv', index_col=0)
alpha = pd.read_csv('genus_alpha_metrics.csv', index_col=0)
clr = pd.read_csv('genus_clr_matrix.csv', index_col=0)
pca = pd.read_csv('genus_pca_coords.csv', index_col=0)
print('Counts shape:', counts.shape)
print('Rel shape:', rel.shape)
print('Alpha shape:', alpha.shape)
pca.head()

Counts shape: (100, 40)
Rel shape: (100, 40)
Alpha shape: (100, 3)


Unnamed: 0,PC1,PC2
Sample_001,2.757662,-2.085201
Sample_002,0.330694,-0.484705
Sample_003,-0.691464,1.07038
Sample_004,1.430152,1.674001
Sample_005,2.296618,-0.491002


## Library size distribution

![Library size](library_size_hist.png)

## Alpha diversity — Shannon 

![Shannon](shannon_hist.png)

## Top 15 genera

![Top15](top15_genera.png)

## PCA (CLR-transformed)

![PCA](pca_clr.png)

## Sample correlation heatmap

![Corr](corr_clr.png)

##  Gist  
- Shannon diversity gives a measure of richness/evenness.  
- Top genera often explain much of the variance.
- CLR + PCA helps mitigate compositional bias in microbiome data, enabling more reliable clustering.  
