### 🌐 Group-wise Microbiome Co-abundance Network Construction

This script generates **group-specific co-abundance networks** for microbial taxa using Pearson correlation coefficients. Each group is extracted based on the sample name prefix.

#### ✅ Features

- Automatically detects groups from sample name prefixes (e.g., "Ctrl1", "Treat2")
- Computes:
  - Pearson correlation matrix
  - P-value matrix
- Filters edges using a p-value threshold (default: 0.5)
- Outputs for each group:
  - Correlation matrix (`*_correlations.csv`)
  - P-value matrix (`*_pvalues.csv`)
  - Network edge list with weights and p-values (`*_edges.csv`)

#### 📥 Input

- CSV file of relative abundance matrix
- Rows = taxa (e.g., genus)
- Columns = samples (must follow naming convention with group prefix)

#### 📤 Output

For each group, the script will generate:


In [None]:
import os
import pandas as pd
import networkx as nx
from scipy.stats import pearsonr

# File path settings
file_path = r'..relative_abundance.csv'  # Replace with your input file path
output_dir = r'..group results'  # Replace with your output directory path
os.makedirs(output_dir, exist_ok=True)  # Create output folder if it doesn't exist

# Load data
df = pd.read_csv(file_path)
df.set_index(df.columns[0], inplace=True)  # Use first column (e.g. taxa names) as row index
df = df.apply(pd.to_numeric, errors='coerce')  # Ensure all values are numeric

# Extract group labels from sample names (prefix)
samples = df.columns
groups = samples.str.extract(r'([A-Za-z]+)')[0]

# Process data group by group
for group in groups.unique():
    print(f"Processing group: {group}")
    
    group_columns = df.columns[groups == group]
    group_data = df[group_columns]

    # Remove constant rows/columns
    group_data = group_data.loc[group_data.std(axis=1) > 0]  # Remove taxa with zero variance
    group_data = group_data.loc[:, group_data.std(axis=0) > 0]  # Remove samples with zero variance

    if group_data.shape[1] < 2 or group_data.shape[0] < 2:
        print(f"Not enough data to compute correlations for group {group}. Skipping.")
        continue

    # Compute correlation matrix
    correlation_matrix = group_data.T.corr()

    # Compute p-value matrix
    def calculate_pvalues(df):
        pvalues = pd.DataFrame(index=df.index, columns=df.index)
        for i, taxon1 in enumerate(df.index):
            for j, taxon2 in enumerate(df.index):
                if i < j:
                    _, p = pearsonr(df.loc[taxon1], df.loc[taxon2])
                    pvalues.loc[taxon1, taxon2] = p
        return pvalues

    pvalue_matrix = calculate_pvalues(group_data)
    pvalue_matrix = pvalue_matrix.combine_first(pvalue_matrix.T)

    # Save correlation and p-value matrices
    correlation_matrix.to_csv(os.path.join(output_dir, f'{group}_correlations.csv'))
    pvalue_matrix.to_csv(os.path.join(output_dir, f'{group}_pvalues.csv'))
    print(f"Correlation and p-value matrices saved for group {group}.")

    # Build edge list for network (based on p-value threshold)
    edges = []
    for i, taxon1 in enumerate(group_data.index):
        for j, taxon2 in enumerate(group_data.index):
            if i < j:
                r, p = pearsonr(group_data.loc[taxon1], group_data.loc[taxon2])
                if p < 0.5:
                    edges.append((taxon1, taxon2, {'weight': r, 'p_value': p}))

    # Save edges as CSV
    edges_data = [
        {'Node1': u, 'Node2': v, 'Weight': d['weight'], 'P_value': d['p_value']}
        for u, v, d in edges
    ]
    edges_df = pd.DataFrame(edges_data)
    edges_df.to_csv(os.path.join(output_dir, f'{group}_edges.csv'), index=False)
    print(f"Network edges saved for group {group}.")