### 🧬 Non-Zero Abundance Counts per Group

This script analyzes an OTU (or taxonomic) abundance table and counts how many samples in each experimental group have **non-zero abundance** for each taxon (e.g., genus).

#### ✅ Functionality

- Reads a taxonomic relative abundance table (CSV format).
- Extracts **group names** from sample column prefixes (e.g., "Control1", "Control2" → "Control").
- For each taxon, counts how many samples in each group show **non-zero** abundance.
- Appends a final row with the total number of samples in each group.

#### 🛠️ Requirements

- Input CSV file: rows = taxa (e.g., genera), columns = sample names.
- Sample names must begin with a group prefix (e.g., `Group1_sample1`).

#### 📤 Output

- A table where rows = taxa, columns = groups
- Each cell = number of samples in that group with non-zero relative abundance
- One final row `"Total Samples"` reports group sizes

#### 📁 Example output file

In [None]:
import pandas as pd

# Load OTU table (taxa abundance matrix)
file_path = r'..abundance.csv'  # Replace with your actual file path
otu_df = pd.read_csv(file_path)

# Set the taxonomy column (e.g. genus names) as index
otu_df.set_index(otu_df.columns[0], inplace=True)

# Extract sample names and determine group labels (based on prefix)
samples = otu_df.columns
groups = samples.str.extract(r'^([A-Za-z]+)')[0]  # Extract the alphabetical prefix as group label

# Define function: count how many samples in each group have non-zero abundance per taxon
def count_nonzeros_by_group(data, groups):
    result = pd.DataFrame(index=data.index)  # Store per-group counts per taxon
    group_sample_counts = {}  # Keep track of how many samples in each group

    for group in groups.unique():
        group_columns = data.columns[groups == group]
        result[group] = (data[group_columns] > 0).sum(axis=1)  # Count non-zero samples
        group_sample_counts[group] = len(group_columns)  # Record total number of samples per group

    # Append a row showing how many total samples per group
    result.loc["Total Samples"] = pd.Series(group_sample_counts)
    return result

# Apply function to compute per-group nonzero counts
nonzero_counts = count_nonzeros_by_group(otu_df, groups)

# Print table
print("Non-zero abundance count table:")
print(nonzero_counts)

# Save result to file
output_file = r'..OTUTaxonAnalysis_Genus_MW_nonzero_counts_by_group.csv'  # Replace with output path
nonzero_counts.to_csv(output_file)

print(f"Count table saved to: {output_file}")



### 🧬 Zero-Inflated Log-Normal (ZILN) Imputation for OTU Tables

This module imputes zero values in a relative abundance OTU table using the **Zero-Inflated Log-Normal (ZILN)** model. It fits a log-normal distribution to the non-zero values for each taxon *within each group*, and replaces zeros with random values drawn from the fitted distribution.

#### ✅ Functionality

- Reads a relative abundance table (rows = taxa, columns = samples)
- Sample names should contain a group prefix (e.g., "Control1", "Test2")
- For each taxon:
  - If all values are zero in all groups → exclude
  - Else → fit log-normal to non-zero values and impute missing (zero) values
- Removes taxa that cannot be fitted or have all-zero abundance
- Saves a cleaned, filled OTU table

#### 📥 Input
- A `.csv` file of relative abundance data (e.g., genus-level)
- Sample columns must be grouped by consistent name prefixes

#### 📤 Output
- A filled `.csv` file with zeros replaced (`*_filled_otu_table.csv`)
- A list of removed taxa (`*_failed_taxa.csv`)

#### 🧪 Use Case
Use this step before network analysis or dimensionality reduction, to avoid sparsity artifacts while preserving group-specific abundance patterns.


In [22]:
import numpy as np
import pandas as pd
from scipy.stats import lognorm

# Load OTU relative abundance data
file_path = r'..relative_abundance.csv'  # Replace with your file path
otu_df = pd.read_csv(file_path)

# Set taxonomy column (e.g., genus names) as row index
otu_df.set_index(otu_df.columns[0], inplace=True)

# Extract sample names and determine group labels (based on sample prefix)
samples = otu_df.columns
groups = samples.str.extract(r'^([A-Za-z]+)')[0]  # Extract group prefix from sample names

# Define function: Impute zero values per group using Zero-Inflated Log-Normal (ZILN) model
def ziln_fill(data, groups):
    filled_data = pd.DataFrame(index=data.index, columns=data.columns)
    failed_taxa = []       # Taxa that failed log-normal fitting
    all_zero_taxa = []     # Taxa that have zero abundance in all samples

    for group in groups.unique():
        group_columns = data.columns[groups == group]
        group_data = data[group_columns]

        for taxon in group_data.index:
            row = group_data.loc[taxon]

            # Skip taxa with all-zero abundance
            if row.sum() == 0:
                all_zero_taxa.append(taxon)
                continue

            non_zero_values = row[row > 0]

            if len(non_zero_values) > 0:
                try:
                    # Fit log-normal distribution to non-zero values
                    shape, loc, scale = lognorm.fit(non_zero_values, floc=0)

                    # Generate replacement values for zero entries
                    zero_indices = row[row == 0].index
                    num_zeros = len(zero_indices)

                    if num_zeros > 0:
                        random_values = lognorm.rvs(shape, loc=loc, scale=scale, size=num_zeros)
                        row[zero_indices] = random_values

                except ValueError:
                    print(f"Failed to fit lognormal for {taxon} in group {group}. Adding to failed list.")
                    failed_taxa.append(taxon)
                    continue

            # Save the filled row
            filled_data.loc[taxon, group_columns] = row

    # Remove taxa that failed to fill or have all-zero abundance
    taxa_to_remove = list(set(all_zero_taxa + failed_taxa))
    filled_data = filled_data.drop(index=taxa_to_remove, errors='ignore')

    # Save removed taxa list
    failed_output_file = r'..failed_taxa.csv'
    pd.DataFrame(taxa_to_remove, columns=["Removed Taxa"]).to_csv(failed_output_file, index=False)
    print(f"Taxa that were removed (failed fit or all-zero) saved to: {failed_output_file}")

    return filled_data

# Apply the ZILN imputation function
otu_filled = ziln_fill(otu_df, groups)

# Print and save the result
print("ZILN-filled OTU table:")
print(otu_filled)

output_file = r'..filled_otu_table.csv'
otu_filled.to_csv(output_file)

print(f"Filled OTU table saved to: {output_file}")


Failed to fit lognormal for g__unclassified_o__Subgroup_2 in group HNG. Adding to failed list.
Failed to fit lognormal for g__Paludibaculum in group HNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_o__Elev-16S-573 in group HNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_c__Subgroup_20 in group HNG. Adding to failed list.
Failed to fit lognormal for g__Smaragdicoccus in group HNG. Adding to failed list.
Failed to fit lognormal for g__Williamsia in group HNG. Adding to failed list.
Failed to fit lognormal for g__Angustibacter in group HNG. Adding to failed list.
Failed to fit lognormal for g__Oerskovia in group HNG. Adding to failed list.
Failed to fit lognormal for g__Pseudactinotalea in group HNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_f__Intrasporangiaceae in group HNG. Adding to failed list.
Failed to fit lognormal for g__Amnibacterium in group HNG. Adding to failed list.
Failed to fit lognormal for g__Curtoba

Failed to fit lognormal for g__unclassified_c__Omnitrophia in group LCG. Adding to failed list.
Failed to fit lognormal for g__Candidatus_Xiphinematobacter in group LCG. Adding to failed list.
Failed to fit lognormal for g__Opitutus in group LCG. Adding to failed list.
Failed to fit lognormal for g__Coraliomargarita in group LCG. Adding to failed list.
Failed to fit lognormal for g__ADurb.Bin063-1 in group LCG. Adding to failed list.
Failed to fit lognormal for g__Ellin516 in group LCG. Adding to failed list.
Failed to fit lognormal for g__Roseibacillus in group LCG. Adding to failed list.
Failed to fit lognormal for g__unclassified_p__WPS-2 in group LCG. Adding to failed list.
Failed to fit lognormal for g__unclassified_p__WS4 in group LCG. Adding to failed list.
Failed to fit lognormal for g__unclassified_o__Elev-16S-573 in group HCG. Adding to failed list.
Failed to fit lognormal for g__unclassified_c__Blastocatellia in group HCG. Adding to failed list.
Failed to fit lognormal for g

Failed to fit lognormal for g__unclassified_o__Rhodospirillales in group HCG. Adding to failed list.
Failed to fit lognormal for g__Rickettsia in group HCG. Adding to failed list.
Failed to fit lognormal for g__Croceicoccus in group HCG. Adding to failed list.
Failed to fit lognormal for g__unclassified_o__Zavarziniales in group HCG. Adding to failed list.
Failed to fit lognormal for g__Verticiella in group HCG. Adding to failed list.
Failed to fit lognormal for g__Burkholderia-Caballeronia-Paraburkholderia in group HCG. Adding to failed list.
Failed to fit lognormal for g__Hydrogenophaga in group HCG. Adding to failed list.
Failed to fit lognormal for g__Pelomonas in group HCG. Adding to failed list.
Failed to fit lognormal for g__Rubrivivax in group HCG. Adding to failed list.
Failed to fit lognormal for g__Variovorax in group HCG. Adding to failed list.
Failed to fit lognormal for g__OM43_clade in group HCG. Adding to failed list.
Failed to fit lognormal for g__unclassified_f__Neiss

Failed to fit lognormal for g__Roseomonas in group LNG. Adding to failed list.
Failed to fit lognormal for g__Hirschia in group LNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_f__Holosporaceae in group LNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_f__Fodinicurvataceae in group LNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_o__Micavibrionales in group LNG. Adding to failed list.
Failed to fit lognormal for g__Methylorosula in group LNG. Adding to failed list.
Failed to fit lognormal for g__Hyphomicrobium in group LNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_f__Hyphomicrobiaceae in group LNG. Adding to failed list.
Failed to fit lognormal for g__Aureimonas in group LNG. Adding to failed list.
Failed to fit lognormal for g__Pseudaminobacter in group LNG. Adding to failed list.
Failed to fit lognormal for g__Bradyrhizobium in group LNG. Adding to failed list.
Failed to fit lognormal for g_

Failed to fit lognormal for g__unclassified_o__Entomoplasmatales in group MNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_f__Paenibacillaceae in group MNG. Adding to failed list.
Failed to fit lognormal for g__Laceyella in group MNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_f__Thermoactinomycetaceae in group MNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_f__Christensenellaceae in group MNG. Adding to failed list.
Failed to fit lognormal for g__Clostridium_sensu_stricto_2 in group MNG. Adding to failed list.
Failed to fit lognormal for g__Clostridium_sensu_stricto_8 in group MNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_f__Clostridiaceae in group MNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_f__Lachnospiraceae in group MNG. Adding to failed list.
Failed to fit lognormal for g__unclassified_o__Lachnospirales in group MNG. Adding to failed list.
Failed to fit logn