# BioRemPP: Technical Demonstration of Command-Line Interface

## Comprehensive Analysis Framework for Bioremediation Potential Assessment via CLI

---

### Introduction

The **Bioremediation Potential Profile (BioRemPP)** represents a state-of-the-art computational framework designed for systematic analysis of biotechnological potential in microbial, fungal, and plant for environmental remediation applications. This notebook provides comprehensive coverage of the command-line interface (CLI) functionality, showcasing the integration of multiple specialized databases and analytical workflows.

### Scientific Context

Environmental contamination poses significant challenges to ecosystem health and human welfare. Bioremediation — the use of biological systems to remove or neutralize pollutants has emerged as a sustainable and cost-effective approach for environmental restoration. However, the identification and characterization of organisms with bioremediation potential requires computational tools capable of integrating diverse biological databases and analytical methodologies.

BioRemPP addresses this need by providing:

1. **Multi-Database Integration**: Seamless access to BioRemPP core database, KEGG (Kyoto Encyclopedia of Genes and Genomes), HADEG (Heavy metal-Associated Domain-containing protein Ecological Gene database), and ToxCSM (Toxicity prediction using Chemical Structure Mining)

2. **Standardized Analytical Workflows**: Consistent processing pipelines with error handling and comprehensive validation

3. **Scalable Architecture**: Optimized for both individual sequence analysis and large-scale datasets

4. **CLI Interface**: Command-line tools designed for integration into bioinformatics pipelines and automated workflows

### Methodology Overview

This demonstration follows established protocols and showcases:
- Installation and environment setup procedures
- Database accessibility and information retrieval
- Single and multi-database analytical workflows
- Output interpretation and downstream analysis considerations

## Citation

When using the BioRemPP API in academic research, please cite:

```
BioRemPP: Bioremediation Potential Profile — a computational framework
for bioremediation analysis. Version {biorempp.__version__}
```


---

## 1. Installation and Environment Setup

### 1.1 Package Installation

BioRemPP is distributed through the Python Package Index (PyPI) and can be installed using standard Python package management tools. For this demonstration, we utilize the test PyPI repository to access the latest development version.

In [1]:
# Install BioRemPP from test PyPI repository
# The --extra-index-url ensures access to all required dependencies
!pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ biorempp==0.7.0.post1.dev2

Looking in indexes: https://test.pypi.org/simple/, https://pypi.org/simple/


### 1.2 Installation Verification

Following installation, we verify the package accessibility and version information to ensure proper setup.

In [2]:
# Verify BioRemPP installation and display version information
import biorempp
print(f"BioRemPP version: {biorempp.__version__}")
print(f"Installation path: {biorempp.__file__}")
print("✅ Installation verified successfully")

BioRemPP version: 0.7.0.post1.dev2
Installation path: /usr/local/lib/python3.11/dist-packages/biorempp/__init__.py
✅ Installation verified successfully


### 1.3 Environment Configuration

The CLI interface can be accessed through the `biorempp` command. We begin by examining the help system to understand available functionality.

---

## 2. Command-Line Interface Overview

### 2.1 Primary Help System

The BioRemPP CLI implements a comprehensive help system providing detailed information about available commands, parameters, and usage patterns.

In [3]:
# Display primary help information
# This provides an overview of all available commands and global options
!biorempp --help

usage: biorempp [-h] [--input INPUT] [--output-dir OUTPUT_DIR]
                [--all-databases] [--database {biorempp,hadeg,kegg,toxcsm}]
                [--list-databases]
                [--database-info {biorempp,hadeg,kegg,toxcsm}]
                [--quiet | --verbose | --debug]

BioRemPP: Bioremediation Potential Profile

options:
  -h, --help            show this help message and exit
  --input INPUT         Path to the input biological data file
  --output-dir OUTPUT_DIR
                        Directory for output files (default:
                        outputs/results_tables)

Database Options:
  --all-databases       Merge input with ALL databases (biorempp, hadeg, kegg,
                        toxcsm)
  --database {biorempp,hadeg,kegg,toxcsm}
                        Merge with specific database only

Information Commands:
  --list-databases      List all available databases
  --database-info {biorempp,hadeg,kegg,toxcsm}
                        Show detailed information abou

### 2.2 Command Structure Analysis

The CLI follows a hierarchical command structure with the following primary operations:

- **`info`**: Database information and system status
- **`list-databases`**: Available database enumeration
- **`merge`**: Core analytical functionality for sequence-database matching

Each command implements specialized parameters and validation procedures appropriate for bioinformatics workflows.

---

## 3. Database Information and Discovery

### 3.1 Available Database Enumeration

BioRemPP integrates multiple specialized databases for comprehensive bioremediation analysis. The `list-databases` command provides systematic enumeration of available resources.

In [5]:
# Enumerate all available databases
# This command provides essential metadata for experimental design
!biorempp --list-databases


[DATABASES] Available Databases:

[DB] BIOREMPP
   Name: BioRemPP Core Database
   Description: Bioremediation Potential Profile Database (6,623 records)
   File: database_biorempp.csv (0.69 MB)

[DB] HADEG
   Name: HADEG Database
   Description: Hydrocarbon Aerobic Degradation Enzymes and Genes (1,168 records)
   File: database_hadeg.csv (0.04 MB)

[DB] KEGG
   Name: KEGG Pathways
   Description: 20 KEGG for xenobiotic biodegradation pathways (871 records)
   File: kegg_degradation_pathways.csv (0.02 MB)

[DB] TOXCSM
   Name: ToxCSM Database
   Description: Comprehensive Prediction of Small Molecule Toxicity Profiles (323 records, 66 endpoints)
   File: database_toxcsm.csv (0.18 MB)

[SAMPLE] Example Input Data:
   File: sample_data.txt (0.18 MB)
   Content: 10 organisms with 23,663 KO identifiers
   Format: Organism headers (>) and KO entries

[USAGE] Usage Examples:
   biorempp --input sample_data.txt --all-databases
   biorempp --input sample_data.txt --database biorempp
   biorem

### 3.2 Detailed Database Information

The `info` command provides comprehensive metadata about each database, including:
- Record counts and data structure
- Content specifications and quality metrics
- Recommended usage


### 3.3 Database-Specific Information

For targeted analysis, individual database information can be retrieved using specific identifiers. This is particularly useful for experimental planning and method selection.

In [8]:
# Retrieve specific information about the BioRemPP core database
!biorempp --database-info biorempp


 BioRemPP Core Database
 Description: Bioremediation Potential Profile
 Size: 6,623 records (0.69 MB)
[FORMAT] Format: CSV with semicolon separator

🔍 Database Schema:
    1. ko
    2. genesymbol
    3. genename
    4. cpd
    5. compoundclass
    6. referenceAG
    7. compoundname
    8. enzyme_activity

⭐ Key Features:
   • 986 unique KEGG Orthology (KO) identifiers
   • 323 unique compounds across 12 chemical classes
   • 978 unique enzyme gene symbols
   • 150 different enzyme activities

🎯 Primary Usage:
   Primary database for bioremediation analysis

[USAGE] Usage Examples:
   biorempp --input sample_data.txt --database biorempp
   biorempp --input sample_data.txt --all-databases
   biorempp --list-databases



In [11]:
# Examine KEGG database specifications
!biorempp --database-info kegg


 KEGG Degradation Pathways
 Description: KEGG-derived biodegradation pathway information
 Size: 871 records (0.02 MB)
[FORMAT] Format: CSV with semicolon separator

🔍 Database Schema:
    1. ko
    2. pathname
    3. genesymbol

⭐ Key Features:
   • 517 unique KO identifiers
   • 20 degradation pathways (Naphthalene, Aromatic, Toluene, etc.)
   • 513 unique gene symbols
   • Focus on xenobiotic degradation

🎯 Primary Usage:
   Pathway enrichment analysis and degradation route identification

[USAGE] Usage Examples:
   biorempp --input sample_data.txt --database kegg
   biorempp --input sample_data.txt --all-databases
   biorempp --list-databases



In [12]:
# Examine HADEG database specifications
!biorempp --database-info hadeg


 Hydrocarbon Aerobic Degradation Enzymes and Genes
 Description: manually curated database containing sequences of experimentally validated
 Size: 1,168 records (0.04 MB)
[FORMAT] Format: CSV with semicolon separator

🔍 Database Schema:
    1. Gene
    2. ko
    3. Pathway
    4. compound_pathway

⭐ Key Features:
   • 323 unique genes involved in degradation
   • 339 unique KO identifiers
   • 71 distinct metabolic pathways
   • 5 major compound pathway categories (Alkanes, Aromatics, etc.)

🎯 Primary Usage:
   Specific biodegradation pathway analysis and gene-pathway mapping

[USAGE] Usage Examples:
   biorempp --input sample_data.txt --database hadeg
   biorempp --input sample_data.txt --all-databases
   biorempp --list-databases



In [13]:
# Examine ToxCSM database specifications
!biorempp --database-info toxcsm


 ToxCSM Toxicity Database
 Description: Comprehensive toxicity prediction database
 Size: 323 records (0.18 MB)
[FORMAT] Format: CSV with semicolon separator

🔍 Database Schema:
    1. SMILES
    2. cpd
    3. ChEBI
    4. compoundname
    5. 66 toxicity endpoints
    6. Nuclear receptor (NR_*), Stress response (SR_*), Genotoxicity (Gen_*)
    7. Environmental (Env_*), Organ toxicity (Org_*) assessments

⭐ Key Features:
   • 314 unique SMILES molecular structures
   • 66 toxicity endpoints with value/label pairs
   • Multiple toxicity categories: Nuclear receptors, Stress response, Genotoxicity, Environmental, Organ-specific
   • ChEBI identifiers for chemical standardization

🎯 Primary Usage:
   Comprehensive toxicity evaluation and safety assessment

[USAGE] Usage Examples:
   biorempp --input sample_data.txt --database toxcsm
   biorempp --input sample_data.txt --all-databases
   biorempp --list-databases



---

## 4. Sample Data Preparation

### 4.1 Input Data Requirements

BioRemPP accepts input data in a FASTA-like header format with KO (KEGG Orthology) identifiers. For demonstration purposes, we first present a **mock example** that illustrates the expected structure:

```
>SampleX2
K00031
K00032
K00090
K00042
K00052
>SampleX2
K00031
K00032
K00090
K00042
K00052
```

This simplified structure shows how each sample begins with a header (`>SampleX`) followed by KO identifiers.

### 4.2 Real Demonstration Data

For the actual demonstration, BioRemPP uses **9 representative samples** drawn from three principal groups relevant to bioremediation research: **Bacteria**, **Fungi**, and **Microalgae/Cyanobacteria**. These organisms were selected based on the genera reported in the article:

*Bacteria, Fungi and Microalgae for the Bioremediation of Marine Sediments Contaminated by Petroleum Hydrocarbons in the Omics Era*
*Microorganisms 2021, 9, 1695.* [https://doi.org/10.3390/microorganisms9081695](https://doi.org/10.3390/microorganisms9081695)

### 4.3 Selected Organisms

The dataset includes **three representatives from each group**, totaling 9 organisms:

* **Bacteria**

  * *Acinetobacter baumannii* — `acb`
  * *Enterobacter asburiae* — `eau`
  * *Pseudomonas aeruginosa* — `pae`

* **Fungi**

  * *Aspergillus nidulans* — `ani`
  * *Fusarium graminearum* — `fgr`
  * *Cryptococcus gattii* — `cgi`

* **Microalgae/Cyanobacteria**

  * *Chlorella variabilis* — `cvr`
  * *Nannochloropsis gaditana* — `ngd`
  * *Synechocystis sp.* — `syn`

These organisms provide a biologically meaningful subset for testing, reflecting taxa that are actively studied in the context of petroleum hydrocarbon bioremediation.


### 4.4 Input Data Validation

BioRemPP implements comprehensive input validation to ensure data quality and format compliance. This includes:
- Specifc format verification
- KO identifier validation
- Character encoding verification
- File accessibility checks

### 4.5 Importing from the Repository

The demonstration dataset is made directly available in the BioRemPP repository. Users can import it from:

```
https://raw.githubusercontent.com/DougFelipe/biorempp/main/src/biorempp/data/sample_data.txt
```

This allows users to load the same **9 representative samples** described above directly from the repository, ensuring reproducibility and consistency with the published demonstration.


In [16]:
import requests
from pathlib import Path

# Raw URL of the file on GitHub
url = "https://raw.githubusercontent.com/DougFelipe/biorempp/main/src/biorempp/data/sample_data.txt"

# Output path
outfile = Path("sample_data.txt")

# Download
response = requests.get(url, timeout=30)
response.raise_for_status()  # raise if request failed

# Save to disk
outfile.write_text(response.text, encoding="utf-8")

print(f"Saved to: {outfile.resolve()}")
print("First 15 lines:\n")
print("\n".join(response.text.splitlines()[:15]))


Saved to: /content/sample_data.txt
First 15 lines:

>Acinetobacter Baumanii - acb
K01704
K10773
K14682
K07462
K03643
K00799
K03799
K00766
K00564
K01735
K03774
K03781
K01652
K03043


---

## 5. Single Database Analysis

### 5.1 BioRemPP Core Database Analysis

Single database analysis allows for focused investigation using specific knowledge bases. This approach is recommended when targeting particular aspects of bioremediation potential.

In [17]:
# Analyze sample data against BioRemPP core database
# This demonstrates targeted bioremediation potential assessment
!biorempp --input sample_data.txt --database biorempp


[BIOREMPP] Processing with BIOREMPP Database

[LOAD] Loading input data...        OK 23,653 identifiers loaded

[CONNECT] Connecting to BIOREMPP...    OK Database available
[PROCESS] Processing data...          #################### 100%
[SAVE] Saving results...            OK BioRemPP_Results.txt

[SUCCESS] Processing completed successfully!
   [RESULTS] Results: 7,613 matches found
   [OUTPUT] Output: BioRemPP_Results.txt (914KB)
   [TIME] Time: 0.2 seconds



### 5.2 KEGG Database Analysis

KEGG database analysis provides pathway-level information essential for understanding metabolic capabilities and bioremediation mechanisms.

In [7]:
# Perform KEGG database analysis
# This provides metabolic pathway and functional annotation information
!biorempp --input sample_data.txt --database kegg


[BIOREMPP] Processing with KEGG Database

[LOAD] Loading input data...        OK 23,653 identifiers loaded

[CONNECT] Connecting to KEGG...    OK Database available
[PROCESS] Processing data...          #################### 100%
[SAVE] Saving results...            OK KEGG_Results.txt

[SUCCESS] Processing completed successfully!
   [RESULTS] Results: 731 matches found
   [OUTPUT] Output: KEGG_Results.txt (38KB)
   [TIME] Time: 0.1 seconds



### 5.3 HADEG Database Analysis

HADEG (Hydrocarbon Aerobic Degradation Enzymes and Genes) analysis focuses on hydrocarbons, plastics and biosurfactants.

In [5]:
# Analyze against HADEG database for heavy metal remediation potential
!biorempp --input sample_data.txt --database hadeg


[BIOREMPP] Processing with HADEG Database

[LOAD] Loading input data...        OK 23,653 identifiers loaded

[CONNECT] Connecting to HADEG...    OK Database available
[PROCESS] Processing data...          #################### 100%
[SAVE] Saving results...            OK HADEG_Results.txt

[SUCCESS] Processing completed successfully!
   [RESULTS] Results: 1,737 matches found
   [OUTPUT] Output: HADEG_Results.txt (107KB)
   [TIME] Time: 0.1 seconds



### 5.4 ToxCSM Database Analysis

ToxCSM analysis provides toxicity prediction capabilities essential for safety assessment in bioremediation applications.

In [20]:
# Perform ToxCSM analysis for toxicity assessment
!biorempp --input sample_data.txt --database toxcsm


[BIOREMPP] Processing with TOXCSM Database

[LOAD] Loading input data...        OK 23,653 identifiers loaded

[CONNECT] Connecting to TOXCSM...    OK Database available
[PROCESS] Processing data...          #################### 100%
[SAVE] Saving results...            OK ToxCSM.txt

[SUCCESS] Processing completed successfully!
   [RESULTS] Results: 7,624 matches found
   [OUTPUT] Output: ToxCSM.txt (5MB)
   [TIME] Time: 0.5 seconds



---

## 6. Comprehensive Multi-Database Analysis

### 6.1 All-Database Integration

Comprehensive analysis utilizing all available databases provides the most complete assessment of bioremediation potential. This approach is recommended for systematic screening and comparative studies.

In [21]:
# Perform comprehensive analysis across all databases
# This provides the most complete bioremediation potential assessment
!biorempp --input sample_data.txt --all-database


[BIOREMPP] Processing with ALL Databases

[LOAD] Loading input data...        OK 23,653 KO identifiers loaded

[PROCESS] Processing databases [1/4]:
   [DB] BioRemPP Database...      OK 7,613 matches -> BioRemPP_Results.txt

[PROCESS] Processing databases [2/4]:
   [DB] HAdeg Database...      OK 1,737 matches -> HADEG_Results.txt

[PROCESS] Processing databases [3/4]:
   [DB] KEGG Database...      OK 731 matches -> KEGG_Results.txt

[PROCESS] Processing databases [4/4]:
   [DB] ToxCSM Database...      OK 7,624 matches -> ToxCSM.txt

[SUCCESS] All databases processed successfully!
   [RESULTS] Total results: 17,705 matches across 4 databases
   [OUTPUT] Location: outputs/results_tables/
   [TIME] Total time: 0.9 seconds



---

## 7. Output Analysis and Interpretation

### 7.1 Result File Examination

BioRemPP generates structured output files containing detailed matching information and analytical results. Each output file includes comprehensive metadata and statistical summaries.

In [30]:
import os
import glob

# Directory where results are stored
results_dir = "/content/outputs/results_tables"

# List all generated output files
output_files = glob.glob(os.path.join(results_dir, '*'))

report = ["Generated output files:"]

if not output_files:
    report.append("  No files found in results directory.")
else:
    for file in sorted(output_files):
        size = os.path.getsize(file)
        report.append(f"  {os.path.basename(file)} ({size} bytes)")

report.append("\n" + "="*50)
report.append("Output files successfully generated")
report.append("Ready for downstream analysis and interpretation")

# Join everything and print properly
summary = "\n".join(report)
print(summary)


Generated output files:
  BioRemPP_Results.txt (936203 bytes)
  HADEG_Results.txt (110438 bytes)
  KEGG_Results.txt (39456 bytes)
  ToxCSM.txt (5283101 bytes)

Output files successfully generated
Ready for downstream analysis and interpretation


### 7.2 Sample Output Content Analysis

To demonstrate the analytical value of BioRemPP output, we examine representative results from our comprehensive analysis.

In [39]:
import os
import glob
import pandas as pd

# Directory containing the result tables
results_dir = "/content/outputs/results_tables"

# Number of lines to preview from each file
head_rows = 5

# Find all result tables with .csv or .txt extensions
table_files = sorted(
    glob.glob(os.path.join(results_dir, "*.csv")) +
    glob.glob(os.path.join(results_dir, "*.txt"))
)

print("Sample Output Content Analysis")
print("=" * 50)

if not table_files:
    print("No table files found in results directory.")
else:
    # Improve console display formatting
    pd.set_option("display.max_colwidth", 200)
    pd.set_option("display.width", None)

    for file in table_files:
        print(f"\n📄 File: {os.path.basename(file)}")
        print("-" * 50)

        # Simple heuristic to detect delimiter (semicolon, comma, tab, or pipe)
        try:
            with open(file, "rb") as fh:
                sample = fh.read(4096).decode("utf-8", errors="ignore")
            counts = {sep: sample.count(sep) for sep in [",", ";", "\t", "|"]}
            # Pick the most frequent separator; if none found, let pandas infer
            sep = max(counts, key=counts.get) if max(counts.values()) > 0 else None

            # Attempt to read the file with pandas
            df = pd.read_csv(
                file,
                sep=sep if sep else None,
                engine="python",
                nrows=head_rows,
                on_bad_lines="skip",  # use error_bad_lines=False for older pandas
            )
            print(df.to_string(index=False))
        except Exception as e:
            # Fallback: show raw text preview if parsing fails
            print(f"Could not parse as CSV/TSV ({e}). Raw preview (first {head_rows} lines):")
            try:
                with open(file, "r", encoding="utf-8", errors="replace") as f:
                    for i, line in enumerate(f):
                        if i >= head_rows:
                            break
                        print(line.rstrip())
            except Exception as ie:
                print(f"Also failed to open raw text: {ie}")


Sample Output Content Analysis

📄 File: BioRemPP_Results.txt
--------------------------------------------------
                      sample     ko genesymbol                   genename    cpd    compoundclass referenceAG    compoundname enzyme_activity
Acinetobacter Baumanii - acb K00799        GST glutathione S-transferase  C10928      Chlorinated         WFD        Alachlor     transferase
Acinetobacter Baumanii - acb K00799        GST glutathione S-transferase  C14322      Chlorinated         WFD    Chlorpyrifos     transferase
Acinetobacter Baumanii - acb K00799        GST glutathione S-transferase  C06790      Chlorinated      CONAMA Trichloroethene     transferase
Acinetobacter Baumanii - acb K00799        GST glutathione S-transferase  C10928      Chlorinated         EPC        Alachlor     transferase
Acinetobacter Baumanii - acb K00799        GST glutathione S-transferase  C14322 Organophosphorus         EPC    Chlorpyrifos     transferase

📄 File: HADEG_Results.txt
---------



## 8. Advanced Usage Patterns

### 8.1 Customizing the Output Path

By default, BioRemPP writes all result tables to:

```
outputs/results_tables
```

You can change this location using the `--output-dir` flag.

**Example:**

```bash
!biorempp --input sample_data.txt --all-databases --output-dir /content/my_results
```

➡️ The command above will run BioRemPP with all databases and store the generated output files inside:

```
/content/my_results
```

---

### 8.2 Controlling Verbosity Levels

BioRemPP provides three verbosity options to control how much information is displayed during execution:

* `--quiet` or `-q` → Silent mode. Only errors are shown.
* `--verbose` or `-v` → Verbose mode (default). Shows detailed progress and processing steps.
* `--debug` → Debug mode. Prints technical details, internal operations, and generates log traces.

**Examples:**

1. Run in **silent mode** (no progress messages):

```bash
!biorempp --input sample_data.txt --all-databases --quiet
```

2. Run in **verbose mode** (see detailed steps of execution):

```bash
!biorempp --input sample_data.txt --all-databases --verbose
```

3. Run in **debug mode** (for troubleshooting, with technical logs):

```bash
!biorempp --input sample_data.txt --all-databases --debug
```


---

## 9. Integration with Bioinformatics Workflows

### 9.1 Pipeline Integration Considerations

BioRemPP is designed for seamless integration into existing bioinformatics pipelines. Key considerations include:

- **Standardized Input Formats**: Compatible with common annotation pipeline outputs
- **Structured Output**: Machine-readable formats suitable for downstream analysis
- **Error Codes**: Appropriate exit codes for pipeline automation
- **Performance Optimization**: Efficient processing for large-scale datasets

### 9.2 Reproducibility and Documentation

Scientific reproducibility requires comprehensive documentation of analytical parameters and software versions. BioRemPP supports this through:

- Detailed logging capabilities
- Version tracking and metadata inclusion
- Parameter validation and documentation
- Standardized output formats

---

## 10. Conclusion and Best Practices

### 10.1 Summary of Demonstrated Capabilities

This technical demonstration has showcased the comprehensive functionality of the BioRemPP command-line interface, including:

1. **Installation and Setup**: Proper package installation and environment configuration
2. **Information Discovery**: Database enumeration and detailed metadata retrieval
3. **Single Database Analysis**: Targeted analysis using specific knowledge bases
4. **Multi-Database Integration**: Comprehensive analysis across all available databases
5. **Output Management**: Structured result generation and interpretation
6. **Error Handling**: Robust validation and error management systems

### 10.2 Recommended Workflows

For optimal results in bioremediation potential assessment:

1. **Exploratory Analysis**: Begin with `info` and `list-databases` commands
2. **Targeted Investigation**: Use single database analysis for specific research questions
3. **Comprehensive Assessment**: Apply multi-database analysis for complete evaluation
4. **Quality Control**: Implement appropriate validation and verification procedures
5. **Documentation**: Maintain comprehensive records of analytical parameters and results

### 10.3 Future Directions

BioRemPP represents a foundational tool for computational bioremediation analysis. Future developments may include:

- Enhanced database integration and updates
- Advanced statistical analysis capabilities
- Machine learning-based prediction models
- Extended output formats and visualization tools
- Performance optimizations for large-scale genomic datasets

### 10.4 Support and Documentation

Comprehensive documentation, including API references, usage examples, and troubleshooting guides, is available through https://biorempp.readthedocs.io/en/latest/.

The development team maintains active support channels for technical assistance and feature requests.

---

**Acknowledgments**: BioRemPP development follows open-source principles and welcomes community contributions to enhance capabilities and broaden applicability in environmental bioinformatics research.