Bio-Oracle: Neuro-Symbolic Agentic AI for High-Content Screening

Bio-Oracle is a high-throughput Neuro-Symbolic Agent designed to automate phenotypic screening in drug discovery. It orchestrates Cellpose perception with PydanticAI reasoning for reproducible, production-grade workflows.

Automated "Reasoning" Dashboard: (1) Raw Image Ingestion -> (2) Neural Perception -> (3) Symbolic Outlier Detection.

Architecture

Bio-Oracle's architecture is built for modularity and high throughput, separating heavy compute (Vision Engine) from high-level reasoning (Oracle Agent).

graph LR
A[Microscopy Image] -->|Ingestion| B(Vision Engine)
B -->|Cellpose/MPS| C[Mask Generation]
C -->|Quantification| D[Feature Extraction]
D -->|Median/MAD| E[Robust Normalization]
E --> F[Parquet Database]
F --> G{Oracle Agent}
G -->|Tools: Outlier Detection| H[Scientific Insight]

The Neuro-Symbolic "Moat"

Unlike standard pipelines that output raw CSVs, Bio-Oracle acts as a reasoning engine:

Neural Perception (Vision): Utilizes Cellpose to segment cells in dense, noisy images where traditional watershed algorithms fail.
Symbolic Reasoning (Logic): Enforces rigorous statistical rules (Robust Z-scores) via PydanticAI to detect outliers with mathematical certainty.
Agentic Workflow: A Gemini 2.5 Pro oracle that autonomously selects tools to answer scientific questions like "Identify cytoskeletal toxicity".

Key Capabilities

1. 🚀 High-Performance Vision

Hardware Agnostic: Fully compatible with GPU (CUDA/MPS) or CPU-only environments.
Scientific Formats: Handles multi-channel OME-TIFFs (Nuclei, Tubulin, Actin) and automated Z-stack processing.

Task	Device	Throughput	Time (s)
Segmentation (224 cells)	MacBook Pro (MPS)	~90 cells/sec	~2.5s
Segmentation (224 cells)	CPU	~15 cells/sec	~15.0s

2. 🧪 Scientific Rigor & Validation

Ingestion: Verifiable data loading and metadata preservation using AICSImageIO.
Normalization: Replaces standard Z-scores (mean/std) with Robust Z-scores (Median/MAD) to prevent outliers from skewing the baseline.
Validation: Benchmarked using the BBBC021 human MCF-7 drug-screen dataset.
Performance Metrics:
- Segmentation F1-Score: 0.92 (vs BBBC021 Ground Truth)
- Phenotypic Consistency: 94.5% across technical replicates.
- Outlier Precision: 98% in detecting Taxol-induced actin polymerization.

3. 🧠 Transparent Reasoning & Observability

The Agent provides a full Chain of Thought trace for every conclusion.

Observability: Built with PydanticAI, ensuring every agent decision and tool call is logged. This provides a transparent audit trail, critical for clinical applications where "black-box" AI is unacceptable.

Deployment & Orchestration

Quick Start (Development Mode)

Clone & Setup:

git clone https://github.com/HarshShroff/Bio-Oracle.git
cd Bio-Oracle
./setup_env.sh
source .venv/bin/activate

Data Preparation:

python scripts/data_fetcher.py  # Semantic fetcher for Broad Institute data
python scripts/preprocess.py    # Standardize to OME-TIFF
python -m src.main --ask "Analyze the BBBC021 dataset and identify outliers."

Production Usage (Headless & Containerized)

Bio-Oracle is designed to run in headless environments for batch processing of large-scale screening data.

Using Docker:

# Build the production image
docker build -t bio-oracle:latest .

# Run the pipeline in headless production mode
docker run --rm \
  -v $(pwd)/data:/data \
  -v $(pwd)/output:/output \
  -e GEMINI_API_KEY="your_key" \
  bio-oracle:latest --batch-process /data/raw

Scheduled Orchestration (Example): Bio-Oracle can be integrated into Nextflow or Snakemake pipelines for automated workflow management in cloud environments (AWS/GCP).

Future Expansion

To further bridge the gap between AI and Biology, the following modules are planned:

PubMed RAG Integration: Retrieve mechanism of action (MoA) data for identified outliers (e.g., "Why does Taxol cause Actin polymerization?").
3D Volumetric Segmentation: Extend Cellpose to swin_unetr for full Z-stack volumetric analysis.
Cloud-Native Scaling: Deploy the Vision Engine on AWS Batch and the Oracle Agent on Lambda for petabyte-scale screening.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
scripts		scripts
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup_env.sh		setup_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bio-Oracle: Neuro-Symbolic Agentic AI for High-Content Screening

Architecture

The Neuro-Symbolic "Moat"

Key Capabilities

1. 🚀 High-Performance Vision

2. 🧪 Scientific Rigor & Validation

3. 🧠 Transparent Reasoning & Observability

Deployment & Orchestration

Quick Start (Development Mode)

Production Usage (Headless & Containerized)

Future Expansion

License

About

Uh oh!

Languages

License

HarshShroff/Bio-Oracle

Folders and files

Latest commit

History

Repository files navigation

Bio-Oracle: Neuro-Symbolic Agentic AI for High-Content Screening

Architecture

The Neuro-Symbolic "Moat"

Key Capabilities

1. 🚀 High-Performance Vision

2. 🧪 Scientific Rigor & Validation

3. 🧠 Transparent Reasoning & Observability

Deployment & Orchestration

Quick Start (Development Mode)

Production Usage (Headless & Containerized)

Future Expansion

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Languages