BioCodeTeacher

v0.45 — An AI-powered educational tool that helps graduate students and postdocs understand bioinformatics code — not just what it does, but why it matters biologically. Provides deep, context-aware explanations for single-cell RNA-seq (Scanpy, Seurat), spatial transcriptomics (Squidpy, BANKSY), and digital pathology (OpenSlide, CLAM, PathML) codebases.

Download

Platform	Download	Size
Windows	BioCodeTeacher-v0.45-windows.zip	19 MB
macOS	BioCodeTeacher.dmg	21 MB

No Python or Node.js required. Extract and run, then paste your API key (OpenRouter, OpenAI, or Anthropic). That's it.

macOS: right-click > Open on first launch (unsigned app).

Freely distributed for non-commercial use under CC BY-NC 4.0. No warranty expressed or implied.

Why BioCodeTeacher?

We can run tutorial code but can't adapt it to our own data. Some copy-paste from Scanpy/Seurat tutorials, tweak parameters blindly, and don't understand the statistical or biological reasoning behind each step.

A generic code explainer says:

"Calls normalize_total on adata with target_sum parameter..."

BioCodeTeacher says:

"CPM normalization — scales each cell's counts so they sum to 10,000, making cells comparable despite different sequencing depths. This compensates for the fact that some cells are captured with more mRNA molecules than others. The 1e4 value is conventional; the exact number doesn't matter because you'll log-transform next..."

Every explanation includes biological motivation, data structure changes (AnnData slots, Seurat assays), parameter guidance with sensible ranges, pipeline position, common mistakes, and cross-tool equivalents (Scanpy ↔ Seurat).

Features

Three-Tier Context System

Tier 1 — Repository Overview: Load a repo and BioCodeTeacher analyzes it using a 4-phase agentic process (structure scan, key file identification, deep analysis, synthesis). Identifies analysis pipeline stages, frameworks (Snakemake, Nextflow, nf-core), data formats, and biological decision points. Cached to disk for instant subsequent loads.

Tier 2 — File Summary: Click any file for an AI-generated educational summary: which pipeline stage it implements, what data objects it consumes/produces, key biological decisions encoded in parameters, and connections to upstream/downstream analysis steps.

Tier 3 — Line Explanation: Click any line (or drag to select a range) for a detailed explanation with:

Biological Significance — why this step matters biologically
Data State — what changed in the AnnData/Seurat object
Parameter Guide — what each parameter controls, sensible ranges, behavior at extremes
Pipeline Position — where this falls in the canonical pipeline (QC → normalization → HVG → PCA → integration → clustering → annotation → DE → trajectory)
Common Mistakes — bioinformatics-specific gotchas
Cross-Tool Reference — equivalent in the other ecosystem (Scanpy ↔ Seurat)

Breadcrumb navigation (repo > file > line) lets you move between tiers.

Bioinformatics-Aware Analysis

Pipeline detection: Recognizes Snakemake, Nextflow, nf-core, CWL, Cell Ranger, and R targets workflows
Framework recognition: scverse ecosystem (Scanpy, AnnData, scVI), Seurat/Bioconductor, spatial tools (Squidpy, BANKSY), pathology tools (CLAM, PathML)
30+ file formats: .h5ad (AnnData), .rds (Seurat), .fasta, .fastq, .vcf, .gff/.gtf, .sam, .bed, .smk, .nf, .cwl, .wdl, .ipynb, .Rmd, .svs, and more
Directory patterns: Automatically identifies workflows/, rules/, notebooks/, analysis/, pipelines/ structures

Core Functionality

Splash Screen: Shows version, changelog, and license on launch. Press SPACE twice to dismiss.
Auto-Analysis on Load: Architecture analysis starts automatically when you open a repo
Persistent Analysis Cache: Analysis stored to C:\BioCodeTeacher\ (Windows) or ~/.biocodeteacher/ (Linux/Mac). Survives server restarts.
Multi-Line Range Explanation: Click and drag to select multiple lines for combined explanations
Interactive Chat: Context-aware bioinformatics conversations enriched with repo architecture, file summaries, and live API documentation. Quick action buttons for common tasks.
Folder Browser: Navigate and select repositories visually instead of typing paths
On-Demand Deep Analysis: For files not in the initial analysis set, a "Deep analyze" button adds them to the architecture index with one LLM call
Cache Staleness Detection: Warns when analyzed files have been modified since the last analysis

AI Models

Choose from 7 pre-configured frontier models via OpenRouter:

Claude Opus 4.6 (default) — Anthropic's strongest model, 1M context
Claude Sonnet 4.6 — Fast and capable, 200K context
GPT-5.4 — OpenAI with medium reasoning effort, 1M context
GPT-5.4 Azure ZDR — GPT-5.4 via Azure with zero data retention
GLM-5 Turbo — High-speed, competitive performance
Gemini 3.1 Pro — Google's flagship with thinking support, 1M context
Gemini 3.0 Flash — High-speed for quick explanations, 1M context

Add any OpenRouter-compatible model through Settings.

Visual Design

"Research Lab" aesthetic inspired by fluorescence microscopy — teal/cyan primary (GFP channel), indigo accent (DAPI channel), and amber warnings (PE channel) on deep dark backgrounds. Instrument Sans + Plus Jakarta Sans typography. Microscope icon wordmark with ambient glow effects. Supports dark and light modes.

Architecture

Frontend (React 18 / TypeScript / Vite)     Backend (FastAPI / Python 3.10-3.13)
├── App.tsx (root, Allotment split)          ├── main.py (entry, CORS, static)
├── components/                               ├── routers/
│   ├── splash/SplashScreen.tsx              │   ├── repos.py   (load, browse)
│   ├── context/                             │   ├── files.py   (content, tree)
│   │   ├── ContextPanel.tsx (3-tier)        │   ├── explain.py (explain, WS)
│   │   ├── RepoOverview.tsx (tier 1)        │   └── chat.py    (chat stream)
│   │   ├── FileSummary.tsx  (tier 2)        ├── services/
│   │   └── LineExplanation.tsx (tier 3)     │   ├── openrouter.py (AI + bio prompts)
│   ├── architecture/PhaseTracker.tsx        │   ├── architecture_agent.py (4-phase)
│   ├── chat/ChatBox.tsx                     │   ├── architecture_store.py (memory+disk)
│   ├── code/CodeEditor.tsx                  │   ├── persistent_store.py (disk cache)
│   └── layout/Header.tsx                    │   ├── doc_search.py (API doc fetcher)
├── hooks/useArchitectureAnalysis.ts         │   ├── repo_manager.py (Git/local)
├── store/codeStore.ts (Zustand)             │   ├── code_parser.py (imports)
├── config/version.ts (version + changelog)  │   └── explanation_cache.py (LRU)
├── services/api.ts (REST + WebSocket)       └── models/schemas.py (Pydantic)
└── styles/theme.css (Research Lab theme)

Quick Start

Option A: Standalone Application (no setup)

Windows:

Download the latest BioCodeTeacher.zip from Releases
Extract and double-click BioCodeTeacher.exe
Browser opens automatically — paste your OpenRouter API key and go

macOS:

Download BioCodeTeacher.dmg from Releases
Open the DMG and drag BioCodeTeacher to Applications
Right-click > Open on first launch (unsigned app — bypasses Gatekeeper)
Browser opens automatically — paste your OpenRouter API key and go

No Python or Node.js required on either platform.

Option B: From Source (all platforms)

Prerequisites: Python 3.10–3.13 (not 3.14+), Node.js 18+, OpenRouter API Key

git clone https://github.com/alexs42/BioCodeTeacher.git
cd BioCodeTeacher

Linux / macOS: ./start.sh Windows: start.bat

This sets up venvs, installs deps, and starts both servers. Open http://localhost:5173.

	Linux / macOS	Windows
Both servers	`./start.sh`	`start.bat`
Backend only	`./start-backend.sh`	`start-backend.bat`
Frontend only	`./start-frontend.sh`	`start-frontend.bat`
Verify project	`./verify-project.sh`	`verify-project.bat`

First-Time Setup

Enter your OpenRouter API key (required, stored locally in browser)
Select an AI model (optional, defaults to Claude Opus 4.6)
Optionally add a GitHub token for private repos
Click Get Started

Usage

Loading a Repository

Click "Open Repository" in the header. Either type a path / GitHub URL, or click the browse button to navigate folders visually. Once loaded, the right panel immediately begins architecture analysis (or loads cached results instantly).

BioCodeTeacher works best with bioinformatics codebases — Scanpy/Seurat analysis scripts, Snakemake/Nextflow pipelines, Jupyter notebooks with single-cell workflows, or digital pathology projects — but it can explain any codebase.

Three-Tier Navigation

Repo loaded, no file selected → Repository Overview with pipeline stages, framework detection, architecture diagrams
File selected, no line clicked → File Summary with pipeline role, data flow, biological decisions, learning objectives
Line or range selected → Line Explanation with biological significance, data state changes, parameter guides, cross-tool references

Click breadcrumbs (repo > file > line) to navigate back up.

Chat

The chat panel is open and ready at the bottom — no expand click needed. The chat assistant has deep knowledge of single-cell analysis, spatial transcriptomics, and digital pathology. Chat is enriched with 4 tiers of context: (1) file-specific architecture role, (2) project-level context block, (3) cached file summaries, and (4) live API documentation fetched from ReadTheDocs for referenced functions. Educational prompt suggestions get you started immediately:

Teach me this repo — architecture walkthrough with pros/cons
Critique this code — review with concrete improvement suggestions
Tutorial mode — step-by-step file walkthrough
Analyze architecture — trigger the 4-phase agentic analysis
Explain with examples — concrete usage examples
Find potential bugs — code review for issues
Create diagram — generate Mermaid diagrams

Or ask domain-specific questions like "Why use Leiden over Louvain?" or "What resolution should I use for clustering?"

Building

macOS

./build.sh
# Output: dist/BioCodeTeacher.app + dist/BioCodeTeacher.dmg

Builds frontend, creates Python venv, runs PyInstaller to produce a .app bundle, then wraps it in a .dmg with an Applications shortcut for drag-and-drop install. No code signing — first launch requires right-click > Open to bypass Gatekeeper.

Windows

build.bat
REM Output: dist\BioCodeTeacher\BioCodeTeacher.exe

Uses the py launcher to find Python 3.10–3.13, builds frontend, bundles with PyInstaller. Includes retry loops for Dropbox file locks.

Both platforms use biocodeteacher.spec — a cross-platform PyInstaller spec with platform.system() detection. UPX compression is disabled on macOS (breaks Gatekeeper).

API Endpoints

Endpoint	Method	Purpose
`/api/repos/browse`	GET	Browse filesystem directories
`/api/repos/load`	POST	Load repository (returns `has_cached_analysis`)
`/api/files/content`	GET	Get file content with language detection
`/api/files/tree`	GET	Get repository file tree
`/api/explain/line`	POST	Explain single line
`/api/explain/range`	POST	Explain code range
`/api/explain/architecture-content/{repo_id}`	GET	Get cached architecture overview
`/api/explain/architecture-status/{repo_id}`	GET	Check analysis existence + staleness
`/api/explain/file-context/{repo_id}`	GET	File-specific architecture context
`/api/explain/file-summary/{repo_id}`	GET	Cached file summary
`/api/explain/stream`	WS	Streaming (line, range, architecture_agent, file_summary, analyze_file)
`/api/chat/stream`	WS	Streaming chat responses

Swagger UI at http://localhost:8000/docs when backend is running.

Persistent Storage

Analysis is cached to disk so it survives server restarts:

Windows: C:\BioCodeTeacher\repos\<hash>\
Linux/Mac: ~/.biocodeteacher/repos/<hash>/

Each repo directory contains architecture.json, architecture_display.md, meta.json, and file_summaries/. File summaries are invalidated when the file content changes.

Development

Testing

Backend (179 tests):

cd backend && pytest -q

Suite	Tests	Coverage
test_repos.py	13	Repository API + browse + cache
test_files.py	7	File operations
test_services.py	33	ArchitectureStore, RepoManager, CodeParser, Cache
test_schemas.py	28	Pydantic schema validation
test_openrouter_service.py	21	OpenRouter, reasoning, payloads
test_explain_router.py	14	Explain endpoints + WebSocket
test_chat_router.py	15	Chat endpoints + context injection
test_persistent_store.py	16	Disk persistence, path hashing
test_doc_search.py	32	Doc search: library detection, HTML extraction, caching

Frontend (31 unit tests):

cd frontend && npx vitest run

E2E (Playwright):

npx playwright test

Type checking:

cd frontend && npx tsc --noEmit

Versioning

Version is tracked in frontend/src/config/version.ts. The splash screen displays the current version on every launch.

Increment 0.01 for small changes (bug fixes, minor UI tweaks)
Increment 0.1 for big changes (new features, major refactors)
Update APP_VERSION and CHANGELOG before each build

Troubleshooting

Problem	Solution
"API key is required"	Enter OpenRouter key in setup modal or settings
Backend connection error	Check port 8000 is free, backend is running
Python 3.14+ build failure	Install Python 3.12 or 3.13 alongside 3.14
Dropbox locks `dist/` during build	Pause Dropbox sync or move project outside Dropbox
macOS "app is damaged"	Right-click > Open to bypass Gatekeeper
Slow explanations	Switch to Gemini 3.0 Flash for speed
"Model not found"	Verify model ID at OpenRouter Models

Configuration

Model Configuration

Edit frontend/src/config/models.ts to add pre-configured models. For reasoning models, also add the ID to REASONING_MODELS in backend/services/openrouter.py.

Storage

API keys: Browser localStorage only (never sent to our servers)
Model preferences: Browser localStorage
Architecture analysis: Disk cache (C:\BioCodeTeacher\ or ~/.biocodeteacher/)
File summaries: Disk cache (per-repo, invalidated by file path hash)
Documentation cache: Disk cache (global doc_cache/, 24h TTL per entry)
Explanation cache: In-memory (cleared on restart)
Chat history: Session only (cleared on refresh)

Security & Privacy

API keys stored locally in your browser only
No data sent to our servers — all communication is direct to OpenRouter
Code only leaves your machine when sent to OpenRouter for explanation
Analysis cache stored locally on your filesystem
Open source — audit the code yourself

Origin

BioCodeTeacher is a specialized fork of CodeTeacher, rewritten with domain-aware prompts, bioinformatics framework detection, extended file format support, and a research-inspired visual theme. The core engine (FastAPI streaming, Monaco editor, three-tier context, persistent storage) is shared; everything user-facing is specialized for computational biology education.

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).

You are free to use, share, and adapt this software for non-commercial purposes. This software is provided "as is" without warranty of any kind. No license beyond CC BY-NC 4.0 is provided or implied. See the LICENSE file for full terms and disclaimer.

Acknowledgments

OpenRouter for unified AI model access
Anthropic for Claude models
Monaco Editor for the code editor
FastAPI for the backend framework
The scverse community for the single-cell analysis ecosystem
Seurat / Bioconductor for the R single-cell ecosystem

BioCodeTeacher — understand the biology behind the code.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
backend		backend
e2e		e2e
frontend		frontend
.gitignore		.gitignore
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
LICENSE		LICENSE
OPENROUTER_MODEL_SELECTION_PLAN.md		OPENROUTER_MODEL_SELECTION_PLAN.md
README.md		README.md
SYSTEM_OVERVIEW.md		SYSTEM_OVERVIEW.md
TEST_SUMMARY.md		TEST_SUMMARY.md
biocodeteacher.spec		biocodeteacher.spec
build.bat		build.bat
build.sh		build.sh
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
start-backend.bat		start-backend.bat
start-backend.sh		start-backend.sh
start-frontend.bat		start-frontend.bat
start-frontend.sh		start-frontend.sh
start.bat		start.bat
start.sh		start.sh
verify-project.bat		verify-project.bat
verify-project.sh		verify-project.sh

Folders and files

Latest commit

History

Repository files navigation

BioCodeTeacher

Download

Why BioCodeTeacher?

Features

Three-Tier Context System

Bioinformatics-Aware Analysis

Core Functionality

AI Models

Visual Design

Architecture

Quick Start

Option A: Standalone Application (no setup)

Option B: From Source (all platforms)

First-Time Setup

Usage

Loading a Repository

Three-Tier Navigation

Chat

Building

macOS

Windows

API Endpoints

Persistent Storage

Development

Testing

Versioning

Troubleshooting

Configuration

Model Configuration

Storage

Security & Privacy

Origin

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages