stanobj

Standardize heterogeneous single-cell transcriptomics datasets into canonical .h5ad

What is this?

stanobj is a data standardization tool (implemented as a Claude Code skill) that converts single-cell transcriptomics datasets from many source formats into a canonical, analysis-ready .h5ad representation.

This is not just file-format conversion. It is a semantic harmonization pipeline that:

Detects and validates matrix orientation (cells x genes vs genes x cells)
Classifies matrix content (raw counts, normalized, log1p, scaled)
Standardizes cell and gene metadata with canonical column names
Preserves all original information while adding structured provenance
Surfaces ambiguities explicitly rather than guessing silently

Supported Formats

Format	Extension	Notes
Matrix Market / 10x	`.mtx`, directory	Auto-detects triplet structure
CSV	`.csv`, `.csv.gz`	Orientation heuristics + decision protocol
TSV	`.tsv`, `.tsv.gz`	Same as CSV with tab delimiter
10x Genomics HDF5	`.h5`	Reads `matrix/` group directly
Generic HDF5	`.h5`, `.hdf5`	Inspects structure, asks if ambiguous
Existing h5ad	`.h5ad`	Re-standardization (not assumed canonical)
Loom	`.loom`	h5py-based reading
Seurat (R)	`.rds`	R subprocess, handles v5 + legacy
SingleCellExperiment (R)	`.rds`	R subprocess, auto-detected

Archives: .tar.gz, .tgz, .tar.bz2, .tar, .zip Compression: .gz, .bz2 (transparent for most formats)

How It Works

Input (any format)
  │
  ├── 1. Decompress / extract archives
  ├── 2. Auto-detect format
  ├── 3. Read with format-specific reader
  ├── 4. Classify matrix type (counts? normalized? scaled?)
  ├── 5. Check for mixed modalities (RNA vs ADT vs CRISPR)
  ├── 6. Standardize metadata (obs / var / obsm)
  ├── 7. Assign layers (counts → layers["counts"])
  ├── 8. Validate (structural + semantic checks)
  └── 9. Write canonical h5ad + report + audit log

When the pipeline encounters ambiguity (e.g., unclear matrix orientation, multiple Seurat assays), it pauses and asks rather than guessing.

Installation

Claude Code plugin

/plugins add chansigit/stanobj

Once installed, ask Claude to "convert to h5ad" or "standardize single-cell data" and the skill activates automatically.

Usage

As a Claude Code Skill

The skill is triggered automatically when you ask Claude Code to convert single-cell data:

"Convert this Seurat RDS to h5ad" "Standardize these 10x files" "Convert expression.csv to canonical h5ad"

Direct CLI

# Basic conversion
python stanobj.py <input> -o <output.h5ad>

# With explicit format
python stanobj.py data.h5 -o result.h5ad --format 10x_h5

# Supply decisions for ambiguous inputs
python stanobj.py data.csv -o result.h5ad \
  --decision matrix_orientation=cells_x_genes \
  --decision matrix_type=counts

Exit Codes

Code	Meaning
`0`	Success. Outputs written.
`1`	Fatal error.
`10`	Decision needed. JSON on stdout with options.

Outputs

Each conversion produces three files:

output.h5ad              # Canonical AnnData
output_report.json       # Machine-readable conversion report
output_audit.log         # Human-readable audit log

Canonical Schema

adata.X                    # Main analysis matrix
adata.layers["counts"]     # Raw counts (when available)
adata.obs["cell_id"]       # Original cell barcode
adata.obs["dataset"]       # Dataset name
adata.obs["cell_type"]     # Standardized from celltype/CellType/annotation/...
adata.obs["sample"]        # Standardized from orig.ident/sample_id/...
adata.var["gene_symbol"]   # Gene symbols
adata.var["gene_id"]       # Ensembl IDs (when available)
adata.var["feature_type"]  # Gene Expression, Antibody Capture, etc.
adata.obsm["X_pca"]        # PCA embedding (when available)
adata.obsm["X_umap"]       # UMAP embedding (when available)
adata.uns["stanobj"]       # Full conversion provenance

Decision Protocol

When stanobj can't proceed confidently, it exits with code 10 and emits structured JSON:

{
  "status": "decision_needed",
  "decision_type": "assay_selection",
  "context": "Seurat object has 3 assays: RNA (32738), SCT (3000), ADT (142)",
  "options": ["RNA", "SCT", "ADT"],
  "recommendation": "RNA",
  "reason": "RNA has the most features and contains raw counts"
}

Known decision points:

Type	Trigger
`assay_selection`	Multiple assays in Seurat/SCE
`matrix_orientation`	Ambiguous CSV/TSV layout
`matrix_type`	Can't classify matrix confidently
`modality_filter`	Mixed RNA + non-RNA features
`format_detection`	Unrecognized file structure

Architecture

stanobj/
├── SKILL.md                 # Claude Code skill definition
├── scripts/
│   ├── stanobj.py           # Main orchestrator CLI
│   ├── detection.py         # Format detection + matrix classification
│   ├── validation.py        # Structural + semantic checks
│   ├── standardize.py       # Metadata harmonization
│   ├── report.py            # JSON report + audit log
│   ├── utils.py             # Compression, sampling, formatting
│   └── readers/
│       ├── mtx_reader.py    # Matrix Market / 10x triplet
│       ├── csv_reader.py    # CSV / TSV
│       ├── h5_reader.py     # 10x h5 + generic HDF5
│       ├── h5ad_reader.py   # Existing h5ad
│       ├── loom_reader.py   # Loom
│       ├── r_bridge.py      # Python-R bridge
│       ├── seurat_reader.R  # Seurat export
│       └── sce_reader.R     # SCE export
├── references/              # Schema docs, format guide
├── examples/                # Sample report
└── tests/                   # 198 tests

Requirements

Python: anndata, scanpy, h5py, scipy, loompy, pandas, numpy

R (optional, for Seurat/SCE): Seurat, SeuratObject, SeuratDisk, SingleCellExperiment

If R is unavailable, stanobj provides manual conversion instructions as fallback.

Related Tools

stangene — Gene identifier harmonization (Ensembl, symbols, aliases)
Together: stanobj standardizes the container, stangene standardizes the gene identifiers

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.claude-plugin		.claude-plugin
docs/superpowers		docs/superpowers
examples		examples
hooks		hooks
references		references
scripts		scripts
skills/stanobj		skills/stanobj
tests		tests
.gitignore		.gitignore
README.md		README.md
SKILL.md		SKILL.md
logo.svg		logo.svg
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stanobj

What is this?

Supported Formats

How It Works

Installation

Claude Code plugin

Usage

As a Claude Code Skill

Direct CLI

Exit Codes

Outputs

Canonical Schema

Decision Protocol

Architecture

Requirements

Related Tools

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

stanobj

What is this?

Supported Formats

How It Works

Installation

Claude Code plugin

Usage

As a Claude Code Skill

Direct CLI

Exit Codes

Outputs

Canonical Schema

Decision Protocol

Architecture

Requirements

Related Tools

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages