Snake

Author: Charles Dana · Monce SAS

Snake

SAT-based explainable multiclass classifier. Zero dependencies. Pure Python.

What Is Snake

Snake is a SAT-based lookalike voting classifier. For each prediction, it finds training samples that "look alike" via Boolean clause matching, then votes by their labels. The result: a fully explainable classifier where every prediction comes with a human-readable audit trail.

Input X  →  Match SAT clauses  →  Find lookalikes  →  Vote  →  Prediction + Audit

Predicted outcome: setosa (93.3%) Because: "petal_length" <= 2.45 AND "petal_width" <= 0.8 Matched 15 lookalikes, all class setosa.

Install

pip install git+https://github.com/Monce-AI/algorithmeai-snake.git

Dev install:

git clone https://github.com/Monce-AI/algorithmeai-snake.git
cd algorithmeai-snake
pip install -e .

Python 3.9+. Zero dependencies — uses only the standard library.

Quick Start

from algorithmeai import Snake

# Train from a list of dicts (production pattern)
data = [
    {"species": "setosa",     "petal_length": 1.4, "petal_width": 0.2},
    {"species": "setosa",     "petal_length": 1.3, "petal_width": 0.3},
    {"species": "versicolor", "petal_length": 4.5, "petal_width": 1.5},
    {"species": "versicolor", "petal_length": 4.1, "petal_width": 1.3},
    {"species": "virginica",  "petal_length": 5.2, "petal_width": 2.0},
    {"species": "virginica",  "petal_length": 5.0, "petal_width": 1.9},
    # ... more rows
]

model = Snake(data, n_layers=5, bucket=250)

# Predict
X = {"petal_length": 4.3, "petal_width": 1.4}
print(model.get_prediction(X))    # "versicolor"
print(model.get_probability(X))   # {"setosa": 0.0, "versicolor": 0.87, "virginica": 0.13}
print(model.get_audit(X))         # Full reasoning trace

# Save & reload
model.to_json("model.json")
model = Snake("model.json")       # Auto-detected by .json extension
print(model.get_prediction(X))    # Same result

Input Formats

Snake accepts five input formats. The first key/column is the target by default.

Format	Example	Notes
List of dicts	`Snake([{"label": "A", ...}])`	Production pattern. First key = target
CSV file	`Snake("data.csv", target_index=3)`	Pandas-formatted CSV
DataFrame	`Snake(df, target_index="species")`	Duck-typed — no pandas dependency
List of tuples	`Snake([("cat", 4, "small"), ...])`	First element = target, auto-headers
List of scalars	`Snake(["apple", "banana", ...])`	Self-classing, dedupes to unique

List of dicts (recommended):

model = Snake([
    {"survived": 1, "class": 3, "sex": "male",   "age": 22},
    {"survived": 0, "class": 1, "sex": "female", "age": 38},
])

CSV file:

model = Snake("titanic.csv", target_index=0)

DataFrame:

model = Snake(df, target_index="survived")

List of tuples:

model = Snake([("cat", 4, "small"), ("dog", 40, "large"), ("cat", 5, "small")])

List of scalars (self-classing — useful for synonym deduplication):

model = Snake(["44.2 LowE", "44.2 bronze", "Float 4mm clair"])

Complex targets (dict/list values as targets):

data = [
    {"label": {"color": "red", "size": "big"}, "feature": "round"},
    {"label": {"color": "blue", "size": "small"}, "feature": "square"},
]
model = Snake(data, n_layers=5)
pred = model.get_prediction({"feature": "round"})  # returns {"color": "red", "size": "big"}

Constructor Reference

Snake(Knowledge, target_index=0, excluded_features_index=(), n_layers=5, bucket=250, noise=0.25, vocal=False, saved=False, progress_file=None, workers=1, oppose_profile="auto")

Parameter	Type	Default	Description
`Knowledge`	str / list / DataFrame	—	CSV path, JSON model path, list of dicts/tuples/scalars, or DataFrame
`target_index`	int / str	`0`	Target column index or name
`excluded_features_index`	tuple/list	`()`	Column indices to exclude from training
`n_layers`	int	`5`	Number of SAT layers to build (more = more accurate, slower)
`bucket`	int	`250`	Max samples per bucket before splitting
`noise`	float	`0.25`	Cross-bucket noise ratio for regularization
`vocal`	bool	`False`	Print training progress
`saved`	bool	`False`	Auto-save model after training (CSV flow only)
`progress_file`	str/None	`None`	File path for JSON training progress updates
`workers`	int	`1`	Parallel workers for layer construction (`>1` uses multiprocessing)
`oppose_profile`	str	`"auto"`	Literal generation strategy: `auto`, `balanced`, `linguistic`, `industrial`, `cryptographic`, `scientific`, `categorical`

Prediction API

Method	Returns	Description
`get_prediction(X)`	value	Most probable class
`get_probability(X)`	dict	`{class: probability}` for all classes
`get_lookalikes(X)`	list	`[[index, class, condition], ...]` matched training samples
`get_lookalikes_labeled(X)`	list	`[[index, class, condition, origin], ...]` with `"c"` (core) or `"n"` (noise)
`get_augmented(X)`	dict	Input enriched with Lookalikes, Probability, Prediction, Audit
`get_audit(X)`	str	Full human-readable reasoning trace

X = {"petal_length": 4.3, "petal_width": 1.4}

model.get_prediction(X)    # "versicolor"
model.get_probability(X)   # {"setosa": 0.0, "versicolor": 0.87, "virginica": 0.13}
model.get_lookalikes(X)    # [[42, "versicolor", [0, 5]], [87, "versicolor", [3]]]
model.get_augmented(X)     # {**X, "Lookalikes": ..., "Probability": ..., "Prediction": ..., "Audit": ...}

Audit output (Routing AND + Lookalike AND):

### BEGIN AUDIT ###
  Prediction: versicolor
  Layers: 5, Lookalikes: 47

  LOOKALIKE SUMMARY
  ================================================
  versicolor           87.2% (41/47) █████████████████░░░
    e.g. sample with petal_length 4.5
  virginica            10.6% (5/47)  ██░░░░░░░░░░░░░░░░░░
    e.g. sample with petal_length 5.0

  PROBABILITY
  ================================================
  P(versicolor) = 87.2% █████████████████░░░
  P(virginica) = 10.6% ██░░░░░░░░░░░░░░░░░░

  ================================================
  LAYER 0
  ================================================

  Routing AND (bucket 1/2, 78 members):
    "petal_width" > 0.8

  Lookalike AND (12 matches):
    Lookalike #42 [versicolor]: 4.5
      AND: "petal_length" <= 5.0 AND "petal_width" <= 1.7
    ...

  >> PREDICTION: versicolor
### END AUDIT ###

Lookalike Origins — Core vs Noise (v5.0.0)

Each lookalike carries an origin label: core (c) = routed to the bucket by condition, noise (n) = randomly injected from the full population for regularization. This splits Snake's probability into independent signals.

model = Snake(data, target_index="label", n_layers=77, bucket=150, noise=0.40, workers=10)

# Labeled lookalikes — each entry: [global_idx, target_value, condition, origin]
lookalikes = model.get_lookalikes_labeled(X)
for idx, target, cond, origin in lookalikes:
    print(f"#{idx} [{target}] ({origin})")  # e.g. "#42 [1] (c)", "#8 [0] (n)"

# Weighted probability — trust core more than noise
def weighted_prob(lookalikes, target_class, w_c=2, w_n=1):
    total = sum(w_c if la[3] == "c" else w_n for la in lookalikes)
    hits = sum((w_c if la[3] == "c" else w_n) for la in lookalikes if str(la[1]) == str(target_class))
    return hits / total if total > 0 else 0.5

# Split signals
core_only = [la for la in lookalikes if la[3] == "c"]
noise_only = [la for la in lookalikes if la[3] == "n"]

Key finding: The optimal weight ratio (w_c, w_n) depends on n_layers. At low layer counts, core dominates — noise is a distraction. At high layer counts, noise becomes a genuine complementary signal because full-population diversity compounds across stochastic layers.

Config	Core AUROC	Noise AUROC	Divergence winner	Best weighting
7 layers, noise=0.25	0.895	0.768	Core 81%	Pure core
77 layers, noise=0.40	0.891	0.877	Noise 59%	Noise-heavy

Backwards compatible — old models without origins default to "c" for all lookalikes.

Oppose Profiles (v5.2.0)

Snake now supports 7 oppose profiles — each a tuned literal generation strategy for a data archetype. The oppose() function is untouched; profiles are substitute functions that control which literal types get generated and at what probability.

# Auto (default) — Snake scans your data and picks the best profile
model = Snake(data, oppose_profile="auto")

# Explicit — you know your data
model = Snake(data, oppose_profile="cryptographic")
model = Snake(data, oppose_profile="linguistic")

Profile	Best for	Key literal types	Speed
`auto`	Any data — scans population, picks one	Depends on detection	—
`balanced`	Unknown data, mixed types	Equal weight across all 24 types	Medium
`linguistic`	NLP, free text, author attribution	LEV (edit distance), JAC (bigram similarity), PFX/SFX	Slower
`industrial`	Product codes, SKUs, short labels	T (substring), TN/TLN (structural), splits	Fast
`cryptographic`	Hashes, IDs, encoded data	ENT (entropy), HEX (hex ratio), CFC (char freq), REP (repeat)	Medium
`scientific`	Measurements, sensors, lab data	NZ (z-score), NL (log-scale), NMG (magnitude)	Fast
`categorical`	Surveys, tags, enums	TWS/TPS/TSS (splits) + T (substring) at 60% combined	Fast

24 literal types (was 7): the original 7 (T, TN, TLN, TWS, TPS, TSS, N) plus 17 new types across distance, positional, charclass, crypto, and scientific families. Each literal is still [index, value, negat, tag] — same format, same apply_literal.

Auto-detection scans text features for avg length, length variance, digit ratio, uppercase ratio, special char ratio, and delimiter density. Pure numeric data → scientific. Long varied text → linguistic. Short codes with digits → industrial. High special chars → cryptographic. Many delimiters → categorical. Mixed or unclear → balanced.

Profile benchmark (500 train / 500 test, 3 layers, bucket=50):

Dataset	Auto picks	Auto Acc	Best Profile	Best Acc
Cryptographic	balanced	100.0%	balanced	100.0%
Linguistic	linguistic	99.4%	industrial	100.0%
Scientific	scientific	99.6%	balanced	99.8%
Categorical	balanced	77.6%	cryptographic	81.2%
Industrial	balanced	100.0%	industrial	100.0%
Mixed	balanced	99.6%	balanced	99.6%

Spaceship Titanic benchmark (6954 train / 1739 test, 5 layers, workers=10, optimal threshold):

Profile	AUROC	Opt Accuracy	Train	Infer
industrial	0.8038	78.0%	771ms	1089ms
scientific	0.7987	78.0%	835ms	1184ms
original	0.7985	77.2%	944ms	1093ms
balanced	0.8093	75.9%	1654ms	1357ms

Backwards compatible — old models without oppose_profile default to the original oppose(). New literal types return False for unknown tags (graceful degradation).

Save & Load

# Save
model.to_json("model.json")

# Load (auto-detected by .json extension)
model = Snake("model.json")

JSON structure (v5.2.0):

{
  "version": "5.2.0",
  "population": [...],
  "header": ["target", "f1", ...],
  "target": "target",
  "targets": [...],
  "datatypes": ["T", "N", ...],
  "config": {"n_layers": 5, "bucket": 250, "noise": 0.25, "vocal": false, "workers": 1, "oppose_profile": "balanced"},
  "layers": [...],
  "log": "..."
}

Datatype codes: B = binary, I = integer, N = numeric, T = text, J = complex JSON (dict/list).

Backwards compatible — v0.1 flat JSON files (with clauses + lookalikes at top level) are automatically wrapped into the bucketed format on load.

Validation & Pruning

# Score each layer on validation data, keep the top N
model.make_validation(val_data, pruning_coef=0.5)

# pruning_coef=0.5 keeps the best 50% of layers
# Save the pruned model
model.to_json("model_pruned.json")

val_data is a list of dicts (same format as training data, must include the target field).

Data Type Detection

Snake auto-detects types for each column at training time.

Target column (checked in priority order):

Priority	Condition	Type Code	Storage
1	Any value is `dict` or `list`	`J` (Complex JSON)	Raw Python objects
2	Values are exactly `{"0", "1"}`	`B` (Binary)	`int`
3	Values are `{"True", "False"}`	`B` (Binary)	`int`
4	All chars in `0-9` only	`I` (Integer)	`int`
5	All chars in `+-.0123456789e`	`N` (Numeric)	`float`
6	Otherwise	`T` (Text)	`str`

Feature columns: Numeric (N) vs Text (T) only — no Binary/Integer distinction.

Watch out:

"3", "4", "5" → Integer (I), but "3.0", "4.0" → Numeric (N). Target type affects how predictions are compared.
floatconversion silently converts unparseable strings ("N/A", "") to 0.0.

Deduplication

Both CSV and list[dict] flows deduplicate by hashing all feature values (not the target). If two rows have identical features but different targets, the second is dropped with a log message. This is intentional — conflicting training data degrades SAT clause quality.

CLI

# Train
snake train data.csv --layers 5 --bucket 250 --noise 0.25 -o model.json --vocal

# Predict
snake predict model.json -q '{"petal_length": 4.3, "petal_width": 1.4}'
snake predict model.json -q '{"petal_length": 4.3}' --audit

# Model info
snake info model.json

Snake vs RF/GB

Same data, same split (80/20, seed=42), zero preprocessing. Snake uses 15 layers, bucket=250, workers=10. RF/GB use 100 estimators (sklearn defaults). No feature engineering on either side.

Pure numeric (float matrices):

Dataset	Features	Classes	Random Forest	GradBoost	Snake (best profile)	Profile
Iris	4	3	100.0%	100.0%	100.0%	all tie
Wine	13	3	100.0%	94.4%	100.0%	original
Breast Cancer	30	2	AUROC 0.997	AUROC 0.995	AUROC 0.999	original
Digits	64	10	97.2%	96.9%	96.4%	original

Mixed text + numeric (the Snake sweet spot):

Dataset	Features	Classes	Snake AUROC	Snake Acc	Best profile	vs original
Classic Titanic (w/ Names)	8	2	0.924	87.2%	cryptographic	+3.8pp
Spaceship Titanic	12	2	0.840	78.6%	balanced	+3.7pp

Snake beats RF and GB on Breast Cancer AUROC (0.999 vs 0.997 vs 0.995). Ties on Iris/Wine. Within 0.8pp on Digits. On mixed text+numeric data, profiles add up to +3.8pp AUROC over the original oppose — no preprocessing required.

Benchmarks

Accuracy on classic sklearn datasets + Kaggle Spaceship Titanic. 80/20 train/test split (seed=42), bucket=250, noise=0.25. Run python benchmarks.py to reproduce.

n_layers=15 (recommended)

Dataset	Type	Samples	Features	Classes	Test Acc	AUROC	Train Time
Iris	Multi	150	4	3	100.0%	—	0.02s
Wine	Multi	178	13	3	100.0%	—	0.02s
Breast Cancer	Binary	569	30	2	98.2%	0.999	0.05s
Digits	Multi	1797	64	10	96.4%	—	0.6s
Classic Titanic	Binary	891	8	2	87.2%	0.924	0.2s
Spaceship Titanic	Binary	8693	12	2	78.6%	0.840	1.7s

More layers improve accuracy at the cost of training time. Benchmark scripts require pandas and scikit-learn for data loading only — Snake itself has zero dependencies.

Architecture

Input X
   │
   ▼
┌─────────────────────────────────────────┐
│         Bucket Chain (IF/ELIF/ELSE)      │
│                                          │
│  IF  condition_0(X):  → Bucket 0         │
│  ELIF condition_1(X): → Bucket 1         │
│  ELIF condition_2(X): → Bucket 2         │
│  ELSE:                → Bucket N         │
│                                          │
│  Each condition = AND of SAT literals    │
│  Each bucket ≤ 250 samples               │
└─────────────┬───────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│         Local SAT (per bucket)           │
│                                          │
│  For each target class:                  │
│    Build minimal clauses separating      │
│    positive from negative samples        │
│                                          │
│  Discriminating literals (24 types):     │
│    T   — substring present/absent        │
│    TN  — string length threshold         │
│    TLN — alphabet size threshold         │
│    TWS/TPS/TSS — split counts            │
│    N   — numeric threshold               │
│    LEV/JAC — edit distance/bigrams       │
│    PFX/SFX — prefix/suffix length        │
│    ENT/HEX/REP/CFC — crypto features    │
│    NZ/NL/NMG — z-score/log/magnitude    │
└─────────────┬───────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│         Lookalike Voting                 │
│                                          │
│  Find training samples matching X        │
│  via SAT clause satisfaction             │
│                                          │
│  Vote by target labels → probability     │
│  Return max probability class            │
└─────────────────────────────────────────┘

Repeated across n_layers independent layers. Final prediction aggregates all lookalikes across all layers.

Complexity: O(n * log(n) * m * bucket²) where n = samples, m = features.

Performance & Scaling

Complexity:

Training: O(n_layers * n_buckets * m * bucket_size²) — dominated by SAT clause construction
Inference: O(n_layers * n_clauses_in_matched_bucket) — fast for small buckets
Memory: Entire population stored in memory; model JSON includes full training data

Scaling guidance:

Dataset Size	Recommendation
< 500 samples	`bucket=250` works fine, single bucket per layer
500–5,000	`bucket=250` creates 2–20 buckets per layer, good performance
5,000+	Training gets slow — consider `n_layers=3-5`, `bucket=500`
10,000+	Segment by a certain field, train per-segment models

Inference latency (from benchmarks):

Configuration	Latency
150 samples, 5 layers	~0.2ms
1,797 samples, 50 layers	~12ms
8,693 samples, 50 layers	~7.4ms

Common Patterns

Threshold-based routing

model = Snake("model.json")
prob = model.get_probability(X)
confidence = max(prob.values())
prediction = model.get_prediction(X)

if confidence >= 0.51:
    return prediction          # Snake is confident
else:
    return fuzzy_fallback(X)   # Fall back to another matcher

Batch prediction

model = Snake("model.json")
results = []
for row in batch:
    features = {k: v for k, v in row.items() if k != "target"}
    results.append({
        "prediction": model.get_prediction(features),
        "probability": model.get_probability(features),
    })

Audit for RAG / LLM consumption

audit = model.get_audit(X)
# Multi-line string with:
#   - Lookalike summary with examples per class
#   - Probability distribution
#   - Per-layer: Routing AND + Lookalike AND explanations
#   - Final prediction
# Feed to an LLM for explanation generation.

Oppose Type Formalism

Snake's boolean test language is fully specified in oppose_types.snake. Every literal is:

MEASURE(field) > threshold    (negat flips to <=)

The file defines:

§1 Measures — 20 primitive functions (identity, len, entropy, levenshtein, zscore, ...)
§2 Literals — 30 boolean test types with oppose rules, eval rules, and format templates
§3 Profiles — weight vectors over literal types

New literal types are added as cartridges: define the measure, the oppose rule (how to compute threshold from T and F), and the eval rule (how to test a new field). The .snake format is human-readable and serves as the single source of truth for Snake's discriminator library.

Gotchas

Target type must match at prediction time. If training targets were int (e.g., 0, 1), predictions return int. If str (e.g., "cat"), predictions return str. Mixing types silently fails to match.
Feature types are fixed at training time. A feature detected as Numeric (N) does numeric comparison. Passing a string at prediction time for that feature causes a TypeError.
CSV must be pandas-formatted. The CSV parser handles quoted fields with commas but expects pandas.DataFrame.to_csv() format. Non-pandas CSVs may parse incorrectly.
No incremental training. To add data, rebuild from scratch. make_validation only prunes layers.
Model JSON contains the full training population. A model trained on 5,000 rows produces a large JSON file. Be mindful of disk/memory.
excluded_features_index only works with CSV flow. For list[dict] and DataFrame, filter your data before passing it in.
Binary True/False targets become int 0/1. Compare predictions against 0/1 (int), not "True"/"False" (str).

Error Handling

Snake has minimal error handling by design:

Situation	What Happens	How to Avoid
Empty list	`ValueError`	Check before constructing
Empty DataFrame	`ValueError`	Check `len(df) > 0`
Wrong file extension	Crashes	Use `.csv` or `.json` extensions
File not found	`FileNotFoundError`	Check path exists
Malformed JSON	`json.JSONDecodeError`	Validate JSON before loading
`target_index` not found	`ValueError` / `IndexError`	Check column name/index exists
Only 1 unique target	Trains but trivially predicts that class	Need at least 2 classes
Prediction with `{}`	Uniform probability	Pass at least some features
Unknown keys in prediction	Ignored silently	Use same key names as training

No exceptions during prediction. get_prediction, get_probability, get_lookalikes, get_audit all handle edge cases gracefully (worst case: uniform probability, empty lookalikes).

Stochastic Behavior

Snake training is non-deterministic. The oppose() function uses random.choice extensively. Two runs on the same data produce different models with different clause sets. This is by design — the power comes from stochastic clause generation + deterministic selection.

There is no random.seed() call in the Snake code. If you need reproducibility:

import random
random.seed(42)
model = Snake(data, n_layers=5)

Test assertions are probabilistic (e.g., "at least 50% of training data has >50% confidence") rather than exact value checks.

Logging

Snake uses Python's logging module. Each instance gets its own logger (snake.<id>).

Buffer handler — always attached at DEBUG level, captures everything to self.log. This is how to_json() persists the training log.
Console handler — attached only when vocal:
- vocal=True: INFO level (training progress)
- vocal=2: DEBUG level (per-target SAT progress)
- vocal=False: no console output

The banner only prints when vocal=True.

Optional: Cython Acceleration

Snake includes optional Cython-accelerated hot paths for apply_literal, apply_clause, and traverse_chain. When compiled, these provide significant speedups for both training and inference.

# Install with Cython support
pip install -e ".[fast]"
python setup.py build_ext --inplace

Without Cython, Snake runs in pure Python with identical behavior. The Cython extension is auto-detected at import time.

Testing

pytest                                # all 236 tests
pytest tests/test_snake.py            # input modes, save/load, augmented, vocal, dedup, parallel training
pytest tests/test_buckets.py          # bucket chain, noise, routing, audit, dedup
pytest tests/test_core_algorithm.py   # oppose, construct_clause, construct_sat
pytest tests/test_validation.py       # make_validation / pruning
pytest tests/test_edge_cases.py       # errors, type detection, extreme params
pytest tests/test_cli.py              # CLI train/predict/info via subprocess
pytest tests/test_logging.py          # logging buffer, JSON persistence, banner
pytest tests/test_audit.py            # Routing AND, Lookalike AND, audit end-to-end
pytest tests/test_stress.py           # stress tests, batch equivalence
pytest tests/test_ultimate_stress.py  # extended stress tests
pytest tests/test_oppose_profiles.py  # oppose profiles, new literal types, auto-detection, JSON roundtrip

236 tests across 11 files. Tests use tests/fixtures/sample.csv (15 rows, 3 classes) with small n_layers (1–3) and bucket (3–5) for speed.

Changelog

v5.2.1 (Mar 2026)

O(n) string distance: Levenshtein uses exact DP for strings ≤32 chars, O(n) char-frequency distance for longer strings. No truncation — signal preserved, compute bounded
Dataset-specific profiles: 8 .snake profile files in profiles/ with empirical annotations (PIMA, Breast Cancer, Titanic, Spaceship, Wine Quality, Mushroom, Digits, Adult Income)
Snake vs RF/GB benchmark: Snake beats Random Forest on Breast Cancer AUROC (0.999 vs 0.997). Ties on Iris/Wine. Within 0.8pp on Digits. Same data, zero preprocessing
Classic Titanic with Names: cryptographic profile achieves 0.924 AUROC / 87.2% accuracy (+3.8pp over original)
Meta classifier removed from codebase
Linguistic profile deprecated: never won where expected, blocks training on long text. Pure NLP is out of Snake's scope
Cython bool-safe: str(field) casts in _accel.pyx handle bool/None values without TypeError

v5.2.0 (Mar 2026)

7 oppose profiles: auto, balanced, linguistic, industrial, cryptographic, scientific, categorical. Each profile is a tuned literal generation strategy — weighted random draws across 6 text families + 5 numeric families. oppose() itself is untouched
23 new literal types (30 total): distance (LEV, JAC), positional (PFX, SFX), charclass (TUC, TDC, TSC), crypto (ENT, HEX, REP, CFC), numeric extended (ND, NZ, NL, NMG), exact match (TEQ), affix (TSW, TEW), zero test (NZR), range (NRG), vowel ratio (TVR)
FA/TA single-char matching: _gen_text_substring now includes character-level discrimination (chars unique to T or F), matching original oppose()'s most powerful pattern
Oppose type formalism: oppose_types.snake — complete specification of all 30 literal types + 7 profiles in a human-readable DSL. Defines measures, oppose rules, eval rules, and format templates
Auto-detection: Scans population text features (avg length, variance, digit ratio, special ratio, delimiter density). Pure numeric → scientific, long varied text → linguistic, short codes → industrial
Spaceship Titanic: industrial profile achieves 78.0% optimal accuracy (vs original 77.2%), 0.8038 AUROC. Balanced achieves best AUROC (0.8093). Breast Cancer: scientific hits 98.2% / 0.9987 AUROC (vs original 96.5%)
Meta classifier removed — was experimental, unused in production
Cython support: All 30 literal types in _accel.pyx with C-level helpers. Bool-safe str(field) casts
236 tests across 11 files (62 new profile tests, 20 Meta tests removed)
Benchmark scripts: benchmark_profiles.py (6 synthetic archetypes), profile comparison on sklearn + Spaceship Titanic

v5.0.0 (Feb 2026)

Lookalike origin labeling: Every lookalike now carries "c" (core) or "n" (noise) origin. New get_lookalikes_labeled(X) method returns [index, class, condition, origin] per match. Enables weighted probability with (w_c, w_n) tuning
Full-population noise: Noise sourced from the entire population minus core (was: remaining minus core). Deep-chain buckets now access global diversity
Origins in JSON: Each bucket stores "origins" parallel to "members". Backwards compatible — old models default to all-core
Regime discovery: Core vs noise signal quality depends on n_layers. Low layers = trust core. High layers = blend both

v4.4.4 (Feb 2026)

194 tests across 11 files

v4.4.2 (Feb 2026)

Perfected audit system: Two clean AND statements per layer — Routing AND (explains bucket routing) and Lookalike AND (per-sample clause negation). Replaces the stub from v4.3.x
Parallel training: workers=N enables multiprocessing for layer construction. Each worker gets a unique RNG seed
Progress tracking: progress_file parameter writes JSON progress updates during training
Cython batch acceleration: batch_get_lookalikes_fast amortizes routing by grouping queries per bucket per layer
174 tests: Extended from 151 across 10 files (added audit tests, parallel training, batch equivalence)

v4.3.3 (Feb 2026)

Cython infinite loop fix: Fixed oppose() infinite loop in Cython hot paths
oppose() canaries: Added safety canaries to detect stuck loops
151 tests: Extended test suite from 92 to 151 tests across 9 files (added stress + ultimate stress tests)

v4.3.2 (Feb 2026)

Logging migration: Replaced print() + string accumulation with Python logging. Per-instance logger, buffer handler always captures, StreamHandler to stdout only when vocal
Extensive test suite: 92 tests across 7 files (was ~41). New: core algorithm, validation, edge cases, CLI, logging
Cython training acceleration: 4 new functions in _accel.pyx for construct_clause, build_condition, _construct_local_sat
Bug fix — Binary True/False targets: floatconversion("True") returned 0.0, collapsing all True/False targets to 0. Fixed in both flows
Spaceship Titanic benchmark: 78.4% test accuracy (Kaggle top ~80–81%)
__main__.py: Enables python -m algorithmeai as CLI entry point

License

Proprietary. Source code is available for viewing and reference only.

See LICENSE for details. For licensing inquiries: contact@monce.ai

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
algorithmeai		algorithmeai
profiles		profiles
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
benchmark_profiles.py		benchmark_profiles.py
benchmark_v521.py		benchmark_v521.py
benchmarks.py		benchmarks.py
oppose_types.snake		oppose_types.snake
pyproject.toml		pyproject.toml
run_vip.py		run_vip.py
setup.py		setup.py
titanic_dual_prob.py		titanic_dual_prob.py

Folders and files

Latest commit

History

Repository files navigation

Snake

What Is Snake

Install

Quick Start

Input Formats

Constructor Reference

Prediction API

Lookalike Origins — Core vs Noise (v5.0.0)

Oppose Profiles (v5.2.0)

Save & Load

Validation & Pruning

Data Type Detection

Deduplication

CLI

Snake vs RF/GB

Benchmarks

Architecture

Performance & Scaling

Common Patterns

Threshold-based routing

Batch prediction

Audit for RAG / LLM consumption

Oppose Type Formalism

Gotchas

Error Handling

Stochastic Behavior

Logging

Optional: Cython Acceleration

Testing

Changelog

v5.2.1 (Mar 2026)

v5.2.0 (Mar 2026)

v5.0.0 (Feb 2026)

v4.4.4 (Feb 2026)

v4.4.2 (Feb 2026)

v4.3.3 (Feb 2026)

v4.3.2 (Feb 2026)

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages