Wire Protocol

Slaze uses a zero-JSON wire protocol for all data paths. Votes are packed into URL path segments. Batch reads use binary application/octet-stream. Single ratings return 204 No Content with the verdict in response headers. Only bootstrapping endpoints (token creation, Clerk linking) use JSON.

This document explains the full protocol grammar, the rationale for every design choice, and the exact byte-level format.

Why Zero JSON

The bandwidth argument

A typical JSON vote request would be:

{
  "platform": "reddit.com",
  "postId": "t3_abc123def456",
  "categories": [0, 2, 6],
  "context": "detail",
  "upvoteBucket": 3,
  "ageBucket": 8,
  "dwellMs": 12000
}

That's ~200 bytes. With HTTP headers, ~500 bytes total.

The compact URL-path version:

POST /v1/ratings/reddit.com/t3_abc123def456/vote/v026p1u3t8d12000

That's 18 bytes of payload. 10× smaller.

For a browser extension that fires on every Reddit page view (25+ posts), this matters. A single page view with 25 votes would be 25 × 500 = 12.5 KB with JSON. With compact encoding: 25 × 200 = 5 KB. Over millions of daily users, this is significant bandwidth savings.

The latency argument

JSON parsing in JavaScript (JSON.parse()) is fast but not free — ~0.1ms per parse. More importantly, it allocates objects that must be garbage collected. The compact protocol uses string slicing and integer conversion, which are register-level operations.

On the server side, Go's encoding/json allocates for every field. The compact parser (ParseVotePayload) allocates zero memory (all stack-allocated integers).

The simplicity argument

With no request body, there's nothing to:

Validate content-type
Read and buffer
Deserialize
Handle partial parsing errors on

The vote data is in the URL. If the URL is malformed, the request never reaches the handler (404 from the router). If the payload is malformed, ParseVotePayload returns a clear error. No partial states, no "body was valid JSON but missing required fields."

Why not Protocol Buffers / FlatBuffers / Cap'n Proto?

These are excellent for service-to-service RPC. For a browser extension calling an HTTP API:

Protobuf requires a .proto schema, a code generator, and a runtime library (~50 KB). The extension's bundle budget is tight.
The compact URL encoding is self-describing (human-readable for debugging).
The binary batch protocol achieves Protobuf-level density with zero schema tooling.
HTTP caching (ETag, Cache-Control, CDN) works transparently — Protobuf responses would need custom CDN configuration.

Compact Vote Payload

Grammar

payload    = "v" categories "p" context "u" upvote "t" age [dwell]
categories = category { category }        ; 1–3 unique digits, sorted ascending
category   = "0".."8"                     ; category index (0–8)
context    = "0" | "1"                    ; 0=feed, 1=detail-page
upvote     = "0".."9"                     ; upvote count bucket
age        = "0".."9"                     ; post age bucket
dwell      = "d" digit { digit }          ; 1–7 digit milliseconds

Validation Rules

The parser (ParseVotePayload in models.go:121) enforces:

Rule	Check	Error
Length	8–18 bytes	"invalid payload length"
Start marker	Must begin with `v`	"payload must start with v"
Category count	1–3 digits in [0,8]	"at most 3 categories are allowed"
Category order	Strictly increasing	"categories must be unique and sorted"
Markers	`p`, `u`, `t` in order	"payload markers must be p/u/t"
Context	`p0` or `p1`	"context must be p0 or p1"
Buckets	`u0`–`u9`, `t0`–`t9`	"u bucket must be 0-9"
Dwell (optional)	`d` + 1–7 numeric digits	"dwell must be 1-7 digits"

Why sorted categories?

The parser enforces that categories are sorted ascending (v026 is valid, v620 is not). This:

Reduces payload variants: Without sorting, v026, v062, v206, v260, v602, v620 all mean the same vote (categories 0, 2, 6). With sorting, only v026 is valid. This makes testing, logging, and debugging predictable.
Enables string equality deduplication: The extension can compare payload strings directly to detect re-votes without parsing.
One canonical form: Every vote for (0, 2, 6) is represented identically. The database stores a bitmask (derived from the digits), which is order-independent — but the canonical URL form is useful for client-side caching and logging.

Why 1–3 categories (not 1–9)?

Allowing all 9 categories would make votes meaningless ("this post is everything"). Limiting to 3 forces the user to identify the MOST relevant categories. It also keeps the payload compact (max 3 digits in the category list).

The "max 3" limit is enforced at both the payload parser AND the database CHECK constraint (category_mask BETWEEN 1 AND 511 — 511 = 9 bits, but popCount of the mask is checked at the parser level).

Dwell time encoding

Dwell is optional and variable-length (1–7 digits, representing 0–9,999,999 ms). Most votes won't have extreme dwell values. The variable-length encoding means a typical 4-digit dwell (e.g., d5000 = 5 seconds) adds 5 bytes to the payload. A 7-digit dwell (e.g., d9999999 = ~2.78 hours — unlikely but valid) adds 8 bytes.

The dwell floor (2500ms) and the fact that values above ~30 seconds are capped in practice means the typical dwell encoding is 3–5 digits.

Feature flags

VoteFeatureFlags in models.go:82 control which signals are persisted:

Flag	Default	Controls
`SLAZE_ENABLE_CONTEXT_SIGNAL`	`true`	Whether `context_code` is stored
`SLAZE_ENABLE_U_BUCKET_SIGNAL`	`true`	Whether `u_bucket` is stored
`SLAZE_ENABLE_T_BUCKET_SIGNAL`	`true`	Whether `t_bucket` is stored

When a flag is false, the corresponding field is zeroed in ApplyFeatureFlags() before persistence. This allows A/B testing signal importance without changing the extension.

ETag Verdict Format

ETag: "c<category>p<percent-3digits>"

Example	Meaning
`"c0p091"`	Dominant category 0 (genuine), 91%
`"c3p067"`	Dominant category 3 (ad-promo), 67%

Design decisions

Why pack into ETag? ETag is a standard HTTP caching header. CDNs, browsers, and proxies all understand it. If-None-Match → 304 Not Modified is built into every HTTP client. By packing the verdict into the ETag, we get:

Free conditional requests (no custom logic needed)
Free CDN caching (Cloudflare caches by ETag automatically)
Compact response (6 bytes of ETag vs ~200 bytes of JSON)

Why not include the full verdict in ETag? The ETag is 6 bytes. The full verdict (signature state, 3 categories, label phrase, subtext, weighted votes) is ~100 bytes. The ETag carries the minimum needed for conditional requests: "has the dominant verdict changed?" The full verdict is in response headers (X-Slaze-*) for clients that need it, or in the JSON body (?full=1) for the website.

Why category + percent, not a hash? A hash (e.g., ETag: "a1b2c3") would be opaque — the extension would need to store and compare hashes without knowing what changed. The packed format tells the extension (and debugging tools) exactly what the current dominant category and percent are, without parsing. The extension can display the badge directly from the ETag.

304 Not Modified flow

Client: GET /v1/ratings/reddit.com/t3_abc
        If-None-Match: "c3p067"

Server: rating.DominantCategory == 3 && rating.DominantPercent == 67
        → 304 Not Modified (zero body, zero processing)

This saves:

Server CPU (no rating serialization)
Bandwidth (0 bytes body)
Client CPU (no parsing)

For the extension, which polls ratings for the same posts as the user scrolls, 304 responses are the common case. A post's verdict changes only when someone votes on it — rare for old posts.

Verdict Response Headers

Header	Type	Example	Description
`X-Slaze-State`	int (0–3)	`1`	0=SPARSE, 1=CLEAR, 2=SPLIT, 3=DISPUTED
`X-Slaze-Sig`	3 ints	`3,-1,-1`	Signature categories C1,C2,C3 (-1=absent)
`X-Slaze-Label`	string	`Stealth Sell`	Human-readable verdict (ASCII-safe)
`X-Slaze-Subtext`	string	`Looks organic, smells commercial.`	One-line tooltip
`X-Slaze-WVotes`	int	`38`	Effective weighted vote count
`X-Slaze-Engine`	int	`1`	Verdict engine version

ASCII sanitization

HTTP headers are interpreted as ISO-8859-1 by most clients (Chromium's Fetch API decodes header bytes as Latin-1). UTF-8 punctuation like em-dashes and curly quotes would render as mojibake (â€" etc.).

asciiSafeHeader in handlers.go:463 transliterates:

Em-dash (—) → --
Curly single quotes (' ') → '
Curly double quotes (" ") → "
Ellipsis (…) → ...
Non-breaking space → regular space
Non-printable characters → dropped

The JSON body (?full=1) keeps the original UTF-8. Only the header fast path is sanitized.

Why custom headers instead of a response body?

Headers are available before the body is downloaded. The extension can start rendering the badge as soon as headers arrive, without waiting for the body. For a 204 No Content response, there IS no body — the headers are the entire response.

Binary Batch Protocol (`POST /v1/b`)

The binary batch protocol is the most bandwidth-efficient way to fetch multiple ratings. It's used by the extension's content script to fetch ratings for all visible posts on a page in a single request.

Request (`application/octet-stream`)

[N: 1 byte]         Number of posts (1–50)
repeat N:
  [platform: 1]     Wire byte → platform host + prefix
  [idLen: 1]        Length of stripped post ID (1–500)
  [id: idLen]       Raw UTF-8 post ID bytes

Total request size: 1 + N × (2 + avg_id_length).

For 30 Reddit posts with average ID length of 8 bytes: 1 + 30 × 10 = 301 bytes.

Platform wire bytes

Byte	Host	Prefix	Example	Notes
0	`reddit.com`	`t1_`	Comment IDs	`t1_abc123`
1	`reddit.com`	`t3_`	Link/post IDs	`t3_abc123`
2	`x.com`	—	Tweet IDs	Numeric string
3	`twitter.com`	—	Tweet IDs	Legacy domain

Why strip the prefix? Reddit post IDs are stored as t3_abc123 in the database. The t3_ prefix is constant — transmitting it for every post in a batch wastes 3 bytes per post. The extension strips it before encoding; the server adds it back from the platform table. For 30 posts, that's 90 bytes saved.

Why 1-byte platform codes instead of strings? A platform string like "reddit.com" is 10 bytes. 30 posts × 10 bytes = 300 bytes just for platform names. The 1-byte code reduces this to 1 byte per post.

Response v2 (`application/octet-stream`)

[version: 1 byte] = 2
N records of 15 bytes each:
  [dominant_category: 1]    0xFF = not found
  [dominant_percent: 1]     0–100
  [c0..c6: 7 bytes]         Category percents (0–100 each)
  [signature_state: 1]      0=SPARSE, 1=CLEAR, 2=SPLIT, 3=DISPUTED
  [sig_c1: 1]               -1→0xFF (absent)
  [sig_c2: 1]               -1→0xFF
  [sig_c3: 1]               -1→0xFF

Total response size: 1 + N × 15.

For 30 posts: 1 + 30 × 15 = 451 bytes.

Record size formula

recordSize = 2 + CategoryCount + 4
           = 2 + 7 + 4    (v2: only 7 active display categories)
           = 13 bytes per record  (v1, without verdict sig)
           = 15 bytes per record  (v2, with verdict sig)

Wait — why 13/15 bytes but the response says 7 category bytes? Let me re-derive:

v1 record (11 bytes):
  dominant_category: 1
  dominant_percent:  1
  c0..c8:            9  ← all 9 engine categories
  Total: 11

v2 record (15 bytes):
  dominant_category: 1
  dominant_percent:  1
  c0..c8:            9
  signature_state:   1
  sig_c1, c2, c3:    3
  Total: 15

The v2 format includes the full verdict signature, allowing the extension to look up the human-readable label from its bundled catalog without a second server round-trip.

0xFF sentinel

Categories are 0–8. 0xFF (255) is used as the "not found" sentinel for both dominant_category (post has no rating) and sig_c1/c2/c3 (absent signature slots).

In Go, this is uint8(int8(-1)) = 255 = 0xFF. In JavaScript, (sigC1 === -1 ? 0xFF : sigC1) or sigC1 & 0xFF at the receiving end.

Why binary and not JSON?

The JSON batch endpoint (POST /v1/ratings/batch) returns:

{"reddit.com/t3_abc": "c3p067", "reddit.com/t3_xyz": "c0p091"}

That's ~50 bytes per post (key string + value string + JSON syntax). The binary format is 15 bytes per post. 3.3× smaller. For 30 posts: 451 bytes (binary) vs ~1500 bytes (JSON).

For the extension, which makes this request on every page view, the savings compound. With 100 page views per user per day and 100K users, that's 100K × 100 × 1 KB = 10 GB/day of bandwidth saved.

JSON Batch (Legacy)

POST /v1/ratings/batch is a JSON alternative to the binary batch. It accepts {"ids": ["reddit.com/t3_abc", "x.com/123456"]} and returns {"reddit.com/t3_abc": "c3p067"}.

This endpoint exists for:

Debugging: Curl-friendly, human-readable
Third-party clients: Easier to integrate than binary protocol
Website: The "Check a post" feature uses this endpoint

Validation

Max 50 IDs per batch
Malformed IDs (no / separator, empty parts, unsupported platform) → silently skipped
Duplicates → silently deduplicated
Unrated posts → omitted from response (not null)
All skipped/deduplicated → returns {} (not 404)

Why silent skip instead of error?

Batch requests come from the extension scanning a Reddit page. Some elements might not be actual posts (ads, promoted content, deleted posts). The extension sends their IDs optimistically. Returning errors for each invalid ID would be noisy and unhelpful. Silent skip means the extension gets ratings for the valid posts and ignores the rest.

Quota Headers

Every API response includes quota headers. The extension tracks usage from these — zero extra API calls.

Header	Example	Description
`X-Quota-Tier`	`email`	Token tier
`X-Quota-Plan`	`free`	Active plan name
`X-Quota-Limit`	`20`	Limit for the current operation type
`X-Quota-Used`	`8`	Current counter after increment
`X-Quota-Remaining`	`12`	`limit - used` (0 on 429)

When each header is set

Response	Headers Set
Read success (200/204)	All 5 headers
Vote success (204)	All 5 headers
Quota exceeded (429)	All 5 headers + `Retry-After` + `X-Slaze-Error`
Auth endpoints	`X-Quota-Tier`, `X-Quota-Plan`, `X-Quota-Limit`
Unauthenticated	None (no token context)

Why headers instead of a dedicated /quota endpoint?

A dedicated endpoint would add one API call per page view just to check quota. With headers on existing responses, the extension gets quota updates for free. The extension's updatePlanFromHeaders() parses headers from every response — whether it's a rating fetch, a vote, or a batch request. The popup's quota display is always current from the last API call.

Error Responses

All errors return {"error": "<message>"} with the appropriate HTTP status. The error message is also echoed in X-Slaze-Error header for clients that don't parse JSON bodies (like the extension on 429 responses, which are zero-byte).

Status	Meaning	When
400	Validation failure	Bad payload, unknown platform, invalid post ID
401	Auth failure	Missing/invalid token, invalid HMAC signature, expired Clerk JWT
402	Payment required	Anonymous token attempted to vote without Clerk identity
404	Not found	Post has no ratings
409	Conflict	Clerk user already linked to another device token
415	Unsupported media type	Binary batch with wrong Content-Type
429	Quota/rate exceeded	Daily/monthly/hourly limit hit, or rate limit hit
500	Internal error	Unexpected server failure (DB down, panic recovered, etc.)

Platform Encoding

Canonical platform names

Host	Canonical Name	Notes
`reddit.com`	`reddit.com`	Both old and new Reddit
`x.com`	`x.com`	Twitter rebranded
`twitter.com`	`twitter.com`	Legacy domain, still in use

x.com and twitter.com are treated as separate platforms in the database. A post viewed via x.com will have a different rating than the same post viewed via twitter.com. This is intentional: the platform is part of the composite key, and the two domains may have different user bases.

Protocol Versioning

The binary batch response has a version byte:

Version	Record Size	Includes
1	11 bytes	dominant + percents (no verdict sig)
2	15 bytes	dominant + percents + verdict signature

The client (extension) must check the version byte and parse accordingly. Version 2 clients can parse version 1 responses (by ignoring the extra bytes), but version 1 clients cannot parse version 2 responses (they'd misalign on record boundaries). The extension always sends binary batch requests; the server always responds with v2.

When to bump the version

Adding fields to the record → bump version
Changing field semantics → bump version
Removing fields → bump version (clients must explicitly handle)
Changing the record size → bump version

Wire Protocol

Table of Contents

Why Zero JSON

The bandwidth argument

The latency argument

The simplicity argument

Why not Protocol Buffers / FlatBuffers / Cap'n Proto?

Compact Vote Payload

Grammar

Validation Rules

Why sorted categories?

Why 1–3 categories (not 1–9)?

Dwell time encoding

Feature flags

ETag Verdict Format

Design decisions

304 Not Modified flow

Verdict Response Headers

ASCII sanitization

Why custom headers instead of a response body?

Binary Batch Protocol (POST /v1/b)

Request (application/octet-stream)

Platform wire bytes

Response v2 (application/octet-stream)

Record size formula

0xFF sentinel

Why binary and not JSON?

JSON Batch (Legacy)

Validation

Why silent skip instead of error?

Quota Headers

When each header is set

Why headers instead of a dedicated /quota endpoint?

Error Responses

Platform Encoding

Canonical platform names

Protocol Versioning

When to bump the version

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Binary Batch Protocol (`POST /v1/b`)

Request (`application/octet-stream`)

Response v2 (`application/octet-stream`)