search_by_prefix: implement relevance scoring function

## Context

Identified during review of #16 (case-insensitive prefix search). The current `search_by_prefix()` sorts results by `(is_alt_label, len(matched_key))`, which is a significant improvement over the prior `len`-only sort (see #16 discussion), but still produces suboptimal ranking for common queries.

## Problem

The sort heuristic doesn't account for:
- **Branch popularity** — Areas of Law and Jurisdictions are far more commonly queried than reporter codes or geographic subdivisions, but rank equally
- **Exact-prefix vs interior-prefix** — a label that *is* the prefix (exact match) should outrank one that merely starts with it
- **Ontology depth** — top-level concepts (California, Tax Law) are more likely targets than deep leaves (California Superior Court - Kern Cty.)

### Examples (after #16 merges)

```
search_by_prefix("Cal"):
  0. Caldas          (Colombian department)
  6. California      (U.S. state — most users want this)

search_by_prefix("Tax"):
  0. Tax Law         ✓ (correct — primary label, short)
  3. tax_type        (property name, unlikely search target)
```

## Proposed approach

Implement a scoring function that considers multiple signals:

```
score = w1 * is_primary_label
      + w2 * (1 / label_length)
      + w3 * branch_boost(class)
      + w4 * exact_prefix_bonus
      + w5 * (1 / ontology_depth)
```

Sort by score descending instead of the current tuple sort.

### Branch boost / penalty model

Rather than boosting every branch, apply a **penalty to low-utility branches** — branches that rarely represent what a user is actually searching for. All other branches receive the default (boosted) treatment.

**Penalized branches** (less commonly the search target):
- Language
- Location
- Standards Compatibility
- System Identifiers
- Matter Narrative
- Currency
- Data Format

**Default (boosted) branches** (everything else — the ones users typically want):
- Area of Law
- Jurisdiction / Forum Venue
- Legal Authority
- Legal Entity
- Actor / Player
- Service
- Document / Artifact
- Industry
- Event
- Engagement Terms
- Objective
- Asset Type
- Communication Modality
- Governmental Body
- Matter Narrative Format
- *(and any other branches not in the penalty list)*

This "penalty" framing is simpler to maintain — new branches get the boost by default, and only demonstrably low-utility branches are explicitly penalized. Implementation could be as simple as a `Set` of penalized `FOLIOTypes` values checked during scoring.

## Scope

- New scoring function in `folio/graph.py`
- Applied to both `_search_by_prefix_sensitive` and `_search_by_prefix_insensitive`
- Backward compatible — no API changes
- Tests for ranking quality on known queries (Cal, Mich, Tax, etc.)

## References

- PR #16 review discussion: https://github.com/alea-institute/folio-python/pull/16#issuecomment-4204345398
- @mjbommar's Option 3 proposal in that comment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

search_by_prefix: implement relevance scoring function #17

Context

Problem

Examples (after #16 merges)

Proposed approach

Branch boost / penalty model

Scope

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

search_by_prefix: implement relevance scoring function #17

Description

Context

Problem

Examples (after #16 merges)

Proposed approach

Branch boost / penalty model

Scope

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions