Add VA-Spec Annotation Dumps to Public Data Export

## Summary

The public data export (`export_public_data.py`) currently produces CSV files for scores, counts, and VEP/gnomAD/ClinGen annotations. It does not include VA-Spec annotations. This issue tracks adding a per-score-set VA-Spec dump to the ZIP archive, where each variant is annotated at the **highest level it individually supports**.

## Problem

The existing streaming endpoints (`GET /score-sets/{urn}/annotations/pathogenicity` and `GET /score-sets/{urn}/annotations/functional`) allow per-score-set access to VA-Spec annotations as NDJSON, but this data is absent from the bulk data export used for distribution and archival.

Additionally, emitting all three annotation levels (StudyResult, FunctionalStatement, PathogenicityStatement) per variant would produce large amounts of redundant data: the higher-level objects already embed the lower-level evidence structures. The dump should instead emit only the highest level each variant can support.

## Proposed Behavior

For each published, CC0 score set included in the data export ZIP:

1. **For each variant**, determine the highest annotation level it individually supports:
   - **PathogenicityStatement** — variant has a non-null score, has been successfully mapped to VRS coordinates, and the score set has calibrations with `acmg_classification` defined (checked via `can_annotate_variant_for_pathogenicity_evidence()`)
   - **FunctionalStatement** — variant has a non-null score, has been mapped, and the score set has calibrations with `functional_classifications` defined (checked via `can_annotate_variant_for_functional_statement()`)
   - **StudyResult** — variant has been successfully mapped but does not meet the score/calibration requirements above
   - **`null`** — variant is unmapped

2. **Emit one annotation per variant** at its highest supported level as NDJSON (one JSON object per line), with `null` for variants that cannot be annotated.

3. **Add the file to the ZIP** alongside the existing CSV files using the naming convention `[URN].va-spec.ndjson`. Emit the file for all score sets that have at least one mapped variant.

4. **Update the manifest** (`main.json`) to include a `max_va_spec_annotation_level` field on each score set entry (`"study_result"`, `"functional_statement"`, `"pathogenicity_statement"`, or `null`), reflecting the ceiling determined by the score set's calibration configuration. Consumers can use this field to know the best-case level before downloading.

## Acceptance Criteria

- [ ] The export ZIP contains a `[URN].va-spec.ndjson` file for every published score set that has at least one mapped variant.
- [ ] Each line is the highest-level VA-Spec annotation that variant individually supports, or `null` if it cannot be annotated (unmapped).
- [ ] Within a score set, variants with null scores that are otherwise mapped produce a `StudyResult` rather than a higher-level statement, even if other variants in the same score set produce `PathogenicityStatement` or `FunctionalStatement` objects.
- [ ] Within a score set, variants with non-null scores but no successful VRS mapping produce `null`.
- [ ] Score sets with no mapped variants at all are skipped (no `.va-spec.ndjson` file; `max_va_spec_annotation_level` is `null` in `main.json`).
- [ ] Each line of the NDJSON file is either a valid serialized VA-Spec object or `null` (matching the pattern already used by `_stream_generated_annotations()`).
- [ ] The script runs to completion without errors on the full production dataset.

## Implementation Notes

**Per-variant level determination**

The eligibility logic already exists in `src/mavedb/lib/annotation/util.py` (`can_annotate_variant_for_functional_statement()`, `can_annotate_variant_for_pathogenicity_evidence()`). These functions check both per-variant conditions (non-null score, successful mapping) and score-set-level conditions (calibration configuration). They already operate at the variant level, so no new abstraction is needed — the existing checks drive the branching logic per variant.

The score set's calibration configuration is the *ceiling*: if a score set has no `acmg_classification` calibrations, no variant in it can ever produce a `PathogenicityStatement`. But variants can still fall below that ceiling individually due to null scores or mapping failures.

**Serialization**

The streaming endpoints in `src/mavedb/routers/score_sets.py` (`_stream_generated_annotations()`) already implement the per-variant NDJSON loop using `variant_study_result()`, `variant_functional_impact_statement()`, and `variant_pathogenicity_statement()` from `src/mavedb/lib/annotation/annotate.py`. However, note that the streaming endpoints emit a fixed level for all variants in a request (either all pathogenicity or all functional). The dump logic will need to branch per-variant to select the highest supported level, rather than applying a single function uniformly.

**File placement and naming**

Current ZIP structure:
```
export.zip
├── main.json
├── LICENSE.txt
└── csv/
├── {urn}.scores.csv
├── {urn}.counts.csv
└── {urn}.annotations.csv ← only when mapped
```

Proposed addition:
```
export.zip
├── main.json
├── LICENSE.txt
└── csv/
├── {urn}.scores.csv
├── {urn}.counts.csv
├── {urn}.annotations.csv
└── {urn}.va-spec.ndjson ← only when mapped
```


If the directory name `csv/` no longer fits, it could be renamed `data/` as part of this change, but this is a breaking change to the archive structure and should be decided explicitly.

**Memory and streaming**

The existing export script streams CSVs row-by-row to avoid loading entire score sets into memory. The same approach should be applied to the VA-Spec files — write each NDJSON line as it is generated rather than collecting all annotations in memory first.

**`main.json` schema change**

Add a `max_va_spec_annotation_level` field to the per-score-set entry in `main.json`. Valid values: `"study_result"`, `"functional_statement"`, `"pathogenicity_statement"`, `null`. This reflects the score-set ceiling (calibration-determined), not the level of any individual variant. Consumers should expect that some variants in the file may produce lower-level annotations or `null` even for score sets with a high ceiling value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VA-Spec Annotation Dumps to Public Data Export #762

Summary

Problem

Proposed Behavior

Acceptance Criteria

Implementation Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add VA-Spec Annotation Dumps to Public Data Export #762

Description

Summary

Problem

Proposed Behavior

Acceptance Criteria

Implementation Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions