Align ParseResult ZIP models with current worker output

## Goal

Align the Python SDK parse-result ZIP parser and public result models with the current Knowhere worker output contract.

This issue is limited to parsing completed job result ZIPs. It should not reimplement features that already exist on `origin/main`.

## Current SDK Baseline

The Python SDK already has the following on `origin/main`:

- `Knowhere.parse(...)` and `AsyncKnowhere.parse(...)` high-level helpers.
- `ParsingParams`.
- `jobs.create(...)`, file upload, polling, and result loading.
- Retrieval query options including channels, rerank, filters, signal paths, thresholds, document exclusions, and section exclusions.
- Document lifecycle resource methods.
- Default base URL `https://api.knowhereto.ai`.

Those areas are out of scope for this issue unless a regression is found while implementing the ZIP parser updates.

## Current Worker ZIP Contract

Current worker output, verified against staging result `job_922b79256307` and the worker contract test, emits:

- `manifest.json`
- `chunks.json`
- `full.md`
- `doc_nav.json`
- `images/*`
- `tables/*`

The current worker contract does not emit:

- `chunks_slim.json`
- `hierarchy.json`
- `hierarchy_slim.json`

## Current SDK Mismatches

The Python SDK currently still models and/or reads legacy result files:

- `ParseResult.hierarchy` is populated from `hierarchy.json`, which current worker output no longer emits.
- `ParseResult.chunks_slim` and `SlimChunk` are tied to `chunks_slim.json`, which current worker output no longer emits.
- `TableChunk.table_type` is exposed, but current table chunk metadata does not include `table_type`.

The SDK also misses current worker fields:

- `doc_nav.json`, which contains the canonical navigation tree and resource summaries.
- `manifest.json` field `HIERARCHY`.
- `metadata.document_top_summary` on chunks.

## Requirements

### 1. Parse and expose `doc_nav.json`

Add a typed public representation for `doc_nav.json` and expose it from `ParseResult`.

Expected shape:

- `sections`
  - `title`
  - `path`
  - `level`
  - `summary`
  - `chunk_count`
  - `children`
- `resources.images`
  - `path`
  - `summary`
- `resources.tables`
  - `path`
  - `summary`

Acceptance criteria:

- Given a result ZIP with `doc_nav.json`, when `parseResultZip()` loads it, then callers can access the parsed navigation object from `ParseResult`.
- Given a result ZIP without `doc_nav.json`, when `parseResultZip()` loads it, then parsing still succeeds and the navigation field is `None`.
- `ParseResult.save()` writes `doc_nav.json` when the parsed navigation object exists.

### 2. Expose manifest hierarchy from `manifest.json`

Represent the worker-emitted `HIERARCHY` field in the public `Manifest` model.

Acceptance criteria:

- Given `manifest.json` contains `HIERARCHY`, when `parseResultZip()` loads it, then Python callers can access the field without reading raw manifest dictionaries.
- The Pydantic model should handle the all-caps input key. An idiomatic model field such as `hierarchy = Field(default=None, alias="HIERARCHY")` is acceptable if serialization behavior is covered by tests.

### 3. Surface `document_top_summary` on chunks

Expose `metadata.document_top_summary` as an optional chunk field.

Acceptance criteria:

- Given any text, image, or table chunk metadata includes `document_top_summary`, when `parseResultZip()` loads `chunks.json`, then the corresponding chunk model exposes `document_top_summary`.
- Existing chunk parsing continues to handle missing `document_top_summary`.

### 4. Remove or clearly deprecate legacy/ghost fields

Clean up public result models and tests that still imply the current worker emits removed files or fields.

Acceptance criteria:

- `chunks_slim.json` and `hierarchy.json` are no longer described as current worker outputs.
- Tests no longer require `chunks_slim.json` or `hierarchy.json` to exist for the current contract fixture.
- `TableChunk.table_type` is either removed or marked deprecated with documentation that the current worker does not emit `table_type`.
- Any backward-compatible legacy parsing that remains is explicitly documented as legacy-only.

## Out of Scope

- Adding high-level parse APIs, parsing params, job polling, retrieval options, document lifecycle resources, or base URL changes. These already exist on `origin/main`.
- Changing the worker result ZIP contract.
- Adding `table_type` to worker table metadata.
- Removing backward compatibility unless the implementation chooses a major-version cleanup.

## Suggested Verification

- Add or update result parser tests using a fixture that matches the current worker ZIP contract.
- Assert that current-contract ZIPs parse without `hierarchy.json` or `chunks_slim.json`.
- Assert that `doc_nav.json`, manifest `HIERARCHY`, and chunk `document_top_summary` are exposed.
- Run the Python SDK type checks and test suite used by this repository.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align ParseResult ZIP models with current worker output #21

Goal

Current SDK Baseline

Current Worker ZIP Contract

Current SDK Mismatches

Requirements

1. Parse and expose `doc_nav.json`

2. Expose manifest hierarchy from `manifest.json`

3. Surface `document_top_summary` on chunks

4. Remove or clearly deprecate legacy/ghost fields

Out of Scope

Suggested Verification

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Align ParseResult ZIP models with current worker output #21

Description

Goal

Current SDK Baseline

Current Worker ZIP Contract

Current SDK Mismatches

Requirements

1. Parse and expose doc_nav.json

2. Expose manifest hierarchy from manifest.json

3. Surface document_top_summary on chunks

4. Remove or clearly deprecate legacy/ghost fields

Out of Scope

Suggested Verification

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Parse and expose `doc_nav.json`

2. Expose manifest hierarchy from `manifest.json`

3. Surface `document_top_summary` on chunks