diff --git a/docs/OSL.md b/docs/OSL.md
index 498abf2..da3b4b7 100644
--- a/docs/OSL.md
+++ b/docs/OSL.md
@@ -7,6 +7,49 @@ An OSL JSON file is a single JSON object with dataset metadata, a label schema,
 and a `data` array of samples. Each sample points to one or more media inputs and
 can carry task-specific annotations.
 
+## Minimal Valid File
+
+This is the smallest practical shape for a dataset with one video sample:
+
+```json
+{
+  "version": "2.0",
+  "date": "2026-05-19",
+  "dataset_name": "minimal-demo",
+  "description": "",
+  "modalities": ["video"],
+  "metadata": {},
+  "labels": {},
+  "data": [
+    {
+      "id": "clip_0001",
+      "inputs": [
+        {
+          "type": "video",
+          "path": "clips/clip_0001.mp4"
+        }
+      ]
+    }
+  ]
+}
+```
+
+!!! note "Relative paths"
+    Relative `inputs[].path` values are resolved from the folder that contains
+    the JSON file. If you move the JSON without moving its media folders,
+    playback can fail.
+
+## Common Mistakes
+
+| Mistake | Result | Fix |
+|---|---|---|
+| Root JSON is an array | The app rejects the file. | Use one root object with a `data` array. |
+| `data` is missing or not a list | The app rejects the file. | Set `data` to `[]` or a list of sample objects. |
+| Using top-level `questions` for Q/A | Legacy question banks are dropped on save. | Store Q/A in each sample's grouped `answers[]`. |
+| Dense captions use `start_ms`/`end_ms` only | The current dense editor expects point timestamps. | Use `dense_captions[].position_ms`. |
+| Annotation head names do not match root `labels` | Controls may not show the expected labels. | Keep `data[].labels` keys and `events[].head` values aligned with root `labels`. |
+| Relative media paths no longer point to files | Samples load but playback cannot find media. | Keep media beside the JSON or resave after correcting paths. |
+
 ## Top-Level Object
 
 The smallest useful file is a JSON object with `data` as a list. When loading,
diff --git a/docs/annotating.md b/docs/annotating.md
index 16f963b..c265ad4 100644
--- a/docs/annotating.md
+++ b/docs/annotating.md
@@ -1,38 +1,70 @@
 # Annotating
 
+All annotation tabs work on the currently selected sample from the Dataset
+Explorer. The JSON field names below match the canonical [OSL JSON Format](OSL.md)
+page.
+
 ## Classification
 
-1. Select a sample in the Dataset Explorer.
+Use `CLS` for clip-level labels.
+
+1. Select a sample.
 2. Open `CLS`.
-3. Choose labels in each head.
-4. Changes persist immediately when they are effective.
+3. Add or choose label heads and labels.
+4. Select the label values for the current sample.
+
+Effective manual changes are saved immediately into the sample's `labels`
+object. Single-label heads write `{"label": "..."}` and multi-label heads write
+`{"labels": [...]}`. Smart predictions add `confidence_score` until confirmed or
+rejected.
 
 ## Localization
 
+Use `LOC` for point events on the timeline.
+
 1. Select a sample and open `LOC`.
-2. Use spotting buttons to add events at current time.
-3. Edit or delete events from the event table.
-4. Optional: run smart inference for a selected head.
+2. Choose a label head and label.
+3. Move the playhead to the event time.
+4. Use the spotting controls to add the event.
+5. Edit or delete rows in the event table when needed.
+
+Events are stored in `events[]` with `head`, `label`, and `position_ms`.
+Smart inference can add predicted rows with `confidence_score`; confirming a row
+keeps the event and removes only the confidence marker.
 
 ## Description
 
+Use `DESC` for one clip-level caption.
+
 1. Select a sample and open `DESC`.
-2. Edit caption text.
-3. Autosave stores the caption in `captions`.
+2. Enter or edit the caption text.
+3. Wait for autosave or save the project.
+
+The text is stored in `captions[]`. Manual description edits currently write an
+English caption entry with `lang` set to `en`.
 
 ## Dense Description
 
+Use `DENSE` for timestamped text descriptions.
+
 1. Select a sample and open `DENSE`.
-2. Click **Add New Description**.
-3. Enter text in the modal; event is stored at current `position_ms`.
-4. Edit time/text from the table when needed.
+2. Move the playhead to the desired timestamp.
+3. Click **Add New Description**.
+4. Enter text in the modal.
+5. Edit time or text from the table when needed.
+
+Dense descriptions are stored in `dense_captions[]` with `position_ms`, `lang`,
+and `text`. The table keeps rows ordered by timestamp.
 
 ## Question/Answer
 
-1. Open `Q/A`.
-2. Add or select a sample question group.
-3. Use the add/edit dialog to choose a previous dataset question or enter custom text.
-4. Double-click a question group to edit it, or right-click it to edit/remove it.
-5. Click **Answer** to add an answer in a multiline dialog.
-6. Double-click an answer to edit it, or right-click it to edit/remove it.
-7. Answers are stored as grouped `answers` with `question` and `answers[]`.
+Use `Q/A` for grouped questions and one or more answers per question.
+
+1. Select a sample and open `Q/A`.
+2. Click **Add** to create a question group.
+3. Choose a previous dataset question or enter custom question text.
+4. Click **Answer** to add answer text.
+5. Double-click or right-click a question or answer to edit or remove it.
+
+Answers are stored as grouped `answers[]` entries with `question` and
+`answers[]`. The app does not write a top-level `questions` bank.
diff --git a/docs/batch_tools.md b/docs/batch_tools.md
index de355e0..7adade0 100644
--- a/docs/batch_tools.md
+++ b/docs/batch_tools.md
@@ -1,45 +1,59 @@
-# Batch Tools
+# Data Transfer and Batch Tools
 
-The app supports Hugging Face dataset transfer from the **Data** menu and script/API workflows for batch conversion.
+The app supports Hugging Face dataset transfer from the **Data** menu and
+script/API workflows for batch conversion. Dataset JSON inputs follow the
+[OSL JSON Format](OSL.md).
 
 ## In-App Data Menu
 
 ### Download Dataset from HF...
 
-- Opens a dialog for:
-  - repo ID
-  - branch/revision
-  - split
-  - format
-  - output directory
-  - optional token
-  - dry-run mode
-- Supports:
-  - JSON split downloads (`<split>.json`)
-  - Parquet split downloads (`<split>/`)
-- Writes files under `<output directory>/<revision>/<split>`.
-- On successful non-dry-run JSON download, source metadata is written into the JSON root:
-  - `hf_repo_id`
-  - `hf_branch`
-  - `hf_split`
+The download dialog asks for:
+
+- repo ID
+- branch/revision
+- split
+- format
+- output directory
+- optional token
+- dry-run mode
+
+It supports JSON split downloads (`<split>.json`) and Parquet split downloads
+(`<split>/`). Files are written under `<output directory>/<revision>/<split>`.
+
+For successful non-dry-run JSON downloads, source metadata is written into the
+JSON root:
+
+- `hf_repo_id`
+- `hf_branch`
+- `hf_split`
+
+!!! note "Dry-run support"
+    Dry-run size estimation is available for JSON downloads. Parquet downloads
+    run as real downloads/conversions.
 
 ### Upload Dataset to HF...
 
-Requires an opened dataset JSON from disk.
+Upload requires an opened dataset JSON from disk.
 
 Upload modes:
 
-- **Upload as JSON**: uploads current dataset JSON plus files referenced by `data[].inputs[].path` in one commit.
-- **Parquet + WebDataset**: converts locally, then uploads generated Parquet/shards (shard size configurable).
+- **Upload as JSON** uploads the current dataset JSON plus every file referenced
+  by `data[].inputs[].path`.
+- **Parquet + WebDataset** converts locally, then uploads generated
+  Parquet/WebDataset artifacts.
 
-If repository/branch is missing, the app can prompt to create it and retry.
+If the target repository or branch is missing, the app can prompt to create it
+and retry.
 
 ## CLI Scripts
 
-### Download referenced files
+Run commands from the repository root.
+
+### Download Referenced Files
 
 ```bash
-python test_data/download_osl_hf.py \
+python tools/download_osl_hf.py \
   --repo-id <org/repo> \
   --revision main \
   --split test \
@@ -48,14 +62,32 @@ python test_data/download_osl_hf.py \
   --dry-run
 ```
 
-### Upload referenced files
+### Upload Referenced Files
 
 ```bash
-python test_data/upload_osl_hf.py \
+python tools/upload_dataset_to_hf.py \
   --repo-id <org/repo> \
   --json-path <local_dataset.json> \
   --split test \
-  --revision main
+  --revision main \
+  --format json
+```
+
+### Convert JSON to Parquet + WebDataset
+
+```bash
+python tools/osl_json_to_parquet_webdataset.py \
+  annotations.json \
+  /path/to/media/root \
+  /path/to/output_dataset
+```
+
+### Convert Parquet + WebDataset Back to JSON
+
+```bash
+python tools/parquet_webdataset_to_osl_json.py \
+  /path/to/output_dataset \
+  reconstructed.json
 ```
 
 ## Python Conversion API
@@ -66,3 +98,5 @@ from opensportslib.tools import convert_json_to_parquet, convert_parquet_to_json
 convert_json_to_parquet(json_path="annotations.json", media_root=".", output_dir="out_parquet")
 convert_parquet_to_json(dataset_dir="out_parquet", output_json_path="reconstructed.json")
 ```
+
+For full script options, run any tool with `--help`.
diff --git a/docs/changelog.md b/docs/changelog.md
index 825c32f..86baad5 100644
--- a/docs/changelog.md
+++ b/docs/changelog.md
@@ -1 +1,14 @@
 # Changelog
+
+Release notes for packaged builds are published on GitHub Releases:
+
+- https://github.com/OpenSportsLab/VideoAnnotationTool/releases
+
+## Documentation Notes
+
+- The public site now treats [OSL JSON Format](OSL.md) as the canonical dataset
+  schema reference.
+- Workflow pages link back to the OSL format page instead of duplicating long
+  schema examples.
+- Saving/loading docs describe the current grouped Q/A format and no longer list
+  legacy top-level `questions` as persisted project data.
diff --git a/docs/getting_started.md b/docs/getting_started.md
index f25559f..8a781aa 100644
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@@ -1,5 +1,9 @@
 # Getting Started
 
+This walkthrough takes you from an empty project to a saved OSL JSON dataset.
+For field-level JSON details, use [OSL JSON Format](OSL.md) as the canonical
+reference.
+
 ## 1. Launch
 
 Start the app from the repository root:
@@ -10,33 +14,52 @@ python annotation_tool/main.py
 
 You will land on the **Welcome** screen.
 
-## 2. Create Or Open A Dataset
+## 2. Create or Open a Dataset
+
+- Choose **Create New Dataset** to start with a blank OSL JSON project.
+- Choose **Load Dataset** to open an existing `.json` file.
+- Reopen known files from the recent-datasets list when available.
 
-- **Create New Dataset** for a blank OSL dataset.
-- **Load Dataset** to open an existing JSON.
-- You can also reopen files from the recent-datasets list.
+!!! warning "Keep JSON and media paths together"
+    OSL input paths are usually relative to the dataset JSON file. If you move a
+    JSON file without moving the referenced media folders, playback may fail
+    until the paths are fixed or the dataset is saved again in the expected
+    location.
 
 ## 3. Add Samples
 
 In the Dataset Explorer:
 
-- Click **Add Data**.
-- Select files or folders.
-- Selected folders are treated as multi-input samples (for multi-view workflows).
+1. Click **Add Data**.
+2. Select one or more files, or select folders that contain supported files.
+3. Review the sample rows that appear in the tree.
+
+Selected files become separate samples. Selected folders are treated as
+multi-input samples for multi-view workflows. The app stores each input under
+`data[].inputs[]` and infers the input type from the file extension when needed.
 
 ## 4. Annotate
 
-Use the right-side tabs:
+Select a sample in the Dataset Explorer, then use the right-side annotation tabs:
 
-- `CLS` for classification labels
-- `LOC` for timestamped events
-- `DESC` for clip-level captions
-- `DENSE` for timestamped dense captions
-- `Q/A` for per-sample question groups and answers
+| Tab | Use it for | JSON field |
+|---|---|---|
+| `CLS` | Clip-level classification labels | `labels` |
+| `LOC` | Timestamped events | `events` |
+| `DESC` | Clip-level text captions | `captions` |
+| `DENSE` | Timestamped dense captions | `dense_captions` |
+| `Q/A` | Per-sample question groups and answers | `answers` |
+
+See [Annotating](annotating.md) for the per-mode workflow.
 
 ## 5. Save
 
-- `Ctrl+S` saves to the current JSON path.
-- `Ctrl+Shift+S` saves as a new JSON file.
+- **Save Dataset** (`Ctrl+S`) writes to the current JSON path.
+- **Save Dataset As** (`Ctrl+Shift+S`) writes to a new JSON path.
+
+On save, the app normalizes sample IDs, removes empty optional task blocks, and
+rewrites `data[].inputs[].path` relative to the saved JSON location when
+possible. See [Saving and Loading](saving_loading.md) for the full save behavior.
 
-When you close with unsaved changes, you can **Save**, **Save As**, **Close Without Saving**, or **Cancel**.
+When you close with unsaved changes, choose **Save**, **Save As**,
+**Close Without Saving**, or **Cancel**.
diff --git a/docs/saving_loading.md b/docs/saving_loading.md
index e195b78..740642a 100644
--- a/docs/saving_loading.md
+++ b/docs/saving_loading.md
@@ -1,21 +1,42 @@
-# Saving And Loading
+# Saving and Loading
+
+This page covers project file behavior. For the exact JSON structure, see
+[OSL JSON Format](OSL.md).
 
 ## Loading
 
 Use **File > Load Dataset** (`Ctrl+O`) to open an OSL JSON file.
 
-The app normalizes loaded content (for example, missing IDs) while preserving unknown root/sample fields when possible.
+On load, the app validates that the root is a JSON object and that top-level
+`data` is a list. It also fills missing standard root fields, normalizes missing
+or duplicate sample IDs, canonicalizes known input types, and preserves unknown
+root/sample fields when possible.
+
+!!! note "Relative input paths"
+    Relative `data[].inputs[].path` values are resolved from the directory that
+    contains the opened JSON file.
 
 ## Saving
 
-- **Save Dataset** (`Ctrl+S`): writes to current path.
-- **Save Dataset As** (`Ctrl+Shift+S`): writes to a new path.
+- **Save Dataset** (`Ctrl+S`) writes to the current path.
+- **Save Dataset As** (`Ctrl+Shift+S`) writes to a new path.
+
+On write, paths in `data[].inputs[].path` are rewritten relative to the chosen
+save location when possible.
+
+Save/export also:
 
-On write, paths in `data[].inputs[].path` are rewritten relative to the chosen save location.
+- Ensures sample IDs are unique.
+- Recomputes `modalities` from sample inputs.
+- Removes empty optional sample blocks such as `labels`, `events`, `captions`,
+  `dense_captions`, `answers`, and `metadata`.
+- Normalizes Q/A payloads to grouped `answers[]` entries with non-empty answer
+  text.
+- Removes retired smart keys such as `smart_labels` and `smart_events`.
 
 ## Close With Unsaved Changes
 
-If the dataset is dirty and you close/quit, the app prompts:
+If the dataset is dirty and you close or quit, the app prompts:
 
 - **Save**
 - **Save As**
@@ -24,9 +45,12 @@ If the dataset is dirty and you close/quit, the app prompts:
 
 ## What Is Persisted
 
-- Core OSL fields (`labels`, `events`, `captions`, `dense_captions`, `questions`, `answers`)
-- Unknown/custom root/sample fields
+- Standard OSL fields such as `labels`, `events`, `captions`,
+  `dense_captions`, and grouped `answers`.
+- Unknown/custom root and sample fields when they do not conflict with retired
+  fields.
 
-### Not persisted in dataset JSON
+### Not Persisted in Dataset JSON
 
-- Localization `label_colors` (stored in app settings)
+- Legacy top-level `questions` and per-answer `question_id` entries.
+- Localization `label_colors`, which are stored in app settings.
diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md
index 1ae040f..159fceb 100644
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@@ -1,6 +1,6 @@
 # Troubleshooting
 
-## App Fails To Start
+## App Fails to Start
 
 - Confirm your environment is active.
 - Reinstall dependencies:
@@ -9,6 +9,32 @@
 pip install -r requirements.txt
 ```
 
+## Dataset JSON Does Not Load
+
+- Confirm the file is valid JSON.
+- Confirm the root value is a JSON object, not an array.
+- Confirm top-level `data` is a list.
+- Check the expected structure in [OSL JSON Format](OSL.md).
+
+Legacy VQA files that use top-level `questions` and per-answer `question_id`
+entries should be converted before editing:
+
+```bash
+python tools/convert_legacy_vqa_to_grouped.py \
+  --input-json old_vqa.json \
+  --output-json grouped_vqa.json
+```
+
+## Media Files Are Missing After Loading
+
+Relative paths in `data[].inputs[].path` are resolved from the directory that
+contains the dataset JSON. If the JSON was moved separately from its media
+folders, move the folders back beside the JSON or update the paths.
+
+!!! tip "Saving can repair path layout"
+    After opening a dataset from the intended folder, **Save Dataset As** can
+    rewrite input paths relative to the new JSON location.
+
 ## Hugging Face Transfer Errors
 
 - Ensure `huggingface_hub` is installed.
@@ -19,15 +45,18 @@ huggingface-cli login
 ```
 
 - For upload failures:
-  - `Repository Not Found`: create repo or let the app create it from the prompt.
-  - `Revision/Branch Not Found`: create branch or let the app create it from the prompt.
+  - `Repository Not Found`: create the repo or let the app create it from the prompt.
+  - `Revision/Branch Not Found`: create the branch or let the app create it from the prompt.
 
 ## Download URL 404 / Not Found
 
-- Verify the URL points to the intended file or folder in the dataset repo.
-- If a previously successful URL is now invalid, reselect/correct it in the dialog.
+- Verify the repo ID, revision, split, and format in the dialog.
+- JSON mode expects `<split>.json`.
+- Parquet mode expects a `<split>/` folder.
+- If a previously successful URL is now invalid, reselect or correct it in the
+  dialog.
 
-## Video Playback Error Or Black Screen
+## Video Playback Error or Black Screen
 
 Some codecs are not decoded by your platform backend. Convert to H.264/AAC MP4:
 
@@ -35,6 +64,9 @@ Some codecs are not decoded by your platform backend. Convert to H.264/AAC MP4:
 ffmpeg -i input.mp4 -vcodec libx264 -acodec aac output.mp4
 ```
 
-## No Playback After Selecting A Row
+## No Playback After Selecting a Row
 
-If the selected input is non-video (for example text metadata), this is expected: the row stays selectable but media playback is not started.
+If the selected input is not playable media for the current backend, the row can
+stay selected while playback does not start. For example, `frames_npy` and
+`tracking_parquet` inputs use specialized renderers, and unsupported text or
+metadata files are not played as video.
diff --git a/mkdocs.yml b/mkdocs.yml
index 601c6ac..9887af4 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -3,6 +3,9 @@ site_description: A PyQt6 GUI tool for analyzing and annotating OSL datasets (Op
 site_author: OpenSportsLab
 repo_url: https://github.com/OpenSportsLab/VideoAnnotationTool
 repo_name: OpenSportsLab/VideoAnnotationTool
+exclude_docs: |
+  README.md
+  assets/README.md
 
 theme:
   name: material