OpenSportsLab · SilvioGiancola · May 19, 2026 · May 19, 2026
diff --git a/docs/OSL.md b/docs/OSL.md
@@ -7,6 +7,49 @@ An OSL JSON file is a single JSON object with dataset metadata, a label schema,
 and a `data` array of samples. Each sample points to one or more media inputs and
 can carry task-specific annotations.
 
+## Minimal Valid File
+
+This is the smallest practical shape for a dataset with one video sample:
+
+```json
+{
+  "version": "2.0",
+  "date": "2026-05-19",
+  "dataset_name": "minimal-demo",
+  "description": "",
+  "modalities": ["video"],
+  "metadata": {},
+  "labels": {},
+  "data": [
+    {
+      "id": "clip_0001",
+      "inputs": [
+        {
+          "type": "video",
+          "path": "clips/clip_0001.mp4"
+        }
+      ]
+    }
+  ]
+}
+```
+
+!!! note "Relative paths"
+    Relative `inputs[].path` values are resolved from the folder that contains
+    the JSON file. If you move the JSON without moving its media folders,
+    playback can fail.
+
+## Common Mistakes
+
+| Mistake | Result | Fix |
+|---|---|---|
+| Root JSON is an array | The app rejects the file. | Use one root object with a `data` array. |
+| `data` is missing or not a list | The app rejects the file. | Set `data` to `[]` or a list of sample objects. |
+| Using top-level `questions` for Q/A | Legacy question banks are dropped on save. | Store Q/A in each sample's grouped `answers[]`. |
+| Dense captions use `start_ms`/`end_ms` only | The current dense editor expects point timestamps. | Use `dense_captions[].position_ms`. |
+| Annotation head names do not match root `labels` | Controls may not show the expected labels. | Keep `data[].labels` keys and `events[].head` values aligned with root `labels`. |
+| Relative media paths no longer point to files | Samples load but playback cannot find media. | Keep media beside the JSON or resave after correcting paths. |
+
 ## Top-Level Object
 
 The smallest useful file is a JSON object with `data` as a list. When loading,

diff --git a/docs/annotating.md b/docs/annotating.md
@@ -1,38 +1,70 @@
 # Annotating
 
+All annotation tabs work on the currently selected sample from the Dataset
+Explorer. The JSON field names below match the canonical [OSL JSON Format](OSL.md)
+page.
+
 ## Classification
 
-1. Select a sample in the Dataset Explorer.
+Use `CLS` for clip-level labels.
+
+1. Select a sample.
 2. Open `CLS`.
-3. Choose labels in each head.
-4. Changes persist immediately when they are effective.
+3. Add or choose label heads and labels.
+4. Select the label values for the current sample.
+
+Effective manual changes are saved immediately into the sample's `labels`
+object. Single-label heads write `{"label": "..."}` and multi-label heads write
+`{"labels": [...]}`. Smart predictions add `confidence_score` until confirmed or
+rejected.
 
 ## Localization
 
+Use `LOC` for point events on the timeline.
+
 1. Select a sample and open `LOC`.
-2. Use spotting buttons to add events at current time.
-3. Edit or delete events from the event table.
-4. Optional: run smart inference for a selected head.
+2. Choose a label head and label.
+3. Move the playhead to the event time.
+4. Use the spotting controls to add the event.
+5. Edit or delete rows in the event table when needed.
+
+Events are stored in `events[]` with `head`, `label`, and `position_ms`.
+Smart inference can add predicted rows with `confidence_score`; confirming a row
+keeps the event and removes only the confidence marker.
 
 ## Description
 
+Use `DESC` for one clip-level caption.
+
 1. Select a sample and open `DESC`.
-2. Edit caption text.
-3. Autosave stores the caption in `captions`.
+2. Enter or edit the caption text.
+3. Wait for autosave or save the project.
+
+The text is stored in `captions[]`. Manual description edits currently write an
+English caption entry with `lang` set to `en`.
 
 ## Dense Description
 
+Use `DENSE` for timestamped text descriptions.
+
 1. Select a sample and open `DENSE`.
-2. Click **Add New Description**.
-3. Enter text in the modal; event is stored at current `position_ms`.
-4. Edit time/text from the table when needed.
+2. Move the playhead to the desired timestamp.
+3. Click **Add New Description**.
+4. Enter text in the modal.
+5. Edit time or text from the table when needed.
+
+Dense descriptions are stored in `dense_captions[]` with `position_ms`, `lang`,
+and `text`. The table keeps rows ordered by timestamp.
 
 ## Question/Answer
 
-1. Open `Q/A`.
-2. Add or select a sample question group.
-3. Use the add/edit dialog to choose a previous dataset question or enter custom text.
-4. Double-click a question group to edit it, or right-click it to edit/remove it.
-5. Click **Answer** to add an answer in a multiline dialog.
-6. Double-click an answer to edit it, or right-click it to edit/remove it.
-7. Answers are stored as grouped `answers` with `question` and `answers[]`.
+Use `Q/A` for grouped questions and one or more answers per question.
+
+1. Select a sample and open `Q/A`.
+2. Click **Add** to create a question group.
+3. Choose a previous dataset question or enter custom question text.
+4. Click **Answer** to add answer text.
+5. Double-click or right-click a question or answer to edit or remove it.
+
+Answers are stored as grouped `answers[]` entries with `question` and
+`answers[]`. The app does not write a top-level `questions` bank.
diff --git a/docs/batch_tools.md b/docs/batch_tools.md
@@ -1,45 +1,59 @@
-# Batch Tools
+# Data Transfer and Batch Tools
 
-The app supports Hugging Face dataset transfer from the **Data** menu and script/API workflows for batch conversion.
+The app supports Hugging Face dataset transfer from the **Data** menu and
+script/API workflows for batch conversion. Dataset JSON inputs follow the
+[OSL JSON Format](OSL.md).
 
 ## In-App Data Menu
 
 ### Download Dataset from HF...
 
-- Opens a dialog for:
-  - repo ID
-  - branch/revision
-  - split
-  - format
-  - output directory
-  - optional token
-  - dry-run mode
-- Supports:
-  - JSON split downloads (`<split>.json`)
-  - Parquet split downloads (`<split>/`)
-- Writes files under `<output directory>/<revision>/<split>`.
-- On successful non-dry-run JSON download, source metadata is written into the JSON root:
-  - `hf_repo_id`
-  - `hf_branch`
-  - `hf_split`
+The download dialog asks for:
+
+- repo ID
+- branch/revision
+- split
+- format
+- output directory
+- optional token
+- dry-run mode
+
+It supports JSON split downloads (`<split>.json`) and Parquet split downloads
+(`<split>/`). Files are written under `<output directory>/<revision>/<split>`.
+
+For successful non-dry-run JSON downloads, source metadata is written into the
+JSON root:
+
+- `hf_repo_id`
+- `hf_branch`
+- `hf_split`
+
+!!! note "Dry-run support"
+    Dry-run size estimation is available for JSON downloads. Parquet downloads
+    run as real downloads/conversions.
 
 ### Upload Dataset to HF...
 
-Requires an opened dataset JSON from disk.
+Upload requires an opened dataset JSON from disk.
 
 Upload modes:
 
-- **Upload as JSON**: uploads current dataset JSON plus files referenced by `data[].inputs[].path` in one commit.
-- **Parquet + WebDataset**: converts locally, then uploads generated Parquet/shards (shard size configurable).
+- **Upload as JSON** uploads the current dataset JSON plus every file referenced
+  by `data[].inputs[].path`.
+- **Parquet + WebDataset** converts locally, then uploads generated
+  Parquet/WebDataset artifacts.
 
-If repository/branch is missing, the app can prompt to create it and retry.
+If the target repository or branch is missing, the app can prompt to create it
+and retry.
 
 ## CLI Scripts
 
-### Download referenced files
+Run commands from the repository root.
+
+### Download Referenced Files
 
 ```bash
-python test_data/download_osl_hf.py \
+python tools/download_osl_hf.py \
   --repo-id <org/repo> \
   --revision main \
   --split test \
@@ -48,14 +62,32 @@ python test_data/download_osl_hf.py \
   --dry-run
 ```
 
-### Upload referenced files
+### Upload Referenced Files
 
 ```bash
-python test_data/upload_osl_hf.py \
+python tools/upload_dataset_to_hf.py \
   --repo-id <org/repo> \
   --json-path <local_dataset.json> \
   --split test \
-  --revision main
+  --revision main \
+  --format json
+```
+
+### Convert JSON to Parquet + WebDataset
+
+```bash
+python tools/osl_json_to_parquet_webdataset.py \
+  annotations.json \
+  /path/to/media/root \
+  /path/to/output_dataset
+```
+
+### Convert Parquet + WebDataset Back to JSON
+
+```bash
+python tools/parquet_webdataset_to_osl_json.py \
+  /path/to/output_dataset \
+  reconstructed.json
 ```
 
 ## Python Conversion API
@@ -66,3 +98,5 @@ from opensportslib.tools import convert_json_to_parquet, convert_parquet_to_json
 convert_json_to_parquet(json_path="annotations.json", media_root=".", output_dir="out_parquet")
 convert_parquet_to_json(dataset_dir="out_parquet", output_json_path="reconstructed.json")
 ```
+
+For full script options, run any tool with `--help`.
diff --git a/docs/changelog.md b/docs/changelog.md
@@ -1 +1,14 @@
 # Changelog
+
+Release notes for packaged builds are published on GitHub Releases:
+
+- https://github.com/OpenSportsLab/VideoAnnotationTool/releases
+
+## Documentation Notes
+
+- The public site now treats [OSL JSON Format](OSL.md) as the canonical dataset
+  schema reference.
+- Workflow pages link back to the OSL format page instead of duplicating long
+  schema examples.
+- Saving/loading docs describe the current grouped Q/A format and no longer list
+  legacy top-level `questions` as persisted project data.
diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -1,5 +1,9 @@
 # Getting Started
 
+This walkthrough takes you from an empty project to a saved OSL JSON dataset.
+For field-level JSON details, use [OSL JSON Format](OSL.md) as the canonical
+reference.
+
 ## 1. Launch
 
 Start the app from the repository root:
@@ -10,33 +14,52 @@ python annotation_tool/main.py
 
 You will land on the **Welcome** screen.
 
-## 2. Create Or Open A Dataset
+## 2. Create or Open a Dataset
+
+- Choose **Create New Dataset** to start with a blank OSL JSON project.
+- Choose **Load Dataset** to open an existing `.json` file.
+- Reopen known files from the recent-datasets list when available.
 
-- **Create New Dataset** for a blank OSL dataset.
-- **Load Dataset** to open an existing JSON.
-- You can also reopen files from the recent-datasets list.
+!!! warning "Keep JSON and media paths together"
+    OSL input paths are usually relative to the dataset JSON file. If you move a
+    JSON file without moving the referenced media folders, playback may fail
+    until the paths are fixed or the dataset is saved again in the expected
+    location.
 
 ## 3. Add Samples
 
 In the Dataset Explorer:
 
-- Click **Add Data**.
-- Select files or folders.
-- Selected folders are treated as multi-input samples (for multi-view workflows).
+1. Click **Add Data**.
+2. Select one or more files, or select folders that contain supported files.
+3. Review the sample rows that appear in the tree.
+
+Selected files become separate samples. Selected folders are treated as
+multi-input samples for multi-view workflows. The app stores each input under
+`data[].inputs[]` and infers the input type from the file extension when needed.
 
 ## 4. Annotate
 
-Use the right-side tabs:
+Select a sample in the Dataset Explorer, then use the right-side annotation tabs:
 
-- `CLS` for classification labels
-- `LOC` for timestamped events
-- `DESC` for clip-level captions
-- `DENSE` for timestamped dense captions
-- `Q/A` for per-sample question groups and answers
+| Tab | Use it for | JSON field |
+|---|---|---|
+| `CLS` | Clip-level classification labels | `labels` |
+| `LOC` | Timestamped events | `events` |
+| `DESC` | Clip-level text captions | `captions` |
+| `DENSE` | Timestamped dense captions | `dense_captions` |
+| `Q/A` | Per-sample question groups and answers | `answers` |
+
+See [Annotating](annotating.md) for the per-mode workflow.
 
 ## 5. Save
 
-- `Ctrl+S` saves to the current JSON path.
-- `Ctrl+Shift+S` saves as a new JSON file.
+- **Save Dataset** (`Ctrl+S`) writes to the current JSON path.
+- **Save Dataset As** (`Ctrl+Shift+S`) writes to a new JSON path.
+
+On save, the app normalizes sample IDs, removes empty optional task blocks, and
+rewrites `data[].inputs[].path` relative to the saved JSON location when
+possible. See [Saving and Loading](saving_loading.md) for the full save behavior.
 
-When you close with unsaved changes, you can **Save**, **Save As**, **Close Without Saving**, or **Cancel**.
+When you close with unsaved changes, choose **Save**, **Save As**,
+**Close Without Saving**, or **Cancel**.