menily/toolkit

The Python reference implementation for menily/schema. Three adapters: POV video, VR hand-tracking, motion capture. Converts heterogeneous raw data into unified task-level VLA demonstration data.

What this is

menily/toolkit is a Python library that ingests four classes of raw data:

📹 First-person video (smartphone / GoPro / Vision Pro)
🎮 VR hand-tracking logs (Meta Quest Pro / Apple Vision Pro / PICO 4U)
🎭 Motion capture (BVH / FBX / C3D)
🤖 Robot teleoperation traces (HDF5 / pickle / RLDS)

…and converts every one of them into the same task-level format defined by menily/schema v1.

Why it exists

Training a VLA (vision–language–action) model requires millions of task-level demonstrations — not raw video frames, not isolated motion clips, but trajectories annotated with the task being executed, the visual context, the action sequence, and the body morphology at a resolution that downstream policies can actually learn from.

Most teams today end up in one of three dead ends:

Hand-label video frame by frame (slow, expensive, non-scalable)
Train only on simulation (domain gap, unrealistic motion)
Depend on a proprietary robot teleoperation lab (geographically concentrated, politically fragile)

menily/toolkit is the preprocessing layer that lets all four raw data sources feed into the same VLA training pipeline without per-source glue code.

Architecture

raw input                    adapter              task-level output
──────────                   ────────              ──────────────────
smartphone video   ─┐                      
VR demonstration   ─┼─► segmentation ──► alignment ──► menily/schema v1
motion capture     ─┤    ▲                  ▲                ▲
teleoperation      ─┘    │                  │                │
                    language prompts   action space    VLA training

Each adapter produces the same Task object conforming to menily/schema v1 — one file, one task, fully self-contained.

Installation

⚠️ menily-toolkit is in internal alpha. PyPI release is planned (see Status). Early access:

# Future (PyPI)
pip install menily-toolkit

# Today (from source, internal alpha)
git clone https://github.com/MenilyIntelligence/toolkit
cd toolkit
pip install -e .

Dependencies: Python 3.10+, NumPy, PyTorch, mediapipe (for POV video hand-keypoint detection), ffmpeg (for video I/O).

Quick start

from menily.toolkit import pov, schema

# POV video → task-level demonstration(s)
tasks = pov.segment(
    video_path="./demo_pour_water.mp4",
    language="Pour water from the blue cup into the kettle.",
    language_variants=[
        "把蓝色杯子里的水倒进水壶里",
        "Fill the kettle with water from the blue cup",
    ],
    fps=30,
    viewpoint="ego",
    body_morphology="bimanual_humanoid",
    collection_region="SEA",
)

# Validate + save each segmented task
for task in tasks:
    report = task.validate()
    assert report.passed
    task.save_as(schema="menily.task-demo/1", out_dir="./out/")

Output: one JSON file per task under ./out/, each conforming to menily/schema v1.

Adapters in detail

`toolkit.pov` — First-person video

Input: MP4/MOV from smartphone, GoPro, Vision Pro, or any egocentric camera.

Pipeline:

Frame sampling (resample to target fps, default 30Hz)
Hand keypoint detection (MediaPipe or HaMeR)
Trajectory reconstruction (keypoints → end-effector 6-DoF)
Task segmentation (optical flow + action-energy + language-timestamp anchors)
Per-segment task object emission

from menily.toolkit import pov

tasks = pov.segment(
    video_path="./raw/demo.mp4",
    language="...",
    fps=30,
    viewpoint="ego",
    body_morphology="bimanual_humanoid",
    collection_region="SEA",
)

`toolkit.vr` — VR hand-tracking

Input: JSON or binary logs from Meta Quest Pro, Apple Vision Pro, or PICO 4U VR devices.

from menily.toolkit import vr

tasks = vr.from_quest_log(
    log_path="./raw/quest_session.json",
    language="Assemble the blue widget onto the base plate.",
    fps=60,
    viewpoint="ego",
    body_morphology="bimanual",
    calibration={
        "origin": "room_center",
        "scale_to_robot": 0.9,
    },
)

Strength: native 60-90Hz, sub-centimeter trajectory precision. Weakness: visual context is virtual — downstream teams usually pair with a separate RGB render.

`toolkit.mocap` — Motion capture

Input: BVH / FBX / C3D from OptiTrack, Vicon, Xsens.

from menily.toolkit import mocap

tasks = mocap.from_bvh(
    bvh_path="./raw/optitrack.bvh",
    segmentation_file="./raw/task_segments.json",
    body_morphology="humanoid",
    retarget_to="unitree_g1",
    retarget_backend="adamorph",   # or "omniretarget" / "spark" / "kdmr" / "custom"
    physics_filter=True,
)

Retargeting backends (AdaMorph, OmniRetarget, SPARK, KDMR) are pluggable — the toolkit doesn't reimplement retargeting, it composes existing research.

Interoperability

Direction	Format	Method
Export downstream	RLDS (Open X-Embodiment)	`Task.to_rlds()`
Export downstream	HuggingFace `datasets.Dataset`	`Task.to_hf_dataset()`
Import upstream	Existing RLDS / Open X-Embodiment	`from_rlds(path)`
Import upstream	BONES-SEED (motion data)	`mocap.from_bones_seed(path)` (planned)

Status and roadmap

Component	Status	PyPI release
`toolkit.core` — `Task` object, validation, I/O	Stable	2–3 weeks
`toolkit.pov` — first-person video adapter	Internal alpha	4–6 weeks
`toolkit.vr` — VR hand-tracking adapter	Internal alpha	4–6 weeks
`toolkit.mocap` — motion capture adapter	Design finalized	8–10 weeks
Reference dataset card on HuggingFace	Pending	After PyPI

We build in open but stage releases — the schema is stable first, then core, then each adapter. If you are building a VLA / VLM / world-model pipeline and want early access or a specific adapter prioritized, email: Masashi@Menily.AI.

Related projects

Repo	Description
menily/schema	The specification this toolkit implements
menily/research	Research notes + design rationale
menily.ai	Organization site — team, publications, contact

License

Apache License 2.0 — see LICENSE (added with first tagged release).

Contributing

🐛 Bug reports → open an Issue
💡 API design discussion → PRs welcome; discuss in an Issue first for significant changes
📧 Early-access requests / specific-adapter prioritization → Masashi@Menily.AI
🌐 Organization → github.com/MenilyIntelligence

Citation

@misc{menily2026toolkit,
  author       = {Masashi},
  title        = {menily/toolkit: Python Reference Implementation for
                  Task-Level VLA Demonstration Data},
  year         = {2026},
  howpublished = {Menily Intelligence, Apache-2.0 open source},
  url          = {https://github.com/MenilyIntelligence/toolkit}
}

中文简介

menily/toolkit 是 menily/schema 的 Python 参考实现。

三个 Adapter：

toolkit.pov — 第一人称视频（手机、GoPro、Vision Pro）→ 任务级示教数据
toolkit.vr — VR 手部追踪（Quest / Vision Pro / PICO）→ 末端执行器轨迹
toolkit.mocap — 动作捕捉（BVH / FBX / C3D）→ 全身动作序列（含 retargeting）

统一输出符合 menily/schema v1 的任务单元，可直接喂给 VLA 训练管道，或互转到 Open X-Embodiment / RLDS 和 HuggingFace Datasets。

当前处于 Pre-MVP 阶段，PyPI 发布分批推进。需要定向早期接入：Masashi@Menily.AI。

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

menily/toolkit

Table of contents

What this is

Why it exists

Architecture

Installation

Quick start

Adapters in detail

`toolkit.pov` — First-person video

`toolkit.vr` — VR hand-tracking

`toolkit.mocap` — Motion capture

Interoperability

Status and roadmap

Related projects

Recommended reading

License

Contributing

Citation

中文简介

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

menily/toolkit

Table of contents

What this is

Why it exists

Architecture

Installation

Quick start

Adapters in detail

toolkit.pov — First-person video

toolkit.vr — VR hand-tracking

toolkit.mocap — Motion capture

Interoperability

Status and roadmap

Related projects

Recommended reading

License

Contributing

Citation

中文简介

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

`toolkit.pov` — First-person video

`toolkit.vr` — VR hand-tracking

`toolkit.mocap` — Motion capture

Packages