Skip to content

Waypoint-1.5 Support#33

Open
lapp0 wants to merge 42 commits intomainfrom
wp-1.5
Open

Waypoint-1.5 Support#33
lapp0 wants to merge 42 commits intomainfrom
wp-1.5

Conversation

@lapp0
Copy link
Collaborator

@lapp0 lapp0 commented Mar 18, 2026

Waypoint-1.5 required changes

  • TAEHV integration
  • Update KV Cache for Waypoint-1.5 frame_idx (backwards compatible with Waypoint-1) + pos_id change including f_pos (incremental frame position)
  • load_state_dict converts the weights to a format compatible with the inference engine if model is WP1.5
  • auto_aspect_ratio defaults to True, impacts WP1.5 only, enforces inputs / outputs are 720p or 360p.
  • Update README to document Waypoint-1.5

Misc changes not specific to Waypoint-1.5

  • COMPILE_OPTIONS for more throughput
  • Enable "state snapshots" (or "game checkpoints") via get_state and load_state
  • Load directly to GPU to minimize CPU memory overhead
  • Clydes dynamic angle computation, allowing infinite length generation with no memory impact
  • Allow load_weights=False, to create a randomly initialized model for benchmarking
  • Fix torch warning via converting controller inputs w/ torch.as_tensor

examples/

MalarzDawid and others added 30 commits February 22, 2026 11:54
)

* fix: uv sync issue with python version 3.9

* fix: VRAM explosion

* refactor: init on gpu device directly

* fix: don't use fbgemm on windows for now

* feat: orthoropeangles

* fix: NoCastModule OrthoRoPEAngles

* fix: remove pos_ids from args

* fix: remove old src rope replacement patch

* fix: remove out of scope ae changes

* fix: remove out of scope text encoder changes

* fix: patch_model pos_ids

---------

Co-authored-by: Philpax <me@philpax.me>
feat: use built triton-windows fork to fix long-path issue
@lapp0 lapp0 marked this pull request as ready for review March 19, 2026 20:03
## Waypoint-1.5 Behavior
All interfaces between Waypoint-1 (or 1.1) and Waypoint-1.5 **except** the following:

In Waypoint-1.5, the `img` passed to `append_frame(...)` and returned by `gen_frame(...)` is now a sequence of 4 frames. Waypoint-1.5 applies temporal compression and generates 4 frames for every controller input.
Copy link

@philpax philpax Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

describe the implications this has for frame pacing; what's the correct way to feed inputs and display the rendered frames to the user?

@@ -77,14 +77,25 @@ for controller_input in [
img = engine.gen_frame(ctrl=controller_input)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably needs to be updated for 4-frame use, or this snippet should be deleted entirely and pointed at one of the examples

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the Waypoint-1.5 clarification below on the nature of img is sufficient

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment pointing to the clarification below, so they have an idea of what to expect for the shape of img

"https://gist.github.com/user-attachments/assets/68c943a4-008a-4c25-948c-c81ab4c47d21",
])
frame = cv2.imdecode(np.frombuffer(urllib.request.urlopen(url).read(), np.uint8), cv2.IMREAD_COLOR)
engine.append_frame(torch.from_numpy(np.repeat(frame[None], 4, axis=0)))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

branch on whether it's a WP-1 or WP-1.5 model and change the append behaviour accordingly; add a comment indicating that we're repeating to meet the 4-frame requirement for WP-1.5

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, new users shouldn't be directed towards WP1 at all. Make integration possible, but move away from backwards compatible examples.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm; yeah, that's fine.

However, at the risk of potentially causing you a lot of grief, perhaps the solution is to make WP-1 use work against [1, H, W, 3], so that consuming code always works with [N, H, W, 3] shapes, where N is 1 for WP-1, or config.temporal_compression for newer models? (including updating all of the examples to work over config.temporal_compression/tensor shape as opposed to 4)

This is already a breaking change for consumers, so this kind of unification makes sense to me, and it means that downstream code should Just Work:tm: (in the sense that you pace out at gentime/N, which gracefully degrades when N=1).

if __name__ == "__main__":
gen_vid()
# Set seed frame
url = random.choice([
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we include seed images within the repo and point to local files instead? easier to hack / see what's going on

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on me pointing to Biome repo images?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine once we've locked them in, which we haven't quite yet done (stalled on generating decently-ID images). Ok to leave this as-is for now, and then we can replace them just before release, I think.

with iio.imopen("out.mp4", "w", plugin="pyav") as out:
out.write(engine.gen_frame().cpu().numpy(), fps=60, codec="libx264")
for ctrl in controller_sequence:
out.write(engine.gen_frame(ctrl=ctrl).cpu().numpy())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super obvious how the inputs map to frames here, especially in the 4-frame model; does this mean that we're supplying one of the inputs once every four frames? how would I do multiple different inputs within that four-frame block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sufficient?

    four_frames = engine.gen_frame().cpu().numpy()  # int8 [4, H, W, 3]
    out.write(four_frames, fps=60, codec="libx264")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use config.temporal_compression for clarity

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, wait, that suggestion's for the seed frame; I think that still doesn't address my issue, which is "how do the inputs that I, as a user, get mapped to inputs under the temporally compressed regime?" Do I bundle together the last four frames of inputs? Do I only send inputs from the current frame, so that only one-fourth of the inputs make it through?

@@ -1,12 +1,17 @@
"""
Additional Dependencies: pytest-benchmark
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move all of the additional deps into pyproject.toml using https://docs.astral.sh/uv/concepts/projects/dependencies/#dependency-groups so that users can do uv run --dev pytest examples/benchmark.py? ditto for the other examples

we want getting up to speed with WE to be as easy as possible; ideally, you clone a repo and run uv run --dev examples/gen_sample.py Overworld/Waypoint-1.5-1B with no additional steps to see the WE do its thing (uv run should do all the intermediate work)

Copy link
Collaborator Author

@lapp0 lapp0 Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated pyproject, now the following comments work:

# MODEL_URI="Overworld/Waypoint-1.5-1B" uv run --dev pytest examples/benchmark.py
# uv run --dev examples/gen_sample.py Overworld/Waypoint-1.5-1B

(Model should be Overworld-Models/MR160k for now)

else None
)
w_amax = lin.weight.data.clone().amax().float().squeeze()
w_amax = lin.weight.data.abs().amax()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope, minor bug fix

"https://gist.github.com/user-attachments/assets/68c943a4-008a-4c25-948c-c81ab4c47d21",
])
frame = cv2.imdecode(np.frombuffer(urllib.request.urlopen(url).read(), np.uint8), cv2.IMREAD_COLOR)
engine.append_frame(torch.from_numpy(np.repeat(frame[None], 4, axis=0)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, new users shouldn't be directed towards WP1 at all. Make integration possible, but move away from backwards compatible examples.

with iio.imopen("out.mp4", "w", plugin="pyav") as out:
out.write(engine.gen_frame().cpu().numpy(), fps=60, codec="libx264")
for ctrl in controller_sequence:
out.write(engine.gen_frame(ctrl=ctrl).cpu().numpy())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sufficient?

    four_frames = engine.gen_frame().cpu().numpy()  # int8 [4, H, W, 3]
    out.write(four_frames, fps=60, codec="libx264")

if __name__ == "__main__":
gen_vid()
# Set seed frame
url = random.choice([
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on me pointing to Biome repo images?

@@ -1,12 +1,17 @@
"""
Additional Dependencies: pytest-benchmark
Copy link
Collaborator Author

@lapp0 lapp0 Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated pyproject, now the following comments work:

# MODEL_URI="Overworld/Waypoint-1.5-1B" uv run --dev pytest examples/benchmark.py
# uv run --dev examples/gen_sample.py Overworld/Waypoint-1.5-1B

(Model should be Overworld-Models/MR160k for now)

@@ -77,14 +77,25 @@ for controller_input in [
img = engine.gen_frame(ctrl=controller_input)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the Waypoint-1.5 clarification below on the nature of img is sufficient

Copy link

@ScottieFox ScottieFox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, branch is stable for anticipation of WP1.5 model behavior and its communication to server.py loaded into the .stream service. The .stream product is not exposed to the end user, and thus all further changes should have BIOME as its consideration as long as compatibility of function exists between both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants