Conversation
) * fix: uv sync issue with python version 3.9 * fix: VRAM explosion * refactor: init on gpu device directly * fix: don't use fbgemm on windows for now * feat: orthoropeangles * fix: NoCastModule OrthoRoPEAngles * fix: remove pos_ids from args * fix: remove old src rope replacement patch * fix: remove out of scope ae changes * fix: remove out of scope text encoder changes * fix: patch_model pos_ids --------- Co-authored-by: Philpax <me@philpax.me>
feat: use built triton-windows fork to fix long-path issue
| ## Waypoint-1.5 Behavior | ||
| All interfaces between Waypoint-1 (or 1.1) and Waypoint-1.5 **except** the following: | ||
|
|
||
| In Waypoint-1.5, the `img` passed to `append_frame(...)` and returned by `gen_frame(...)` is now a sequence of 4 frames. Waypoint-1.5 applies temporal compression and generates 4 frames for every controller input. |
There was a problem hiding this comment.
describe the implications this has for frame pacing; what's the correct way to feed inputs and display the rendered frames to the user?
| @@ -77,14 +77,25 @@ for controller_input in [ | |||
| img = engine.gen_frame(ctrl=controller_input) | |||
There was a problem hiding this comment.
this probably needs to be updated for 4-frame use, or this snippet should be deleted entirely and pointed at one of the examples
There was a problem hiding this comment.
IMO, the Waypoint-1.5 clarification below on the nature of img is sufficient
There was a problem hiding this comment.
I'd add a comment pointing to the clarification below, so they have an idea of what to expect for the shape of img
examples/gen_sample.py
Outdated
| "https://gist.github.com/user-attachments/assets/68c943a4-008a-4c25-948c-c81ab4c47d21", | ||
| ]) | ||
| frame = cv2.imdecode(np.frombuffer(urllib.request.urlopen(url).read(), np.uint8), cv2.IMREAD_COLOR) | ||
| engine.append_frame(torch.from_numpy(np.repeat(frame[None], 4, axis=0))) |
There was a problem hiding this comment.
branch on whether it's a WP-1 or WP-1.5 model and change the append behaviour accordingly; add a comment indicating that we're repeating to meet the 4-frame requirement for WP-1.5
There was a problem hiding this comment.
IMHO, new users shouldn't be directed towards WP1 at all. Make integration possible, but move away from backwards compatible examples.
There was a problem hiding this comment.
Hmm; yeah, that's fine.
However, at the risk of potentially causing you a lot of grief, perhaps the solution is to make WP-1 use work against [1, H, W, 3], so that consuming code always works with [N, H, W, 3] shapes, where N is 1 for WP-1, or config.temporal_compression for newer models? (including updating all of the examples to work over config.temporal_compression/tensor shape as opposed to 4)
This is already a breaking change for consumers, so this kind of unification makes sense to me, and it means that downstream code should Just Work:tm: (in the sense that you pace out at gentime/N, which gracefully degrades when N=1).
examples/gen_sample.py
Outdated
| if __name__ == "__main__": | ||
| gen_vid() | ||
| # Set seed frame | ||
| url = random.choice([ |
There was a problem hiding this comment.
can we include seed images within the repo and point to local files instead? easier to hack / see what's going on
There was a problem hiding this comment.
Thoughts on me pointing to Biome repo images?
There was a problem hiding this comment.
That's fine once we've locked them in, which we haven't quite yet done (stalled on generating decently-ID images). Ok to leave this as-is for now, and then we can replace them just before release, I think.
examples/gen_sample.py
Outdated
| with iio.imopen("out.mp4", "w", plugin="pyav") as out: | ||
| out.write(engine.gen_frame().cpu().numpy(), fps=60, codec="libx264") | ||
| for ctrl in controller_sequence: | ||
| out.write(engine.gen_frame(ctrl=ctrl).cpu().numpy()) |
There was a problem hiding this comment.
not super obvious how the inputs map to frames here, especially in the 4-frame model; does this mean that we're supplying one of the inputs once every four frames? how would I do multiple different inputs within that four-frame block?
There was a problem hiding this comment.
Is this sufficient?
four_frames = engine.gen_frame().cpu().numpy() # int8 [4, H, W, 3]
out.write(four_frames, fps=60, codec="libx264")
There was a problem hiding this comment.
I'd use config.temporal_compression for clarity
There was a problem hiding this comment.
Er, wait, that suggestion's for the seed frame; I think that still doesn't address my issue, which is "how do the inputs that I, as a user, get mapped to inputs under the temporally compressed regime?" Do I bundle together the last four frames of inputs? Do I only send inputs from the current frame, so that only one-fourth of the inputs make it through?
examples/benchmark.py
Outdated
| @@ -1,12 +1,17 @@ | |||
| """ | |||
| Additional Dependencies: pytest-benchmark | |||
There was a problem hiding this comment.
can we move all of the additional deps into pyproject.toml using https://docs.astral.sh/uv/concepts/projects/dependencies/#dependency-groups so that users can do uv run --dev pytest examples/benchmark.py? ditto for the other examples
we want getting up to speed with WE to be as easy as possible; ideally, you clone a repo and run uv run --dev examples/gen_sample.py Overworld/Waypoint-1.5-1B with no additional steps to see the WE do its thing (uv run should do all the intermediate work)
There was a problem hiding this comment.
Updated pyproject, now the following comments work:
# MODEL_URI="Overworld/Waypoint-1.5-1B" uv run --dev pytest examples/benchmark.py
# uv run --dev examples/gen_sample.py Overworld/Waypoint-1.5-1B
(Model should be Overworld-Models/MR160k for now)
| else None | ||
| ) | ||
| w_amax = lin.weight.data.clone().amax().float().squeeze() | ||
| w_amax = lin.weight.data.abs().amax() |
There was a problem hiding this comment.
Out of scope, minor bug fix
examples/gen_sample.py
Outdated
| "https://gist.github.com/user-attachments/assets/68c943a4-008a-4c25-948c-c81ab4c47d21", | ||
| ]) | ||
| frame = cv2.imdecode(np.frombuffer(urllib.request.urlopen(url).read(), np.uint8), cv2.IMREAD_COLOR) | ||
| engine.append_frame(torch.from_numpy(np.repeat(frame[None], 4, axis=0))) |
There was a problem hiding this comment.
IMHO, new users shouldn't be directed towards WP1 at all. Make integration possible, but move away from backwards compatible examples.
examples/gen_sample.py
Outdated
| with iio.imopen("out.mp4", "w", plugin="pyav") as out: | ||
| out.write(engine.gen_frame().cpu().numpy(), fps=60, codec="libx264") | ||
| for ctrl in controller_sequence: | ||
| out.write(engine.gen_frame(ctrl=ctrl).cpu().numpy()) |
There was a problem hiding this comment.
Is this sufficient?
four_frames = engine.gen_frame().cpu().numpy() # int8 [4, H, W, 3]
out.write(four_frames, fps=60, codec="libx264")
examples/gen_sample.py
Outdated
| if __name__ == "__main__": | ||
| gen_vid() | ||
| # Set seed frame | ||
| url = random.choice([ |
There was a problem hiding this comment.
Thoughts on me pointing to Biome repo images?
examples/benchmark.py
Outdated
| @@ -1,12 +1,17 @@ | |||
| """ | |||
| Additional Dependencies: pytest-benchmark | |||
There was a problem hiding this comment.
Updated pyproject, now the following comments work:
# MODEL_URI="Overworld/Waypoint-1.5-1B" uv run --dev pytest examples/benchmark.py
# uv run --dev examples/gen_sample.py Overworld/Waypoint-1.5-1B
(Model should be Overworld-Models/MR160k for now)
| @@ -77,14 +77,25 @@ for controller_input in [ | |||
| img = engine.gen_frame(ctrl=controller_input) | |||
There was a problem hiding this comment.
IMO, the Waypoint-1.5 clarification below on the nature of img is sufficient
ScottieFox
left a comment
There was a problem hiding this comment.
So far, branch is stable for anticipation of WP1.5 model behavior and its communication to server.py loaded into the .stream service. The .stream product is not exposed to the end user, and thus all further changes should have BIOME as its consideration as long as compatibility of function exists between both.
Waypoint-1.5 required changes
f_pos(incremental frame position)load_state_dictconverts the weights to a format compatible with the inference engine if model is WP1.5auto_aspect_ratiodefaults to True, impacts WP1.5 only, enforces inputs / outputs are 720p or 360p.Misc changes not specific to Waypoint-1.5
COMPILE_OPTIONSfor more throughputget_stateandload_stateload_weights=False, to create a randomly initialized model for benchmarkingtorch.as_tensorexamples/gen_sample.pyfor WP1.5