Add Mamba (minimal) #918

swfsql · 2024-02-02T00:35:58Z

Ports a minimal (non-optimized) implementation of Mamba (submitted on 2023-12), highly related to S4 (submitted on 2021-10).

In short and simple terms, Mamba is an alternative, with trade-offs, to the attention mechanism. Mamba can be used in RNNs that steps over a single sequence point at a time (instead of requiring to observe multiple sequence points at the same time, but it needs to carry the previous state over), and so it's memory and time requirements are fixed for each sequence point.

Implementation references:

This pr requires others (some of which are drafts or are useful for an app using this Module):

The commits specific to this Mamba pr are:

4b9de1c...165abc9

Tasks

Youtube Videos

S4

Mamba

- Makes the safetensors module private. - Doesn't get exported on the preamble, avoiding a naming clash with the safetensors external crate. - Change how and when the period is inserted. - This should make it closer to how the fields are accessed in the code.

- Add the try_normalize_rms related functions. - Add the `LayerRMSNorm1D` module.

- Add `TrySplitShapeAlong` and `TrySplitTensorAlong`. - Minor linting and docs fix. TODO - Check if the tape should be returned. If not, it can be removed from the interface. - Add cuda kernel. - Consider a different interface, where it could get split in more than two tensors - possibly stated on a vec. In this way it could get closer to the pytorch interface (chunks).

- Also added `from_fn` for Arrays. Note: the interface currently requires two passes for construction, one for creating a list of tensors with NoneTape and another for putting tapes into those tensors.

Remove ftz

This alternative method: - Requires load/read to decide whether it should skip missing tensors; - Requires load/read/save/write to decide how should keys be mapped.

- Add stateless forward impl. - Efficient for training (but training is not yet implemented). - Input requires the entire sequence, and requires no state cache. - Generates one output for each input sequence. - Add stateful forward impl. - Efficient for inference. - Input requires the last single sequence point, and requires the last state cache. - Generates a single output referring to the last input.

swfsql · 2024-03-01T15:00:26Z

I'll prioritize moving this experiment to a separate crate, but feel free to ping in case anyone have some question or suggestion.
Edit: moved to here.

rainiwu and others added 14 commits January 26, 2024 00:29

remove deprecated ftz intrinsics

5c532ec

suppress spurious cargo clippy warning

fb91f13

impl core::ops::Sub for Dim types

901cfe4

add SiLU activation function

a14b40b

add RMS normalization

b52932c

- Add the try_normalize_rms related functions. - Add the `LayerRMSNorm1D` module.

rm unrelated derive

de55567

Merge branch 'silu' into mamba-minimal

3122f78

Merge branch 'split-tensor-along' into mamba-minimal

ace3808

Merge branch 'rms-norm' into mamba-minimal

f6d06e0

Added TryUnstack for tensors.

ea424c3

- Also added `from_fn` for Arrays. Note: the interface currently requires two passes for construction, one for creating a list of tensors with NoneTape and another for putting tapes into those tensors.

fix wgpu signature

5994ac5

Merge pull request #1 from rainiwu/remove-ftz

24a8593

Remove ftz

swfsql force-pushed the mamba-minimal branch 2 times, most recently from cadf65c to 9a2cf25 Compare February 7, 2024 22:02

add continuity requirement for unstack

5ffff2d

swfsql force-pushed the mamba-minimal branch from 9a2cf25 to 1e02f9f Compare February 8, 2024 03:17

Added {load/read/save/write}_safetensor_with methods

e883b28

This alternative method: - Requires load/read to decide whether it should skip missing tensors; - Requires load/read/save/write to decide how should keys be mapped.

swfsql force-pushed the mamba-minimal branch 2 times, most recently from 1207867 to ce6d624 Compare February 9, 2024 17:27

swfsql added 4 commits February 9, 2024 12:42

unstack fixes

c695a15

Merge branch 'unstack' into mamba-root

4141e06

Merge branch 'safetensors-change' into mamba-root

8202b20

Merge remote-tracking branch 'origin/avoid-ci-errors' into mamba-root

34234e2

swfsql force-pushed the mamba-minimal branch 2 times, most recently from 3f392a6 to 165abc9 Compare February 9, 2024 17:55

swfsql added 3 commits February 19, 2024 21:52

silu: fix cpu df

93202ad

allow to load safetensors from a byte array

eb70a88

avoid conv1d bound for cudnn

fde7a40

swfsql added 4 commits February 19, 2024 21:52

bump gemm

75d63cd

clippy fix

f0bcb9a

add nightly requirement for mamba-minimal

bff1b65

swfsql force-pushed the mamba-minimal branch from 165abc9 to bff1b65 Compare February 20, 2024 02:52

swfsql closed this Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mamba (minimal) #918

Add Mamba (minimal) #918

swfsql commented Feb 2, 2024 •

edited

Loading

swfsql commented Mar 1, 2024 •

edited

Loading

Add Mamba (minimal) #918

Add Mamba (minimal) #918

Conversation

swfsql commented Feb 2, 2024 • edited Loading

Tasks

Youtube Videos

S4

Mamba

swfsql commented Mar 1, 2024 • edited Loading

swfsql commented Feb 2, 2024 •

edited

Loading

swfsql commented Mar 1, 2024 •

edited

Loading