<a href="https://colab.research.google.com/github/LoniQin/Large-Language-Models-A-Collection-of-Resources/blob/main/LLM_with_Mamba.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## LLM with Mamba

Mamba is a simplified end-to-end neural network architecture without attention or even MLP blocks. It achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.


## References
* [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)

## Installation

In [12]:
!pip install causal-conv1d>=1.1.0
!pip install mamba-ssm

Collecting argparse (from buildtools->causal-conv1d>=1.1.0->mamba-ssm)
  Using cached argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Installing collected packages: argparse
Successfully installed argparse-1.4.0


## Import packages

In [26]:
import torch
import gc
from transformers import AutoTokenizer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel

## Load Mamba

In [5]:
device = "cuda"
model = MambaLMHeadModel.from_pretrained("state-spaces/mamba-2.8b", device=device, dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/11.1G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/156 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/457k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

In [13]:
model

MambaLMHeadModel(
  (backbone): MixerModel(
    (embedding): Embedding(50280, 2560)
    (layers): ModuleList(
      (0-63): 64 x Block(
        (mixer): Mamba(
          (in_proj): Linear(in_features=2560, out_features=10240, bias=False)
          (conv1d): Conv1d(5120, 5120, kernel_size=(4,), stride=(1,), padding=(3,), groups=5120)
          (act): SiLU()
          (x_proj): Linear(in_features=5120, out_features=192, bias=False)
          (dt_proj): Linear(in_features=160, out_features=5120, bias=True)
          (out_proj): Linear(in_features=5120, out_features=2560, bias=False)
        )
        (norm): RMSNorm()
      )
    )
    (norm_f): RMSNorm()
  )
  (lm_head): Linear(in_features=2560, out_features=50280, bias=False)
)

In [14]:
## Fix issue that libcuda.so cannot found! in Google Codlab
!echo /usr/lib64-nvidia/ >/etc/ld.so.conf.d/libcuda.conf; ldconfig

/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link



Since this pretrained model haven't been fine-tuned to make conversation yet. I wouldn't call it a Chatbot.

In [30]:
%%time
question = """
Hi, my name is Tom.
"""
tokens = tokenizer(question, return_tensors="pt")
input_ids = tokens.input_ids.to(device=device)
attn_mask = tokens.attention_mask.to(device=device)
generated_ids = model.generate(
  input_ids=input_ids,
  max_length=200
)
text = tokenizer.batch_decode(generated_ids)[0]
print(text)
gc.collect()


Hi, my name is Tom.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm a software engineer.
I'm
CPU times: user 8.98 s, sys: 3.42 ms, total: 8.99 s
Wall time: 8.97 s


660

In [31]:
%%time
question = """
Ths is my new year plan:
"""
tokens = tokenizer(question, return_tensors="pt")
input_ids = tokens.input_ids.to(device=device)
attn_mask = tokens.attention_mask.to(device=device)
generated_ids = model.generate(
  input_ids=input_ids,
  max_length=200
)
text = tokenizer.batch_decode(generated_ids)[0]
print(text)
gc.collect()


Ths is my new year plan:

1. Get a new job.

2. Get a new car.

3. Get a new house.

4. Get a new girlfriend.

5. Get a new job.

6. Get a new car.

7. Get a new house.

8. Get a new girlfriend.

9. Get a new job.

10. Get a new car.

11. Get a new house.

12. Get a new girlfriend.

13. Get a new job.

14. Get a new car.

15. Get a new house.

16. Get a new girlfriend.

17. Get a new job.

18. Get a new car.

19. Get a new house.

20. Get a new girlfriend.

21. Get a new job.


CPU times: user 9.22 s, sys: 1.44 ms, total: 9.22 s
Wall time: 9.2 s


600

In [33]:
%%time
question = "Following books could be used to learn deep learning:"
tokens = tokenizer(question, return_tensors="pt")
input_ids = tokens.input_ids.to(device=device)
attn_mask = tokens.attention_mask.to(device=device)
generated_ids = model.generate(
  input_ids=input_ids,
  max_length=200
)
text = tokenizer.batch_decode(generated_ids)[0]
print(text)
gc.collect()

Following books could be used to learn deep learning:

Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville

Deep Learning by Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov

Deep Learning by Yann LeCun, Yoshua Bengio, Aaron Courville

Deep Learning by Andrew Ng

Deep Learning by Jeff Dean, Geoffrey Hinton

Deep Learning by Yann LeCun, Yoshua Bengio, Aaron Courville

Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville

Deep Learning by Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov

Deep Learning by Yann LeCun, Yoshua Bengio, Aaron Courville

Deep Learning by
CPU times: user 8.79 s, sys: 3.23 ms, total: 8.79 s
Wall time: 8.77 s


360

In [42]:
%%time
question = "Jack: What is the best book in the world? Alice: "
tokens = tokenizer(question, return_tensors="pt")
input_ids = tokens.input_ids.to(device=device)
attn_mask = tokens.attention_mask.to(device=device)
generated_ids = model.generate(
  input_ids=input_ids,
  max_length=200
)
text = tokenizer.batch_decode(generated_ids)[0]
print(text)
gc.collect()

Jack: What is the best book in the world? Alice: erm... I don't know. Jack: What is the best book in the world? Alice: I don't know. Jack: What is the best book in the world? Alice: I don't know. Jack: What is the best book in the world? Alice: I don't know. Jack: What is the best book in the world? Alice: I don't know. Jack: What is the best book in the world? Alice: I don't know. Jack: What is the best book in the world? Alice: I don't know. Jack: What is the best book in the world? Alice: I don't know. Jack: What is the best book in the world? Alice: I don't know. Jack: What is the best book in the world? Alice: I don't know. Jack: What is the best book in the world? Alice: I don't know
CPU times: user 9.07 s, sys: 8.19 ms, total: 9.08 s
Wall time: 9.06 s


360