This notebook was executed in Google Colab using A100-GPU

### Start of execution

In [1]:
import time

In [2]:
start = time.time()

# 1. Setting the environment

In [3]:
!pip install -q datasets==2.20.0

# 2. Import libraries

In [4]:
import warnings
warnings.filterwarnings("ignore")

In [5]:
import pandas as pd
from datasets import Dataset
from transformers import AutoTokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

# 3. Load model

In [6]:
model = MambaForCausalLM.from_pretrained("state-spaces/mamba-130m-hf")

The fast path is not available because on of `(selective_state_update, selective_scan_fn, causal_conv1d_fn, causal_conv1d_update, mamba_inner_fn)` is None. Falling back to the naive implementation. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal-conv1d


In [7]:
tokenizer = AutoTokenizer.from_pretrained("state-spaces/mamba-130m-hf")

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# 4. Inference

In [8]:
text = "Hey how are you doing?"

In [9]:
input_ids = tokenizer(text, return_tensors= "pt")["input_ids"]

In [10]:
out = model.generate(input_ids, max_new_tokens=10)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


In [11]:
print(tokenizer.batch_decode(out))

["Hey how are you doing?\n\nI'm so glad you're here."]


### End of execution

In [12]:
end = time.time()

delta = (end - start)

hours = int(delta/3_600)
mins = int((delta - hours*3_600)/60)
secs = int(delta - hours*3_600 - mins*60)

print(f'Hours: {hours}, Minutes: {mins}, Seconds: {secs}')

Hours: 0, Minutes: 0, Seconds: 13
