## Import and install dependencies

In [1]:
!pip install causal-conv1d>=1.2.0
!pip install mamba-ssm

Collecting mamba-ssm
  Downloading mamba_ssm-1.2.0.post1.tar.gz (34 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting einops (from mamba-ssm)
  Downloading einops-0.8.0-py3-none-any.whl.metadata (12 kB)
Collecting triton (from mamba-ssm)
  Downloading triton-2.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Downloading einops-0.8.0-py3-none-any.whl (43 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading triton-2.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (168.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.1/168.1 MB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hBuilding wheels for collected packages: mamba-ssm
  Building wheel for mamba-ssm (setup.py) ... [?25ldone
[?25h  Created wheel for mamba-ssm: filename=mamba_ssm-1.2.0.post1-cp310-cp310-linux_x86_64.whl size=137581036 sha256=37a782

In [2]:
import json
import numpy as np
import random
import torch
import time
import pandas as pd
import gc

Making the notebook deterministic

In [3]:
def fix_random(seed: int) -> None:
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)

    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True


fix_random(seed=42)

In [4]:
data_path = "/kaggle/input/squad0"

This function loads a little SQuAD dataset for evaluation.

In [5]:
def load(path):
    data = []
    with open(f"{data_path}/squad_val_1k.jsonl", "r") as file:
          for line in file:
              try:
                  data.append(json.loads(line))
              except Exception as e:
                  print("json processing exception", e)
                  continue
    return data

In [6]:
data = load('squad_val_1k.jsonl')
print(data[0])

{'context': 'The Panthers beat the Seattle Seahawks in the divisional round, running up a 31–0 halftime lead and then holding off a furious second half comeback attempt to win 31–24, avenging their elimination from a year earlier. The Panthers then blew out the Arizona Cardinals in the NFC Championship Game, 49–15, racking up 487 yards and forcing seven turnovers.', 'prompt': 'How many yards did the Panthers get for the division championshipt game?', 'response': '487'}


This functions necessary for evaluating the models. More description below.

In [7]:
#This function generate the prompt. If we set the few_shot parameter to True, then it will generate
#a prompt, which uses few-shot prompting technique, otherwise not.
def generate_prompt(few_shot, question_with_context):
    questions = []
    if few_shot:
        three_shot_prompting = [
        {
            "context" : "In 1993, the FCC repealed the Financial Interest and Syndication Rules, once again allowing networks to hold interests in television production studios. That same year, Capital Cities/ABC purchased the French animation studio DIC Entertainment; it also signed an agreement with Time Warner Cable to carry its owned-and-operated television stations on the provider's systems in ABC O&O markets. By that year, ABC had a total viewership share of 23.63% of American households, just below the limit of 25% imposed by the FCC.",
            "question": "What French animation studio did ABC purchase in 1993?",
            "answer": "DIC Entertainment",
        },
        {
            "context" : "On April 12, 1961, Soviet cosmonaut Yuri Gagarin became the first person to fly in space, reinforcing American fears about being left behind in a technological competition with the Soviet Union. At a meeting of the US House Committee on Science and Astronautics one day after Gagarin's flight, many congressmen pledged their support for a crash program aimed at ensuring that America would catch up. Kennedy was circumspect in his response to the news, refusing to make a commitment on America's response to the Soviets.",
            "question": "How many days after Gagarin's flight did the US House Committee on Science and Astronautics meet?",
            "answer": "one day",
        },
        {
            "context" : "The customary law of Normandy was developed between the 10th and 13th centuries and survives today through the legal systems of Jersey and Guernsey in the Channel Islands. Norman customary law was transcribed in two customaries in Latin by two judges for use by them and their colleagues: These are the Très ancien coutumier (Very ancient customary), authored between 1200 and 1245; and the Grand coutumier de Normandie (Great customary of Normandy, originally Summa de legibus Normanniae in curia laïcali), authored between 1235 and 1245.",
            "question": "Where are Jersey and Guernsey?",
            "answer": "Channel Islands",
        }
        ]
        for item in question_with_context:
            prompt = f"You are a question answering bot. Your task is to answer the questions based on the appropriate contexts and your own knowledge. Your answers should contain only the most important things and should be as short as possible."
            prompt = f"{prompt}\n\n" + "\n\n".join([f"{p['context']}\n\nQ: {p['question']}\n\nA: {p['answer']}" for p in three_shot_prompting])
            prompt = f"{prompt}\n\n{item}\n\nA:"
            questions.append(prompt)
    else:
        for item in question_with_context:
            prompt = f"You are a question answering bot. Your task is to answer the questions based on the appropriate contexts and your own knowledge. Your answers should contain only the most important things and should be as short as possible."
            prompt = f"{prompt}\n\n{item}\n\nA:"
            questions.append(prompt)
    return questions

#This function generate the text for the prompt.
def run_with_SQuAD(model, tokenizer, question, few_shot):
    prompt = generate_prompt(few_shot, question)
    
    inputs = tokenizer(prompt, padding=True, truncation=True, return_tensors="pt")
    input_ids = inputs["input_ids"].cuda()
    attention_mask = inputs["attention_mask"].cuda()
    model.cuda()

    out = model.generate(
        input_ids=input_ids,
        max_length=input_ids.shape[1] + 128,
        eos_token_id=tokenizer.eos_token_id,
        attention_mask=attention_mask
    )
    
    answers = []
    num_tokens = []
    
    decoded = tokenizer.batch_decode(out)

    for decodedText, promptText in zip(decoded, prompt):
        cleaned = decodedText.replace("<|endoftext|>", "")
        num_tokens.append(len(cleaned) - len(promptText))
        cleaned = cleaned.replace(promptText, "")
        answer = cleaned.split("\n\n")[0].strip()
        answers.append(answer)

    return answers, num_tokens

#In this function we making the evaluation for every single datapoint. During the generation we
#are making a statistics about the generation.
def eval(data, model, tokenizer, output_file, few_shot, batch_size):
    numberOfTokens = 0
    timeOfStart = 0
    results = []
    total_qs = len(data)
    correct = 0
    i = 0

    for batch_start in range(0, total_qs, batch_size):
        batch_end = min(batch_start + batch_size, total_qs)
        batch_data = data[batch_start:batch_end] 

        # Prepare batch inputs
        questions = [item['prompt'] for item in batch_data]
        contexts = [item['context'] for item in batch_data]
        answers = [item['response'] for item in batch_data]
        inputs = [f"{ctx}\n\nQ: {que}" for ctx, que in zip(contexts, questions)]
        
        # Generate outputs in a batch
        start_time = time.time()
        batch_guesses, batch_num_tokens = run_with_SQuAD(model, tokenizer, inputs, few_shot)
        end_time = time.time()
        timeOfStart += end_time - start_time

        # Process results 
        for guess, num_tokens, answer, question, context in zip(batch_guesses, batch_num_tokens, answers, questions, contexts):
            if guess and guess[-1] == '.' and answer[-1] != '.':
                guess = guess[:-1]
            numberOfTokens += num_tokens
            tkps = num_tokens / (end_time - start_time)
            is_correct = (answer.strip().lower() == guess.strip().lower())
            print(f"Question {i+1}/{total_qs}")
            print(f"Q: {question}")
            print(f"A: {answer}")
            print(f"?: {guess}")
            if is_correct:
                print(f"✅")
                correct += 1
            else:
                print(f"❌")
            print("="*80)
            result = {
                "idx": i,
                "question": question,
                "context": context,
                "answer": answer,
                "guess": guess,
                "is_correct": is_correct,
                "time": end_time - start_time,
                "num_tokens": num_tokens,
                "tokens_per_sec": tkps
            }
            results.append(result)
            i += 1
            
            if len(results) % 20 == 0:
                write_results(results, output_file)

    print(f"Accuracy: {correct / total_qs * 100}% -- {correct} correct and {total_qs - correct} incorrect")
    print(f"Number of tokens generated: {numberOfTokens} -- Time: {timeOfStart} -- Tokens-Per-Sec: {numberOfTokens / timeOfStart}")
    
#This function writes out the results.
def write_results(results, output_file):
    df = pd.DataFrame(results)
    df = df[["idx", "question", "context", "answer", "guess", "is_correct", "time", "num_tokens", "tokens_per_sec"]]
    print(f"Writing {output_file}")
    df.to_json(output_file, orient="records", lines=True)

## 130m model

In [8]:
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer

tokenizer130m = AutoTokenizer.from_pretrained("state-spaces/mamba-130m-hf", padding_side='left')
model130m = MambaForCausalLM.from_pretrained("state-spaces/mamba-130m-hf")
tokenizer130m.eos_token = "<|endoftext|>"
tokenizer130m.pad_token = tokenizer130m.eos_token

tokenizer_config.json:   0%|          | 0.00/4.79k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/895 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/517M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [9]:
eval(data[:100], model130m, tokenizer130m, "evalMamba130M.json", few_shot=False, batch_size=32)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Question 1/100
Q: How many yards did the Panthers get for the division championshipt game?
A: 487
?: The Panthers got a total of 7.5 yards per carry
❌
Question 2/100
Q: What caused Jacksonville's tourism to become less desirable at the latter half of the 19th century?
A: yellow fever outbreaks
?: The city's population grew rapidly and the city's economy was heavily reliant on tourism. The city's population grew from about 1,000 in 1846 to about 2,000 in 1877. The city's economy was heavily reliant on tourism. The city's economy was heavily reliant on tourism. The city's economy was heavily reliant on tourism. The city's economy was heavily reliant on tourism. The city's economy was heavily reliant on tourism. The city's economy was heavily reliant on tourism. The city's economy was heavily reliant on tourism. The city's economy was heavily reliant on tourism
❌
Question 3/100
Q: What are pharmacists in the United Kingdom being increasingly paid for?
A: medicine use reviews
?: Pharmacist

In [10]:
eval(data[:100], model130m, tokenizer130m, "evalMamba130M_Few_shot.json", few_shot=True, batch_size=32)

Question 1/100
Q: How many yards did the Panthers get for the division championshipt game?
A: 487
?: The Panthers scored a total of 1,923 yards on the ground, including 1,923 yards on the ground in the divisional round
❌
Question 2/100
Q: What caused Jacksonville's tourism to become less desirable at the latter half of the 19th century?
A: yellow fever outbreaks
?: The city's population was growing rapidly, and the city's economy was in decline. The city's population was growing at a rate of about 1.5% per year, and the city's population was growing at a rate of about 1.5% per year. The city's population was growing at a rate of about 1.5% per year. The city's population was growing at a rate of about 1.5% per year. The city's population was growing at a rate of about 1.5% per year. The city's population was growing at a rate of about 1.5% per year. The city
❌
Question 3/100
Q: What are pharmacists in the United Kingdom being increasingly paid for?
A: medicine use reviews
?: Pharmacist

In [11]:
tokenizer130m = None
model130m = None

gc.collect()

475

## 790M model

In [12]:
tokenizer790m = AutoTokenizer.from_pretrained("state-spaces/mamba-790m-hf", padding_side='left')
model790m = MambaForCausalLM.from_pretrained("state-spaces/mamba-790m-hf")
tokenizer790m.eos_token = "<|endoftext|>"
tokenizer790m.pad_token = tokenizer790m.eos_token

tokenizer_config.json:   0%|          | 0.00/4.79k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/878 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.17G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [13]:
eval(data[:100], model790m, tokenizer790m, "evalMamba790M.json", few_shot=False, batch_size=32)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Question 1/100
Q: How many yards did the Panthers get for the division championshipt game?
A: 487
?: 31
❌
Question 2/100
Q: What caused Jacksonville's tourism to become less desirable at the latter half of the 19th century?
A: yellow fever outbreaks
?: The city's population declined from about 100,000 in 1900 to about 50,000 in 1920. The city's economy was based on the railroad and the city's growth was limited by the railroad's expansion. The city's population declined from about 100,000 in 1900 to about 50,000 in 1920. The city's economy was based on the railroad and the city's growth was limited by the railroad's expansion. The city's population declined from about 100,000 in 1900 to about 50,000 in 1920. The city's economy was based on the railroad and the city's growth was limited by the railroad's expansion. The city
❌
Question 3/100
Q: What are pharmacists in the United Kingdom being increasingly paid for?
A: medicine use reviews
?: The United Kingdom has a National Health Servi

In [14]:
eval(data[:100], model790m, tokenizer790m, "evalMamba790m_Few_shot.json", few_shot=True, batch_size=32)

Question 1/100
Q: How many yards did the Panthers get for the division championshipt game?
A: 487
?: 31
❌
Question 2/100
Q: What caused Jacksonville's tourism to become less desirable at the latter half of the 19th century?
A: yellow fever outbreaks
?: Yellow fever
❌
Question 3/100
Q: What are pharmacists in the United Kingdom being increasingly paid for?
A: medicine use reviews
?: Pharmaceutical care
❌
Question 4/100
Q: Geoglyphs dating to what period were found in deforested land along the Amazon River?
A: AD 0–1250
?: Amazon rainforest
❌
Question 5/100
Q: Who does a gender pay gap tend to favor?
A: males in the labor market
?: men
❌
Question 6/100
Q: What is the annual construction industry revenue in 2014?
A: $960 billion
?: $9.1 trillion
❌
Question 7/100
Q: Which sculpture by Michelangelo has a full-size replica in the Cast Courts?
A: David
?: David
✅
Question 8/100
Q: Richard Allen and Absalom Jones were licensed by St. George's Church in what year?
A: 1784
?: 1784
✅
Question 9/1

In [16]:
tokenizer790m = None
model790m = None

gc.collect()

0

## 2.8B model

In [17]:
tokenizer2_8b = AutoTokenizer.from_pretrained("state-spaces/mamba-2.8b-hf", padding_side='left')
model2_8b = MambaForCausalLM.from_pretrained("state-spaces/mamba-2.8b-hf")
tokenizer2_8b.eos_token = "<|endoftext|>"
tokenizer2_8b.pad_token = tokenizer2_8b.eos_token

tokenizer_config.json:   0%|          | 0.00/4.79k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/843 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/50.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/1.15G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [18]:
eval(data[:100], model2_8b, tokenizer2_8b, "evalMamba2_8b.json", few_shot=False, batch_size=8)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Question 1/100
Q: How many yards did the Panthers get for the division championshipt game?
A: 487
?: They got 603 yards
❌
Question 2/100
Q: What caused Jacksonville's tourism to become less desirable at the latter half of the 19th century?
A: yellow fever outbreaks
?: The yellow fever outbreaks in the late 19th century
❌
Question 3/100
Q: What are pharmacists in the United Kingdom being increasingly paid for?
A: medicine use reviews
?: In the United Kingdom, pharmacists who undertake additional training are obtaining prescribing rights and this is because of pharmacy education. They are also being paid for consultant pharmacists, who traditionally operated primarily in nursing homes are now expanding into direct consultation with patients, under the banner of "senior care pharmacy."
❌
Question 4/100
Q: Geoglyphs dating to what period were found in deforested land along the Amazon River?
A: AD 0–1250
?: The geoglyphs were discovered in 1977 by Ondemar Dias and Alceu Ranzi
❌
Question 5/1

In [19]:
eval(data[:100], model2_8b, tokenizer2_8b, "evalMamba2_8b_Few_shot.json", few_shot=True, batch_size=8)

Question 1/100
Q: How many yards did the Panthers get for the division championshipt game?
A: 487
?: 31
❌
Question 2/100
Q: What caused Jacksonville's tourism to become less desirable at the latter half of the 19th century?
A: yellow fever outbreaks
?: Yellow fever outbreaks
✅
Question 3/100
Q: What are pharmacists in the United Kingdom being increasingly paid for?
A: medicine use reviews
?: medication reviews
❌
Question 4/100
Q: Geoglyphs dating to what period were found in deforested land along the Amazon River?
A: AD 0–1250
?: 1540s
❌
Question 5/100
Q: Who does a gender pay gap tend to favor?
A: males in the labor market
?: women
❌
Question 6/100
Q: What is the annual construction industry revenue in 2014?
A: $960 billion
?: $960 billion
✅
Question 7/100
Q: Which sculpture by Michelangelo has a full-size replica in the Cast Courts?
A: David
?: David
✅
Question 8/100
Q: Richard Allen and Absalom Jones were licensed by St. George's Church in what year?
A: 1784
?: 1784
✅
Question 9/100

In [20]:
tokenizer2_8b = None
model2_8b = None

gc.collect()

93