## Import and install dependencies

In [1]:
!pip install causal-conv1d>=1.2.0
!pip install mamba-ssm

Collecting mamba-ssm
  Downloading mamba_ssm-1.2.0.post1.tar.gz (34 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting einops (from mamba-ssm)
  Downloading einops-0.8.0-py3-none-any.whl.metadata (12 kB)
Collecting triton (from mamba-ssm)
  Downloading triton-2.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Downloading einops-0.8.0-py3-none-any.whl (43 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading triton-2.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (168.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.1/168.1 MB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hBuilding wheels for collected packages: mamba-ssm
  Building wheel for mamba-ssm (setup.py) ... [?25ldone
[?25h  Created wheel for mamba-ssm: filename=mamba_ssm-1.2.0.post1-cp310-cp310-linux_x86_64.whl size=137581036 sha256=37a782

In [2]:
import numpy as np
import torch
import random
import gc

Making the notebook deteministic

In [3]:
def fix_random(seed: int) -> None:
    np.random.seed(seed)
    random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)

    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True


fix_random(seed=42)

You can generate the text with 3-shot prompting or without 3-shot prompting.

In [4]:
def question(question, model, tokenizer, few_shot=True):
    model.cuda()
    if not few_shot:
        input_ids = tokenizer(question, return_tensors="pt")["input_ids"].cuda()
        out = model.generate(input_ids, max_new_tokens=256)
        response = tokenizer.batch_decode(out)[0]
        print(response)
    else:
        three_shot_prompting = [
            {
                "question": "What is the capital of Hungary?",
                "answer": "Budapest"
            },
            {
                "question": "Which is the most populous country?",
                "answer": "India"
            },
            {
                "question": "Who is Isaac Newton?",
                "answer": "Sir Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher. He was a key figure in the Scientific Revolution and the Enlightenment that followed. "
            }
        ]
        
        prompt = f"You are a question answering bot. Please answer the questions to the best of your knowledge."
        prompt = f"{prompt}\n\n" + "\n\n".join([f"Q: {p['question']}\nA: {p['answer']}" for p in three_shot_prompting])
        prompt = f"{prompt}\n\nQ: {user_message}"
        input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].cuda()

        out = model.generate(input_ids, max_new_tokens=256)
        response = tokenizer.batch_decode(out)[0]
        response = user_message + response.replace(prompt, "")
        responseQuery = "Q: " + response.split("\n\n")[0]
        print(responseQuery)

Every model contain three questions and you can see the responses for these questions.

## Mamba 130m model

In [5]:
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer

tokenizer130m = AutoTokenizer.from_pretrained("state-spaces/mamba-130m-hf")
model130m = MambaForCausalLM.from_pretrained("state-spaces/mamba-130m-hf")

tokenizer_config.json:   0%|          | 0.00/4.79k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/895 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/517M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [6]:
user_message = f"Who is Albert Einstein?"
question(user_message, model130m, tokenizer130m, few_shot=False)

Who is Albert Einstein?

Einstein was born in 1869 in Berlin, Germany. He was the son of a Jewish family. He was educated at the University of Berlin and the University of Vienna. He was a member of the Royal Academy of Sciences and the Royal Society of London. He was a member of the Royal Society of London and the Royal Society of Edinburgh. He was a member of the Royal Society of New Zealand. He was a member of the Royal Society of South Africa. He was a member of the Royal Society of Canada. He was a member of the Royal Society of Australia. He was a member of the Royal Society of Edinburgh. He was a member of the Royal Society of London. He was a member of the Royal Society of South Africa. He was a member of the Royal Society of New Zealand. He was a member of the Royal Society of Australia. He was a member of the Royal Society of Canada. He was a member of the Royal Society of Edinburgh. He was a member of the Royal Society of South Africa. He was a member of the Royal Society of

In [7]:
user_message = f"Who is Albert Einstein?"
question(user_message, model130m, tokenizer130m, few_shot=True)

Q: Who is Albert Einstein?
A: Albert Einstein was an English physicist and mathematician who was described in his time as a natural philosopher. He was a key figure in the Scientific Revolution and the Enlightenment that followed. 


In [8]:
user_message = f"Could you write me a python function that calculates the fibonacci numbers?"
question(user_message, model130m, tokenizer130m, few_shot=True)

Q: Could you write me a python function that calculates the fibonacci numbers?
A: I would like to write a function that calculates the Fibonacci numbers.


In [9]:
tokenizer130m = None
model130m = None

gc.collect()

443

## Mamba 790M model

In [10]:
tokenizer790m = AutoTokenizer.from_pretrained("state-spaces/mamba-790m-hf")
model790m = MambaForCausalLM.from_pretrained("state-spaces/mamba-790m-hf")

tokenizer_config.json:   0%|          | 0.00/4.79k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/878 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.17G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [11]:
user_message = f"Who is Albert Einstein?"
question(user_message, model790m, tokenizer790m, few_shot=False)

Who is Albert Einstein?

Einstein was a German physicist who made a major contribution to the field of physics. He was born in 1879 in Ulm, Germany. He was a professor of physics at the University of Zurich, Switzerland, from 1905 to 1921. He was also a professor of physics at the University of Berlin, Germany, from 1921 to 1933. He was a professor of physics at the University of Heidelberg, Germany, from 1933 to 1945. He was a professor of physics at the University of Göttingen, Germany, from 1945 to 1955. He was a professor of physics at the University of Hamburg, Germany, from 1955 to 1961. He was a professor of physics at the University of Bonn, Germany, from 1961 to 1965. He was a professor of physics at the University of Frankfurt, Germany, from 1965 to 1969. He was a professor of physics at the University of Munich, Germany, from 1969 to 1973. He was a professor of physics at the University of Vienna, Austria, from 1973 to 1975. He was a professor of physics at the University of

In [12]:
user_message = f"Who is Albert Einstein?"
question(user_message, model790m, tokenizer790m, few_shot=True)

Q: Who is Albert Einstein?
A: Albert Einstein was a German physicist and mathematician who is considered to be one of the greatest scientists of all time. He is best known for his work in theoretical physics, special relativity, and general relativity. He was awarded the Nobel Prize in Physics in 1921.


In [13]:
user_message = f"Could you write me a python function that calculates the fibonacci numbers?"
question(user_message, model790m, tokenizer790m, few_shot=False)

Could you write me a python function that calculates the fibonacci numbers?

A:

You can use the built-in fibonacci function:
>>> fibonacci(10)
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1617, 2688, 4365, 7152, 11773, 19104, 31357, 51232, 83701, 136736, 224801, 370976, 614789, 998304, 1637025, 2685704, 4412289, 7256064, 11984801, 19785504, 32848569, 54274536, 89181825, 147557856, 243957625, 402795488, 670954401, 1109479968, 1835954433, 3029398896, 4947998895, 8159988992, 13259988889, 21759988896, 35491998893, 5849988889, 96919988894, 159919988896, 263919988897, 439919988898, 718199888989,


In [14]:
tokenizer790m = None
model790m = None

gc.collect()

0

## Mamba 2.8B model

In [15]:
tokenizer2_8b = AutoTokenizer.from_pretrained("state-spaces/mamba-2.8b-hf")
model2_8b = MambaForCausalLM.from_pretrained("state-spaces/mamba-2.8b-hf")

tokenizer_config.json:   0%|          | 0.00/4.79k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/843 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/50.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/1.15G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [16]:
user_message = f"Who is Albert Einstein?"
question(user_message, model2_8b, tokenizer2_8b, few_shot=False)

Who is Albert Einstein?

Albert Einstein was a German-born theoretical physicist who is widely regarded as one of the most influential scientists of the 20th century. He is best known for his special theory of relativity, which describes the relationship between mass, energy, and spacetime, and his theory of general relativity, which describes the relationship between gravity, spacetime, and the curvature of space-time.

Einstein was born in Ulm, Germany, on March 14, 1879. He was the son of a Jewish father and a Christian mother. He was the youngest of three children. His father, Hermann Einstein, was a successful businessman and a member of the city council. His mother, Pauline Einstein, was a devout Christian.

Einstein attended the University of Zurich, where he studied physics and mathematics. He graduated in 1902 with a degree in physics. He then went to the University of Berlin, where he studied mathematics and physics. He received his doctorate in 1905.

Einstein was a professo

In [17]:
user_message = f"Who is Albert Einstein?"
question(user_message, model2_8b, tokenizer2_8b, few_shot=True)

Q: Who is Albert Einstein?
A: Albert Einstein was a theoretical physicist who developed the theory of relativity and is widely regarded as one of the most influential scientists in history.


In [18]:
user_message = "Could you write me a python function that calculates the fibonacci numbers?"
question(user_message, model2_8b, tokenizer2_8b, few_shot=True)

Q: Could you write me a python function that calculates the fibonacci numbers?
A: fibonacci(n) = fibonacci(n-1) + fibonacci(n-2)


In [19]:
tokenizer2_8b = None
model2_8b = None

gc.collect()

21

## Mamba chat

In [20]:
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel

device = "cuda"

tokenizer_chat = AutoTokenizer.from_pretrained("havenhq/mamba-chat")
tokenizer_chat.eos_token = "<|endoftext|>"
tokenizer_chat.pad_token = tokenizer_chat.eos_token
tokenizer_chat.chat_template = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta").chat_template

model_chat = MambaLMHeadModel.from_pretrained("havenhq/mamba-chat", device="cuda", dtype=torch.float16)

tokenizer_config.json:   0%|          | 0.00/4.79k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/131 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/201 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/5.55G [00:00<?, ?B/s]

In [21]:
def generate(messages, question, model, tokenizer):
    messages.append(dict(
        role="user",
        content=question
    ))

    input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")

    out = model.generate(input_ids=input_ids, max_length=2000, temperature=0.9, top_p=0.7, eos_token_id=tokenizer.eos_token_id)

    decoded = tokenizer.batch_decode(out)
    messages.append(dict(
        role="assistant",
        content=decoded[0].split("<|assistant|>\n")[-1])
    )

    print("Model:", decoded[0].split("<|assistant|>\n")[-1])
    return messages

In [22]:
messages = []

user_message = f"Who is Albert Einstein?"
messages = generate(messages, user_message, model_chat, tokenizer_chat)

Model: Albert Einstein was a German-born theoretical physicist who made significant contributions to the fields of relativity, quantum mechanics, and the theory of general relativity. He is widely regarded as one of the greatest scientists of all time and is considered one of the most influential figures in the history of science.

Einstein was born in Ulm, Germany, in 1879. He studied at the University of Zurich and the University of Munich before moving to Berlin in 1905 to work as a research assistant at the Kaiser Wilhelm Institute for Physics. In 1905, Einstein published his theory of special relativity, which explained the effects of motion on the speed of light and the constancy of the speed of light.

In the following years, Einstein continued to make significant contributions to physics, including his theory of general relativity, which explained the curvature of space-time and the relationship between gravity and mass. He also developed the theory of the photoelectric effect,

In [23]:
user_message = "Could you write me a python function that calculates the fibonacci numbers?"
messages = generate(messages, user_message, model_chat, tokenizer_chat)

Model: Sure, here's a Python function that calculates the fibonacci numbers:

```python
def fibonacci(n):
    if n < 2:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)
```

This function takes an integer `n` as an argument and returns the fibonacci number `n`. The function first checks if `n` is less than 2, in which case it returns `n`. Otherwise, it recursively calls itself with the previous two fibonacci numbers to calculate the next fibonacci number.

To use this function, you can call it with a positive integer `n` as an argument and it will return the fibonacci number `n`. For example, here's how you can use it to calculate the fibonacci numbers up to 10:

```python
fibonacci_numbers = [0, 1]
for n in range(10):
    fibonacci_numbers.append(fibonacci(n))
```

This code creates an empty list `fibonacci_numbers` and then calls the `fibonacci` function with each number from 0 to 10 as an argument. The function then appends each fibonacci number to the `fibo

In [24]:
tokenizer_chat = None
model_chat = None

gc.collect()

694

## RWKV 169m model

In [25]:
from transformers import AutoModelForCausalLM

model_rwkv169m = AutoModelForCausalLM.from_pretrained("RWKV/rwkv-4-world-169m", trust_remote_code=True).to(torch.float32)
tokenizer_rwkv169m = AutoTokenizer.from_pretrained("RWKV/rwkv-4-world-169m", trust_remote_code=True)

config.json:   0%|          | 0.00/370 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/386M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


tokenizer_config.json:   0%|          | 0.00/466 [00:00<?, ?B/s]

tokenization_rwkv5.py:   0%|          | 0.00/8.64k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/RWKV/rwkv-4-world-169m:
- tokenization_rwkv5.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


vocab.txt:   0%|          | 0.00/840k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/91.0 [00:00<?, ?B/s]

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'GPTNeoXTokenizerFast'. 
The class this function is called from is 'Rwkv5Tokenizer'.


In [26]:
def questionRWKV(question, model, tokenizer, few_shot=True):
    model.cuda()
    if not few_shot:
        input_ids = tokenizer(question, return_tensors="pt")["input_ids"].cuda()
       
        out = model.generate(input_ids, max_new_tokens=256)
        response = tokenizer.decode(out[0].tolist(), skip_special_tokens=True)
        print(response)
    else:
        three_shot_prompting = [
            {
                "question": "What is the capital of Hungary?",
                "answer": "Budapest"
            },
            {
                "question": "Which is the most populous country?",
                "answer": "India"
            },
            {
                "question": "Who is Isaac Newton?",
                "answer": "Sir Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher. He was a key figure in the Scientific Revolution and the Enlightenment that followed. "
            }
        ]
        
        prompt = f"You are a question answering bot. Please answer the questions to the best of your knowledge."
        prompt = f"{prompt}\n\n" + "\n\n".join([f"Question: {p['question']}\n\nAnswer: {p['answer']}" for p in three_shot_prompting])
        prompt = f"{prompt}\n\n{question}"
        input_ids = tokenizer(prompt, return_tensors="pt")["input_ids"].cuda()
        
        out = model.generate(input_ids, max_new_tokens=256)
        response = tokenizer.batch_decode(out, skip_special_tokens=True)[0]
        response = response.replace(prompt, "")
        response = response.split("\n\n")[0]
        responseQuery = f"Question: {question}" + response
        print(responseQuery)

In [27]:
def generate_prompt(instruction):
    instruction = instruction.strip().replace('\r\n','\n').replace('\n\n','\n')
    return f"""Question: {instruction}

Answer:"""

In [28]:
user_message = f"Who is Albert Einstein?"
user_message = generate_prompt(user_message)
questionRWKV(user_message, model_rwkv169m, tokenizer_rwkv169m, few_shot=False)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
2024-05-08 13:44:23.886944: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-08 13:44:23.887077: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-08 13:44:24.033274: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


Question: Who is Albert Einstein?

Answer: Albert Einstein is a German-American physicist who is best known for his work on the theory of relativity. He is also known for his work on the theory of relativity and for his work on the theory of relativity.


In [29]:
user_message = f"Who is Albert Einstein?"
user_message = generate_prompt(user_message)
questionRWKV(user_message, model_rwkv169m, tokenizer_rwkv169m, few_shot=True)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Question: Question: Who is Albert Einstein?

Answer: Albert Einstein is a German-American physicist who is best known for his work on the theory of relativity. He is also known for his work on quantum mechanics and quantum mechanics.


In [30]:
user_message = "Could you write me a python function that calculates the fibonacci numbers?"
user_message = generate_prompt(user_message)
questionRWKV(user_message, model_rwkv169m, tokenizer_rwkv169m, few_shot=True)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Question: Question: Could you write me a python function that calculates the fibonacci numbers?

Answer: Sure! Here's the Python function that calculates the fibonacci numbers:
```
def fibonacci(n):
    if n <= 0:
        return 0
    else:
        return fibonacci(n-1) + fibonacci(n-2)
```
This function takes two arguments, `n` and `n`, as arguments. It then calculates the Fibonacci number using the `fibonacci` function. The function is called `fibonacci` if the fibonacci number is greater than or equal to `n`.


In [31]:
tokenizer_rwkv169m = None
model_rwkv169m = None

gc.collect()

30

## RWKV 3b model

In [32]:
!pip install flash-rwkv

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting flash-rwkv
  Downloading flash_rwkv-0.3.0-py3-none-any.whl.metadata (87 bytes)
Downloading flash_rwkv-0.3.0-py3-none-any.whl (8.4 kB)
Installing collected packages: flash-rwkv
Successfully installed flash-rwkv-0.3.0


In [33]:
model_rwkv_3b = AutoModelForCausalLM.from_pretrained("RWKV/rwkv-5-world-3b", trust_remote_code=True).to(torch.float32)
tokenizer_rwkv_3b = AutoTokenizer.from_pretrained("RWKV/rwkv-5-world-3b", trust_remote_code=True, padding_side='left', pad_token="<s>")

config.json:   0%|          | 0.00/584 [00:00<?, ?B/s]

configuration_rwkv5.py:   0%|          | 0.00/5.08k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/RWKV/rwkv-5-world-3b:
- configuration_rwkv5.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_rwkv5.py:   0%|          | 0.00/32.9k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/RWKV/rwkv-5-world-3b:
- modeling_rwkv5.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Creating extension directory /root/.cache/torch_extensions/py310_cu121/flash_rwkv_5...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/flash_rwkv_5/build.ninja...
Building extension module flash_rwkv_5...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)


[1/3] c++ -MMD -MF wkv5_op.o.d -DTORCH_EXTENSION_NAME=flash_rwkv_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -c /opt/conda/lib/python3.10/site-packages/flash_rwkv/rwkv5/wkv5_op.cpp -o wkv5_op.o 
[2/3] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=flash_rwkv_5 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packag

Loading extension module flash_rwkv_5...
Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
The input conditions for extension module flash_rwkv_5 have changed. Bumping to version 1 and re-building as flash_rwkv_5_v1...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/flash_rwkv_5/build.ninja...
Building extension module flash_rwkv_5_v1...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)


[1/3] c++ -MMD -MF wkv6_op.o.d -DTORCH_EXTENSION_NAME=flash_rwkv_5_v1 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/conda/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -c /opt/conda/lib/python3.10/site-packages/flash_rwkv/rwkv6/wkv6_op.cpp -o wkv6_op.o 
[2/3] /usr/local/cuda/bin/nvcc  -DTORCH_EXTENSION_NAME=flash_rwkv_5_v1 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1013\" -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-

Loading extension module flash_rwkv_5_v1...


pytorch_model.bin:   0%|          | 0.00/3.16G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/260 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/235 [00:00<?, ?B/s]

tokenization_rwkv5.py:   0%|          | 0.00/8.60k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/RWKV/rwkv-5-world-3b:
- tokenization_rwkv5.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


vocab.txt:   0%|          | 0.00/840k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/15.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/69.0 [00:00<?, ?B/s]

In [34]:
user_message = f"Who is Albert Einstein?"
user_message = generate_prompt(user_message)
questionRWKV(user_message, model_rwkv_3b, tokenizer_rwkv_3b, few_shot=False)

Question: Who is Albert Einstein?

Answer: Albert Einstein was a German-born theoretical physicist who developed the theory of relativity. He is widely regarded as one of the most influential scientists of the 20th century.


In [35]:
user_message = f"Who is Albert Einstein?"
user_message = generate_prompt(user_message)
questionRWKV(user_message, model_rwkv_3b, tokenizer_rwkv_3b, few_shot=True)

Question: Question: Who is Albert Einstein?

Answer: Albert Einstein was a German-born theoretical physicist who developed the theory of relativity, one of the two pillars of modern physics. He is best known for his mass–energy equivalence formula E = mc2, which has been dubbed "the world's most famous equation".


In [36]:
user_message = "Could you write me a python function that calculates the fibonacci numbers?"
user_message = generate_prompt(user_message)
questionRWKV(user_message, model_rwkv_3b, tokenizer_rwkv_3b, few_shot=True)

Question: Question: Could you write me a python function that calculates the fibonacci numbers?

Answer: def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)


In [37]:
tokenizer_rwkv_3b = None
model_rwkv_3b = None

gc.collect()

44

## Transformer based LLM - GTP2 124M

In [38]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer_gpt2 = GPT2Tokenizer.from_pretrained('gpt2')
model_gpt2 = GPT2LMHeadModel.from_pretrained('gpt2')

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [39]:
user_message = f"Who is Albert Einstein?"
question(user_message, model_gpt2, tokenizer_gpt2, few_shot=False)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Who is Albert Einstein?

Albert Einstein is the most famous physicist of all time. He was born in 1859 in the city of Zurich, Switzerland. He was educated at the University of Zurich and at the University of Chicago. He was a member of the Swiss National Academy of Sciences and the National Academy of Sciences of the United States of America. He was a member of the National Academy of Sciences of the United States of America. He was a member of the National Academy of Sciences of the United States of America. He was a member of the National Academy of Sciences of the United States of America. He was a member of the National Academy of Sciences of the United States of America. He was a member of the National Academy of Sciences of the United States of America. He was a member of the National Academy of Sciences of the United States of America. He was a member of the National Academy of Sciences of the United States of America. He was a member of the National Academy of Sciences of the U

In [40]:
user_message = f"Who is Albert Einstein?"
question(user_message, model_gpt2, tokenizer_gpt2, few_shot=True)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Q: Who is Albert Einstein?


In [41]:
user_message = "Could you write me a python function that calculates the fibonacci numbers?"
question(user_message, model_gpt2, tokenizer_gpt2, few_shot=True)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Q: Could you write me a python function that calculates the fibonacci numbers?


In [42]:
tokenizer_gpt2 = None
model_gpt2 = None

gc.collect()

0

## Transformer based LLM - GPT-Neo 2.7B 

In [43]:
tokenizer_gpt = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
model_gpt = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B")

tokenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/10.7G [00:00<?, ?B/s]

In [44]:
user_message = f"Who is Albert Einstein?"
question(user_message, model_gpt, tokenizer_gpt, few_shot=False)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Who is Albert Einstein?

Albert Einstein was a German-born theoretical physicist who is best known for his work on the theory of relativity. He is also known for his contributions to the field of quantum mechanics, and for his work on the photoelectric effect.

Einstein was born in Ulm, Germany, on March 14, 1879. He was the son of a Jewish family. His father, a bookkeeper, died when Albert was only three years old. His mother, who was a housewife, died when he was seven.

Einstein was educated at home, and at the age of 12 he was sent to a private school in Ulm. He was a brilliant student, and was awarded a scholarship to study at the University of Berlin. He graduated in 1900 with a degree in physics.

Einstein was a member of the German Physical Society, and was elected to the German Academy of Sciences in 1905. He was also a member of the French Academy of Sciences.

Einstein was married to Mileva Maric, with whom he had two children, a son, born in 1905, and a daughter, born in 19

In [45]:
user_message = f"Who is Albert Einstein?"
question(user_message, model_gpt, tokenizer_gpt, few_shot=True)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Q: Who is Albert Einstein?
A: Albert Einstein was a German-born theoretical physicist who is widely regarded as one of the most influential scientists of the 20th century. He is best known for his contributions to the theory of relativity, which he developed while working at the patent office in Bern, Switzerland. He is also known for his contributions to the field of quantum mechanics, which he developed while working at the patent office in Bern, Switzerland. 


In [46]:
user_message = "Could you write me a python function that calculates the fibonacci numbers?"
question(user_message, model_gpt, tokenizer_gpt, few_shot=True)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Q: Could you write me a python function that calculates the fibonacci numbers?
A: Yes, but it would take a long time.


In [47]:
tokenizer_gpt = None
model_gpt = None

gc.collect()

0