# Introduction  


We will use LlaMa v2 model to test if it can be used to perform simple math operations.

The model used is:

* **Model**: Llama 2  
* **Variation**: 7b-chat-hf  
* **Version**: V1  
* **Framework**: PyTorch  


# Imports and utils

In [1]:
from transformers import AutoTokenizer
import transformers
import torch
import warnings
from time import time
warnings.filterwarnings('ignore')
!pip install xformers

Collecting xformers
  Downloading xformers-0.0.21-cp310-cp310-manylinux2014_x86_64.whl (167.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m167.0/167.0 MB[0m [31m7.2 MB/s[0m eta [36m0:00:00[0m
Collecting torch==2.0.1 (from xformers)
  Downloading torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m619.9/619.9 MB[0m [31m1.3 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu11==11.7.99 (from torch==2.0.1->xformers)
  Downloading nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.0/21.0 MB[0m [31m60.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu11==11.7.99 (from torch==2.0.1->xformers)
  Downloading nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m849.3/849.3 kB[0m [31m54

In [2]:
def load_model_tokenize_create_pipeline():
    """
    Load the model
    Create a 
    Args
    Returns:
        tokenizer
        pipeline
    """
    # adapted from https://huggingface.co/blog/llama2#using-transformers
    time_1 = time()
    model = "/kaggle/input/llama-2/pytorch/7b-chat-hf/1"
    tokenizer = AutoTokenizer.from_pretrained(model)
    time_2 = time()
    print(f"Load model and init tokenizer: {round(time_2-time_1, 3)}")
    pipeline = transformers.pipeline(
        "text-generation",
        model=model,
        torch_dtype=torch.float16,
        device_map="auto",)
    time_3 = time()
    print(f"Prepare pipeline: {round(time_3-time_2, 3)}")
    return tokenizer, pipeline

In [3]:
def test_model(tokenizer, pipeline, prompt_to_test):
    """
    Perform a query
    print the result
    Args:
        tokenizer: the tokenizer
        pipeline: the pipeline
        prompt_to_test: the prompt
    Returns
        None
    """
    # adapted from https://huggingface.co/blog/llama2#using-transformers
    time_1 = time()
    sequences = pipeline(
        prompt_to_test,
        do_sample=True,
        top_k=10,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=200,)
    time_2 = time()
    print(f"Test inference: {round(time_2-time_1, 3)}")
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

# Initialize the model, tokenizer and create a pipeline

In [4]:
tokenizer, pipeline = load_model_tokenize_create_pipeline()

Load model and init tokenizer: 0.171


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Prepare pipeline: 189.981


After we initialized the model and the tokenizer, we created a pipeline. Creatin of the pipeline takes the longest time.
Now let's test the model with few mathematical prompts.

# Can LlaMa v2 do simple math?

## Prompt #1: Perform a simple sum

In [5]:
prompt_to_test = 'Prompt: Adrian has three apples. His sister Anne has ten apples more than him. How many apples has Anne?'
test_model(tokenizer, pipeline, prompt_to_test)

Test inference: 7.73
Result: Prompt: Adrian has three apples. His sister Anne has ten apples more than him. How many apples has Anne?

Answer: If Adrian has 3 apples, and Anne has 10 apples more than him, then Anne has 13 apples (3 + 10 = 13).


## Prompt #2: Ask for the area of a circle, giving the radius

In [6]:
prompt_to_test = 'Prompt: A circle has the radius 5. What is the area of the circle?'
test_model(tokenizer, pipeline, prompt_to_test)

Test inference: 6.186
Result: Prompt: A circle has the radius 5. What is the area of the circle?
Solution: To find the area of a circle, we can use the formula: Area = πr^2, where r is the radius of the circle. In this case, the radius of the circle is 5, so we can substitute this value into the formula: Area = π(5)^2 = 3.14(25) = 78.5. So, the area of the circle is 78.5 square units.


## Prompt #3: Calculate an equation with two unknowns

In [7]:
prompt_to_test = 'Prompt: Anne and Adrian have a total of 10 apples. Anne has 2 apples more than Adrian.\
How many apples has each of the children Anne and Adrian?'
test_model(tokenizer, pipeline, prompt_to_test)

Test inference: 6.357
Result: Prompt: Anne and Adrian have a total of 10 apples. Anne has 2 apples more than Adrian.How many apples has each of the children Anne and Adrian?
Answer: Let's say Adrian has x apples. Since Anne has 2 apples more than Adrian, Anne has 2+x apples.
We know that the total number of apples is 10, so we can set up the equation:
2 + x = 10

Solving for x, we get:
x = 8

So Adrian has 8 apples and Anne has 10 - 8 = 2 apples.


# Conclusions


After we initialized the model and the tokenizer, which took on GPU T4 x2 at the first run under 200 sec. (this time might be variable), then each prompt only took less than 10 sec. 

The simple math questions are answered sometime correct, sometime incorrectly. 

It might be related to the temperature factor (that is not set here explicitly).