# Llama

In this notebook, the original dataset is run through the Llama model in order to obtain artificially generated text.

### Install missing dependencies

[llama-cpp-python](https://github.com/abetlen/llama-cpp-python) is Python bindings for [llama.cpp](https://github.com/ggerganov/llama.cpp).
It allows to use lama.cpp functionality from Python code.

In [4]:
pip install llama-cpp-python==0.1.71

Collecting llama-cpp-python==0.1.71
  Downloading llama_cpp_python-0.1.71.tar.gz (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25ldone
[?25h  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.1.71-cp311-cp311-macosx_14_0_arm64.whl size=226040 sha256=98ddaaec94f2c81273b7c06d311ddee4d95a83b8043691540d3463ce22b5dbfb
  Stored in directory: /Users/d3lph1/Library/Caches/pip/wheels/ed/34/05/2f85e98dce83c341e1483c2d5f9c31134c73abb51100bd519d
Successfully built llama-cpp-python
Installing collected packages: llama-cpp-python
  Attempting uninstall: llama-cpp-python
    Found existing installation

In [86]:
import numpy as np
import pandas as pd
from tqdm import tqdm
import math

from llama_cpp import Llama

### Read original dataset and add some columns

"another" column - for Phi dataset. This file was written before we chosen the second LLM.

In [95]:
df = pd.read_parquet('./datasets/dataset.parquet')

df = df.drop(columns=['nbestanswers', 'main_category'])
df['llama'] = ['' for _ in range(len(df))]
df['another'] = ['' for _ in range(len(df))]

In [96]:
df.to_csv('./datasets/container.csv')

In [105]:
df = pd.read_csv('./datasets/container.csv')
df['llama'] = df['llama'].astype(str)

### Load lama.cpp model (Llama 7B quantized to 4 bits)

In [77]:
llm = Llama(model_path="./llama/llama.cpp/models/7B/ggml-model-q4_0.bin", verbose=False)

llama.cpp: loading model from ./llama/llama.cpp/models/7B/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 5439.94 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  =  256.00 MB


### Evaluate the dataset via the model

In [None]:
for index, row in tqdm(df.iterrows(), total=len(df)):
    if row['llama'] == 'nan':
        question = row['question']
        output = llm(f"Q: {question} A: ", max_tokens=128, stop=["Q:", "\n"], echo=False)
    
        choice = output['choices'][0]
        
        if choice['finish_reason'] == 'stop':
            df.at[index, 'llama'] = choice['text']

    if index != 0 and index % 100 == 0:
        df.to_csv('./datasets/llama.csv')

In [64]:
df

Unnamed: 0.1,Unnamed: 0,id,question,answer,llama,another
0,0,2020338,Why did the U.S Invade Iraq ?,A small group of politicians believed strongly...,"1) To control the oil supply, (2)To make the w...",
1,1,2874684,How to get rid of a beehive?,Call an area apiarist. They should be able to...,,
2,2,4193114,Why don't European restaurants serve water?,There's a general belief in Europe (and in fac...,,
3,3,1908421,Why hybrid cars gas mileage is better in city ?,hybrid cars save energy in two ways: 1.by stor...,,
4,4,3608897,Can someone explain the theory of e=mc2?,In general it means that in a very high speed ...,,
...,...,...,...,...,...,...
87357,87357,1134376,"How did the phrase ""(someone) has gone missing...","I think it's strange, too.. It sounds very col...",,
87358,87358,945182,Why in the world do I have to press 1 to get E...,Because there are some Spanish people. If it a...,,
87359,87359,3433624,Why chemicals should never be placed directly ...,"Because chemicals react with eachother, it is ...",,
87360,87360,2900989,what do you consider..........?,treat others as you want to be treated. becaus...,,


In [107]:
output = llm(f"Q: {question} A: ", max_tokens=128, stop=["Q:", "\n"], echo=False)

In [108]:
output

{'id': 'cmpl-5b3b3772-33aa-46ba-85cf-059bdc15e55c',
 'object': 'text_completion',
 'created': 1701124200,
 'model': './llama/llama.cpp/models/7B/ggml-model-q4_0.bin',
 'choices': [{'text': ' So far as I know, the Chinese military has been in touch with other military forces of the region since at least 1952. They started their peace keeping forces in Korea after 1950 and sent soldiers to Vietnam to help them against US invaders in 1964. In the 1980s they helped Ethiopia against Somalia, and later in the same decade they provided military assistance to Angola (against South Africa) and Zimbabwe (against Rhodesia).',
   'index': 0,
   'logprobs': None,
   'finish_reason': 'stop'}],
 'usage': {'prompt_tokens': 52, 'completion_tokens': 114, 'total_tokens': 166}}