#**Llama 2**

The Llama 2 is a collection of pretrained and fine-tuned generative text models, ranging from 7 billion to 70 billion parameters, designed for dialogue use cases.

 It outperforms open-source chat models on most benchmarks and is on par with popular closed-source models in human evaluations for helpfulness and safety.

[Llama 2 13B-chat](https://huggingface.co/meta-llama/Llama-2-13b-chat)

`llama.cpp`'s objective is to run the LLaMA model with 4-bit integer quantization on MacBook. It is a plain C/C++ implementation optimized for Apple silicon and x86 architectures, supporting various integer quantization and BLAS libraries. Originally a web chat example, it now serves as a development playground for ggml library features.

`GGML`, a C library for machine learning, facilitates the distribution of large language models (LLMs). It utilizes quantization to enable efficient LLM execution on consumer hardware. GGML files contain binary-encoded data, including version number, hyperparameters, vocabulary, and weights. The vocabulary comprises tokens for language generation, while the weights determine the LLM's size. Quantization reduces precision to optimize resource usage.

#  Quantized Models from the Hugging Face Community

The Hugging Face community provides quantized models, which allow us to efficiently and effectively utilize the model on the T4 GPU. It is important to consult reliable sources before using any model.

There are several variations available, but the ones that interest us are based on the GGLM library.

We can see the different variations that Llama-2-13B-GGML has [here](https://huggingface.co/models?search=llama%202%20ggml).



In this case, we will use the model called [Llama-2-13B-chat-GGML](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML).

#**Step 1: Install All the Required Packages**

In [1]:
# GPU llama-cpp-python
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.78 numpy==1.23.4 --force-reinstall --upgrade --no-cache-dir --verbose
!pip install huggingface_hub
!pip install llama-cpp-python==0.1.78
!pip install numpy==1.23.4

Using pip 23.2.1 from /home/lerceg/rn-eestech/.venv/lib/python3.11/site-packages/pip (python 3.11)
Collecting llama-cpp-python==0.1.78
  Downloading llama_cpp_python-0.1.78.tar.gz (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m870.3 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Running command pip subprocess to install build dependencies
  Collecting setuptools>=42
    Obtaining dependency information for setuptools>=42 from https://files.pythonhosted.org/packages/f7/29/13965af254e3373bceae8fb9a0e6ea0d0e571171b80d6646932131d6439b/setuptools-69.5.1-py3-none-any.whl.metadata
    Using cached setuptools-69.5.1-py3-none-any.whl.metadata (6.2 kB)
  Collecting scikit-build>=0.13
    Obtaining dependency information for scikit-build>=0.13 from https://files.pythonhosted.org/packages/fa/af/b3ef8fe0bb96bf7308e1f9d196fc069f0c75d9c74cfaad851e418cc704f4/scikit_build-0.17.6-py3-none-any.whl.metadata
    Using cached scikit_build-0.17

In [2]:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGML"
model_basename = "llama-2-13b-chat.ggmlv3.q5_1.bin" # the model is in bin format

#**Step 2: Import All the Required Libraries**

In [3]:
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

#**Step 3: Download the Model**

In [4]:
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

llama-2-13b-chat.ggmlv3.q5_1.bin:   0%|          | 0.00/9.76G [00:00<?, ?B/s]

Error while downloading from https://cdn-lfs.huggingface.co/repos/cd/43/cd4356b11767f5136b31b27dbb8863d6dd69a4010e034ef75be9c2c12fcd10f7/97d9becd5a364323c7959cc82e7506d6eb26c025623320b844e45e517e3dfe76?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-2-13b-chat.ggmlv3.q5_1.bin%3B+filename%3D%22llama-2-13b-chat.ggmlv3.q5_1.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1714496275&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNDQ5NjI3NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9jZC80My9jZDQzNTZiMTE3NjdmNTEzNmIzMWIyN2RiYjg4NjNkNmRkNjlhNDAxMGUwMzRlZjc1YmU5YzJjMTJmY2QxMGY3Lzk3ZDliZWNkNWEzNjQzMjNjNzk1OWNjODJlNzUwNmQ2ZWIyNmMwMjU2MjMzMjBiODQ0ZTQ1ZTUxN2UzZGZlNzY%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=S%7Es%7EJZ9DQsYGjqEKZAqcPiZ9-g7BVvSk%7EDKnXgcLBccvBSaI4BnQJG5PAmkYyXpQ-NMapUkzjgd-8ZTP7cY1zZSv9%7EtNlO%7EHQP1nN2enTvUGBNYwyBNduc2Ff

llama-2-13b-chat.ggmlv3.q5_1.bin:   2%|2         | 210M/9.76G [00:00<?, ?B/s]

Error while downloading from https://cdn-lfs.huggingface.co/repos/cd/43/cd4356b11767f5136b31b27dbb8863d6dd69a4010e034ef75be9c2c12fcd10f7/97d9becd5a364323c7959cc82e7506d6eb26c025623320b844e45e517e3dfe76?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-2-13b-chat.ggmlv3.q5_1.bin%3B+filename%3D%22llama-2-13b-chat.ggmlv3.q5_1.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1714496275&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNDQ5NjI3NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9jZC80My9jZDQzNTZiMTE3NjdmNTEzNmIzMWIyN2RiYjg4NjNkNmRkNjlhNDAxMGUwMzRlZjc1YmU5YzJjMTJmY2QxMGY3Lzk3ZDliZWNkNWEzNjQzMjNjNzk1OWNjODJlNzUwNmQ2ZWIyNmMwMjU2MjMzMjBiODQ0ZTQ1ZTUxN2UzZGZlNzY%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=S%7Es%7EJZ9DQsYGjqEKZAqcPiZ9-g7BVvSk%7EDKnXgcLBccvBSaI4BnQJG5PAmkYyXpQ-NMapUkzjgd-8ZTP7cY1zZSv9%7EtNlO%7EHQP1nN2enTvUGBNYwyBNduc2Ff

llama-2-13b-chat.ggmlv3.q5_1.bin:   3%|3         | 336M/9.76G [00:00<?, ?B/s]

Error while downloading from https://cdn-lfs.huggingface.co/repos/cd/43/cd4356b11767f5136b31b27dbb8863d6dd69a4010e034ef75be9c2c12fcd10f7/97d9becd5a364323c7959cc82e7506d6eb26c025623320b844e45e517e3dfe76?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-2-13b-chat.ggmlv3.q5_1.bin%3B+filename%3D%22llama-2-13b-chat.ggmlv3.q5_1.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1714496275&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNDQ5NjI3NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9jZC80My9jZDQzNTZiMTE3NjdmNTEzNmIzMWIyN2RiYjg4NjNkNmRkNjlhNDAxMGUwMzRlZjc1YmU5YzJjMTJmY2QxMGY3Lzk3ZDliZWNkNWEzNjQzMjNjNzk1OWNjODJlNzUwNmQ2ZWIyNmMwMjU2MjMzMjBiODQ0ZTQ1ZTUxN2UzZGZlNzY%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=S%7Es%7EJZ9DQsYGjqEKZAqcPiZ9-g7BVvSk%7EDKnXgcLBccvBSaI4BnQJG5PAmkYyXpQ-NMapUkzjgd-8ZTP7cY1zZSv9%7EtNlO%7EHQP1nN2enTvUGBNYwyBNduc2Ff

llama-2-13b-chat.ggmlv3.q5_1.bin:   4%|3         | 357M/9.76G [00:00<?, ?B/s]

Error while downloading from https://cdn-lfs.huggingface.co/repos/cd/43/cd4356b11767f5136b31b27dbb8863d6dd69a4010e034ef75be9c2c12fcd10f7/97d9becd5a364323c7959cc82e7506d6eb26c025623320b844e45e517e3dfe76?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-2-13b-chat.ggmlv3.q5_1.bin%3B+filename%3D%22llama-2-13b-chat.ggmlv3.q5_1.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1714496275&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNDQ5NjI3NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9jZC80My9jZDQzNTZiMTE3NjdmNTEzNmIzMWIyN2RiYjg4NjNkNmRkNjlhNDAxMGUwMzRlZjc1YmU5YzJjMTJmY2QxMGY3Lzk3ZDliZWNkNWEzNjQzMjNjNzk1OWNjODJlNzUwNmQ2ZWIyNmMwMjU2MjMzMjBiODQ0ZTQ1ZTUxN2UzZGZlNzY%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=S%7Es%7EJZ9DQsYGjqEKZAqcPiZ9-g7BVvSk%7EDKnXgcLBccvBSaI4BnQJG5PAmkYyXpQ-NMapUkzjgd-8ZTP7cY1zZSv9%7EtNlO%7EHQP1nN2enTvUGBNYwyBNduc2Ff

llama-2-13b-chat.ggmlv3.q5_1.bin:  53%|#####2    | 5.14G/9.76G [00:00<?, ?B/s]

Error while downloading from https://cdn-lfs.huggingface.co/repos/cd/43/cd4356b11767f5136b31b27dbb8863d6dd69a4010e034ef75be9c2c12fcd10f7/97d9becd5a364323c7959cc82e7506d6eb26c025623320b844e45e517e3dfe76?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-2-13b-chat.ggmlv3.q5_1.bin%3B+filename%3D%22llama-2-13b-chat.ggmlv3.q5_1.bin%22%3B&response-content-type=application%2Foctet-stream&Expires=1714496275&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcxNDQ5NjI3NX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9jZC80My9jZDQzNTZiMTE3NjdmNTEzNmIzMWIyN2RiYjg4NjNkNmRkNjlhNDAxMGUwMzRlZjc1YmU5YzJjMTJmY2QxMGY3Lzk3ZDliZWNkNWEzNjQzMjNjNzk1OWNjODJlNzUwNmQ2ZWIyNmMwMjU2MjMzMjBiODQ0ZTQ1ZTUxN2UzZGZlNzY%7EcmVzcG9uc2UtY29udGVudC1kaXNwb3NpdGlvbj0qJnJlc3BvbnNlLWNvbnRlbnQtdHlwZT0qIn1dfQ__&Signature=S%7Es%7EJZ9DQsYGjqEKZAqcPiZ9-g7BVvSk%7EDKnXgcLBccvBSaI4BnQJG5PAmkYyXpQ-NMapUkzjgd-8ZTP7cY1zZSv9%7EtNlO%7EHQP1nN2enTvUGBNYwyBNduc2Ff

llama-2-13b-chat.ggmlv3.q5_1.bin:  67%|######6   | 6.51G/9.76G [00:00<?, ?B/s]

#**Step 4: Loading the Model**

In [57]:
model_path

'/home/lerceg/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin'

In [5]:
# GPU
lcpp_llm = None
lcpp_llm = Llama(
    model_path=model_path,
    n_threads=2, # CPU cores
    n_batch=512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=32 # Change this value based on your model and your GPU VRAM pool.
    )

llama.cpp: loading model from /home/lerceg/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_head_kv  = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 5.0e-06
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.11 MB
llama_model_load_inte

In [6]:
# See the number of layers in GPU
lcpp_llm.params.n_gpu_layers

32

#**Step 5: Create a Prompt Template**

In [53]:
prompt = (
"""Here are the sentences \n
The study material is very accessible. \n
"Cleanliness is not acceptable. \n
"Keyboards in classrooms are not working properly.\n
"Sometimes professor shows signs of anger issues.\n
"Library doesnt have all proper literature.\n
"Teaching assistants can be very helpful.\n
"Professor Stojakovic is the best.\n
          """)
prompt_template=f'''SYSTEM: You will need to classify student feedback for each sentence give the output in format (sentiment, topic). Sentiment can either be POSITIVE or NEGATIVE. Topic can be choosen as one from this list is either positive or negative and 'topic' which should be te subject of the feedback. It can be one of the following: ['professor, 'assistant', 'study material', 'infrastructure', 'course work'], if you cannot confidently classify the topic into one of those classify as 'other'. Please only return 

ASSISTANT:

USER: {prompt}


'''

#**Step 6: Generating the Response**

In [54]:
response=lcpp_llm(prompt=prompt_template, max_tokens=256, temperature=0.5, top_p=0.95,
                  repeat_penalty=1.2, top_k=20,
                  echo=True)

Llama.generate: prefix-match hit

llama_print_timings:        load time = 12804.72 ms
llama_print_timings:      sample time =   127.49 ms /   256 runs   (    0.50 ms per token,  2007.97 tokens per second)
llama_print_timings: prompt eval time = 54055.39 ms /   159 tokens (  339.97 ms per token,     2.94 tokens per second)
llama_print_timings:        eval time = 109903.84 ms /   255 runs   (  431.00 ms per token,     2.32 tokens per second)
llama_print_timings:       total time = 164527.37 ms


In [55]:
print(response)

{'id': 'cmpl-3b58dfcd-dd46-4c1c-bd05-50ed6d08d670', 'object': 'text_completion', 'created': 1714247044, 'model': '/home/lerceg/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/3140827b4dfcb6b562cd87ee3d7f07109b014dd0/llama-2-13b-chat.ggmlv3.q5_1.bin', 'choices': [{'text': 'SYSTEM: You will need to classify student feedback for each sentence give a json that contains two keys \'sentiment\' which is either positive or negative and \'topic\' which should be te subject of the feedback. It can be one of the following: [\'proffesors, \'assistants\', \'study material\', \'infrastructure\', \'course work\'], if you cannot confidently classify the topic into one of those classify as \'other\'.\n\nASSISTANT:\n\nUSER: Here are the sentences \n\nThe study material is very acessable. \n\n"Cleanliness is not acceptable. \n\n"Keyboards in classrooms are not working properly.\n\n"Sometimes professor shows signs of anger issues.\n\n"Library doesnt have all proper literature.\n\n

In [56]:
print(response["choices"][0]['text'])

SYSTEM: You will need to classify student feedback for each sentence give a json that contains two keys 'sentiment' which is either positive or negative and 'topic' which should be te subject of the feedback. It can be one of the following: ['proffesors, 'assistants', 'study material', 'infrastructure', 'course work'], if you cannot confidently classify the topic into one of those classify as 'other'.

ASSISTANT:

USER: Here are the sentences 

The study material is very acessable. 

"Cleanliness is not acceptable. 

"Keyboards in classrooms are not working properly.

"Sometimes professor shows signs of anger issues.

"Library doesnt have all proper literature.

"Teaching assistants can be very helpful.

"Professor Stojakovic is the best.

          


Please provide feedback for each sentence and classify them into sentiment and topic.

ASSISTANT: Sure! Here are my assessments of your sentences, along with their sentiment and topics:

1. "The study material is very accessible."
Sentim

In [58]:
"vanja je vanja".find('anja')

1

In [103]:
response_text = """"ASSISTANT:
        1. The toilets are clean - POSITIVE, other
        2. The staff is good - POSITIVE, other
        3. Professors are not so good - NEGATIVE, professor
        4. Classes could be more interactive - NEUTRAL, study material
        SYSTEM: Here is the feedback : The toilets are clean. The staff is good. Professors are not so good. Classes could be more interactive. My output would be (sentiment, topic) = ((POSITIVE, other), (POSITIVE, othe" 
        """

In [79]:
assistant_first = response_text.find("ASSISTANT:")
user_next = response_text.find("USER:", assistant_first)
system_next = response_text.find("SYSTEM:", assistant_first)


In [100]:
reponses = []
for x in response_text.split("\n")[1 : -1]:
    sentence = " ".join(x.strip().split(' - (')[0].split(' ')[1 : ]).strip() + "."
    sentiment, topic = x.strip().split(' - (')[1].split(',')
    sentiment = sentiment.strip().replace('(', '').replace(')', '')
    topic = topic.strip().replace('(', '').replace(')', '')
    dict = {'sentence' : sentence, 'sentiment': sentiment, 'topic': topic}
    reponses.append(dict)
    

In [101]:
reponses

[{'sentence': 'Classrooms are old.', 'sentiment': '-', 'topic': 'other'},
 {'sentence': 'Desks are new and pretty.',
  'sentiment': 'positive',
  'topic': 'study material'},
 {'sentence': 'The infrastructure of the building is getting old.',
  'sentiment': 'negative',
  'topic': 'infrastructure'},
 {'sentence': 'Professors are excellent at teaching!.',
  'sentiment': 'positive',
  'topic': 'professor'}]

In [107]:
def prepare_output(raw : str) -> list:
    assistant_first = raw.find("ASSISTANT:")
    user_next = raw.find("USER:", assistant_first)
    user_next = user_next if user_next >= 0 else len(raw)
    system_next = raw.find("SYSTEM:", assistant_first)
    system_next = system_next if system_next >= 0 else len(raw)

    end = min(user_next, system_next)

    raw_cut = raw[assistant_first: end]

    responses = []
    for x in raw_cut.split("\n")[1: -1]:
        sentence = " ".join(x.strip().split(' - (')[0].split(' ')[1:]).strip() + "."
        sentiment, topic = x.strip().split(' - (')[1].split(',')
        sentiment = sentiment.strip().replace('(', '').replace(')', '')
        topic = topic.strip().replace('(', '').replace(')', '')
        dict = {'sentence': sentence, 'sentiment': sentiment, 'topic': topic}
        responses.append(dict)
    return reponses


In [104]:
prepare_output(response_text)

IndexError: list index out of range

In [108]:
response_text = """SYSTEM: You will need to classify student feedback for each sentence give the output in format (sentiment, topic). Sentiment can either be POSITIVE or NEGATIVE. Topic can be choosen as one from this list is either positive or negative and 'topic' which should be te subject of the feedback. It can be one of the following: ['professor, 'assistant', 'study material', 'infrastructure', 'course work'], if you cannot confidently classify the topic into one of those classify as 'other'. Please only return topics that i specified. Please for each sentecene, show output in format "n. sentence - (sentiment, topic)".  
        USER: Here is the feedback : Toilets are not nice. Professors guide really well. Chairs are little older then they should be. Infrastructure is holding quite well.
        ASSISTANT:
        1. Toilets are not nice - (NEGATIVE, other)
        2. Professors guide really well - (POSITIVE, professor)
        3. Chairs are little older then they should be - (NEGATIVE, infrastructure)
        4. Infrastructure is holding quite well - (POSITIVE, infrastructure) """

In [109]:
prepare_output(response_text)

[{'sentence': 'Classrooms are old.', 'sentiment': '-', 'topic': 'other'},
 {'sentence': 'Desks are new and pretty.',
  'sentiment': 'positive',
  'topic': 'study material'},
 {'sentence': 'The infrastructure of the building is getting old.',
  'sentiment': 'negative',
  'topic': 'infrastructure'},
 {'sentence': 'Professors are excellent at teaching!.',
  'sentiment': 'positive',
  'topic': 'professor'}]