# Run Llama 2 with Python

This Jupyter Notebook is meant to be used in Google Colab, but can be easily run locally.  
The aim of this notebook is to be an easy and direct way to access and start using llama 2 with python.

Based on this post from SWHarden.com:
https://swharden.com/blog/2023-07-29-ai-chat-locally-with-python/

In [1]:
!pip install llama-cpp-python==0.1.78

Collecting llama-cpp-python==0.1.78
  Downloading llama_cpp_python-0.1.78.tar.gz (1.7 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.7 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.1/1.7 MB[0m [31m1.7 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━[0m [32m1.2/1.7 MB[0m [31m17.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting diskcache>=5.6.1 (from llama-cpp-python==0.1.78)
  Downloading diskcache-5.6.3-py3-none-any.whl (45 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m6.2 MB/s[0m eta [36m0:00:

If something goes wrong here or later, please check the most up-to-date version of llama-cpp-python at the following link:

https://pypi.org/project/llama-cpp-python/

In [2]:
from llama_cpp import Llama

from IPython.display import display, HTML
import json
import time
import pathlib

In [3]:
!pip install wget

Collecting wget
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l[?25hdone
  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9655 sha256=b14ca093b9facff43df509c939003c8c0ed8c171941f1614a66d38be4437f64c
  Stored in directory: /root/.cache/pip/wheels/8b/f1/7f/5c94f0a7a505ca1c81cd1d9208ae2064675d97582078e6c769
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2


Next we bring in model Q8_0, there are other possibilities, maybe newer models with better performance.

In [4]:
import wget

url = 'https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin'
local_path = '/content/llama-2-7b-chat.ggmlv3.q8_0.bin'
wget.download(url, local_path)

MODEL_Q8_0 = Llama(
    model_path="/content/llama-2-7b-chat.ggmlv3.q8_0.bin",
    n_ctx=2048)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | 


In [5]:
def query(model, question):
    model_name = pathlib.Path(model.model_path).name
    time_start = time.time()
    prompt = f"Q: {question} A:"
    output = model(prompt=prompt, max_tokens=0) # if max tokens is zero, depends on n_ctx
    response = output["choices"][0]["text"]
    time_elapsed = time.time() - time_start
    display(HTML(f'<code>{model_name} response time: {time_elapsed:.02f} sec</code>'))
    display(HTML(f'<strong>Question:</strong> {question}'))
    display(HTML(f'<strong>Answer:</strong> {response}'))
    print(json.dumps(output, indent=2))

In this last cell we ask the prompt with a string between quotation marks, if it's too long it's possible to surpass the token limitation, we'll be informed with an error instead of the result.  
To solve this just reduce the prompt text.

In [6]:
query(MODEL_Q8_0, "Tell me a story using 100 words")

{
  "id": "cmpl-77f2c6a5-1fd9-4a62-a37c-fa16657704ba",
  "object": "text_completion",
  "created": 1694873617,
  "model": "/content/llama-2-7b-chat.ggmlv3.q8_0.bin",
  "choices": [
    {
      "text": " Sure! Here is a story in exactly 100 words:\n\nOnce upon a time, there was a little girl named Lily. She lived in a tiny house with her mother and father. One day, Lily found a hidden garden behind their house filled with the most beautiful flowers she had ever seen. She picked one and gave it to her mother, who smiled and said, \"Thank you, dear.\" From that day on, Lily tended the garden every day, and it became her happy place.",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 107,
    "total_tokens": 122
  }
}
