<a href="https://colab.research.google.com/github/EdBerg21/AI-Professional-Prompts/blob/main/prontoggufllama2_quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to Run Llama 2 Locally with Python (Quickstart)

This Jupyter Notebook is part of a Blog Post on https://swharden.com

https://swharden.com/blog/2023-07-29-ai-chat-locally-with-python/

In [None]:
!pip install llama-cpp-python

Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.25.tar.gz (8.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.8/8.8 MB[0m [31m23.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.25-cp310-cp310-manylinux_2_35_x86_64.whl size=2092950 sha256=0d836cc176ce1b98ccae64b476fe64b67834b51a1e005848d70a4a247178751b
  Stored in directory: /root/.cache/pip/wheels/6f/7e/23/5a9b41241b41025d10c13e31d005d6c1a6bce58fa02870ee3a
Successfully built llama-cpp-python
Installing collected packages: llama-cpp-python
Successfully installed llama-cpp-python-0.2.25


In [None]:
from llama_cpp import Llama

from IPython.display import display, HTML
import json
import time
import pathlib

Load two different models so we can compare their responses to the same prompt.

Note that `n_ctx` is the maximum number of context tokens, and increasing this value increases the maximum length of the responses.

In [None]:
!wget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_S.gguf

--2023-12-25 16:40:05--  https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_S.gguf
Resolving huggingface.co (huggingface.co)... 18.172.134.124, 18.172.134.24, 18.172.134.4, ...
Connecting to huggingface.co (huggingface.co)|18.172.134.124|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/b0/ca/b0cae82fd4b3a362cab01d17953c45edac67d1c2dfb9fbb9e69c80c32dc2012e/632fa75f94b46960de3caf623e79b1d0a4c8564b39a5faacbb8c1fdbd9ba3f52?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27llama-2-7b-chat.Q4_K_S.gguf%3B+filename%3D%22llama-2-7b-chat.Q4_K_S.gguf%22%3B&Expires=1703781605&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcwMzc4MTYwNX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5odWdnaW5nZmFjZS5jby9yZXBvcy9iMC9jYS9iMGNhZTgyZmQ0YjNhMzYyY2FiMDFkMTc5NTNjNDVlZGFjNjdkMWMyZGZiOWZiYjllNjljODBjMzJkYzIwMTJlLzYzMmZhNzVmOTRiNDY5NjBkZTNjYWY2MjNlNzliMWQwYTR

In [None]:
MODEL_Q8_0 = Llama(
    model_path="/content/llama-2-7b-chat.Q4_K_S.gguf",
    n_ctx=2048)

MODEL_Q2_K = Llama(
    model_path="/content/llama-2-7b-chat.Q4_K_S.gguf",
    n_ctx=2048)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


In [None]:
def query(model, question):
    model_name = pathlib.Path(model.model_path).name
    time_start = time.time()
    prompt = f"Q: {question} A:"
    output = model(prompt=prompt, max_tokens=0) # if max tokens is zero, depends on n_ctx
    response = output["choices"][0]["text"]
    time_elapsed = time.time() - time_start
    display(HTML(f'<code>{model_name} response time: {time_elapsed:.02f} sec</code>'))
    display(HTML(f'<strong>Question:</strong> {question}'))
    display(HTML(f'<strong>Answer:</strong> {response}'))
    print(json.dumps(output, indent=2))

In [None]:
query(MODEL_Q2_K, "Why are Jupyter notebooks difficult to maintain?")

{
  "id": "cmpl-22c9a7aa-a7d0-4680-b5d9-97d4d71e4ce4",
  "object": "text_completion",
  "created": 1703522500,
  "model": "/content/llama-2-7b-chat.Q4_K_S.gguf",
  "choices": [
    {
      "text": " Jupyter notebooks can be challenging to maintain due to several reasons. Here are some of the common issues:\n1. Version control: Jupyter notebooks are generated by running code, which means that each notebook is a snapshot of the code at a particular point in time. This makes it difficult to track changes and manage different versions of the notebook.\n2. File size: Jupyter notebooks can be large files, especially if they contain a lot of code or data. This can make it difficult to transfer or share them with others, and may require special tools or workflows to manage.\n3. Organization: Jupyter notebooks often lack a clear organizational structure, which can make it difficult to find specific sections of the notebook or to understand how different parts of the notebook fit together.\n4. C

In [None]:
query(MODEL_Q8_0, "Why are Jupyter notebooks difficult to maintain?")

{
  "id": "cmpl-cdccf216-fb13-46d0-a4e2-3d61371cd1a5",
  "object": "text_completion",
  "created": 1703522874,
  "model": "/content/llama-2-7b-chat.Q4_K_S.gguf",
  "choices": [
    {
      "text": " Jupyter notebooks can be difficult to maintain for several reasons, including: 1.lax organization: Jupyter notebooks are essentially a collection of code cells, and if these cells are not organized in a consistent manner, it can make it difficult to understand the logic of the code. 2.unstructured data: Jupyter notebooks often contain unstructured data, such as images or tables, which can be challenging to manage and maintain. 3.version control issues: Since Jupyter notebooks are essentially a collection of files, version control can become complex, especially when working with collaborators. 4.dependencies management: Jupyter notebooks rely on a variety of dependencies, including libraries and frameworks, which can make it difficult to manage and maintain. 5.lack of documentation: Without 