Goal: Experiment with llama.cpp

I'll follow the LangChain documentation (https://python.langchain.com/docs/integrations/llms/llamacpp/)

# Installation

In [12]:
!apt-get update && apt-get upgrade -y && apt-get install -y build-essential

Hit:1 http://deb.debian.org/debian bullseye InRelease
Hit:2 http://deb.debian.org/debian-security bullseye-security InRelease
Hit:3 http://deb.debian.org/debian bullseye-updates InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  libc-bin libc6
2 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 3652 kB of archives.
After this operation, 0 B of additional disk space will be used.
Get:1 http://deb.debian.org/debian-security bullseye-security/main amd64 libc6 amd64 2.31-13+deb11u10 [2825 kB]
Get:2 http://deb.debian.org/debian-security bullseye-security/main amd64 libc-bin amd64 2.31-13+deb11u10 [827 kB]
Fetched 3652 kB in 0s (9683 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
(Reading database ... 7040 files and directories currently installed.)
Preparing to unpack .../libc6_2.3

In [61]:
import os
os.environ['CMAKE_C_COMPILER'] = "/usr/bin/cc"
os.environ['CMAKE_CXX_COMPILER'] = "/usr/bin/gcc"

In [63]:
!env | grep CMAKE_C

CMAKE_CXX_COMPILER=/usr/bin/gcc
CMAKE_C_COMPILER=/usr/bin/cc


In [64]:
%pip install --upgrade llama-cpp-python

Collecting llama-cpp-python
  Using cached llama_cpp_python-0.2.69.tar.gz (42.5 MB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Installing backend dependencies ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting diskcache>=5.6.1 (from llama-cpp-python)
  Using cached diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Using cached diskcache-5.6.3-py3-none-any.whl (45 kB)
Building wheels for collected packages: llama-cpp-python
  Building wheel for llama-cpp-python (pyproject.toml) ... [?25ldone
[?25h  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.69-cp311-cp311-linux_x86_64.whl size=3505657 sha256=26de5ce4403a1035d06f24e70cb6cb9e9f6128bb36ffa6935b4a51f1d56f4baa
  Stored in directory: /root/.cache/pip/wheels/e8/1b/ff/b4dba97fbd16e731705b262602ba8f3b672bf4bde54ea0c104
Successfully built llama-cpp-python
Installing collected packages: diskcache, llama-cpp-python
Succe

# Usage

In [65]:
from langchain_community.llms import LlamaCpp
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate

In [66]:
template = """Question: {question}

Answer: Let's work this out in a step by step way to be sure we have the right answer."""

prompt = PromptTemplate.from_template(template)

In [67]:
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

In [76]:
!apt-get install -y curl

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libcurl4 libnghttp2-14 libpsl5 librtmp1 libssh2-1 publicsuffix
The following NEW packages will be installed:
  curl libcurl4 libnghttp2-14 libpsl5 librtmp1 libssh2-1 publicsuffix
0 upgraded, 7 newly installed, 0 to remove and 0 not upgraded.
Need to get 1096 kB of archives.
After this operation, 2320 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian bullseye/main amd64 libnghttp2-14 amd64 1.43.0-1+deb11u1 [77.2 kB]
Get:2 http://deb.debian.org/debian bullseye/main amd64 libpsl5 amd64 0.21.0-1.2 [57.3 kB]
Get:3 http://deb.debian.org/debian bullseye/main amd64 librtmp1 amd64 2.4+20151223.gitfa8646d.1-2+b2 [60.8 kB]
Get:4 http://deb.debian.org/debian bullseye/main amd64 libssh2-1 amd64 1.9.0-2 [156 kB]
Get:5 http://deb.debian.org/debian bullseye/main amd64 libcurl4 amd64 7.74.0-1.3+deb11u11 [347 kB]
Get:6 http://d

In [78]:
# Download model
!curl -L https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf --output llama-2-7b-chat.Q4_K_M.gguf

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1147  100  1147    0     0  11356      0 --:--:-- --:--:-- --:--:-- 11356
100 3891M  100 3891M    0     0  12.4M      0  0:05:11  0:05:11 --:--:-- 14.3M


In [80]:
!pwd

/home/experiments


In [81]:
llm = LlamaCpp(
    model_path="/home/experiments/llama-2-7b-chat.Q4_K_M.gguf",
    temperature=0.75,
    max_tokens=2000,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /home/experiments/llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u

In [82]:
question = """
Question: A rap battle between Stephen Colbert and John Oliver
"""
llm.invoke(question)


Stephen Colbert:
Yo, John, I heard you've been talking smack
About my show, saying it's not as tough
But let me tell

KeyboardInterrupt: 