### Tutorial on loading GGUF llama model within langchain in a CPU-only environment

This tutorial is based on the official llamacpp documentation [here](https://python.langchain.com/docs/integrations/llms/llamacpp) and the one in llama docker [here](https://github.com/penkow/llama-docker/blob/main/llama_cpu.py)

This tutorial runs successfully with **langchain==0.0.354** and **llama-cpp-python=0.2.27**

#### step0. prepare the environment with llama_cpp and langchain

In [12]:
from llama_cpp import Llama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import LlamaCpp

#### step1. load the gguf model with llama_cpp

In [13]:
model_path = "./model/llama2-7b-chat-Q4KM/llama-2-7b-chat.Q4_K_M.gguf"

llama_gguf = Llama(model_path=model_path, verbose=True)

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from ./model/llama2-7b-chat-Q4KM/llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.he

In [22]:
# create the llama-style prompt
system_message = "You are a helpful assistant"
user_message = "Generate a list of 5 funny dog names"

prompt = f"""<s>[INST] <<SYS>>
{system_message}
<</SYS>>
{user_message} [/INST]"""

# Model parameters
max_tokens = 1000

# Run the model
output = llama_gguf(prompt, max_tokens=max_tokens, echo=True)
print(output)

Llama.generate: prefix-match hit


{'id': 'cmpl-85ea91f4-1183-46aa-b30a-55306523c9b9', 'object': 'text_completion', 'created': 1704456582, 'model': './model/llama2-7b-chat-Q4KM/llama-2-7b-chat.Q4_K_M.gguf', 'choices': [{'text': "<s>[INST] <<SYS>>\nYou are a helpful assistant\n<</SYS>>\nGenerate a list of 5 funny dog names [/INST]  Of course, I'd be happy to help you with that! Here are five funny dog names that might make you and your furry friend smile:\n\n1. Captain Fluffy Pants - This name is perfect for a dog that's as fluffy and adorable as it is adventurous and playful.\n2. Sir Bark-a-Lot - For the dog that loves to bark at everything, this name is sure to bring a smile to your face.\n3. Puddles McSplashy - This name is perfect for a dog that loves to splash around in puddles and get wet and muddy.\n4. Barky Boo Boo - For the dog that's always up for a good cuddle, this name is sure to bring a smile to your face.\n5. Sir Whiskerface von Fluffypants - This name is perfect for a dog with a fluffy beard and whiskers,


llama_print_timings:        load time =     223.86 ms
llama_print_timings:      sample time =      46.48 ms /   257 runs   (    0.18 ms per token,  5528.67 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    6831.30 ms /   257 runs   (   26.58 ms per token,    37.62 tokens per second)
llama_print_timings:       total time =    7370.01 ms


In [23]:
# the total generated output text with prompt
print(output['choices'][0]['text'])

<s>[INST] <<SYS>>
You are a helpful assistant
<</SYS>>
Generate a list of 5 funny dog names [/INST]  Of course, I'd be happy to help you with that! Here are five funny dog names that might make you and your furry friend smile:

1. Captain Fluffy Pants - This name is perfect for a dog that's as fluffy and adorable as it is adventurous and playful.
2. Sir Bark-a-Lot - For the dog that loves to bark at everything, this name is sure to bring a smile to your face.
3. Puddles McSplashy - This name is perfect for a dog that loves to splash around in puddles and get wet and muddy.
4. Barky Boo Boo - For the dog that's always up for a good cuddle, this name is sure to bring a smile to your face.
5. Sir Whiskerface von Fluffypants - This name is perfect for a dog with a fluffy beard and whiskers, and it's sure to make you and your friends laugh.
I hope these suggestions help inspire you to find the perfect funny dog name for your new furry friend!


#### step2. load the model with langchain built-in LlamaCpp

In [16]:
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

In [3]:
llm_cpp = LlamaCpp(
    model_path=model_path,
    temperature=0.75,
    max_tokens=2000,
    top_p=1,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from ./model/llama2-7b-chat-Q4KM/llama-2-7b-chat.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.he

In [4]:
llm_cpp

LlamaCpp(callbacks=<langchain_core.callbacks.manager.CallbackManager object at 0x7f2094564be0>, client=<llama_cpp.llama.Llama object at 0x7f2094566530>, model_path='./model/llama2-7b-chat-Q4KM/llama-2-7b-chat.Q4_K_M.gguf', max_tokens=2000, temperature=0.75, top_p=1.0)

In [5]:
# build a prompt template
template = """Question: {question}

Answer: Let's work this out in a step by step way to be sure we have the right answer."""

prompt = PromptTemplate(template=template, input_variables=["question"])

In [6]:
# build a chain with the gguf model and the prompt
llm_chain = LLMChain(prompt=prompt, llm=llm_cpp)

In [7]:
# run the chain with the prompt filled with the user question
question = "What NBA team won the champion in the year Justin Bieber was born?"
llm_chain.run(question)


Justin Bieber was born on March 1, 1994. The NBA (National Basketball Association) season typically runs from October to June of the following year. So if we look at the NBA champions for each year from 1994 to present, we can find out which team won the championship in the year Justin Bieber was born.
Here are the NBA champions for each year since 1994:
* 1994: Chicago Bulls
So, the NBA team that won the championship in the year Justin Bieber was born (1994) is the Chicago Bulls!


llama_print_timings:        load time =      74.52 ms
llama_print_timings:      sample time =      23.16 ms /   136 runs   (    0.17 ms per token,  5871.69 tokens per second)
llama_print_timings: prompt eval time =     418.31 ms /    44 tokens (    9.51 ms per token,   105.19 tokens per second)
llama_print_timings:        eval time =    4210.45 ms /   135 runs   (   31.19 ms per token,    32.06 tokens per second)
llama_print_timings:       total time =    5001.16 ms


'\nJustin Bieber was born on March 1, 1994. The NBA (National Basketball Association) season typically runs from October to June of the following year. So if we look at the NBA champions for each year from 1994 to present, we can find out which team won the championship in the year Justin Bieber was born.\nHere are the NBA champions for each year since 1994:\n* 1994: Chicago Bulls\nSo, the NBA team that won the championship in the year Justin Bieber was born (1994) is the Chicago Bulls!'

In [10]:
# or just run the gguf model directly
llm_cpp(prompt.format(question=question))

 Here are the NBA teams

Llama.generate: prefix-match hit


 and their championships since 1970 when Justin Bieber was born:

* 1970 - Seattle Super Sonics (won championship)
* 1980 - Los Angeles Lakers (won championship)
* 1990 - Detroit Pistons (won championship)
* 2000 - Los Angeles Lakers (won championship)
* 2010 - Dallas Mavericks (won championship)

So, since Justin Bieber was born in 1994, the NBA team that won the championship in the year he was born is:
Detroit Pistons.


llama_print_timings:        load time =      74.52 ms
llama_print_timings:      sample time =      24.44 ms /   145 runs   (    0.17 ms per token,  5933.14 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    4467.86 ms /   145 runs   (   30.81 ms per token,    32.45 tokens per second)
llama_print_timings:       total time =    4843.94 ms


' Here are the NBA teams and their championships since 1970 when Justin Bieber was born:\n\n* 1970 - Seattle Super Sonics (won championship)\n* 1980 - Los Angeles Lakers (won championship)\n* 1990 - Detroit Pistons (won championship)\n* 2000 - Los Angeles Lakers (won championship)\n* 2010 - Dallas Mavericks (won championship)\n\nSo, since Justin Bieber was born in 1994, the NBA team that won the championship in the year he was born is:\nDetroit Pistons.'