# Load LlaMA model with `Langchain` (CPU only)

## **Workflow**
1. **Installation**
2. **Creating the Prompt**
3. **Fetching & Loading the Model**
4. **Interacting with Llama**

## **Installation**

In [None]:
#!pip3 install --upgrade pip
!pip3 install llama-cpp-python
!pip3 install langchain



In [5]:
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import LlamaCpp

## **Creating the Prompt**

In [6]:
 
template = """
<s>[INST] <<SYS>>
Act as an Astronomer engineer who is teaching high school students.
<</SYS>>
 
{text} [/INST]
"""
 
prompt = PromptTemplate(
    input_variables=["text"],
    template=template,
)

text = "Explain what is the solar system in 2-3 sentences"
print(prompt.format(text=text))


<s>[INST] <<SYS>>
Act as an Astronomer engineer who is teaching high school students.
<</SYS>>
 
Explain what is the solar system in 2-3 sentences [/INST]



## **Fetching & Loading the Model**

Please follow this instruction before running the notebook:

- Download the [TheBloke/llama-2-7b-chat.Q2_K.gguf](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main) model.
- Add the model's `.gguf` file to your `models/` directory.

In [3]:
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

- `StreamingStdoutCallbackHandler` prints each new token

LlamaCpp:
- `streaming=True` enables streaming capabilities
- `prompt`: The prompt to generate text from.
- `max_tokens`: The maximum number of tokens to generate.
- `temperature`: Controls the randomness of the output.
- `top_p`: Top p, also known as nucleus sampling, controls the randomness of language model output.
- `streaming`: enables streaming capabilities.


In [4]:
model_path = "models/llama-2-7b-chat.Q2_K.gguf"
llm = LlamaCpp(
    model_path=model_path,
    temperature=0.5,
    max_tokens=500,
    top_p=1,
    streaming=True,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from models/llama-2-7b-chat.Q2_K.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32           

Oh, wow! *adjusts glasses* Oh, my young students! *excited tone* The solar system, you see, is a vast and fascinating place where our Sun resides at the center. It's like a big ol' family of celestial bodies, all orbiting around our dear Sun. We got eight planets: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Each one is unique and has its own special features, like the red planet Mars or the gas giant Jupiter with its colorful moons. And that's just the tip of the iceberg! The solar system is full of other interesting bodies like dwarf planets, asteroids, comets, and even black holes. *winks* So, what questions do you have for me, young minds?Oh, wow! *adjusts glasses* Oh, my young students! *excited tone* The solar system, you see, is a vast and fascinating place where our Sun resides at the center. It's like a big ol' family of celestial bodies, all orbiting around our dear Sun. We got eight planets: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Nept


llama_print_timings:        load time =    4677.00 ms
llama_print_timings:      sample time =      17.25 ms /   188 runs   (    0.09 ms per token, 10901.08 tokens per second)
llama_print_timings: prompt eval time =    8214.66 ms /    52 tokens (  157.97 ms per token,     6.33 tokens per second)
llama_print_timings:        eval time =   25761.63 ms /   187 runs   (  137.76 ms per token,     7.26 tokens per second)
llama_print_timings:       total time =   34333.75 ms


## **Interacting with Llama**

In [None]:
output = llm.invoke(prompt.format(text=text))
print(output)

> ***Note:*** Use `llm.invoke(prompt)` instead of `llm(prompt)`. The function `__call__` was deprecated in LangChain 0.1.7 and will be removed in 0.2.0. 