# LLM Inference with Llama.cpp and Langchain

## Checking GPU Availability

Before loading the model, we check if GPU offloading is supported on the environment.


In [7]:
import pathlib
from llama_cpp.llama_cpp import load_shared_library

In [8]:
def is_gpu_available() -> bool:
    lib = load_shared_library('llama',pathlib.Path('/opt/conda/lib/python3.1/site-packages/llama_cpp/lib'))
    return bool(lib.llama_supports_gpu_offload())

is_gpu_available()

True

## Inference with Langchain Llama.cpp

In [9]:
from langchain_community.llms import LlamaCpp
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_core.prompts import PromptTemplate

In [10]:
# CHANGE THE FOLLOWING VARIABLES

# Make sure the model path is correct for your system!
model_path = "ai-models/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf"

# The number of model's layers to offload to the GPU (if set to -1, all model layers will be offloaded)
n_gpu_layers = -1

In [11]:
template = """
Question: {question}.
"""
prompt = PromptTemplate.from_template(template)

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

In [12]:
llm = LlamaCpp(
    model_path=model_path,
    temperature=0.75,
    max_tokens=200,
    top_p=1,
    n_gpu_layers=n_gpu_layers,
    callback_manager=callback_manager,
    verbose=True,  # Verbose is required to pass to the callback manager
)

llama_model_load_from_file_impl: using device CUDA0 (NVIDIA A100 80GB PCIe) - 75853 MiB free
llama_model_loader: loaded meta data with 27 key-value pairs and 771 tensors from ai-models/DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Qwen 32B
llama_model_loader: - kv   3:                       general.organization str              = Deepseek Ai
llama_model_loader: - kv   4:                           general.basename str              = DeepSeek-R1-Distill-Qwen
llama_model_loader: - kv   5:                         general.size_label str              = 32B
llama_model_loa

In [13]:
question = """
What is Machine Learning?
"""

llm.invoke(question)

</think>

Machine Learning is a subset of Artificial Intelligence (AI) that focuses on the development of algorithms and models that enable computers to learn from and make decisions based on data.

Key characteristics of Machine Learning include:

1. **Data-Driven**: Machine Learning systems rely heavily on data to identify patterns, make predictions, or take actions.

2. **Generalization**: The goal is for a model to generalize well to new, unseen data rather than memorizing the training data.

3. **Adaptability**: Many Machine Learning models can adapt and improve their performance as they are exposed to more data over time.

4. **Automation**: Machine Learning automates the process of identifying patterns in data by building mathematical models from the training data.

5. **Types of Learning**:
   - **Supervised Learning**: The model is trained on labeled data, where each example has an input and a corresponding output (label). The goal is to learn a mapping from inputs to outputs.

llama_perf_context_print:        load time =      63.15 ms
llama_perf_context_print: prompt eval time =      63.08 ms /     7 tokens (    9.01 ms per token,   110.97 tokens per second)
llama_perf_context_print:        eval time =    4660.36 ms /   199 runs   (   23.42 ms per token,    42.70 tokens per second)
llama_perf_context_print:       total time =    5172.80 ms /   206 tokens


'</think>\n\nMachine Learning is a subset of Artificial Intelligence (AI) that focuses on the development of algorithms and models that enable computers to learn from and make decisions based on data.\n\nKey characteristics of Machine Learning include:\n\n1. **Data-Driven**: Machine Learning systems rely heavily on data to identify patterns, make predictions, or take actions.\n\n2. **Generalization**: The goal is for a model to generalize well to new, unseen data rather than memorizing the training data.\n\n3. **Adaptability**: Many Machine Learning models can adapt and improve their performance as they are exposed to more data over time.\n\n4. **Automation**: Machine Learning automates the process of identifying patterns in data by building mathematical models from the training data.\n\n5. **Types of Learning**:\n   - **Supervised Learning**: The model is trained on labeled data, where each example has an input and a corresponding output (label). The goal is to learn a mapping from in