# Load LlaMA model with `Ollama` & `Langchain` (CPU only)


## **Workflow**
1. **Installation**
2. **Fetching & Loading the Model**


## **Installation**

In [2]:
#!pip3 install --upgrade pip
!pip3 install langchain



Please follow this instruction before running the notebook:

- Download and run the [Ollama app](https://ollama.ai/download)  
- From command line, fetch the Llama 2 7B model with `ollama pull llama2`  
- When the app is running, all models are automatically served on localhost:11434

## **Fetching & Loading the Model**

**Without Streaming**

In [5]:
from langchain_community.llms import Ollama

llm = Ollama(model="llama2",
             base_url='http://localhost:11434',
             num_gpu = 0)
llm.invoke("Which planet is the closest to the sun?")

'The closest planet to the sun is Mercury. On average, Mercury is about 58 million kilometers (36 million miles) away from the sun.'

**With Streaming**

In [10]:
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = Ollama(
    model="llama2",
    num_gpu = 0,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()])
)
llm.invoke("Which planet is the closest to the sun?")

The closest planet to the sun is Mercury. On average, Mercury is about 58 million kilometers (36 million miles) away from the sun.

'The closest planet to the sun is Mercury. On average, Mercury is about 58 million kilometers (36 million miles) away from the sun.'

By default, GPUs is on with Ollama (`num_gpu=1`). To use CPU only, we need to set `num_gpu=0`.  
- `num_gpu`: The number of GPUs to use. On macOS it defaults to 1 to enable metal support, 0 to disable.


Lanchain Ollama class documention [here](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.ollama.Ollama.html?highlight=ollama#)