# OLlaMA for Running LLM's Locally

In [11]:
# imports requires to communicate with the local api
import requests
import json

## Installing XTerm / Initializing OLlaMA

As it is running locally, I could install OLlaMA and have it started/initialized by downloading it using the MacOS installer provided on the [ollama.com](https://ollama.com) website. Similarly, I don't need to install XTerm to interact with the OLlaMA model or API as the local jupyter notebook is not running on a colab environment.

## Using the OLlaMA API to run LLMs (LlaMA 3, Gemma 2b) and Multimodal (Llava-LlaMA)  Models
All models are given the same prompt: to take the role of a journalist who cites their responses.

In [12]:
# the run_ollama.sh script is not needed anymore as it is running and initialized locally
# instead, we can directly communicate with the api using the requests library
data = {
    "name": "local_llama3",
    "modelfile": "FROM llama3\nSYSTEM You are an avid journalist. When questioned about news, you always cite multiple sources as evidence."
}

response_to_create_endpoint = requests.post('http://localhost:11434/api/create', json=data)


In [17]:
response_to_create_endpoint #implies a successful creation of the llama3 model

<Response [200]>

In [18]:
data = {
    "name": "local_gemma2b",
    "modelfile": "FROM gemma2\nSYSTEM You are an avid journalist. When questioned about news, you always cite multiple sources as evidence."
}

response_to_create_endpoint = requests.post('http://localhost:11434/api/create', json=data)

In [19]:
response_to_create_endpoint #implies a successful creation of the gemma2 model

<Response [200]>

In [22]:
data = {
    "name": "local_llava-llama3",
    "modelfile": "FROM llava-llama3\nSYSTEM You are an avid journalist. When questioned about news, you always cite multiple sources as evidence."
}

response_to_create_endpoint = requests.post('http://localhost:11434/api/create', json=data)

In [23]:
response_to_create_endpoint #implies a successful creation of the llava-llama3 model

<Response [200]>

## Checking the Status of Models

In [24]:
available_models = requests.get('http://localhost:11434/api/tags')

In [25]:
available_models.json() #implies that the models are available

{'models': [{'name': 'llava-llama3:latest',
   'model': 'llava-llama3:latest',
   'modified_at': '2024-09-12T16:25:45.002747603-07:00',
   'size': 5545682182,
   'digest': '44c161b1f46523301da9c0cc505afa4a4a0cc62f580581d98a430bb21acd46de',
   'details': {'parent_model': '',
    'format': 'gguf',
    'family': 'llama',
    'families': ['llama', 'clip'],
    'parameter_size': '8B',
    'quantization_level': 'Q4_K_M'}},
  {'name': 'local_llava-llama3:latest',
   'model': 'local_llava-llama3:latest',
   'modified_at': '2024-09-12T16:25:45.020100703-07:00',
   'size': 5545682359,
   'digest': 'd625ea08843b09c46106fcfb2f179a551b43cc1851fb44a995bf6be801674342',
   'details': {'parent_model': '',
    'format': 'gguf',
    'family': 'llama',
    'families': ['llama', 'clip'],
    'parameter_size': '8.0B',
    'quantization_level': 'Q4_K_M'}},
  {'name': 'local_gemma2b:latest',
   'model': 'local_gemma2b:latest',
   'modified_at': '2024-09-12T16:11:56.374821872-07:00',
   'size': 5443152592,
   

As seen in the json above, the models created locally are available - along with the non-local models that were created within the remote colab environment.

## Using the OLLaMA API to run Inference on the Models
Although the OLLaMA API supports stream, I'll be using the non-stream request version to chat with the models to simplify the showcase.

In [26]:
llama3_chat_data = {
    "model": "local_llama3:latest",
    "messages": [
        {
            "role": "user",
            "content": "What is the most recent news you have of the 2024 election? Provide the specific dates."
        }
    ],
    "stream": False
}

response_to_llama3_chat = requests.post('http://localhost:11434/api/chat', json=llama3_chat_data)

In [29]:
from IPython.core.display import display, HTML

response_content = response_to_llama3_chat.json()['message']['content']
display(HTML(response_content))

  from IPython.core.display import display, HTML


In [30]:
gemma2_chat_data = {
    "model": "local_gemma2b:latest",
    "messages": [
        {
            "role": "user",
            "content":  "What is the most recent news you have of the 2024 election? Provide the specific dates."
        }
    ],
    "stream": False
}

response_to_gemma2_chat = requests.post('http://localhost:11434/api/chat', json=gemma2_chat_data)

In [31]:
response_content = response_to_gemma2_chat.json()['message']['content']
display(HTML(response_content))

In [32]:
multimodal_chat_data = {
    "model": "llava-llama3:latest",
    "messages": [
        {
            "role": "user",
            "content":  "What is the most recent news you have of the 2024 election? Provide the specific dates."
        }
    ],
    "stream": False
}

response_to_multimodal_chat = requests.post('http://localhost:11434/api/chat', json=multimodal_chat_data)

In [33]:
response_content = response_to_multimodal_chat.json()['message']['content']
display(HTML(response_content))

### Encoding an Image in Base64 for Input to LlaVA-LLaMA

In [34]:
import base64

def encode_image_to_base64(image_path):
    with open(image_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
    return encoded_string

# Example usage:
image_path = './images/debate_image.png'
encoded_image = encode_image_to_base64(image_path)

In [35]:
encoded_image[:10] # Displaying the first 10 characters of the encoded image

'iVBORw0KGg'

In [47]:
multimodal_image_chat_data = {
    "model": "llava-llama3:latest",
    "messages": [
        {
            "role": "user",
            "content":  "Describe who are the people in this image:",
            "image": [encoded_image]
        }
    ],
    "stream": False
}

response_to_multimodal_image_chat = requests.post('http://localhost:11434/api/chat', json=multimodal_image_chat_data)

In [48]:
response_content = response_to_multimodal_image_chat.json()['message']['content']
display(HTML(response_content))