# Localized Large Language Models

In this notebook, we explore localized large language models available through **Ollama**, specifically focusing on **Llama3** and **Mistral**. Ollama facilitates running these models on local machines, enabling customizable AI solutions.

## Ollama Overview

Ollama provides a streamlined interface for running large language models such as **Llama3** and **Mistral** locally. For a detailed guide, you can refer to the [Ollama GitHub page](https://github.com/ollama/ollama).

Since I have a windows system I have used the **exe** file which is available on the website. Once installed you can follow the instructions below 

## Installation

For Windows users, you can download and install the **Ollama** application from the official website. Follow these steps to get started:

1. **Install Ollama**:
   - Download the `.exe` file and install it on your Windows system.
2. **Verify Installation**:
   - Open a command prompt and type `ollama` to ensure the installation is successful.
   - Run [The API](http://localhost:11434/) in a browser and you should see the below output 
   
![alt text](images/Running_1.jpg)         

## Using Ollama 

1. I have used Ollama to run the models using the following 2 methods 

    - `CURL commands` - I used CURL commands to test out both the models and compare them
    - `Command prompt` - This was used to interact with the models and to make the Q&A feature more intuitive and feel less programatic


### Steps to run Llama 3
    - Pulling the Llama3 source code locally
    - Lets see the available environments 
![alt text](images/Running_Ollama1.jpg)

    - Lets get Llama3 up and running 
    - Once we run Llama3 we can test it out with some examples 

![alt text](images/Running_Ollama2.jpg)

    - We can also run custom Llama3 models and prompt it to do different things 
    - Here is an example I tried out using the modelfile present in the repository
    - running the below code creates a new model
    
 ```ollama create myllama3 -f myllama3.modelfile```

![alt text](images/Running_Custom_Ollama3.jpg)    
    
    - Testing out the new model we can see our custom AI bot has been deployed 
    
    
![alt text](images/Running_Custom_Ollama4.jpg)        


### Steps to run Mistral

    - Similar to running Llama3 we can run Mistral by installing it Through Ollama
    - When we list out the models we can see that our Mistral model is available to us 
    
![alt text](images/Running_Custom_mistral5.jpg)           

    - Testing out the Model
    
![alt text](images/Running_Custom_mistral5_5.jpg)           

    - Similar to our Custom LLama3 model we create a custom Mistral model to see how it performs
     - Testing out the new model we can see our custom Mistral AI bot has been deployed     
![alt text](images/Running_Custom_mistral6.jpg)               

   


## Comparing the Large Language Models

We can compare the large language models by passing using the Curl command and obtaining important information 

```
curl -X POST http://localhost:11434/api/generate -d "{\"model\": \"llama3\",  \"prompt\":\"Tell me a fact about Llama?\", \"stream\": false}"
```
we get the following parameters upon running the API 

- `total_duration`: time spent generating the response
- `load_duration`: time spent in nanoseconds loading the model
- `prompt_eval_count`: number of tokens in the prompt
- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
- `eval_count`: number of tokens in the response
- `eval_duration`: time in nanoseconds spent generating the response
- `context`: an encoding of the conversation used in this response, this can be sent in the next request to keep a conversational memory
- `response`: empty if the response was streamed, if not streamed, this will contain the full response

Now I have created a custom script to run a set of questions and compare some of the parameters across the models

here are some of the outputs I got when comparing the following parameters 

- `prompt_eval_duration`: time spent in nanoseconds evaluating the prompt
- `eval_duration`: time in nanoseconds spent generating the response

           

![alt text](images/Prompt_Eval_Duration_.jpg)             


![alt text](images/Prompt_Eval_Duration_.jpg)    

In [None]:
# -*- coding: utf-8 -*-
"""
Created on Sun Jun 16 15:06:01 2024

@author: abhis
"""


import requests
from bs4 import BeautifulSoup'
import json
import pandas as pd
import time
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Set the style of the visualization
sns.set(style="whitegrid")



'''
This function is used to run a particular Llama model and generate the metrics
''' 
def getInsights(model, question):
    url = 'http://localhost:11434/api/generate'
    
    data = {
        "model": model,
        "prompt": question,
        "stream": False
    }
    
    data_json = json.dumps(data)
    
    try:
        r = requests.post(url, data=data_json, headers={'Content-Type': 'application/json'})
        r.raise_for_status() 
        df = r.json() 
    except requests.exceptions.RequestException as e:
        print(f"Request failed: {e}")
        return None

    return df

'''
This function plots the metrics obtained upon running each question
''' 
def plot_metrics(df, metric):
    plt.figure(figsize=(10, 6))
    sns.lineplot(data=df, x="Question", y=metric, hue="Model", marker="o")
    plt.title(f"Comparison of {metric} Across Questions")
    plt.xticks(rotation=45, ha='right')
    plt.xlabel("Question")
    plt.ylabel(f"{metric} (s)")
    plt.legend(title="Model")
    
    plt.grid(True)
    plt.tight_layout()
    plt.savefig(os.getcwd()+"/"+metric+".png")
    plt.show()


'''
Converting nanoseconds to seconds for better understanding
'''
def nanoseconds_to_seconds(nanoseconds):
    return nanoseconds / 1_000_000_000

questions = [
    "Who are you and what is the capital of India?",
    "What is the largest mammal in the world?",
    "How do you make an apple pie?",
    "What are the benefits of exercise?",
    "Explain the theory of relativity.",
]

metrics_data = {
    "Model": [],
    "Question": [],
    "Eval Duration (s)": [],
    "Prompt Eval Duration (s)": []
}

'''
Collect metrics for both models

mymistral - This is the custom Mistral model I made using the modelfile 

myllama3 - This is the custom Llama3 model I made using the modelfile 

'''
for model in ["mymistral", "myllama3"]:
    for question in questions:
        response = getInsights(model, question)
        if response:
            metrics_data["Model"].append(model)
            metrics_data["Question"].append(question)
            metrics_data["Eval Duration (s)"].append(nanoseconds_to_seconds(response['eval_duration']))
            metrics_data["Prompt Eval Duration (s)"].append(nanoseconds_to_seconds(response['prompt_eval_duration']))

a = metrics_data
df_metrics = pd.DataFrame(metrics_data)
print(df_metrics)

# Plot each metric
for metric in ["Eval Duration (s)", "Prompt Eval Duration (s)"]:
    plot_metrics(df_metrics, metric)

