<a href="https://colab.research.google.com/github/Luxadevi/Ollama-Colab-Integration/blob/main/Ollama_publicV2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup Instructions for Ollama API Access and NAT Tunneling

This notebook provides step-by-step instructions for setting up the Ollama service with NAT tunneling. The following features are covered:

- **Tunneling**: Establishing a secure connection to access the Ollama API.
- **Background Processing**: Running Ollama and the tunnel in the background.
- **Monitoring**: Keeping an eye on the status of Ollama and the tunnel.
- **Logging**: Capturing all standard output (stdout) and standard error (stderr) messages from Ollama and natsrv.py.
- **Interactive Modelfile Creator**: Creating custom Modelfiles for tailored Ollama behavior.

## Getting Started

Before proceeding, please provide the required information:

1. **Secret Password**: Enter the secret password for tunnel authentication.
2. **Endpoint IP Address**: Specify the IP address for the NAT tunnel endpoint.

These details are essential for the secure setup of the Ollama service and NAT tunneling.

**Note**: Ensure that you have the necessary dependencies installed before following these instructions. Refer to the "Dependency Installation" section for guidance.


In [None]:
secret_pass = input("Please enter the secret password for --secret: ")
admin_ip = input("Please enter the IP address for --admin: ")

Please enter the secret password for --secret: 5sdawescvf
Please enter the IP address for --admin: 85.145.210.99


## Dependency Installation

This script encompasses the installation of essential dependencies:

- Latest CUDA drivers and toolkit.
- Ollama.
- Nat-tunnel configuration.
- PCIutils for GPU information retrieval.
- Python array-based model loading.

In [None]:
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
!mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
!wget https://developer.download.nvidia.com/compute/cuda/12.3.0/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.0-545.23.06-1_amd64.deb
!dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.0-545.23.06-1_amd64.deb
!cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
!apt-get update
!apt-get -y install cuda-toolkit
!apt-get -y install cuda
!sudo apt install pciutils
!lspci
!wget https://ollama.ai/install.sh -O install.sh
!chmod +x install.sh
!./install.sh
!git clone https://github.com/rofl0r/nat-tunnel.git
!pip install httpx
!rm rm cuda-repo-ubuntu2204-12-3-local_12.3.0-545.23.06-1_amd64.deb
!pip install asyncio
import requests
# URL containing the JSON data
url = 'https://raw.githubusercontent.com/Luxadevi/Ollama-Colab-intergration/main/models.json'
# Fetch the JSON data from the URL
response = requests.get(url)
# Parse the JSON content into a Python dictionary
models = response.json()
# models is now a Python dictionary containing your data
print(models)


# Ollama Setup and Tunnel Configuration

## Tunneling Setup Instructions

To get started with tunneling for the Ollama service, follow these steps:

1. **Download the NAT Tunnel Script**:
   Download the script from the GitHub repository.
   - [Nat-Tunnel on GitHub](https://github.com/rofl0r/nat-tunnel)

2. **Port Requirements**:
   Ensure that at least one port is exposed to receive connections.

3. **Server-Side Configuration**:
   On the server side (where you receive the connection), execute the following command:

   ```sh
   python3 natsrv.py --mode server --secret s3cretP4ss --public 0.0.0.0:7000 --admin 0.0.0.0:8000

**Explanation of Parameters:**
- `--mode server`: This sets the NAT tunnel script to operate in server mode.
- `--secret s3cretP4ss`: A customizable secret code. This code will be used for authentication and should also be provided when prompted in this notebook.
- `--public 0.0.0.0:7000`: Defines port 7000 as the public-facing port for Ollama. This port is necessary for remote access; omit this if you only need local access.
- `--admin 0.0.0.0:8000`: Port 8000 is dedicated to administrative controls. Ensure this port is forwarded for remote management capabilities.


In [None]:
import subprocess
import threading
import time
import logging.handlers
import httpx
import sys
import os

def create_logger(name, filename, level, formatter):
    logger = logging.getLogger(name)
    handler = logging.handlers.RotatingFileHandler(filename, maxBytes=5*1024*1024, backupCount=5)
    handler.setFormatter(formatter)
    logger.addHandler(handler)
    logger.setLevel(level)
    return logger

status_formatter = logging.Formatter('[%(asctime)s] [%(levelname)s] [%(name)s] - %(message)s')
error_formatter = logging.Formatter('[%(asctime)s] [%(levelname)s] [%(name)s] - %(message)s')

loggers = {
    "Status": create_logger("Status", "status.log", logging.INFO, status_formatter),
    "NatsrvStatus": create_logger("NatsrvStatus", "natsrv.log", logging.INFO, status_formatter),
    "OllamaStatus": create_logger("OllamaStatus", "ollama.log", logging.INFO, status_formatter),
    "Error": create_logger("Error", "error.log", logging.ERROR, error_formatter),
    "NatsrvError": create_logger("NatsrvError", "natsrv_error.log", logging.ERROR, error_formatter),
    "OllamaError": create_logger("OllamaError", "ollama_error.log", logging.ERROR, error_formatter)
}

class ProcessMonitor:
    def __init__(self):
        self.processes = {}
        self.is_monitoring = True

    def handle_output(self, process_name):
        process = self.processes[process_name]
        logger_status = loggers[f"{process_name.capitalize()}Status"]
        for line in iter(process.stdout.readline, b''):
            logger_status.info(line.decode().strip())

    def check_url_and_restart_natsrv(self):
        while self.is_monitoring:
            try:
                response = httpx.get(f"http://{admin_ip}:7000/")
                if response.status_code != 200:
                    raise Exception("Non-200 status code")
            except Exception as e:
                loggers["Error"].error(f"Error accessing the URL: {e}. Restarting natsrv.py...")
                if self.processes.get('natsrv'):
                    self.processes['natsrv'].terminate()
                self.run_natsrv()
            time.sleep(5)

    def run_natsrv(self):
        cmd = f"python3 /content/nat-tunnel/natsrv.py --mode client --secret {secret_pass} --local localhost:11434 --admin {admin_ip}:8000"
        # Redirect subprocess output to /dev/null
        with open(os.devnull, 'wb') as devnull:
            self.processes['natsrv'] = subprocess.Popen(cmd, shell=True, stdout=devnull, stderr=devnull)
        loggers["NatsrvStatus"].info(f"Started natsrv with command: {cmd}")

    def run_ollama(self):
        os.environ["OLLAMA_HOST"] = "0.0.0.0:11434"
        os.environ["OLLAMA_ORIGINS"] = "http://0.0.0.0:*"

        cmd = "ollama serve"
        # Redirect subprocess output to /dev/null
        with open(os.devnull, 'wb') as devnull:
            self.processes['ollama'] = subprocess.Popen(cmd, shell=True, stdout=devnull, stderr=devnull)
        loggers["OllamaStatus"].info(f"Started ollama with command: {cmd}")


    def monitor_process(self, process_name):
        while self.is_monitoring:
            if self.processes[process_name].poll() is not None:
                loggers["Status"].warning(f"{process_name} process has stopped. Restarting...")
                if process_name == 'natsrv':
                    self.run_natsrv()
                else:
                    self.run_ollama()
            time.sleep(5)

    def start(self):
        self.run_ollama()
        time.sleep(2)
        self.run_natsrv()

        threading.Thread(target=self.monitor_process, args=('ollama',)).start()
        threading.Thread(target=self.monitor_process, args=('natsrv',)).start()
        threading.Thread(target=self.check_url_and_restart_natsrv).start()

    def stop(self):
        self.is_monitoring = False
        for p in self.processes.values():
            p.terminate()

if __name__ == '__main__':
    monitor = ProcessMonitor()
    monitor.start()


# Interactive Modelfile Maker

## Overview
Create your own modelfile with ease using this intuitive tool. Tailor it according to your needs, choose your model and model type, and get started in no time!

### Features
- **Model Selection**: Pick the model that fits your requirements.
- **Modeltype Customization**: Select from various available model types.
- **Naming**: Input fields for naming your model determine how the API will identify it.
- **Parameterization**: Flexibility to use specific PARAMETERS or opt not to use any.
- **Template Variables**: Add custom template variables or choose not to include any.

### Disclaimer
When generating and deploying, please note that it might take some time before you see the output. For quicker feedback, you can use the following `curl` command:

```sh
curl -X POST http://127.0.0.1:7000/api/create -d '{
  "name": "modelname",
  "path": "/content/Modelfile"
}'



In [None]:
import ipywidgets as widgets
from IPython.display import display, FileLink
import os
import requests

# Sample models dictionary

parameters = {
    'mirostat': 'Enable mirostat (default: false)',
    'mirostat_eta': 'Mirostat eta (default: 0.1)',
    'mirostat_tau': 'Mirostat tau (default: 0.1)',
    'num_ctx': 'Number of tokens of context to use (default: 4096)',
    'num_gqa': 'Number of tokens to generate per request (default: 256)',
    'num_gpu': 'Number of GPUs to use (default: 1)',
    'num_thread': 'Number of threads to use (default: 1)',
    'repeat_last_n': 'Repeat last n tokens of input (default: 0)',
    'repeat_penalty': 'Repeat penalty (default: 1.0)',
    'temperature': 'Sampling temperature (default: 0.8)',
    'seed': 'Random seed',
    'stop': 'Stop sequence for generation',
    'tfs_z': 'Enable TFS z (default: false)',
    'num_predict': 'Number of tokens to generate (default: 256)',
    'top_k': 'Top-k sampling (default: 0)',
    'top_p': 'Top-p sampling (default: 1.0)'
}
# Create a dropdown for model selection
model_dropdown = widgets.Dropdown(
    options=models.keys(),
    description='Model:',
    disabled=False,
)

# Input field for the name of the Modelfile
modelfile_name_input = widgets.Text(value='', placeholder='Enter Modelfile name', description='Modelfile Name:', layout=widgets.Layout(width='300px'))

# Input field for the name in the data variable
data_name_input = widgets.Text(value='', placeholder='Enter name for data', description='Name :', layout=widgets.Layout(width='300px'))

# Create a dropdown for model type based on selected model
model_type_dropdown = widgets.Dropdown(
    options=models[model_dropdown.value],
    description='Model Type:',
    disabled=False,
)

def update_model_type_options(change):
    model_type_dropdown.options = models[change['new']]

model_dropdown.observe(update_model_type_options, names='value')

# Create checkboxes for PARAMETERS with input fields showing the description
checkboxes = []
input_fields = []
for param, desc in parameters.items():
    checkbox = widgets.Checkbox(value=False, description=param)
    input_field = widgets.Text(value='', placeholder=desc)
    checkboxes.append(checkbox)
    input_fields.append(input_field)

# Function to generate and save Modelfile
def generate_modelfile(btn=None):
    modelfile_content = f"FROM {model_dropdown.value}:{model_type_dropdown.value}\n"

    for checkbox, input_field in zip(checkboxes, input_fields):
        if checkbox.value:
            modelfile_content += f"PARAMETER {checkbox.description} {input_field.value}\n"

    system_value = template_input_fields[0].value.strip()
    prompt_value = template_input_fields[1].value.strip()
    first_value = template_input_fields[2].value.strip()

    if system_value or prompt_value or first_value:
        modelfile_content += 'TEMPLATE """\n'
        if first_value:
            modelfile_content += "{{- if .First }}\n"
            modelfile_content += "### System:\n"
            modelfile_content += f"{{ {system_value} }}\n"
            modelfile_content += "{{- end }}\n"
        modelfile_content += '"""'

    filename = modelfile_name_input.value
    with open(filename, "w") as file:
        file.write(modelfile_content)

    if btn:  # Only display the link if the function was called by a button click
        display(FileLink(filename))

# Function to generate, save, and deploy Modelfile
def generate_and_deploy(btn):
    generate_modelfile()  # Generate and save the Modelfile

    data = {
        "name": data_name_input.value,
        "path": f"/content/{modelfile_name_input.value}"
    }

    response = requests.post("http://localhost:11434/api/create", json=data, headers={"Content-Type": "application/json"})
    print(response.text)  # Display the response in the notebook output

# Button to generate Modelfile
generate_button = widgets.Button(description="Generate Modelfile")
generate_button.on_click(generate_modelfile)

# Button to generate and deploy Modelfile
deploy_button = widgets.Button(description="Generate and Deploy")
deploy_button.on_click(generate_and_deploy)

# Update descriptions for Template Variables Input Fields
template_input_fields = [
    widgets.Text(value='', placeholder='', description='System:'),
    widgets.Text(value='', placeholder='', description='Prompt:'),
    widgets.Text(value='', placeholder='', description='First:')
]

template_input_fields[0].placeholder = "The system prompt used to specify custom behavior. This must also be set in the Modelfile as an instruction."
template_input_fields[1].placeholder = "The incoming prompt. This is not specified in the model file and will be set based on input."
template_input_fields[2].placeholder = "A boolean value used to render specific template information for the first generation of a session."

template_label = widgets.Label(value="Template Variables")
template_container = widgets.VBox(template_input_fields)

# Display widgets
display(modelfile_name_input)
display(data_name_input)  # Display the new input field
display(model_dropdown)
display(model_type_dropdown)

for checkbox, input_field in zip(checkboxes, input_fields):
    display(widgets.HBox([checkbox, input_field]))

# Display Template Variables Input Fields
display(template_label)
display(template_container)

display(widgets.HBox([generate_button, deploy_button]))


Text(value='', description='Modelfile Name:', layout=Layout(width='300px'), placeholder='Enter Modelfile name'…

Text(value='', description='Name :', layout=Layout(width='300px'), placeholder='Enter name for data')

Dropdown(description='Model:', options=('mistral', 'llama2', 'codellama', 'vicuna', 'orca-mini', 'llama2-uncen…

Dropdown(description='Model Type:', options=('latest', 'text', 'instruct', '7b', '7b-instruct', '7b-text', '7b…

HBox(children=(Checkbox(value=False, description='mirostat'), Text(value='', placeholder='Enable mirostat (def…

HBox(children=(Checkbox(value=False, description='mirostat_eta'), Text(value='', placeholder='Mirostat eta (de…

HBox(children=(Checkbox(value=False, description='mirostat_tau'), Text(value='', placeholder='Mirostat tau (de…

HBox(children=(Checkbox(value=False, description='num_ctx'), Text(value='', placeholder='Number of tokens of c…

HBox(children=(Checkbox(value=False, description='num_gqa'), Text(value='', placeholder='Number of tokens to g…

HBox(children=(Checkbox(value=False, description='num_gpu'), Text(value='', placeholder='Number of GPUs to use…

HBox(children=(Checkbox(value=False, description='num_thread'), Text(value='', placeholder='Number of threads …

HBox(children=(Checkbox(value=False, description='repeat_last_n'), Text(value='', placeholder='Repeat last n t…

HBox(children=(Checkbox(value=False, description='repeat_penalty'), Text(value='', placeholder='Repeat penalty…

HBox(children=(Checkbox(value=False, description='temperature'), Text(value='', placeholder='Sampling temperat…

HBox(children=(Checkbox(value=False, description='seed'), Text(value='', placeholder='Random seed')))

HBox(children=(Checkbox(value=False, description='stop'), Text(value='', placeholder='Stop sequence for genera…

HBox(children=(Checkbox(value=False, description='tfs_z'), Text(value='', placeholder='Enable TFS z (default: …

HBox(children=(Checkbox(value=False, description='num_predict'), Text(value='', placeholder='Number of tokens …

HBox(children=(Checkbox(value=False, description='top_k'), Text(value='', placeholder='Top-k sampling (default…

HBox(children=(Checkbox(value=False, description='top_p'), Text(value='', placeholder='Top-p sampling (default…

Label(value='Template Variables')

VBox(children=(Text(value='', description='System:', placeholder='The system prompt used to specify custom beh…

HBox(children=(Button(description='Generate Modelfile', style=ButtonStyle()), Button(description='Generate and…

# Useful commands

### Hard kill everything


Kill all instances of the background procceses

In [None]:
!pkill -f "python3 /content/nat-tunneling/natsrv.py"
!pkill -f "ollama serve"
monitor.stop()

### Example local model create and generate promt for quick swithing of models

In [None]:
%%shell
curl -X POST http://127.0.0.1:11434/api/create -d '{
  "name": "modelname",
  "path": "/content/Modelfile"
}'

In [None]:
%%shell
curl -X POST http://127.0.0.1:11434/api/generate -d '{
  "model": "ne",
  "prompt":"Why is the sky blue?"
}'

### Experience and behavior

Through testing, I've noticed that loading certain models onto the GPU can be challenging and may occasionally lead to crashes. A practical workaround involves initially creating a small, dummy model. This strategy allows for the quick unloading of any problematic models, followed by another attempt with a larger one. It's important to note that if a model loads successfully after a crash, it will operate using only the CPU. At this juncture, you should load the small model and then retry loading the larger one.

A critical point to remember: avoid exceeding 13GB of VRAM usage. Surpassing this limit tends to overheat the system, leading to crashes.

These issues often stem from insufficient RAM or storage capacity required to preload the model before transferring it to the GPU.

For enhanced performance at no extra cost, consider using Kaggle, which offers up to 24GB VRAM and additional RAM. For different setups and more information, check out the Kaggle version on my Github.



# TODO

* Add dynamic viewing of logging
* More functions for ollama API
