# Convert Fine-tuned PLLuM Model to Ollama (Google Colab Version)

This notebook guides you through the process of converting your fine-tuned PLLuM model (with LoRA adapters) to a format compatible with Ollama for local deployment. It's specifically adapted for running on Google Colab.

The process consists of four main steps:
1. Merge LoRA adapters with base model
2. Convert merged model to GGUF format
3. Create Ollama Modelfile
4. Download for local Ollama deployment


## Check Colab Environment

First, let's make sure we're running on Colab and verify the GPU.

In [None]:
# Check if we're running on Colab
import sys
IN_COLAB = 'google.colab' in sys.modules
print(f"Running on Google Colab: {IN_COLAB}")

if not IN_COLAB:
    print("Warning: This notebook is designed specifically for Google Colab. Some features may not work elsewhere.")

In [None]:
# Check GPU availability and info
!nvidia-smi

## Mount Google Drive

We'll use Google Drive to store the model files and retrieve your fine-tuned model.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

## Setup

Let's set up our environment and install the required dependencies.

In [None]:
# Install required packages
!pip install -q transformers peft torch huggingface_hub bitsandbytes safetensors tensorboard accelerate sentencepiece

In [None]:
# Additional dependencies for GGUF conversion
!pip install -q llama-cpp-python

In [None]:
import os
import sys
import torch
import logging
from pathlib import Path
import shutil
import tempfile

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Check CUDA availability
logger.info(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    logger.info(f"CUDA device: {torch.cuda.get_device_name(0)}")
    logger.info(f"CUDA version: {torch.version.cuda}")

## Configuration

Set the paths and configuration for the conversion process.

In [None]:
# Create a working directory in Colab
WORK_DIR = "/content/pllum-to-ollama"
os.makedirs(WORK_DIR, exist_ok=True)

# Path to your fine-tuned model with LoRA adapters
# Replace this with the actual path to your fine-tuned model in Google Drive
DRIVE_LORA_MODEL_PATH = "/content/drive/MyDrive/models/pllum-function-calling-20250330_071532"

# Local copy of the model in the Colab environment
LORA_MODEL_PATH = f"{WORK_DIR}/pllum-function-calling-lora"

# Path for the merged model
MERGED_MODEL_PATH = f"{WORK_DIR}/pllum-function-calling-merged"

# Path for the GGUF model
GGUF_MODEL_PATH = f"{WORK_DIR}/pllum-function-calling.gguf"

# Path for the quantized GGUF model (optional)
QUANT_TYPE = "Q4_K_M"  # Options: Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q4_K_M, etc.

# Ollama model name
OLLAMA_MODEL_NAME = "pllum-fc"

# Path for Ollama Modelfile and related files
OLLAMA_DIR = f"{WORK_DIR}/ollama"

# Create directories if they don't exist
os.makedirs(LORA_MODEL_PATH, exist_ok=True)
os.makedirs(MERGED_MODEL_PATH, exist_ok=True)
os.makedirs(os.path.dirname(GGUF_MODEL_PATH), exist_ok=True)
os.makedirs(OLLAMA_DIR, exist_ok=True)

## Copy Fine-tuned Model

First, let's copy the fine-tuned model from Google Drive to the Colab environment.

In [None]:
# Function to copy a directory with progress indication
def copy_directory_with_progress(src_dir, dst_dir):
    """Copy a directory with progress indication."""
    # Get list of files
    files = []
    for root, _, filenames in os.walk(src_dir):
        for filename in filenames:
            files.append(os.path.join(root, filename))

    total_files = len(files)
    logger.info(f"Copying {total_files} files from {src_dir} to {dst_dir}")

    # Copy files with progress
    for i, src_file in enumerate(files, 1):
        rel_path = os.path.relpath(src_file, src_dir)
        dst_file = os.path.join(dst_dir, rel_path)

        # Create directory if it doesn't exist
        os.makedirs(os.path.dirname(dst_file), exist_ok=True)

        # Copy the file
        shutil.copy2(src_file, dst_file)

        # Log progress every 10% or for every 10 files
        if i % max(1, total_files // 10) == 0 or i % 10 == 0:
            logger.info(f"Copied {i}/{total_files} files ({i/total_files*100:.1f}%)")

    logger.info(f"Finished copying {total_files} files")

In [None]:
# Copy the fine-tuned model from Google Drive
if os.path.exists(DRIVE_LORA_MODEL_PATH):
    logger.info(f"Copying fine-tuned model from {DRIVE_LORA_MODEL_PATH} to {LORA_MODEL_PATH}")
    copy_directory_with_progress(DRIVE_LORA_MODEL_PATH, LORA_MODEL_PATH)
else:
    logger.error(f"Fine-tuned model not found at {DRIVE_LORA_MODEL_PATH}")
    logger.info("Please make sure the path is correct and the model is in your Google Drive.")

## Step 1: Merge LoRA Adapters with Base Model

In this step, we'll merge your fine-tuned LoRA adapters with the base PLLuM model to create a complete model.

In [None]:
# Define merge function
def merge_lora_model(input_dir, output_dir):
    """Merge LoRA adapters with base model."""
    from peft import AutoPeftModelForCausalLM
    from transformers import AutoTokenizer
    import torch

    logger.info(f"Loading fine-tuned model from {input_dir}")

    # Load the fine-tuned model with LoRA adapters
    model = AutoPeftModelForCausalLM.from_pretrained(
        input_dir,
        device_map="auto",
        torch_dtype=torch.float16
    )

    logger.info("Model loaded. Merging adapters with base model...")

    # Merge adapters with the base model
    merged_model = model.merge_and_unload()

    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)

    logger.info(f"Saving merged model to {output_dir}")

    # Save the merged model
    merged_model.save_pretrained(
        output_dir,
        safe_serialization=True,
        max_shard_size="2GB"  # Colab can handle larger shards
    )

    # Save the tokenizer
    logger.info("Saving tokenizer")
    tokenizer = AutoTokenizer.from_pretrained(input_dir)
    tokenizer.save_pretrained(output_dir)

    logger.info("Model and tokenizer saved successfully!")
    return output_dir

In [None]:
# Execute merge
try:
    merged_path = merge_lora_model(LORA_MODEL_PATH, MERGED_MODEL_PATH)
    logger.info(f"Merged model saved to {merged_path}")
except Exception as e:
    logger.error(f"Error merging model: {str(e)}")

## Step 2: Convert Merged Model to GGUF Format

Now we'll convert the merged model to GGUF format, which is required by Ollama. This step requires llama.cpp.

In [None]:
# Clone llama.cpp repository
llama_cpp_dir = "./llama.cpp"
if not os.path.exists(llama_cpp_dir):
    !git clone https://github.com/ggerganov/llama.cpp {llama_cpp_dir}
else:
    !cd {llama_cpp_dir} && git pull

In [None]:
# Install build dependencies
!apt-get update && apt-get install -y build-essential cmake

In [None]:
# Build llama.cpp
try:
    !cd {llama_cpp_dir} && mkdir -p build && cd build && cmake .. && make -j
    logger.info("llama.cpp build completed successfully")
except Exception as e:
    logger.error(f"Error building llama.cpp: {str(e)}")

In [None]:
print(GGUF_MODEL_PATH)

In [None]:
# Convert to GGUF
try:
    # First convert to F16 GGUF
    !cd {llama_cpp_dir} && python3 convert_hf_to_gguf.py {MERGED_MODEL_PATH} --outfile {GGUF_MODEL_PATH}
    logger.info(f"Model converted to GGUF: {GGUF_MODEL_PATH}")

    # Then quantize if specified
    if QUANT_TYPE:
        quant_output = GGUF_MODEL_PATH.replace(".gguf", f"-{QUANT_TYPE.lower()}.gguf")
        !cd {llama_cpp_dir} && ./build/bin/llama-quantize {GGUF_MODEL_PATH} {quant_output} {QUANT_TYPE.lower()}
        logger.info(f"Model quantized to {QUANT_TYPE}: {quant_output}")
        GGUF_MODEL_PATH = quant_output
except Exception as e:
    logger.error(f"Error converting model to GGUF: {str(e)}")

## Step 3: Create Ollama Modelfile

Now we'll create an Ollama Modelfile, which defines how the model should be used.

In [None]:
def create_modelfile(model_filename, output_dir, model_name):
    """Create an Ollama Modelfile."""
    # Create output directory if it doesn't exist
    os.makedirs(output_dir, exist_ok=True)

    # Create Modelfile path
    modelfile_path = os.path.join(output_dir, "Modelfile")

    system_prompt = "Jesteś modelem językowym PLLuM, wyspecjalizowanym w przetwarzaniu języka polskiego oraz innych języków słowiańskich i bałtyckich. Twoje umiejętności obejmują generowanie spójnych tekstów, odpowiadanie na pytania, podsumowywanie treści oraz wspieranie aplikacji specjalistycznych, takich jak inteligentni asystenci. Zostałeś wytrenowany na wysokiej jakości korpusach tekstowych i dostosowany do precyzyjnego dopasowania odpowiedzi, uwzględniając specyfikę polskiego języka i kultury. Jeśli nie posiadasz pełnych informacji lub pytanie jest niejasne, zawsze poproś użytkownika o doprecyzowanie."

    template = '''{{- if .Messages }}
{{- if or .System .Tools }}<|im_start|>system
{{- if .System }}
{{ .System }}
{{- end }}
{{- if .Tools }}
# Narzędzia
Możesz wywołać jedną lub więcej funkcji, aby pomóc z zapytaniem użytkownika.
Dostępne narzędzia:
{{- range .Tools }}
{"type": "function", "function": {{ .Function }}}
{{- end }}

Gdy chcesz wywołać funkcję, odpowiedz używając formatu JSON:
[{"name": "nazwa_funkcji", "arguments": {"parametr1": "wartość1", "parametr2": "wartość2"}}]
{{- end }}<|im_end|>
{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "user" }}<|im_start|>user
{{ .Content }}<|im_end|>
{{ else if eq .Role "assistant" }}<|im_start|>assistant
{{ .Content }}{{ if not $last }}<|im_end|>
{{ end }}
{{- else if eq .Role "tool" }}<|im_start|>user
Wynik funkcji:
{{ .Content }}<|im_end|>
{{ end }}
{{- if and (ne .Role "assistant") $last }}<|im_start|>assistant
{{ end }}
{{- end }}
{{- else }}
{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}'''

    # Create Modelfile content
    modelfile_content = f'''FROM {model_filename}
PARAMETER temperature 0.1
PARAMETER top_p 0.9
PARAMETER stop "<|im_end|>"

# System message
SYSTEM "{system_prompt}"

# Template for chat format
TEMPLATE "{template}"
'''

    # Write Modelfile
    with open(modelfile_path, "w", encoding="utf-8") as f:
        f.write(modelfile_content)

    logger.info(f"Modelfile created at {modelfile_path}")
    return modelfile_path

In [None]:
# Create Modelfile
try:
    # First, verify the exact name of the GGUF file that was created
    # List the files in the working directory to confirm
    print("Available GGUF files:")
    !ls -la {WORK_DIR}/*.gguf

    # Use the actual filename that exists
    gguf_files = [f for f in os.listdir(WORK_DIR) if f.endswith('.gguf')]

    if gguf_files:
        # Use the first (or only) GGUF file found
        gguf_filename = gguf_files[0]
        gguf_filepath = os.path.join(WORK_DIR, gguf_filename)
        print(f"Using GGUF file: {gguf_filepath}")

        # Copy the GGUF model to the Ollama directory
        ollama_model_path = os.path.join(OLLAMA_DIR, gguf_filename)
        shutil.copy(gguf_filepath, ollama_model_path)
        logger.info(f"Copied GGUF model to {ollama_model_path}")

        # Create Modelfile with the correct filename
        modelfile_path = create_modelfile(gguf_filename, OLLAMA_DIR, OLLAMA_MODEL_NAME)
        logger.info(f"Modelfile created at {modelfile_path}")
    else:
        logger.error(f"No GGUF files found in {WORK_DIR}")
except Exception as e:
    logger.error(f"Error creating Modelfile: {str(e)}")

## Step 4: Create Installation Script

Let's create a script to install the model in Ollama.

In [None]:
# Create install script
install_script_path = os.path.join(OLLAMA_DIR, "install_model.sh")
install_script_content = f"""#!/bin/bash
set -e

# Get the directory of this script
SCRIPT_DIR="$( cd "$( dirname "${{BASH_SOURCE[0]}}" )" && pwd )"

# Check if Ollama is installed
if ! command -v ollama &> /dev/null; then
    echo "Ollama is not installed. Please install it first:"
    echo "curl -fsSL https://ollama.com/install.sh | sh"
    exit 1
fi

# Create the model in Ollama
echo "Creating model {OLLAMA_MODEL_NAME} in Ollama..."
cd "$SCRIPT_DIR"
ollama create {OLLAMA_MODEL_NAME} -f ./Modelfile

echo "Model {OLLAMA_MODEL_NAME} has been created in Ollama!"
echo "You can now run it with: ollama run {OLLAMA_MODEL_NAME}"
"""

# Write install script and make it executable
with open(install_script_path, "w", encoding="utf-8") as f:
    f.write(install_script_content)

os.chmod(install_script_path, 0o755)
logger.info(f"Install script created at {install_script_path}")

## Step 5: Create Test Script

Let's create a Python script to test the model once it's installed in Ollama.

In [None]:
# Create test script with proper string escaping
test_script_path = os.path.join(OLLAMA_DIR, "test_model.py")
test_script_content = '''#!/usr/bin/env python3
# A script to test the Ollama model with function calling

import requests
import json
import argparse

def query_ollama(model_name, prompt, temperature=0.1, tools=None):
    # Query the Ollama model
    # Format the prompt with tools if provided
    formatted_prompt = prompt
    if tools:
        formatted_prompt = f"""
Poniżej znajduje się zapytanie i lista dostępnych narzędzi.
Proszę wywołać odpowiednie narzędzie, aby odpowiedzieć na zapytanie użytkownika.

Zapytanie: {prompt}

Dostępne narzędzia:
{json.dumps(tools, indent=2, ensure_ascii=False)}
"""

    # Call Ollama API
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": model_name,
            "prompt": formatted_prompt,
            "temperature": temperature,
            "top_p": 0.9
        }
    )

    if response.status_code != 200:
        print(f"Error: {response.status_code}")
        print(response.text)
        return None

    # Parse the response
    result = response.json()
    response_text = result.get("response", "")

    # Try to parse as JSON if it looks like JSON
    if response_text.strip().startswith("[") and response_text.strip().endswith("]"):
        try:
            return json.loads(response_text)
        except json.JSONDecodeError:
            pass

    return response_text

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Test Ollama model")
    parser.add_argument("--model", default="pllum-fc", help="Ollama model name")
    parser.add_argument("--prompt", required=True, help="Prompt to send to the model")
    parser.add_argument("--temperature", type=float, default=0.1, help="Temperature (0.0-1.0)")
    parser.add_argument("--function-call", action="store_true", help="Format as function calling request")

    args = parser.parse_args()

    if args.function_call:
        # Example weather tool
        tools = [
            {
                "name": "get_weather",
                "description": "Get the current weather for a location",
                "parameters": {
                    "location": {
                        "type": "string",
                        "description": "The city and state or country",
                        "required": True
                    },
                    "unit": {
                        "type": "string",
                        "description": "Unit of temperature: 'celsius' or 'fahrenheit'",
                        "required": False
                    }
                }
            }
        ]
        result = query_ollama(args.model, args.prompt, args.temperature, tools)
    else:
        result = query_ollama(args.model, args.prompt, args.temperature)

    if isinstance(result, (dict, list)):
        print(json.dumps(result, indent=2, ensure_ascii=False))
    else:
        print(result)
'''

# Write test script and make it executable
with open(test_script_path, "w", encoding="utf-8") as f:
    f.write(test_script_content)

os.chmod(test_script_path, 0o755)
logger.info(f"Test script created at {test_script_path}")

## Step 6: Create Function Calling Server

Let's create a Flask server that handles function calling with Ollama.

In [None]:
# Create function calling server script with proper escaping
server_script_path = os.path.join(OLLAMA_DIR, "function_calling_server.py")
server_script_content = '''#!/usr/bin/env python3
# A Flask server for handling function calling with Ollama

import os
import json
import logging
import requests
from flask import Flask, request, jsonify

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Initialize Flask app
app = Flask(__name__)

# Configuration
OLLAMA_API_HOST = os.environ.get("OLLAMA_API_HOST", "http://localhost:11434")
DEFAULT_MODEL = os.environ.get("OLLAMA_MODEL", "pllum-fc")

@app.route('/function_call', methods=['POST'])
def function_call():
    # Handle function calling requests
    data = request.json
    query = data.get('query')
    tools = data.get('tools', [])
    model = data.get('model', DEFAULT_MODEL)
    temperature = data.get('temperature', 0.1)

    # Determine if query is Polish or English (very simple detection)
    polish_chars = set('ąćęłńóśźż')
    is_polish = any(char.lower() in polish_chars for char in query)

    # Format the prompt
    if is_polish:
        prefix = "Poniżej znajduje się zapytanie i lista dostępnych narzędzi.\\nProszę wywołać odpowiednie narzędzie, aby odpowiedzieć na zapytanie użytkownika.\\n\\nZapytanie: "
    else:
        prefix = "Below is a query and a list of available tools.\\nPlease call the appropriate tool to respond to the user's query.\\n\\nQuery: "

    suffix = "\\n\\nAvailable tools:\\n"

    prompt = prefix + query + suffix + json.dumps(tools, indent=2, ensure_ascii=False)

    # Call Ollama API
    response = requests.post(
        f"{OLLAMA_API_HOST}/api/generate",
        json={
            "model": model,
            "prompt": prompt,
            "temperature": temperature
        }
    )

    result = response.json()
    response_text = result.get("response", "")

    # Try to parse the JSON response
    try:
        if response_text.strip().startswith("[") and response_text.strip().endswith("]"):
            function_call = json.loads(response_text)
            return jsonify(function_call)
        else:
            return jsonify({"raw_response": response_text})
    except json.JSONDecodeError:
        return jsonify({"raw_response": response_text})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
'''

# Write server script and make it executable
with open(server_script_path, "w", encoding="utf-8") as f:
    f.write(server_script_content)

os.chmod(server_script_path, 0o755)
logger.info(f"Function calling server script created at {server_script_path}")

## Step 7 (option): Copy to google drive if you need


In [None]:
# Copy the output files to Google Drive with the correct path
DRIVE_OUTPUT_DIR = "/content/drive/MyDrive/pllum-ollama-output"
!mkdir -p {DRIVE_OUTPUT_DIR}

# Copy all contents of the Ollama directory (scripts AND the GGUF file)
!cp -r {OLLAMA_DIR}/* {DRIVE_OUTPUT_DIR}/

print(f"Files saved to Google Drive at: {DRIVE_OUTPUT_DIR}")
print("You can now download them directly from your Google Drive.")

## Step 7 (option): Create a ZIP Archive for Download

Now let's create a ZIP archive of all the necessary files for deployment.

In [None]:
# Create a ZIP archive for download
import zipfile

# Create a zip file
zip_path = f"{WORK_DIR}/pllum-ollama-model.zip"
with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
    # Add all files from the Ollama directory
    for root, _, files in os.walk(OLLAMA_DIR):
        for file in files:
            file_path = os.path.join(root, file)
            # Get path relative to OLLAMA_DIR
            rel_path = os.path.relpath(file_path, OLLAMA_DIR)
            zipf.write(file_path, rel_path)

logger.info(f"Created ZIP archive at {zip_path}")

## Step 8: Download the ZIP File

Now you can download the ZIP file to your local machine.

In [None]:
from google.colab import files
files.download(zip_path)

## Summary

We've now completed all the steps needed to convert your fine-tuned PLLuM model to an Ollama-compatible format. Here's a summary of what's been accomplished:

1. **Merged LoRA adapters** with the base PLLuM model
2. **Converted the model to GGUF format** for use with Ollama
3. **Created an Ollama Modelfile** with the appropriate configuration
4. **Prepared installation and test scripts**
5. **Added a function calling server** for API access
6. **Created a ZIP archive** for easy download

All the necessary files have been packaged into a ZIP file which you can download and use on your local machine.

## Next Steps

After downloading the ZIP file to your local machine, follow these steps:

1. **Extract the ZIP file** to a directory on your machine
2. **Install Ollama** if you haven't already (visit [ollama.com](https://ollama.com) for instructions)
3. **Run the installation script**:
   ```bash
   cd path/to/extracted/zip
   ./install_model.sh
   ```
4. **Test the model**:
   ```bash
   # For regular chat
   ollama run pllum-fc
   
   # For function calling
   ./test_model.py --prompt "Jaka jest pogoda w Warszawie?" --function-call
   ```
5. **Run the function calling server** (optional):
   ```bash
   pip install flask requests
   python function_calling_server.py
   ```
   Then you can make requests to `http://localhost:5000/function_call`

Enjoy your local deployment of PLLuM with function calling capabilities!