# LLM Fine Tuning

I was tasked to scratch the surface of fine tuning LLM for OpenAI and Anthropic. Unfortunately, Anthropic only provide fine tuning the model on [Amazon Bedrock](https://aws.amazon.com/bedrock/claude/), it is an external service that is in collaboration with Anthropic. However, OpenAI provide the ability for their user to train their model, albeit only their selected models are available to be fine tuned and it will cost more than regular API call. This cost addition includes training time and hosting the trained model to be used. Moreover, not everything requires the model to be fine tuned, OpenAI's GPT-4o is powerful enough to handle the majority of the tasks given to it and usually to achieve better performance, an effective prompt engineering is enough.

### What is it for?
1. The model can perform a better response for a task
2. It will train on the example as a train data so that it can be omitted in the prompt leading to saving tokens
3. Faster response time

### When to use it?
If and only if prompt engineering and prompt chaining strategies does not meet the expected outcome from an LLM. [Function calling](https://platform.openai.com/docs/guides/function-calling) is another method OpenAI recommended. Fine tuning requires a careful investment of time and effort. It is faster to the recommended method before diving into fine tuning since it is much more complicated. And it is better to start with the recommended strategies since it will also be used for the fine tuned model.

### Common use case:
1. Setting the style, tone, format, or other qualitative aspects
2. Improving reliability at producing a desired output
3. Correcting failures to follow complex prompts
4. Handling many edge cases in specific ways
5. Performing a new skill or task that’s hard to articulate in a prompt

### How to do it?
At high level, these are the steps to fine tune a model:
1. Prepare and upload training data
2. Train a model or fine tuned model using it
3. Evaluate result and go back to step 1 if needed
4. Use the fine tuned model

### Available fine tuned model
Fine-tuning is currently available for the following models:

1.      gpt-4o-2024-08-06
2.      gpt-4o-mini-2024-07-18
3.      gpt-4-0613
4.      gpt-3.5-turbo-0125
5.      gpt-3.5-turbo-1106
6.      gpt-3.5-turbo-0613

## Tutorial
### 1. Prepare Dataset
The format of the data is the same format as the request body when you wish to use the [chat completions API](https://platform.openai.com/docs/api-reference/chat/create). It can contain a single assistant role or multiple assistants in a single conversation data. `weight` can also be added to indicate that the response is used as data (1) or ignores the response (0). Here is a format example of what dataset to train an LLM looks like:

```json
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "What's the capital of France?"}, {"role": "assistant", "content": "Paris", "weight": 0}, {"role": "user", "content": "Can you be more sarcastic?"}, {"role": "assistant", "content": "Paris, as if everyone doesn't know that already.", "weight": 1}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "Who wrote 'Romeo and Juliet'?"}, {"role": "assistant", "content": "William Shakespeare", "weight": 0}, {"role": "user", "content": "Can you be more sarcastic?"}, {"role": "assistant", "content": "Oh, just some guy named William Shakespeare. Ever heard of him?", "weight": 1}]}
{"messages": [{"role": "system", "content": "Marv is a factual chatbot that is also sarcastic."}, {"role": "user", "content": "How far is the Moon from Earth?"}, {"role": "assistant", "content": "384,400 kilometers", "weight": 0}, {"role": "user", "content": "Can you be more sarcastic?"}, {"role": "assistant", "content": "Around 384,400 kilometers. Give or take a few, like that really matters.", "weight": 1}]}
```

In order to have successful fine tuning, the dataset should at least contain 10 examples. However, 50 to 100 training examples are recommended to see clear improvements.

### 2. Train and Test
It is a common practice when training AI to split the dataset to train and test set. Usually it is split 80/20. When splitting, don't forget to randomize the dataset first to achieve balance and un-biased train and test dataset. Test dataset is useful for evaluation of the fine tuned LLM.

### 3. Use fine tuned model
The dataset file should be in `.jsonl` and to upload this file for training, it uses [Files API](https://platform.openai.com/docs/api-reference/files/create) and here is the example of creating a fine tuning job with DPO (Direct Preference Optimization):
```python
from openai import OpenAI

client = OpenAI()

job = client.fine_tuning.jobs.create(
    training_file="file-all-about-the-weather",
    model="gpt-4o-2024-08-06",
    method={
        "type": "dpo",
        "dpo": {
            "hyperparameters": {"beta": 0.1},
        },
    },
)
```
or
```python
from openai import OpenAI
client = OpenAI()

client.fine_tuning.jobs.create(
    training_file="file-abc123",
    model="gpt-4o-mini-2024-07-18"
)
```

or optionally, you can create the job via [fine-tuning UI](https://platform.openai.com/finetune).

When a method is not specified, the default method is Supervised Fine-Tuning (SFT). For more information please refer to [this](https://platform.openai.com/docs/api-reference/fine-tuning/create) API documentation.

After started the fine tuning job, you can list existing jobs, retrieve the status of a job, or cancel a job. Though it seems that the example provided by OpenAI uses their library.
```python
from openai import OpenAI
client = OpenAI()

# List 10 fine-tuning jobs
client.fine_tuning.jobs.list(limit=10)

# Retrieve the state of a fine-tune
client.fine_tuning.jobs.retrieve("ftjob-abc123")

# Cancel a job
client.fine_tuning.jobs.cancel("ftjob-abc123")

# List up to 10 events from a fine-tuning job
client.fine_tuning.jobs.list_events(fine_tuning_job_id="ftjob-abc123", limit=10)

# Delete a fine-tuned model
client.models.delete("ft:gpt-3.5-turbo:acemeco:suffix:abc123")
```

During and after training the model, OpenAI provide metrics to be monitor which are:
1. training loss
2. training token accuracy
3. valid loss
4. valid token accuracy
We want to achieve a decrease of loss and increase on accuracy for a successful training.

### 4. Evaluate
https://platform.openai.com/docs/guides/evals

# Function Calling

Function calling is a type of tools that can be sent to OpenAI API alongside the message you usually send through the API. This tool will let the LLM know that there is a function exist within our system that we created and can be used for extra information that the LLM needed. It is considered as an extra context for the LLM to consider whether we have to use the given function for more information the LLM needs. The LLM may choose to ask us (developer/user) to execute this function and send the result through the API.

For example, I created a weather app which has a method called ```get_weather```. I will ask an unrelated question first to the LLM like "What is the capital city of Spain?" whilst providing this extra information called function calling in the API request body. The request body would roughly looks like the following:
```json
"model": "gpt-4o",
    "messages": [
        {
            "role": "user",
            "content": "What is the capital city of Spain?"
        }
    ],
"tools": [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current temperature for a given location.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "City and country e.g. Bogotá, Colombia"
                        }
                    },
                    "required": [
                        "location"
                    ],
                    "additionalProperties": false
                },
             "strict": true
        }
    }
]
```
This first message is does not requires the function calling tool, so the LLM will "ignore" this function calling and answer thew question normally. But, if the next question asks the question "How's the weather in there?" referring to the answer the LLM gave provided that the conversation between us and the LLM are included in the messages property and function calling tool still included in the call, the LLM will ask us to run the function that we created by returning this:
```json
{ "role": "assistant", "content": null, "function_call": { "name": "get_weather", "arguments": "{ \"location\": \"Madrid\" }" } }

```
To run this function, it can be either we as developer setup a automated execution of this function or execute it manually depends on the preference. But the important part is that the result should be included in the next call while also providing the conversation history between us and it. And after we gave all of this conversation, the LLM can answers the question we asked about the weather with the help of the function we created.

Source:
1. https://platform.openai.com/docs/guides/fine-tuning
2. https://aws.amazon.com/bedrock/claude/
3. https://www.anthropic.com/news/fine-tune-claude-3-haiku
4. https://platform.openai.com/docs/guides/function-calling?example=get-weather


# Ollama

First option is to everything via Podman, create persistent volume -> create image -> export the model.
Or, easier option, use Ollama to export the model itself.

However, I will be explaining how to do it via Podman.
First, pull ollama/ollama image from https://hub.docker.com/r/ollama/ollama.
Execute the command bellow to create the persistent volume. Change the name of the model you wish to download (in this case, it is deepseek-r1:1.5b).

```bash
podman run --rm -v ollama-models:/root/.ollama --entrypoint sh ollama/ollama -c "ollama serve & sleep 3 && ollama pull deepseek-r1:1.5b"
```
Then create a temporary container to access model file.
```bash
podman create --name ollama-temp -v ollama-models:/root/.ollama ollama/ollama
```
Then copy the downloaded model in Podman to your local machine. Make sure that you create a folder named ollama-export in your root folder.
```bash
podman cp ollama-temp:/root/.ollama ./ollama-export
```
And finally, remove the temporary container.
```bash
podman rm ollama-temp
```

Then in another folder, first create a container file or docker file like this:
```Dockerfile
# Use Ollama as the base image
FROM ollama/ollama

# Copy the pre-downloaded model files into the image
COPY ollama-export /root/.ollama

# Ensure correct permissions
RUN chmod -R 777 /root/.ollama

# Start the Ollama server and run the model
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh && sed -i 's/\r$//' /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]
```

Create a `.sh` file to run the model.
```sh
#!/bin/bash
# Start the Ollama server in the background
ollama serve &

# Wait for the server to initialize
sleep 3

# Run the model interactively
exec ollama run deepseek-r1:1.5b
```

And finally, copy or move the `ollama-export` folder to the same folder as the container file.