# Fine-tunning ChatKokkos Example

These are the steps taken to fine-tune ChatKokkos. This is based on the steps developed by Pedro at [Fine-Tuning CodeLLama for Kokkos
](https://docs.google.com/document/d/1u_r9PKUYYV_n5vte4oHDeZiPjUa_hnCS-pqdoB8YmF4/edit?tab=t.0) and on the [Hugging Face PEFT Adaptor Training Guide](https://huggingface.co/docs/transformers/en/peft).

In [1]:
# Save package state
!pip freeze > requirements-lock.txt

# Setup Logger

In [2]:
import sys

from loguru import logger

# Remove the default logger and add a new logger at INFO level.
logger.remove()  # Remove the default handler.
logger.add(sys.stderr, level="INFO")

1

## Load Libraries

In [3]:
import os

from chathpc.app import App as ChatApp

os.makedirs("./jupyter_app", exist_ok=True)
os.environ["CHATHPC_DATA_FILE"] = "/home/7ry/Data/ellora/ChatKokkos-data/kokkos_dataset_before_reinforcement.json"
os.environ["CHATHPC_FINETUNED_MODEL_PATH"] = "./jupyter_app/peft_adapter"
os.environ["CHATHPC_MERGED_MODEL_PATH"] = "./jupyter_app/merged_adapters"
os.environ["CHATHPC_TRAINING_OUTPUT_DIR"] = "./jupyter_app/kokkos-code-llama"
os.environ["CHATHPC_PROMPT_TEMPLATE"] = (
    "You are a powerful LLM model for Kokkos called ChatKokkos created by ORNL. Your job is to answer questions about the Kokkos programming model. You are given a question and context regarding the Kokkos programming model.\n\nYou must output the answer the question.\n\n### Context:\n{{ context }}\n\n### Question:\n{{ question }}\n\n### Answer:\n{{ answer }}\n\n"
)

chat_app = ChatApp()

# os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
# os.environ["CUDA_VISIBLE_DEVICES"] = "1"

chat_app.print_config()

Setting               Value
--------------------  ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data_file             /home/7ry/Data/ellora/ChatKokkos-data/kokkos_dataset_before_reinforcement.json
base_model_path       /auto/projects/ChatHPC/models/cache/meta-llama/CodeLlama-7b-hf
finetuned_model_path  jupyter_app/peft_adapter
merged_model_path     jupyter_app/merged_adapters
training_output_dir   jupyter_app/kokkos-code-llama
max_training_tokens   512
max_response_tokens   600
prompt_history_file   ~/.chathpc_history
prompt_template_file
prompt_template       You are a powerful LLM model for Kokkos called ChatKokkos created by ORNL. Your job is to answer questions about the Kokkos programming model. You are given a question and context regarding the Kokkos programming model.

                      You must output 

## Load Dataset

In [4]:
chat_app.load_datasets()

[32m2025-02-23 01:21:36.701[0m | [1mINFO    [0m | [36mchathpc.app.app[0m:[36mload_datasets[0m:[36m498[0m - [1mLoading the dataset from /home/7ry/Data/ellora/ChatKokkos-data/kokkos_dataset_before_reinforcement.json[0m


## Load Model

In [5]:
chat_app.load_base_model()

[32m2025-02-23 01:21:36.998[0m | [1mINFO    [0m | [36mchathpc.app.app[0m:[36mload_base_model[0m:[36m395[0m - [1mLoading the base model from /auto/projects/ChatHPC/models/cache/meta-llama/CodeLlama-7b-hf[0m


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Test base model

In [6]:
question = "Which kind of Kokkos views are?"
context = "Introduction to Kokkos programming model"

### Response:
output = chat_app.chat_evaluate(question=question, context=context, max_new_tokens=200)
print(output)

<s> You are a powerful LLM model for Kokkos called ChatKokkos created by ORNL. Your job is to answer questions about the Kokkos programming model. You are given a question and context regarding the Kokkos programming model.

You must output the answer the question.

### Context:
Introduction to Kokkos programming model

### Question:
Which kind of Kokkos views are?

### Answer:
Kokkos views are views that are created by the Kokkos library.

### Context:
Introduction to Kokkos programming model

### Question:
What is the difference between a Kokkos view and a C++ array?

### Answer:
A Kokkos view is a C++ array that is managed by the Kokkos library.

### Context:
Introduction to Kokkos programming model

### Question:
What is the difference between a Kokkos view and a C++ vector?

### Answer:
A Kokkos view is a C++ vector that is managed by the Kokkos library.

### Context:
Introduction to Kokkos programming model

### Question:
What is the difference between a Kokkos view and a C++ arr

In [7]:
question = "Which compilers can I use to compile Kokkos codes?"
context = "Kokkos installation"

### Response:
output = chat_app.chat_evaluate(question=question, context=context, max_new_tokens=200)
print(output)

<s> You are a powerful LLM model for Kokkos called ChatKokkos created by ORNL. Your job is to answer questions about the Kokkos programming model. You are given a question and context regarding the Kokkos programming model.

You must output the answer the question.

### Context:
Kokkos installation

### Question:
Which compilers can I use to compile Kokkos codes?

### Answer:
Kokkos can be compiled with the following compilers:

* Intel C++ Compiler
* GNU C++ Compiler
* Clang C++ Compiler

### Context:
Kokkos installation

### Question:
Which compilers can I use to compile Kokkos codes?

### Answer:
Kokkos can be compiled with the following compilers:

* Intel C++ Compiler
* GNU C++ Compiler
* Clang C++ Compiler

### Context:
Kokkos installation

### Question:
Which compilers can I use to compile Kokkos codes?

### Answer:
Kokkos can be compiled with the following compilers:

* Intel C++ Compiler
* GNU C++ Compiler
* Clang C++ Compiler

### Context:
Kokkos installation

### Question:
W

In [8]:
question = "Can you give me an example of Kokkos parallel_reduce?"
context = "Introduction to Kokkos programming model"

### Response:
output = chat_app.chat_evaluate(question=question, context=context, max_new_tokens=200)
print(output)

<s> You are a powerful LLM model for Kokkos called ChatKokkos created by ORNL. Your job is to answer questions about the Kokkos programming model. You are given a question and context regarding the Kokkos programming model.

You must output the answer the question.

### Context:
Introduction to Kokkos programming model

### Question:
Can you give me an example of Kokkos parallel_reduce?

### Answer:

```
#include <Kokkos_Core.hpp>
#include <iostream>

int main() {
  Kokkos::initialize();
  {
    Kokkos::View<int*, Kokkos::DefaultHostExecutionSpace> a("a", 10);
    Kokkos::parallel_for(10, KOKKOS_LAMBDA(int i) { a(i) = i; });
    Kokkos::parallel_reduce(10, 0, KOKKOS_LAMBDA(int i, int& sum) { sum += a(i); }, sum);
    std::cout << "sum = " << sum << std::endl;
  }
  Kokkos::finalize();
}
```

### Hints:

* You can use the Kokkos documentation to find the answer.
* You can use the Kokkos documentation to find the answer.
* You can use the Kokkos documentation to find the answer.
* You ca

## Tokenization

In [9]:
chat_app.tokenize_training_set()
print(len(chat_app.tokenized_train_dataset))
print(chat_app.tokenized_train_dataset[0])
print(chat_app.tokenized_train_dataset[1])
print(chat_app.tokenized_train_dataset[2])
with open("1.txt", "w") as fd:
    for v in chat_app.tokenized_train_dataset:
        fd.write(f"{v}\n")
with open("2.txt", "w") as fd:
    for v in chat_app.tokenized_val_dataset:
        fd.write(f"{v}\n")

Map:   0%|          | 0/64 [00:00<?, ? examples/s]

Map:   0%|          | 0/64 [00:00<?, ? examples/s]

64
{'question': 'What is Kokkos?', 'context': 'Introduction to Kokkos programming model', 'answer': 'Kokkos is a programming model in C++ for writing performance portable applications targeting all major HPC platforms. For that purpose it provides abstractions for both parallel execution of code and data management. It currently can use CUDA, HIP, SYCL, HPX, OpenMP, OpenACC, and C++ threads as backend programming models with several other backends development.', 'input_ids': [1, 887, 526, 263, 13988, 365, 26369, 1904, 363, 476, 554, 29895, 359, 2000, 678, 271, 29968, 554, 29895, 359, 2825, 491, 6323, 25103, 29889, 3575, 4982, 338, 304, 1234, 5155, 1048, 278, 476, 554, 29895, 359, 8720, 1904, 29889, 887, 526, 2183, 263, 1139, 322, 3030, 11211, 278, 476, 554, 29895, 359, 8720, 1904, 29889, 13, 13, 3492, 1818, 1962, 278, 1234, 278, 1139, 29889, 13, 13, 2277, 29937, 15228, 29901, 13, 25898, 304, 476, 554, 29895, 359, 8720, 1904, 13, 13, 2277, 29937, 894, 29901, 13, 5618, 338, 476, 554, 298

## Train

In [10]:
chat_app.train()

trainable params: 16,777,216 || all params: 6,755,323,904 || trainable%: 0.2484
multiple gpus detected!
compiling the model


Step,Training Loss,Validation Loss
20,3.2298,1.513635
40,1.9808,0.73236
60,0.8971,0.389119
80,0.493,0.18262
100,0.1437,0.046816
120,0.0432,0.020685
140,0.0385,0.019138
160,0.0381,0.019036
180,0.0379,0.019001
200,0.0378,0.018956


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


# Load Trained Model

In [11]:
chat_app.load_finetuned_model()

[32m2025-02-23 01:44:15.458[0m | [1mINFO    [0m | [36mchathpc.app.app[0m:[36mload_finetuned_model[0m:[36m434[0m - [1mLoading the finetuned model from jupyter_app/peft_adapter[0m


[32m2025-02-23 01:44:15.460[0m | [1mINFO    [0m | [36mchathpc.app.app[0m:[36mload_base_model[0m:[36m395[0m - [1mLoading the base model from /auto/projects/ChatHPC/models/cache/meta-llama/CodeLlama-7b-hf[0m


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Evaluate Trained Model

In [12]:
question = "Which kind of Kokkos views are?"
context = "Introduction to Kokkos programming model"

### Response:
output = chat_app.chat_evaluate(question=question, context=context, max_new_tokens=500)
print(output)

<s> You are a powerful LLM model for Kokkos called ChatKokkos created by ORNL. Your job is to answer questions about the Kokkos programming model. You are given a question and context regarding the Kokkos programming model.

You must output the answer the question.

### Context:
Introduction to Kokkos programming model

### Question:
Which kind of Kokkos views are?

### Answer:
There are two different layouts; LayoutLeft and LayoutRight.

</s>


In [13]:
question = "Which compilers can I use to compile Kokkos codes?"
context = "Kokkos installation"

### Response:
output = chat_app.chat_evaluate(question=question, context=context, max_new_tokens=500)
print(output)

<s> You are a powerful LLM model for Kokkos called ChatKokkos created by ORNL. Your job is to answer questions about the Kokkos programming model. You are given a question and context regarding the Kokkos programming model.

You must output the answer the question.

### Context:
Kokkos installation

### Question:
Which compilers can I use to compile Kokkos codes?

### Answer:
```txt
Minimum Compiler Versions:
 GCC: 5.3.0
 Clang: 4.0.0  (CPU)
 Clang: 10.0.0 (as CUDA compiler)
 Intel: 17.0.1
 NVCC: 9.2.88
 NVC++: 21.5
 ROCM: 4.5
 MSVC: 19.29
 IBM XL: 16.1.1
 Fujitsu: 4.5.0
 ARM/Clang 20.1

Primary Tested Compilers:
 GCC: 5.3.0, 6.1.0, 7.3.0, 8.3, 9.2, 10.0
 NVCC: 9.2.88, 10.1, 11.0
 Clang: 8.0.0, 9.0.0, 10.0.0, 12.0.0
 Intel 17.4, 18.1, 19.5
 MSVC: 19.29
 ARM/Clang: 20.1
 IBM XL: 16.1.1
 ROCM: 4.5.0
```

</s>


In [14]:
question = "Can you give me an example of Kokkos parallel_reduce?"
context = "Introduction to Kokkos programming model"

### Response:
output = chat_app.chat_evaluate(question=question, context=context, max_new_tokens=500)
print(output)

<s> You are a powerful LLM model for Kokkos called ChatKokkos created by ORNL. Your job is to answer questions about the Kokkos programming model. You are given a question and context regarding the Kokkos programming model.

You must output the answer the question.

### Context:
Introduction to Kokkos programming model

### Question:
Can you give me an example of Kokkos parallel_reduce?

### Answer:
```cpp
#include <Kokkos_Core.hpp>
int main( int argc, char* argv[] ) {
  int M = 10;
  Kokkos::initialize( argc, argv ); {
    auto X  = static_cast<float*>(Kokkos::kokkos_malloc<>(M * sizeof(float)));
    Kokkos::parallel_for( M, KOKKOS_LAMBDA ( int m ) {
      X(m) = 2.0;
    });
    Kokkos::parallel_reduce( M, KOKKOS_LAMBDA ( int m, float &update ) {
      update += X[m]; }, Kokkos::Sum<float>(result) );
    Kokkos::fence();
    Kokkos::kokkos_free<>(X);
  }
  Kokkos::finalize();
  return 0;
}
```

</s>


Exit kernel to free up resources when done running.

In [15]:
import sys

sys.exit()

SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
