# Finetune Your Chatbot on a Single Node Xeon SPR 

NeuralChat is a customizable chat framework designed to create user own chatbot within few minutes on multiple architectures. This notebook will introduce how to finetune your chatbot on the customized data on a single node Xeon SPR.

## Prepare Environment

Install intel extension for transformers:

In [None]:
!pip install intel-extension-for-transformers

Install Requirements:

In [None]:
!git clone https://github.com/intel/intel-extension-for-transformers.git

In [None]:
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/
!pip install -r requirements.txt
%cd ../../../

In [2]:
!git clone https://github.com/tloen/alpaca-lora.git

Cloning into 'alpaca-lora'...
remote: Enumerating objects: 607, done.[K
remote: Total 607 (delta 0), reused 0 (delta 0), pack-reused 607[K
Receiving objects: 100% (607/607), 27.84 MiB | 20.48 MiB/s, done.
Resolving deltas: 100% (358/358), done.


In [1]:
import json
import random

# File paths
input_file = './alpaca-lora/alpaca_data_cleaned_archive.json'
output_file = './alpaca-lora/alpaca_data_subset.json'

# Number of examples you want to keep in the subset
subset_size = 2000  # Adjust this number based on your requirement

# Read the original dataset
with open(input_file, 'r') as f:
    data = json.load(f)

# Shuffle the data and select a subset
random.shuffle(data)
data_subset = data[:subset_size]

# Save the subset to a new file
with open(output_file, 'w') as f:
    json.dump(data_subset, f, indent=4)

print(f"Subset of size {subset_size} saved to {output_file}")


Subset of size 2000 saved to ./alpaca-lora/alpaca_data_subset.json


## Prepare the Dataset
We select 3 kind of datasets to conduct the finetuning process for different tasks.

1. Text Generation (General domain instruction): We use the [Alpaca dataset](https://github.com/tatsu-lab/stanford_alpaca) from Stanford University as the general domain dataset to fine-tune the model. This dataset is provided in the form of a JSON file, [alpaca_data.json](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). In Alpaca, researchers have manually crafted 175 seed tasks to guide `text-davinci-003` in generating 52K instruction data for diverse tasks.

2. Summarization: An English-language dataset [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail, is used for this task.

3. Code Generation: To enhance code performance of LLMs (Large Language Models), we use the [theblackcat102/evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1).



## Finetune Your Chatbot

We employ the [LoRA approach](https://arxiv.org/pdf/2106.09685.pdf) to finetune the LLM efficiently.

Finetune the model on Alpaca-format dataset to conduct text generation:

In [2]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
model_args = ModelArguments(model_name_or_path="meta-llama/Llama-2-7b-chat-hf")
data_args = DataArguments(train_file="./alpaca-lora/alpaca_data_subset.json", validation_split_percentage=1)
training_args = TrainingArguments(
    output_dir='./tmp',
    do_train=True,
    do_eval=True,
    num_train_epochs=1,
    overwrite_output_dir=True,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=2,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True,
)
finetune_args = FinetuningArguments()
finetune_cfg = TextGenerationFinetuningConfig(
            model_args=model_args,
            data_args=data_args,
            training_args=training_args,
            finetune_args=finetune_args,
        )
finetune_model(finetune_cfg)

2024-07-14 09:11:21.436437: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-14 09:11:22.072533: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-14 09:11:22.246130: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-14 09:11:22.246211: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-14 09:11:22.404468: I tensorflow/core/platform/cpu_feature_gua

Generating train split: 0 examples [00:00, ? examples/s]

Unable to verify splits sizes.
2024-07-14 09:11:32,257 - info_utils.py - datasets.utils.info_utils - INFO - Unable to verify splits sizes.
Dataset json downloaded and prepared to /home/uc651c1f4b4c7f15e851413c0d49c8fa/.cache/huggingface/datasets/json/default-94da19c74cf34be3/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a. Subsequent calls will reuse this data.
2024-07-14 09:11:32,266 - builder.py - datasets.builder - INFO - Dataset json downloaded and prepared to /home/uc651c1f4b4c7f15e851413c0d49c8fa/.cache/huggingface/datasets/json/default-94da19c74cf34be3/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a. Subsequent calls will reuse this data.
Using custom data configuration default-94da19c74cf34be3
2024-07-14 09:11:32,532 - builder.py - datasets.builder - INFO - Using custom data configuration default-94da19c74cf34be3
Loading Dataset Infos from /home/uc651c1f4b4c7f15e851413c0d49c8fa/Training/AI/GenAI/intel-extension-for-transformers/inte

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

[INFO|modeling_utils.py:4364] 2024-07-14 09:12:12,013 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4372] 2024-07-14 09:12:12,015 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-7b-chat-hf.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:955] 2024-07-14 09:12:12,113 >> loading configuration file generation_config.json from cache at /home/uc651c1f4b4c7f15e851413c0d49c8fa/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/generation_config.json
[INFO|configuration_utils.py:1000] 2024-07-14 09:12:12,115 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_length": 4096,
  "pad_token_id": 0,
  "temperature": 0.6,
  "top_p": 0.

Map:   0%|          | 0/1980 [00:00<?, ? examples/s]

Caching processed dataset at /home/uc651c1f4b4c7f15e851413c0d49c8fa/.cache/huggingface/datasets/json/default-94da19c74cf34be3/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a/cache-4542c0a09c9f1c86.arrow
2024-07-14 09:12:13,594 - arrow_dataset.py - datasets.arrow_dataset - INFO - Caching processed dataset at /home/uc651c1f4b4c7f15e851413c0d49c8fa/.cache/huggingface/datasets/json/default-94da19c74cf34be3/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a/cache-4542c0a09c9f1c86.arrow


Map:   0%|          | 0/20 [00:00<?, ? examples/s]

Caching processed dataset at /home/uc651c1f4b4c7f15e851413c0d49c8fa/.cache/huggingface/datasets/json/default-94da19c74cf34be3/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a/cache-20188ff6328d983f.arrow
2024-07-14 09:12:14,991 - arrow_dataset.py - datasets.arrow_dataset - INFO - Caching processed dataset at /home/uc651c1f4b4c7f15e851413c0d49c8fa/.cache/huggingface/datasets/json/default-94da19c74cf34be3/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a/cache-20188ff6328d983f.arrow
2024-07-14 09:12:15,002 - finetuning.py - intel_extension_for_transformers.transformers.llm.finetuning.finetuning - INFO - Using data collator of type DataCollatorForSeq2Seq


trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199


[INFO|trainer.py:642] 2024-07-14 09:12:15,356 >> Using cpu_amp half precision backend
[INFO|trainer.py:2128] 2024-07-14 09:12:15,644 >> ***** Running training *****
[INFO|trainer.py:2129] 2024-07-14 09:12:15,646 >>   Num examples = 1,980
[INFO|trainer.py:2130] 2024-07-14 09:12:15,647 >>   Num Epochs = 1
[INFO|trainer.py:2131] 2024-07-14 09:12:15,648 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:2134] 2024-07-14 09:12:15,649 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2135] 2024-07-14 09:12:15,651 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:2136] 2024-07-14 09:12:15,652 >>   Total optimization steps = 247
[INFO|trainer.py:2137] 2024-07-14 09:12:15,655 >>   Number of trainable parameters = 4,194,304


Step,Training Loss


[INFO|trainer.py:2383] 2024-07-14 09:30:12,965 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:3478] 2024-07-14 09:30:12,970 >> Saving model checkpoint to ./tmp
[INFO|tokenization_utils_base.py:2574] 2024-07-14 09:30:13,004 >> tokenizer config file saved in ./tmp/tokenizer_config.json
[INFO|tokenization_utils_base.py:2583] 2024-07-14 09:30:13,006 >> Special tokens file saved in ./tmp/special_tokens_map.json
2024-07-14 09:30:13,020 - finetuning.py - intel_extension_for_transformers.transformers.llm.finetuning.finetuning - INFO - *** Evaluate After Training***
[INFO|trainer.py:3788] 2024-07-14 09:30:13,028 >> 
***** Running Evaluation *****
[INFO|trainer.py:3790] 2024-07-14 09:30:13,028 >>   Num examples = 20
[INFO|trainer.py:3793] 2024-07-14 09:30:13,029 >>   Batch size = 4


***** eval metrics *****
  epoch                   =      0.998
  eval_loss               =     1.2172
  eval_ppl                =     3.3778
  eval_runtime            = 0:00:03.45
  eval_samples            =         20
  eval_samples_per_second =      5.789
  eval_steps_per_second   =      1.447


Finetune the model on the summarization task:

In [None]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
model_args = ModelArguments(model_name_or_path="meta-llama/Llama-2-7b-chat-hf")
data_args = DataArguments(dataset_name="cnn_dailymail", dataset_config_name="3.0.0")
training_args = TrainingArguments(
    output_dir='./tmp',
    do_train=True,
    do_eval=True,
    num_train_epochs=3,
    overwrite_output_dir=True,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=2,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True
)
finetune_args = FinetuningArguments(task='summarization')
finetune_cfg = TextGenerationFinetuningConfig(
            model_args=model_args,
            data_args=data_args,
            training_args=training_args,
            finetune_args=finetune_args,
        )
finetune_model(finetune_cfg)

Finetune the model on the code generation task:

In [None]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
model_args = ModelArguments(model_name_or_path="meta-llama/Llama-2-7b-chat-hf")
data_args = DataArguments(dataset_name="theblackcat102/evol-codealpaca-v1")
training_args = TrainingArguments(
    output_dir='./tmp',
    do_train=True,
    do_eval=True,
    num_train_epochs=3,
    overwrite_output_dir=True,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=2,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True
)
finetune_args = FinetuningArguments(task='code-generation')
finetune_cfg = TextGenerationFinetuningConfig(
            model_args=model_args,
            data_args=data_args,
            training_args=training_args,
            finetune_args=finetune_args,
        )
finetune_model(finetune_cfg)