# Finetune Your Chatbot on a Single Node Xeon SPR

NeuralChat is a customizable chat framework designed to create user own chatbot within few minutes on multiple architectures. This notebook will introduce how to finetune your chatbot on the customized data on a single node Xeon SPR.

## Prepare Environment

Install intel extension for transformers:

In [None]:
!pip install intel-extension-for-transformers

Collecting intel-extension-for-transformers
  Downloading intel_extension_for_transformers-1.4.2-cp310-cp310-manylinux_2_28_x86_64.whl (45.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.3/45.3 MB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
Collecting schema (from intel-extension-for-transformers)
  Downloading schema-0.7.7-py2.py3-none-any.whl (18 kB)
Collecting neural-compressor (from intel-extension-for-transformers)
  Downloading neural_compressor-2.6-py3-none-any.whl (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m60.3 MB/s[0m eta [36m0:00:00[0m
Collecting deprecated>=1.2.13 (from neural-compressor->intel-extension-for-transformers)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Installing collected packages: schema, deprecated, neural-compressor, intel-extension-for-transformers
Successfully installed deprecated-1.2.14 intel-extension-for-transformers-1.4.2 neural-compressor-2.6 schema-0.7.7


Install Requirements:

In [None]:
!git clone https://github.com/intel/intel-extension-for-transformers.git

Cloning into 'intel-extension-for-transformers'...
remote: Enumerating objects: 1681988, done.[K
remote: Counting objects: 100% (116638/116638), done.[K
remote: Compressing objects: 100% (12340/12340), done.[K
remote: Total 1681988 (delta 63056), reused 114702 (delta 61456), pack-reused 1565350[K
Receiving objects: 100% (1681988/1681988), 594.70 MiB | 26.83 MiB/s, done.
Resolving deltas: 100% (898963/898963), done.
Updating files: 100% (3217/3217), done.


In [None]:
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/
!pip install -r requirements.txt
%cd ../../../

/content/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat
Collecting accelerate (from -r requirements.txt (line 1))
  Downloading accelerate-0.32.1-py3-none-any.whl (314 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m314.1/314.1 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting cchardet (from -r requirements.txt (line 2))
  Downloading cchardet-2.1.7.tar.gz (653 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m653.6/653.6 kB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting einops (from -r requirements.txt (line 3))
  Downloading einops-0.8.0-py3-none-any.whl (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting evaluate (from -r requirements.txt (line 4))
  Downloading evaluate-0.4.2-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

/content


## Prepare the Dataset

Text Generation (General domain instruction): We use the [Alpaca dataset](https://github.com/tatsu-lab/stanford_alpaca) from Stanford University as the general domain dataset to fine-tune the model. This dataset is provided in the form of a JSON file, [alpaca_data.json](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). In Alpaca, researchers have manually crafted 175 seed tasks to guide `text-davinci-003` in generating 52K instruction data for diverse tasks.

## Finetune Your Chatbot

We employ the [LoRA approach](https://arxiv.org/pdf/2106.09685.pdf) to finetune the LLM efficiently.

Finetune the model on Alpaca-format dataset to conduct text generation:

In [None]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
model_args = ModelArguments(model_name_or_path="TinyLlama/TinyLlama_v1.1")
data_args = DataArguments(train_file="alpaca_data.json", validation_split_percentage=1)
training_args = TrainingArguments(
    output_dir='./tmp',
    do_train=True,
    do_eval=True,
    num_train_epochs=3,
    overwrite_output_dir=True,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=2,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True,
)
finetune_args = FinetuningArguments()
finetune_cfg = TextGenerationFinetuningConfig(
            model_args=model_args,
            data_args=data_args,
            training_args=training_args,
            finetune_args=finetune_args,
        )
finetune_model(finetune_cfg)

distributed training: True, 16-bits training: True
INFO:intel_extension_for_transformers.transformers.llm.finetuning.finetuning:Training/evaluation parameters TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
batch_eval_metrics=False,
bf16=True,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batche

config.json:   0%|          | 0.00/560 [00:00<?, ?B/s]

[INFO|configuration_utils.py:733] 2024-07-14 08:24:45,166 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--TinyLlama--TinyLlama_v1.1/snapshots/ff3c701f2424c7625fdefb9dd470f45ef18b02d6/config.json
[INFO|configuration_utils.py:796] 2024-07-14 08:24:45,177 >> Model config LlamaConfig {
  "_name_or_path": "TinyLlama/TinyLlama_v1.1",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 5632,
  "max_position_embeddings": 2048,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 22,
  "num_key_value_heads": 4,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.41.2",
  "use_cache": t

tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

[INFO|tokenization_utils_base.py:2108] 2024-07-14 08:24:46,603 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--TinyLlama--TinyLlama_v1.1/snapshots/ff3c701f2424c7625fdefb9dd470f45ef18b02d6/tokenizer.model
[INFO|tokenization_utils_base.py:2108] 2024-07-14 08:24:46,605 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2108] 2024-07-14 08:24:46,607 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--TinyLlama--TinyLlama_v1.1/snapshots/ff3c701f2424c7625fdefb9dd470f45ef18b02d6/special_tokens_map.json
[INFO|tokenization_utils_base.py:2108] 2024-07-14 08:24:46,608 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--TinyLlama--TinyLlama_v1.1/snapshots/ff3c701f2424c7625fdefb9dd470f45ef18b02d6/tokenizer_config.json
[INFO|tokenization_utils_base.py:2108] 2024-07-14 08:24:46,609 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/mo

Generating train split: 0 examples [00:00, ? examples/s]

Unable to verify splits sizes.
INFO:datasets.utils.info_utils:Unable to verify splits sizes.
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-0a7894dfd4d2d325/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a. Subsequent calls will reuse this data.
INFO:datasets.builder:Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-0a7894dfd4d2d325/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a. Subsequent calls will reuse this data.
Using custom data configuration default-0a7894dfd4d2d325
INFO:datasets.builder:Using custom data configuration default-0a7894dfd4d2d325
Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json
INFO:datasets.info:Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/json
Overwrite dataset info from restored data version if exists.
INFO:datasets.builder:Overwrite dataset inf

pytorch_model.bin:   0%|          | 0.00/4.40G [00:00<?, ?B/s]

[INFO|modeling_utils.py:3474] 2024-07-14 08:25:17,508 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--TinyLlama--TinyLlama_v1.1/snapshots/ff3c701f2424c7625fdefb9dd470f45ef18b02d6/pytorch_model.bin
[INFO|modeling_utils.py:1519] 2024-07-14 08:25:17,566 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:962] 2024-07-14 08:25:17,570 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2
}

[INFO|modeling_utils.py:4280] 2024-07-14 08:25:30,181 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4288] 2024-07-14 08:25:30,184 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at TinyLlama/TinyLlama_v1.1.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.


generation_config.json:   0%|          | 0.00/129 [00:00<?, ?B/s]

[INFO|configuration_utils.py:917] 2024-07-14 08:25:30,401 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--TinyLlama--TinyLlama_v1.1/snapshots/ff3c701f2424c7625fdefb9dd470f45ef18b02d6/generation_config.json
[INFO|configuration_utils.py:962] 2024-07-14 08:25:30,403 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "max_length": 2048,
  "pad_token_id": 0
}



Map:   0%|          | 0/51482 [00:00<?, ? examples/s]

Caching processed dataset at /root/.cache/huggingface/datasets/json/default-0a7894dfd4d2d325/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a/cache-54b5bb5eca928a20.arrow
INFO:datasets.arrow_dataset:Caching processed dataset at /root/.cache/huggingface/datasets/json/default-0a7894dfd4d2d325/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a/cache-54b5bb5eca928a20.arrow


Map:   0%|          | 0/520 [00:00<?, ? examples/s]

Caching processed dataset at /root/.cache/huggingface/datasets/json/default-0a7894dfd4d2d325/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a/cache-81bedc7f6123cf5c.arrow
INFO:datasets.arrow_dataset:Caching processed dataset at /root/.cache/huggingface/datasets/json/default-0a7894dfd4d2d325/0.0.0/7483f22a71512872c377524b97484f6d20c275799bb9e7cd8fb3198178d8220a/cache-81bedc7f6123cf5c.arrow
INFO:intel_extension_for_transformers.transformers.llm.finetuning.finetuning:Using data collator of type DataCollatorForSeq2Seq


trainable params: 1,126,400 || all params: 1,101,174,784 || trainable%: 0.10229075496156657


[INFO|trainer.py:642] 2024-07-14 09:09:36,984 >> Using auto half precision backend
[INFO|trainer.py:2128] 2024-07-14 09:09:37,456 >> ***** Running training *****
[INFO|trainer.py:2129] 2024-07-14 09:09:37,457 >>   Num examples = 51,482
[INFO|trainer.py:2130] 2024-07-14 09:09:37,457 >>   Num Epochs = 3
[INFO|trainer.py:2131] 2024-07-14 09:09:37,458 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:2134] 2024-07-14 09:09:37,459 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:2135] 2024-07-14 09:09:37,460 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:2136] 2024-07-14 09:09:37,460 >>   Total optimization steps = 19,305
[INFO|trainer.py:2137] 2024-07-14 09:09:37,464 >>   Number of trainable parameters = 1,126,400


Step,Training Loss
500,1.3621
1000,1.3112
1500,1.2914
2000,1.3059
2500,1.2974
3000,1.3077
3500,1.3032
4000,1.289
4500,1.288
5000,1.2903


[INFO|trainer.py:2383] 2024-07-14 17:13:00,328 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:3478] 2024-07-14 17:13:00,333 >> Saving model checkpoint to ./tmp
[INFO|tokenization_utils_base.py:2574] 2024-07-14 17:13:00,376 >> tokenizer config file saved in ./tmp/tokenizer_config.json
[INFO|tokenization_utils_base.py:2583] 2024-07-14 17:13:00,377 >> Special tokens file saved in ./tmp/special_tokens_map.json
[INFO|trainer.py:3788] 2024-07-14 17:13:00,383 >> 
***** Running Evaluation *****
[INFO|trainer.py:3790] 2024-07-14 17:13:00,384 >>   Num examples = 520
[INFO|trainer.py:3793] 2024-07-14 17:13:00,384 >>   Batch size = 4


***** eval metrics *****
  epoch                   =     2.9998
  eval_loss               =     1.2866
  eval_ppl                =     3.6205
  eval_runtime            = 0:00:43.64
  eval_samples            =        520
  eval_samples_per_second =     11.914
  eval_steps_per_second   =      2.978
