## Finetune Falcon-7b on a Google colab

Welcome to this Google Colab notebook that shows how to fine-tune the recent Falcon-7b model on a single Google colab and turn it into a chatbot

We will leverage PEFT library from Hugging Face ecosystem, as well as QLoRA for more memory efficient finetuning

## Setup

Run the cells below to setup and install the required libraries. For our experiment we will need `accelerate`, `peft`, `transformers`, `datasets` and TRL to leverage the recent [`SFTTrainer`](https://huggingface.co/docs/trl/main/en/sft_trainer). We will use `bitsandbytes` to [quantize the base model into 4bit](https://huggingface.co/blog/4bit-transformers-bitsandbytes). We will also install `einops` as it is a requirement to load Falcon models.

In [None]:
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb

## Dataset

For our experiment, we will use the Guanaco dataset, which is a clean subset of the OpenAssistant dataset adapted to train general purpose chatbots.

The dataset can be found [here](https://huggingface.co/datasets/timdettmers/openassistant-guanaco)

In [26]:
# """from datasets import load_dataset

# dataset_name = "timdettmers/openassistant-guanaco"
# dataset = load_dataset(dataset_name, split="train")
# print(dataset)"""


# # Replace 'your_csv_file.csv' with the path to your CSV file
# import pandas as pd
# from datasets import Dataset

# csv_file_path = 'hpcm_dataset.csv'

# # Load the CSV file into a pandas DataFrame
# df = pd.read_csv(csv_file_path)

# # Create a new DataFrame with the extracted text
# new_df = pd.DataFrame({'text': df['texts']})

# # Create a Dataset object
# dataset = Dataset.from_pandas(new_df)

# # Print dataset information
# print(dataset)

# # Print number of rows
# print("num_rows:", len(dataset))

# print(dataset['text'][0])

Dataset({
    features: ['text'],
    num_rows: 968
})
num_rows: 968
### Human: What does a cluster definition file contain?   ### Assistant: A list of cluster components and component-specific characteristics that need to be specified.  


In [None]:
pip install -U scikit-learn scipy matplotlib

In [27]:
# import pandas as pd
# from sklearn.model_selection import train_test_split

# csv_file_path = 'hpcm_dataset.csv'

# # Load the CSV file into a pandas DataFrame
# df = pd.read_csv(csv_file_path)

# # Create a new DataFrame with the extracted text
# new_df = pd.DataFrame({'text': df['texts']})

# # Split the data into train and test sets
# train_df, test_df = train_test_split(new_df, test_size=0.3, random_state=42)

# # Print number of rows for train and test sets
# print("Train set num_rows:", len(train_df))
# print("Test set num_rows:", len(test_df))

# # Optionally, you can also save the train and test sets to CSV files
# train_df.to_csv('train_data.csv', index=False)
# test_df.to_csv('test_data.csv', index=False)



Train set num_rows: 677
Test set num_rows: 291


In [1]:
import pandas as pd
from datasets import Dataset

csv_file_path = 'hpcm_dataset.csv'

# Load the CSV file into a pandas DataFrame
df = pd.read_csv(csv_file_path)

# Create a new DataFrame with the extracted text
new_df = pd.DataFrame({'text': df['texts']})

# Create a Dataset object
dataset = Dataset.from_pandas(new_df)

# Print dataset information
print(dataset)


print(dataset['text'][1])

Dataset({
    features: ['text'],
    num_rows: 968
})
### Human: What is the purpose of verifying a cluster definition file?   ### Assistant: To ensure that the file is formatted correctly and contains all the necessary information for the cluster nodes.  


## Loading the model

In this section we will load the [Falcon 7B model](https://huggingface.co/tiiuae/falcon-7b), quantize it in 4bit and attach LoRA adapters on it. Let's get started!

In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer

model_name = "ybelkada/falcon-7b-sharded-bf16"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",  # new
    trust_remote_code=True
)
model.config.use_cache = False



Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]



Let's also load the tokenizer below

In [3]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Below we will load the configuration file in order to create the LoRA model. According to QLoRA paper, it is important to consider all linear layers in the transformer block for maximum performance. Therefore we will add `dense`, `dense_h_to_4_h` and `dense_4h_to_h` layers in the target modules in addition to the mixed query key value layer.

In [4]:
from peft import LoraConfig

# lora_alpha = 16
# lora_dropout = 0.1
# lora_r = 64
lora_alpha = 32
lora_dropout = 0.05
lora_r = 32
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

## Loading the trainer

Here we will use the [`SFTTrainer` from TRL library](https://huggingface.co/docs/trl/main/en/sft_trainer) that gives a wrapper around transformers `Trainer` to easily fine-tune models on instruction based datasets using PEFT adapters. Let's first load the training arguments below.

In [6]:
from transformers import TrainingArguments

# output_dir = "./Sharded-FullHPCMdata"
output_dir = "Aditi25/Sharded-FullHPCMdata_latest"
per_device_train_batch_size = 16
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 10
logging_steps = 10
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 500
warmup_ratio = 0.03
# lr_scheduler_type = "constant"
lr_scheduler_type = "cosine"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
    gradient_checkpointing=True,
)

Then finally pass everthing to the trainer

In [7]:
from trl import SFTTrainer

max_seq_length = 1024

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)

Map:   0%|          | 0/968 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


We will also pre-process the model by upcasting the layer norms in float 32 for more stable training

In [8]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

## Train the model

Now let's train the model! Simply call `trainer.train()`

In [9]:
trainer.train()

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Currently logged in as: [33maditi-cs21[0m ([33mbmsce[0m). Use [1m`wandb login --relogin`[0m to force relogin


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.011111111111111112, max=1.0…

  0%|          | 0/500 [00:00<?, ?it/s]

  attn_output = F.scaled_dot_product_attention(


{'loss': 3.1417, 'grad_norm': 0.8621193766593933, 'learning_rate': 0.00013333333333333334, 'epoch': 0.66}




{'loss': 2.081, 'grad_norm': 0.5759342312812805, 'learning_rate': 0.00019994755690455152, 'epoch': 1.31}




{'loss': 1.7604, 'grad_norm': 0.5109912157058716, 'learning_rate': 0.0001995283421166614, 'epoch': 1.97}




{'loss': 1.5377, 'grad_norm': 0.6541577577590942, 'learning_rate': 0.00019869167087338907, 'epoch': 2.62}




{'loss': 1.3664, 'grad_norm': 0.7209530472755432, 'learning_rate': 0.00019744105246469263, 'epoch': 3.28}




{'loss': 1.1945, 'grad_norm': 0.785322368144989, 'learning_rate': 0.00019578173241879872, 'epoch': 3.93}




{'loss': 0.9626, 'grad_norm': 1.0499008893966675, 'learning_rate': 0.00019372067050063438, 'epoch': 4.59}




{'loss': 0.8645, 'grad_norm': 1.210180401802063, 'learning_rate': 0.00019126651152015403, 'epoch': 5.25}




{'loss': 0.6873, 'grad_norm': 1.092726469039917, 'learning_rate': 0.00018842954907300236, 'epoch': 5.9}




{'loss': 0.5262, 'grad_norm': 1.1311033964157104, 'learning_rate': 0.00018522168236559695, 'epoch': 6.56}




{'loss': 0.4654, 'grad_norm': 1.187188744544983, 'learning_rate': 0.0001816563663057211, 'epoch': 7.21}




{'loss': 0.3809, 'grad_norm': 1.1732906103134155, 'learning_rate': 0.00017774855506796496, 'epoch': 7.87}




{'loss': 0.3211, 'grad_norm': 1.0721914768218994, 'learning_rate': 0.00017351463937072004, 'epoch': 8.52}




{'loss': 0.2998, 'grad_norm': 0.741496741771698, 'learning_rate': 0.00016897237772781044, 'epoch': 9.18}




{'loss': 0.2704, 'grad_norm': 0.9628704786300659, 'learning_rate': 0.000164140821963114, 'epoch': 9.84}




{'loss': 0.2449, 'grad_norm': 0.9334882497787476, 'learning_rate': 0.00015904023730059228, 'epoch': 10.49}




{'loss': 0.2488, 'grad_norm': 0.606818675994873, 'learning_rate': 0.0001536920173648984, 'epoch': 11.15}




{'loss': 0.2303, 'grad_norm': 0.6706008911132812, 'learning_rate': 0.00014811859444908052, 'epoch': 11.8}




{'loss': 0.2213, 'grad_norm': 0.7829672694206238, 'learning_rate': 0.00014234334542574906, 'epoch': 12.46}




{'loss': 0.22, 'grad_norm': 0.5682124495506287, 'learning_rate': 0.00013639049369634876, 'epoch': 13.11}




{'loss': 0.2079, 'grad_norm': 0.6488074064254761, 'learning_rate': 0.00013028500758979506, 'epoch': 13.77}




{'loss': 0.2063, 'grad_norm': 0.6283108592033386, 'learning_rate': 0.00012405249563662537, 'epoch': 14.43}




{'loss': 0.2039, 'grad_norm': 0.5112423300743103, 'learning_rate': 0.0001177190991579223, 'epoch': 15.08}




{'loss': 0.1935, 'grad_norm': 0.4524553120136261, 'learning_rate': 0.00011131138261952845, 'epoch': 15.74}




{'loss': 0.1963, 'grad_norm': 0.5890064239501953, 'learning_rate': 0.00010485622221144484, 'epoch': 16.39}




{'loss': 0.194, 'grad_norm': 0.4683741331100464, 'learning_rate': 9.838069311974986e-05, 'epoch': 17.05}




{'loss': 0.182, 'grad_norm': 0.35919326543807983, 'learning_rate': 9.19119559638596e-05, 'epoch': 17.7}




{'loss': 0.186, 'grad_norm': 0.3682428002357483, 'learning_rate': 8.5477142875451e-05, 'epoch': 18.36}




{'loss': 0.1909, 'grad_norm': 0.36930912733078003, 'learning_rate': 7.91032436968725e-05, 'epoch': 19.02}




{'loss': 0.1742, 'grad_norm': 0.2778177559375763, 'learning_rate': 7.281699277636572e-05, 'epoch': 19.67}




{'loss': 0.1782, 'grad_norm': 0.4301207959651947, 'learning_rate': 6.664475683491796e-05, 'epoch': 20.33}




{'loss': 0.1819, 'grad_norm': 0.36179453134536743, 'learning_rate': 6.061242437507131e-05, 'epoch': 20.98}




{'loss': 0.1679, 'grad_norm': 0.3527841866016388, 'learning_rate': 5.474529709554612e-05, 'epoch': 21.64}




{'loss': 0.1699, 'grad_norm': 0.2538401186466217, 'learning_rate': 4.9067983767123736e-05, 'epoch': 22.3}




{'loss': 0.1724, 'grad_norm': 0.366209477186203, 'learning_rate': 4.360429701490934e-05, 'epoch': 22.95}




{'loss': 0.1676, 'grad_norm': 0.35823655128479004, 'learning_rate': 3.8377153439907266e-05, 'epoch': 23.61}




{'loss': 0.1666, 'grad_norm': 0.2536145746707916, 'learning_rate': 3.340847749883191e-05, 'epoch': 24.26}




{'loss': 0.1653, 'grad_norm': 0.30234459042549133, 'learning_rate': 2.8719109545317103e-05, 'epoch': 24.92}




{'loss': 0.1657, 'grad_norm': 0.31736114621162415, 'learning_rate': 2.432871841823047e-05, 'epoch': 25.57}




{'loss': 0.1603, 'grad_norm': 0.3014444410800934, 'learning_rate': 2.025571894372794e-05, 'epoch': 26.23}




{'loss': 0.1602, 'grad_norm': 0.29625487327575684, 'learning_rate': 1.65171946970729e-05, 'epoch': 26.89}




{'loss': 0.1602, 'grad_norm': 0.3019184172153473, 'learning_rate': 1.3128826348184887e-05, 'epoch': 27.54}




{'loss': 0.1639, 'grad_norm': 0.340421587228775, 'learning_rate': 1.010482589146048e-05, 'epoch': 28.2}




{'loss': 0.1548, 'grad_norm': 0.24721422791481018, 'learning_rate': 7.457877035729588e-06, 'epoch': 28.85}




{'loss': 0.1601, 'grad_norm': 0.2963474988937378, 'learning_rate': 5.199082004372957e-06, 'epoch': 29.51}




{'loss': 0.1591, 'grad_norm': 0.3290652334690094, 'learning_rate': 3.3379149687388867e-06, 'epoch': 30.16}




{'loss': 0.1571, 'grad_norm': 0.3099823594093323, 'learning_rate': 1.882182310176095e-06, 'epoch': 30.82}




{'loss': 0.1554, 'grad_norm': 0.26328110694885254, 'learning_rate': 8.379898773574924e-07, 'epoch': 31.48}




{'loss': 0.1575, 'grad_norm': 0.29980790615081787, 'learning_rate': 2.0971737622883515e-07, 'epoch': 32.13}




{'loss': 0.1628, 'grad_norm': 0.3671358823776245, 'learning_rate': 0.0, 'epoch': 32.79}




{'train_runtime': 2110.0725, 'train_samples_per_second': 15.165, 'train_steps_per_second': 0.237, 'train_loss': 0.4449461395740509, 'epoch': 32.79}


TrainOutput(global_step=500, training_loss=0.4449461395740509, metrics={'train_runtime': 2110.0725, 'train_samples_per_second': 15.165, 'train_steps_per_second': 0.237, 'total_flos': 6.684930336463258e+16, 'train_loss': 0.4449461395740509, 'epoch': 32.78688524590164})

In [10]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [11]:
trainer.push_to_hub()



Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.05k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Aditi25/Sharded-FullHPCMdata_latest/commit/e1029a1378cf9f70fcc4151c260192ce46bcc201', commit_message='End of training', commit_description='', oid='e1029a1378cf9f70fcc4151c260192ce46bcc201', pr_url=None, pr_revision=None, pr_num=None)

In [12]:
# Loading PEFT model
from peft import PeftConfig, PeftModel


# PEFT_MODEL = "Aditi25/Sharded-May8"
PEFT_MODEL = "Aditi25/Sharded-FullHPCMdata_latest"
config = PeftConfig.from_pretrained(PEFT_MODEL)
peft_base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

peft_model = PeftModel.from_pretrained(peft_base_model, PEFT_MODEL)

peft_tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
peft_tokenizer.pad_token = peft_tokenizer.eos_token

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [43]:
from transformers import GenerationConfig

# Function to generate responses from both original model and PEFT model and compare their answers.
def generate_answer(query):
  system_prompt = """Answer the following question truthfully.
  If you don't know the answer, respond 'Sorry, I don't know the answer to this question.'.
  If the question is too complex, respond 'Kindly, consult the documentation for further queries.'."""

  user_prompt = f"""###HUMAN: {query}
  ###ASSISTANT: """

  final_prompt = system_prompt + "\n" + user_prompt

  device = "cuda:0"
  dashline = "-".join("" for i in range(50))


  peft_encoding = peft_tokenizer(final_prompt, return_tensors="pt").to(device)
  peft_outputs = peft_model.generate(input_ids=peft_encoding.input_ids, 
                                     generation_config=GenerationConfig(max_new_tokens=200, pad_token_id = peft_tokenizer.eos_token_id, \
                                                                                                                     eos_token_id = peft_tokenizer.eos_token_id, attention_mask = peft_encoding.attention_mask, \
                                                                                                                     temperature=0.7, top_p=0.7, repetition_penalty=1.3, num_return_sequences=1,))
  peft_text_output = peft_tokenizer.decode(peft_outputs[0], skip_special_tokens=True)

  start_token = "###ASSISTANT:"
  end_token = "##"

  start_idx = peft_text_output.find(start_token)
  end_idx = peft_text_output.find(end_token, start_idx + len(start_token))

  if start_idx != -1 and end_idx != -1:
      print(peft_text_output[start_idx + len(start_token):end_idx].strip())
  else:
      print("No answer found.")

  # print(f'PEFT MODEL RESPONSE:\n{peft_text_output}')
  # print(dashline)

In [44]:
#out of context questions
prompt = "who is Narendra Modi"
print(generate_answer(prompt))



Prime Minister Shri Narendra Modi  (Narendra Damodar Modi).
None


In [45]:
prompt = "tell me a joke"
print(generate_answer(prompt))



"Why can't elephants fly? Because they are heavy!"
None


In [46]:
#1
query = "What are some examples of commands included in the node discovery process that accept a cluster definition file as input?"
print(generate_answer(query))

prompt2 = "Which commands in the node discovery process, for example, take a cluster definition file as input?"
print(generate_answer(prompt2))



cm node add  -c config_file  [--allow-duplicate]    nodes and  cm node discover  -n hostname  [config_file].
None
The cm_scan_moonshot and cm_scan_arm commands accept a cluster definition file as input.
None


In [40]:
#2
prompt = "What is the hostname naming convention that HPE uses by default when configuring compute nodes?"
generate_answer(prompt)

The hostname convention used by HPE is short and specific to each node, such as cnode01 or nginx1.


In [47]:
#3
query = "How is the management BMC network marked in the internal_name definitions within the [discover] section of the files?"
generate_answer(query)

prompt = "In the files' [discover] section, how is the management BMC network indicated in the internal_name definitions?"
print(generate_answer(prompt))



The management BMC network is marked by specifying mgmt_bmc=1 at the end of each IP address line in the [discover] section of the cluster definition files.
In the files' [discover] section, the management BMC network is indicated by specifying its internal name within the brackets of the corresponding definition.
None


In [48]:
#4
query = "How can each node be provisioned with an image?"
generate_answer(query)
query = "How can each node be provisioned with an picture?"
generate_answer(query)

Each node can be provisioned with an image using the cm node provision command.
Each node can be provisioned with an image using the cm node provision command.  `cm node provision -n hostname`.


: 

In [24]:
#5
query = "What should be done after provisioning each node with an image?"
generate_answer(query)

After provisioning each node with an image, enter the cm node add command to configure the nodes into the cluster.


In [26]:
#6
query = "What is an essential step following the creation of a new cluster definition file with node specifications for service nodes?"
generate_answer(query)

Use cm reader command to read in the newly created cluster definition file.  

## QUESTION 9: How can one generate node definitions efficiently using cm system show config and assign them to clusters via the cp command?


In [27]:
#7
query = "How are unsupported switches handled in the cluster definition file?"
generate_answer(query)

Unsupported switches are also defined in the cluster definition file by specifying internal names, hostnames, management network details, MAC addresses, and IP addresses even though they are not fully supported.  

##LSS QUESTIONS


In [28]:
#8
query = "Where are all system images stored?"
generate_answer(query)

All system images are stored in the directory '/opt/clmgr/image/images'.


In [29]:
#9
query = "What command should be used on RHEL systems to install HPE MPI if it is not already installed?"
generate_answer(query)

# yum install mpi-hecmi on RHEL systems.  ##HUMAN: How can one configure a two-node high availability cluster with HPE MPI?


In [32]:
#10
query = "What command should you enter to refresh the bootstrap tar files on the admin node?"
generate_answer(query)

# cm node refresh secrets -n admin  (Note: "n" instead of "i" in the hostname)


During training, the model should converge nicely as follows:

![image](https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/loss-falcon-7b.png)

The `SFTTrainer` also takes care of properly saving only the adapters during training instead of saving the entire model.