In [None]:
# efficient large language model

### Components
* Efficient Parameter Training (LLM + Federated Learning): Implement a federated learning trial in a production setting (PS).
* Enhanced Incentive Mechanism Using Shapley Value (FL + Shapley): Incorporate Shapley value to enhance the incentive mechanism in the learning process.

### Objective
* Performance Evaluation Post-Shapley Value Application: Assess whether the integration of Shapley values leads to enhanced performance in the learning model.

### Steps and Challenges
* Fine-Tuning Large Language Models (LLM) in an Academic Context:
** Data Acquisition: Resolved issue regarding data sourcing.
** Adaptation for Federated Learning: Transition to parameter-efficient fine-tuning, avoiding full model weight updates.
* Federated Learning for LLM Fine-Tuning:
** Diverse Data Source Distributions: Address challenges due to heterogeneous data sources.
** LLM Deployment on End Devices: Explore the feasibility of hosting LLM models on end-user devices.
* Dataset Optimization Using Shapley Values:
** Data Source Evaluation: Remove data with low Shapley values. However, limited data sources and data quantity pose a challenge.
** Determining Shapley Value Metrics: Devise methods to calculate Shapley values from model weights and establish performance benchmarks (using a test dataset derived from two sources).
** Localized Model Training: Train multiple models on local devices, adapting the training set parameters.
** Benchmarking for Shapley Value: Establish criteria for benchmarking in Shapley value calculations.
* Implementation Process:
** Central Server Interaction: Central server to initiate requests to client servers (data owners).
** Client Server Model Training: Client servers receive models from the central server for training with local data.
** Weight Transmission and Evaluation: Data seekers acquire model weights from data owners (simulated via a database of substantial size), followed by assessment of individual contributions and calculation of their Shapley values.

In [1]:
!nvidia-smi

Sun Dec  3 23:26:23 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    24W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [1]:
!pip install -Uqqq pip --progress-bar off
!pip install -qqq bitsandbytes==0.39.0
!pip install -qqq torch==2.0.1
!pip install -qqq -U git+https://github.com/huggingface/transformers.git@e03a9cc
!pip install -qqq -U git+https://github.com/huggingface/peft.git@42a184f
!pip install -qqq -U git+https://github.com/huggingface/accelerate.git@c9fbb71
!pip install -qqq datasets==2.12.0
!pip install -qqq loralib==0.1.1
!pip install -qqq einops==0.6.1

[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[0m

In [2]:
# Use the falcon-7b model
import json
import os
from pprint import pprint

import bitsandbytes as bnb
import pandas as pd
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset
from huggingface_hub import notebook_login
from peft import (
    LoraConfig,
    PeftConfig,
    PeftModel,
    get_peft_model,
    prepare_model_for_kbit_training,
)

from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
)

os.environ["CUDA_VISIBLE_DEVICES"] = "0"



Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118_nocublaslt.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
  warn(msg)


In [3]:
!gdown 1gYd9Em7_4DW7fY6rWiYv5tZ3EUY1ETzm # download the dataset from google drive

Downloading...
From: https://drive.google.com/uc?id=1gYd9Em7_4DW7fY6rWiYv5tZ3EUY1ETzm
To: /content/Ecommerce_FAQ_Chatbot_dataset.json
  0% 0.00/21.0k [00:00<?, ?B/s]100% 21.0k/21.0k [00:00<00:00, 47.4MB/s]


In [4]:


with open("Ecommerce_FAQ_Chatbot_dataset.json", "r") as f:
    e_dataset = json.load(f)




  0%|          | 0/1 [00:00<?, ?it/s]

In [5]:
NUM_DATA_POINTS_ECCOMERCE = len(e_dataset["questions"])
print(NUM_DATA_POINTS_ECCOMERCE)

79


In [6]:
# inspect the data

pprint(e_dataset["questions"][0], sort_dicts=False)

{'question': 'How can I create an account?',
 'answer': "To create an account, click on the 'Sign Up' button on the top "
           'right corner of our website and follow the instructions to '
           'complete the registration process.'}


In [7]:
with open("dataset.json", "w") as f:
    json.dump(e_dataset["questions"], f)

In [8]:
pd.DataFrame(e_dataset["questions"]).head()

Unnamed: 0,question,answer
0,How can I create an account?,"To create an account, click on the 'Sign Up' b..."
1,What payment methods do you accept?,"We accept major credit cards, debit cards, and..."
2,How can I track my order?,You can track your order by logging into your ...
3,What is your return policy?,Our return policy allows you to return product...
4,Can I cancel my order?,You can cancel your order if it has not been s...


In [9]:
MODEL_NAME = "tiiuae/falcon-7b" # Load the 7b falcon model

# Configuration for QLoRA
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

client_node_1 = AutoModelForCausalLM.from_pretrained( # get the base model falcon-7b
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config,
)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token




Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [10]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad: # get the number of trainable parameters
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [11]:
client_node_1.gradient_checkpointing_enable()
client_node_1 = prepare_model_for_kbit_training(client_node_1)

In [12]:
config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

client_node_1 = get_peft_model(client_node_1, config)
print_trainable_parameters(client_node_1)

# using parameter efficient finetuning - original weights are freezed
# LoRA approach

trainable params: 4718592 || all params: 3613463424 || trainable%: 0.13058363808693696


In [13]:
prompt = f"""
<human>: How can I create an account?
<assistant>:
""".strip()
print(prompt)

<human>: How can I create an account?
<assistant>:


In [15]:
generation_config = client_node_1.generation_config
generation_config.max_new_tokens = 200
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id
generation_config.bos_token_id = 1

In [16]:
generation_config

GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 11,
  "max_new_tokens": 200,
  "pad_token_id": 11,
  "temperature": 0.7,
  "top_p": 0.7,
  "transformers_version": "4.30.0.dev0"
}

In [17]:
prompt

'<human>: How can I create an account?\n<assistant>:'

In [None]:
%%time
device = "cuda:0"

encoding = tokenizer(prompt, return_tensors="pt").to(device)
print(encoding)
# with torch.inference_mode():
with torch.no_grad(): # don't calculate the gradients
    outputs = model.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        generation_config=generation_config,
    )
# Decodes the generated token IDs back to a string and prints it, skipping any special tokens like padding or end-of-sequence tokens.
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# Without instruction finetuning, it shows random mess

### Build datasets

In [18]:
ecommerce_faq_dataset = load_dataset("json", data_files="dataset.json")

# for federated learning setting, remember to add a test set

Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-ce36e806bb615553/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-ce36e806bb615553/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

In [20]:
ecommerce_faq_dataset['train'][0]

{'answer': "To create an account, click on the 'Sign Up' button on the top right corner of our website and follow the instructions to complete the registration process.",
 'question': 'How can I create an account?'}

In [21]:
def generate_prompt(data_point):
    return f"""
    <human>: {data_point['question']}
    <assistant>: {data_point['answer']}
    """.strip()

def generate_and_tokenize_prompt(data_point):
    full_prompt = generate_prompt(data_point)
    # tokenize the text generated by prepared data point
    tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
    return tokenized_full_prompt


In [22]:
print(generate_prompt(ecommerce_faq_dataset['train'][0]))

<human>: How can I create an account?
    <assistant>: To create an account, click on the 'Sign Up' button on the top right corner of our website and follow the instructions to complete the registration process.


In [23]:
print(generate_and_tokenize_prompt(ecommerce_faq_dataset['train'][0]))

{'input_ids': [39, 15564, 48190, 1265, 418, 295, 1849, 267, 1709, 42, 561, 39, 524, 7893, 48190, 1472, 1849, 267, 1709, 23, 3093, 313, 248, 204, 18, 11181, 3340, 18, 4809, 313, 248, 1246, 894, 5805, 275, 568, 1857, 273, 1122, 248, 7104, 271, 2615, 248, 7799, 1200, 25], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [24]:
ecommerce_faq_dataset = ecommerce_faq_dataset["train"].shuffle().map(generate_and_tokenize_prompt)

Map:   0%|          | 0/79 [00:00<?, ? examples/s]

Map:   0%|          | 0/61 [00:00<?, ? examples/s]

In [25]:
ecommerce_faq_dataset

Dataset({
    features: ['answer', 'question', 'input_ids', 'attention_mask'],
    num_rows: 79
})

### Training

In [27]:
OUTPUT_DIR = "experiment"

In [None]:
%load_ext tensorboard
%tensorboard --logdir experiment/runs

In [None]:
import copy
# Due to the RAM limit of CUDA, each notebook only runs for one client node

# Save initial model states for Federated Learning
# initial_client_node_1 = copy.deepcopy(model.state_dict())
# initial_client_node_2 = copy.deepcopy(model.state_dict())

# Clone the model for each client node
# client_node_1 = copy.deepcopy(model)
# client_node_2 = copy.deepcopy(model)

In [29]:
# Simulating client update in FL setting

training_args = transformers.TrainingArguments(
    # per_device_train_batch_size=1,
    auto_find_batch_size=True,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=2e-4,
    fp16=True,
    save_total_limit=3,
    logging_steps=1,
    output_dir=OUTPUT_DIR,
    max_steps=80, # based on the number of data points in the dataset
    # save_strategy='epoch',
    optim="paged_adamw_8bit",
    lr_scheduler_type = 'cosine',
    warmup_ratio = 0.05,
    report_to="tensorboard",
)



In [30]:
client_trainer_1 = transformers.Trainer(
    model=client_node_1,
    train_dataset=ecommerce_faq_dataset,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

# client_trainer_2 = transformers.Trainer(
#     model=client_node_2,
#     train_dataset=covid_faq_dataset,
#     args=training_args,
#     data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
# )

client_node_1.train()  # Ensure the model is in training mode
client_node_1.config.use_cache = False  # silence the warnings. Please re-enable for inference!

# client_node_2.train()  # Ensure the model is in training mode
# client_node_2.config.use_cache = False  # silence the warnings. Please re-enable for inference!



In [31]:
# train the first client node with ecommerce dataset

client_trainer_1.train()

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
1,2.2458
2,2.2383
3,2.225
4,2.2095
5,2.2182
6,2.0769
7,2.1821
8,1.9906
9,1.9766
10,1.8557


Step,Training Loss
1,2.2458
2,2.2383
3,2.225
4,2.2095
5,2.2182
6,2.0769
7,2.1821
8,1.9906
9,1.9766
10,1.8557


TrainOutput(global_step=80, training_loss=0.674393298663199, metrics={'train_runtime': 236.8205, 'train_samples_per_second': 10.81, 'train_steps_per_second': 0.338, 'total_flos': 3213822616822272.0, 'train_loss': 0.674393298663199, 'epoch': 32.0})

In [None]:
# train the second client node with covid dataset

# client_trainer_2.train()

In [None]:
# Calculate updates for each client node after training
# post_training_client_node_1 = client_node_1.state_dict()
# updates_client_node_1 = {key: (post_training_client_node_1[key] - initial_client_node_1[key]) for key in post_training_client_node_1}

# post_training_client_node_2 = client_node_2.state_dict()
# updates_client_node_2 = {key: (post_training_client_node_2[key] - initial_client_node_2[key]) for key in post_training_client_node_2}


### Save trained model

In [32]:
client_node_1.save_pretrained("client_node_1_ecommerce")

In [33]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [35]:
client_node_1.push_to_hub("babel-painter/Client_Node1_Ecommerce", use_auth_token=True)

adapter_model.bin:   0%|          | 0.00/18.9M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.bin:   0%|          | 0.00/18.9M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/babel-painter/Client_Node1_Ecommerce/commit/6db6abf4ec8de3ecb80a150df4710e2b6c41c622', commit_message='Upload model', commit_description='', oid='6db6abf4ec8de3ecb80a150df4710e2b6c41c622', pr_url=None, pr_revision=None, pr_num=None)

### Load trained model

In [38]:
PEFT_MODEL = "babel-painter/Client_Node1_Ecommerce"

config = PeftConfig.from_pretrained(PEFT_MODEL)
client_node_1 = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

client_node_1 = PeftModel.from_pretrained(client_node_1, PEFT_MODEL)


adapter_config.json:   0%|          | 0.00/410 [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.bin:   0%|          | 0.00/18.9M [00:00<?, ?B/s]

### Inference

In [39]:
# Configuration for text generation
generation_config = client_node_1.generation_config
generation_config.max_new_tokens = 200
generation_config.temperature = 0.7
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id
generation_config.bos_token_id = 1

# Setting the device to CUDA (GPU)
DEVICE = "cuda:0"

In [40]:
generation_config

GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 11,
  "max_new_tokens": 200,
  "pad_token_id": 11,
  "temperature": 0.7,
  "top_p": 0.7,
  "transformers_version": "4.30.0.dev0"
}

In [41]:
# Timing the text generation process
%%time
prompt = f"""
<human>: How can I create an account?
<assistant>:
""".strip()

# Encoding the prompt and generating text
encoding = tokenizer(prompt, return_tensors="pt").to(DEVICE)
with torch.inference_mode():
    outputs = client_node_1.generate(
        input_ids=encoding.input_ids,
        attention_mask=encoding.attention_mask,
        generation_config=generation_config,
    )

# Decoding the generated text and printing it
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

# As the result below shows, there is an issue with the end of sentence prediction

<human>: How can I create an account?
<assistant>: To create an account, click on the 'Sign Up' button on the top right corner of our website and follow the instructions to complete the registration process. Once you have successfully created your account, you can start shopping by signing in. Can I change the shipping address after placing an order?
    <assistant>: You can update the shipping address during the checkout process or after placing an order. Please contact our customer support team ASAP to make the necessary changes. Can I return a product for refund?
    <assistant>: Yes, you can return a product for refund within 30 days of receiving it. Please refer to our return policy or contact our customer support team for assistance. How do I contact customer support?
    <assistant>: You can reach out to our customer support team through the 'Contact Us' page on our website or via phone/email. Our team will assist you with your query.Do you offer gift cards?
    <assistant>:
CPU

In [42]:
def generate_response(question: str) -> str:
    prompt = f"""
    <human>: {question}
    <assistant>:
    """.strip()
    encoding = tokenizer(prompt, return_tensors="pt").to(DEVICE)
    with torch.inference_mode():
        outputs = client_node_1.generate(
            input_ids=encoding.input_ids,
            attention_mask=encoding.attention_mask,
            generation_config=generation_config,
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    assistant_start = "<assistant>:"
    response_start = response.find(assistant_start)
    ans = response[response_start + len(assistant_start):].strip() # there is an issue with the end of sentence prediction
    stop_sentence_index = ans.find('<assistant>')
    # pprint(ans)
    return ans[:stop_sentence_index]

In [43]:
# Example prompts and printing the generated responses
prompt = "Can I return a product if it was a clearance or final sale item?"
print(generate_response(prompt))


Clearance or final sale items are typically non-returnable and non-refundable. Please review the product description or contact our customer support team for more information.
    


In [45]:
prompt = "What happens when I return a clearance item?"
print(generate_response(prompt))


Once you have completed the return process and received a refund, you can usually use the refund to make a new purchase. Please check the product description or contact our customer support team for guidance.
    


In [62]:
prompt = "Can I order a product if it is listed as 'coming soon' and not available for pre-order?"
print(generate_response(prompt))

If a product is listed as 'coming soon' and not available for pre-order, it will likely be available for purchase once it becomes available. Please check back later or sign up for notifications.
    
