How to switch between different config ?

Can we train multiple lora_adapter? for same task? different task parallely? 

from a list of adapters how to select? or load




First lets revise how we can add different adapters and load it during training and inference

In [1]:
from transformers import AutoModelForCausalLM, AutoModelForSequenceClassification, AutoTokenizer

from peft import LoraConfig, get_peft_model

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
peft_model_name = 'roberta-base-peft'

base_model = 'roberta-base'

In [3]:
##
model=AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path=base_model)
model

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


RobertaForSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
             

In [4]:
model_total_parameters=sum(p.numel() for p in model.parameters())
model_parameters_trainable=sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'total parameters: {model_total_parameters} | total_trainable_parameters={model_parameters_trainable}')

total parameters: 124647170 | total_trainable_parameters=124647170


In [5]:
##loading the peft model
peft_config=LoraConfig(inference_mode=False)

peft_model=get_peft_model(model=model,
                          peft_config=peft_config)

In [6]:
peft_model.print_trainable_parameters() ## only trains 23% of the model

trainable params: 294,912 || all params: 124,942,082 || trainable%: 0.2360389672392365


In [7]:
peft_model

PeftModel(
  (base_model): LoraModel(
    (model): RobertaForSequenceClassification(
      (roberta): RobertaModel(
        (embeddings): RobertaEmbeddings(
          (word_embeddings): Embedding(50265, 768, padding_idx=1)
          (position_embeddings): Embedding(514, 768, padding_idx=1)
          (token_type_embeddings): Embedding(1, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): RobertaEncoder(
          (layer): ModuleList(
            (0-11): 12 x RobertaLayer(
              (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                  (query): lora.Linear(
                    (base_layer): Linear(in_features=768, out_features=768, bias=True)
                    (lora_dropout): ModuleDict(
                      (default): Identity()
                    )
                    (lora_A): ModuleDict(
                      (default): Linear

Adding a adapter works same as adding adapter to peft_model from the hub. Basically we will register numbers of adapters tp the config. This does not change the model itself. While training and inference we can switch to any of them using different methods as explained below. More detail on previous colab note 09_PEFT.ipynb

In [8]:


peft_model.add_adapter('adapter1',peft_config)

In [9]:
peft_model

PeftModel(
  (base_model): LoraModel(
    (model): RobertaForSequenceClassification(
      (roberta): RobertaModel(
        (embeddings): RobertaEmbeddings(
          (word_embeddings): Embedding(50265, 768, padding_idx=1)
          (position_embeddings): Embedding(514, 768, padding_idx=1)
          (token_type_embeddings): Embedding(1, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): RobertaEncoder(
          (layer): ModuleList(
            (0-11): 12 x RobertaLayer(
              (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                  (query): lora.Linear(
                    (base_layer): Linear(in_features=768, out_features=768, bias=True)
                    (lora_dropout): ModuleDict(
                      (default): Identity()
                      (adapter1): Identity()
                    )
                    (lora_A): Module

In [10]:
peft_model.print_trainable_parameters()

trainable params: 294,912 || all params: 125,236,994 || trainable%: 0.23548313527870207


In [11]:
peft_model.add_adapter('adapter2',peft_config)
peft_model.add_adapter('adapter3',peft_config)

In [12]:
peft_model

PeftModel(
  (base_model): LoraModel(
    (model): RobertaForSequenceClassification(
      (roberta): RobertaModel(
        (embeddings): RobertaEmbeddings(
          (word_embeddings): Embedding(50265, 768, padding_idx=1)
          (position_embeddings): Embedding(514, 768, padding_idx=1)
          (token_type_embeddings): Embedding(1, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): RobertaEncoder(
          (layer): ModuleList(
            (0-11): 12 x RobertaLayer(
              (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                  (query): lora.Linear(
                    (base_layer): Linear(in_features=768, out_features=768, bias=True)
                    (lora_dropout): ModuleDict(
                      (default): Identity()
                      (adapter1): Identity()
                      (adapter2): Identity()
             

In [13]:
peft_model.active_adapters

['default']

In [14]:
peft_model.set_adapter('adapter1')  #We can use this method to see the active peft adapter

### peft_model.set_adapter
This method would allow to set different adapter before training or 

In [15]:
peft_model.set_adapter('adapter1')

In [16]:
peft_model

PeftModel(
  (base_model): LoraModel(
    (model): RobertaForSequenceClassification(
      (roberta): RobertaModel(
        (embeddings): RobertaEmbeddings(
          (word_embeddings): Embedding(50265, 768, padding_idx=1)
          (position_embeddings): Embedding(514, 768, padding_idx=1)
          (token_type_embeddings): Embedding(1, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): RobertaEncoder(
          (layer): ModuleList(
            (0-11): 12 x RobertaLayer(
              (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                  (query): lora.Linear(
                    (base_layer): Linear(in_features=768, out_features=768, bias=True)
                    (lora_dropout): ModuleDict(
                      (default): Identity()
                      (adapter1): Identity()
                      (adapter2): Identity()
             

#### Testing different adapters config during training and testing 

In [4]:
import transformers
from transformers import AutoTokenizer
from transformers import DataCollatorWithPadding
import numpy as np

import datasets


2024-05-16 00:25:57.573554: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-16 00:25:57.615548: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-16 00:25:57.615583: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-16 00:25:57.616604: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-16 00:25:57.626531: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-05-16 00:25:57.627438: I tensorflow/core/platform/cpu_feature_guard.cc:1

In [18]:


tokenizer=AutoTokenizer.from_pretrained(pretrained_model_name_or_path=base_model)
dataset=datasets.load_dataset('ag_news')

num_labels=np.unique(dataset['train']['label']).shape[0]
dataset['train'].features
def tokenize_example(example_dataset):
    text=example_dataset['text']
    return tokenizer(text,padding=True,truncation=True)

tokenize_dataset=dataset.map(tokenize_example,
                             batched=True,
                            remove_columns=['text'])
num_labels = dataset['train'].features['label'].num_classes
classnames=tokenize_dataset['train'].features['label'].names
print(f"number of labels: {num_labels}")
print(f"the labels: {classnames}")

id2label={i:label for i,label in enumerate(classnames)}
print(f'id2label: {id2label}')


data_collator=DataCollatorWithPadding(tokenizer=tokenizer,padding=True,return_tensors='pt') ## we are again padding even though there is already padding in above tokenizer.map(batched=True); This is because
# in map padding is done in a fixed batched size; however; during training using Trainer().train() different batch_size would have been present so we wanna make sure it is agiain padded when we are generating batches
##during training

number of labels: 4
the labels: ['World', 'Sports', 'Business', 'Sci/Tech']
id2label: {0: 'World', 1: 'Sports', 2: 'Business', 3: 'Sci/Tech'}


In [19]:
seed=101
train_dataset=tokenize_dataset['train'].shuffle(seed=seed).select(range(2000))
eval_dataset=tokenize_dataset['test'].shuffle(seed=seed).select(range(2000))

In [20]:
### creating different lora_adapter_config
peft_config=peft_config_A=LoraConfig(inference_mode=False)
peft_config_1=LoraConfig(r=4,lora_alpha=16,inference_mode=False,lora_dropout=0.1)
peft_config_2=LoraConfig(inference_mode=False,r=8,lora_alpha=32,lora_dropout=0.3)


In [21]:
model=AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path=base_model,id2label=id2label)
peft_model=get_peft_model(model=model,
                          peft_config=peft_config
                       
                    )


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [22]:
## add adapter
peft_model.add_adapter('adapter1',peft_config=peft_config_1)
peft_model.add_adapter('adapter2',peft_config=peft_config_2)

In [23]:
peft_model

PeftModel(
  (base_model): LoraModel(
    (model): RobertaForSequenceClassification(
      (roberta): RobertaModel(
        (embeddings): RobertaEmbeddings(
          (word_embeddings): Embedding(50265, 768, padding_idx=1)
          (position_embeddings): Embedding(514, 768, padding_idx=1)
          (token_type_embeddings): Embedding(1, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): RobertaEncoder(
          (layer): ModuleList(
            (0-11): 12 x RobertaLayer(
              (attention): RobertaAttention(
                (self): RobertaSelfAttention(
                  (query): lora.Linear(
                    (base_layer): Linear(in_features=768, out_features=768, bias=True)
                    (lora_dropout): ModuleDict(
                      (default): Identity()
                      (adapter1): Dropout(p=0.1, inplace=False)
                      (adapter2): Dropo

In [5]:
import evaluate

metrics=evaluate.load('accuracy')

def compute_metrics(eval_pred):
    logits,label=eval_pred
    pred_label=np.argmax(logits,axis=-1)
    return metrics.compute(predictions=pred_label,references=label)

#### Training with LoRAConfigA

In [25]:
from transformers import TrainingArguments,Trainer

In [26]:
peft_model.set_adapter('adapter1')
saved_dire='../saved_weight/12_config1_lora'
args_1=TrainingArguments(output_dir=saved_dire)
trainer_1=Trainer(model=peft_model,
                  args=args_1,
                  train_dataset=train_dataset,
                  eval_dataset=eval_dataset,
                  compute_metrics=compute_metrics,
                  data_collator=data_collator
                  )

trainer_1.train()


dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Step,Training Loss
500,1.3363


TrainOutput(global_step=750, training_loss=1.219131795247396, metrics={'train_runtime': 149.721, 'train_samples_per_second': 40.075, 'train_steps_per_second': 5.009, 'total_flos': 957729935645376.0, 'train_loss': 1.219131795247396, 'epoch': 3.0})

In [27]:
trainer_1.evaluate()

{'eval_runtime': 17.9652,
 'eval_samples_per_second': 111.326,
 'eval_steps_per_second': 13.916,
 'epoch': 3.0}

#### Training with LoRAConfig2

In [28]:
peft_model.set_adapter('adapter2')
saved_dire='../saved_weight/12_config2_lora'
args_2=TrainingArguments(output_dir=saved_dire)
trainer_2=Trainer(model=peft_model,
                  args=args_2,
                  train_dataset=train_dataset,
                  eval_dataset=eval_dataset,
                  compute_metrics=compute_metrics,
                  data_collator=data_collator
                  )

trainer_2.train()

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


Step,Training Loss
500,1.1626


TrainOutput(global_step=750, training_loss=0.9998443806966146, metrics={'train_runtime': 151.0823, 'train_samples_per_second': 39.713, 'train_steps_per_second': 4.964, 'total_flos': 957729935645376.0, 'train_loss': 0.9998443806966146, 'epoch': 3.0})

In [29]:
trainer_2.evaluate()

{'eval_runtime': 18.1284,
 'eval_samples_per_second': 110.324,
 'eval_steps_per_second': 13.79,
 'epoch': 3.0}

In [7]:
saved_dire='../saved_weight/12_config_lora'
peft_model.save_pretrained(saved_dire)
tokenizer.save_pretrained(saved_dire)

## loading both model_adapter

In [28]:
from peft import PeftConfig,PeftModel
from transformers import AutoModelForSequenceClassification,AutoTokenizer

In [30]:
## Path of the save_model dire
saved_dire='../saved_weight/12_config_lora'

In [45]:
## loading the "Pretrained" base model and "Pretrained" tokenizer
id2label={0: 'World', 1: 'Sports', 2: 'Business', 3: 'Sci/Tech'}
base_model=AutoModelForSequenceClassification.from_pretrained(saved_dire,id2label= id2label)
tokenizer=AutoTokenizer.from_pretrained(saved_dire)

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [46]:
##loading the adapter1_config and adapter2_config
adapter1_config=PeftConfig.from_pretrained(saved_dire+'/adapter1')
adapter2_config=PeftConfig.from_pretrained(saved_dire+'/adapter2')

In [47]:
print(f'adapter1_config: {adapter1_config}')
print(f'adapter2_config: {adapter2_config}')

adapter1_config: LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping={'base_model_class': 'RobertaForSequenceClassification', 'parent_library': 'transformers.models.roberta.modeling_roberta'}, base_model_name_or_path='roberta-base', revision=None, task_type=None, inference_mode=True, r=4, target_modules={'query', 'value'}, lora_alpha=16, lora_dropout=0.1, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, use_dora=False, layer_replication=None)
adapter2_config: LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping={'base_model_class': 'RobertaForSequenceClassification', 'parent_library': 'transformers.models.roberta.modeling_roberta'}, base_model_name_or_path='roberta-base', revision=None, task_type=None, inference_mode=True, r=8, target_modules={'query', 'value'}, lora_alpha=3

In [48]:

# Load the entire model with adapters
peft_model = PeftModel.from_pretrained(base_model, saved_dire)

# Load adapter1 and adapter2
peft_model.load_adapter(saved_dire + '/adapter1', adapter_name='adapter1')
peft_model.load_adapter(saved_dire + '/adapter2', adapter_name='adapter2')

_IncompatibleKeys(missing_keys=['base_model.model.roberta.embeddings.word_embeddings.weight', 'base_model.model.roberta.embeddings.position_embeddings.weight', 'base_model.model.roberta.embeddings.token_type_embeddings.weight', 'base_model.model.roberta.embeddings.LayerNorm.weight', 'base_model.model.roberta.embeddings.LayerNorm.bias', 'base_model.model.roberta.encoder.layer.0.attention.self.query.base_layer.weight', 'base_model.model.roberta.encoder.layer.0.attention.self.query.base_layer.bias', 'base_model.model.roberta.encoder.layer.0.attention.self.query.lora_A.default.weight', 'base_model.model.roberta.encoder.layer.0.attention.self.query.lora_A.adapter1.weight', 'base_model.model.roberta.encoder.layer.0.attention.self.query.lora_B.default.weight', 'base_model.model.roberta.encoder.layer.0.attention.self.query.lora_B.adapter1.weight', 'base_model.model.roberta.encoder.layer.0.attention.self.key.weight', 'base_model.model.roberta.encoder.layer.0.attention.self.key.bias', 'base_mode

In [59]:
import torch.nn.functional as F

def classify(peft_model,text, adapter_name: str):
    # Set the adapter
    peft_model.set_adapter(adapter_name)
    # Tokenize the input text
    inputs = tokenizer(text, padding=True, truncation=True, return_tensors='pt')
    # Get the model's output
    output = peft_model(**inputs)
    # Get the predicted class and confidence
    probabilities = F.softmax(output.logits, dim=-1)
    prediction = probabilities.argmax(dim=-1).item()
    confidence = probabilities[0, prediction].item()
    print(f'Adapter: {adapter_name} | Text: {text} | Class: {prediction} | Label: {id2label[prediction]} | Confidence: {confidence:.2%}')



In [60]:
text1="Kederis proclaims innocence Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his ..."
text2="Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\band of ultra-cynics, are seeing green again."


In [61]:
classify(peft_model,text1,'adapter1')
classify(peft_model,text1,'adapter2')

Adapter: adapter1 | Text: Kederis proclaims innocence Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his ... | Class: 2 | Label: Business | Confidence: 30.06%
Adapter: adapter2 | Text: Kederis proclaims innocence Olympic champion Kostas Kederis today left hospital ahead of his date with IOC inquisitors claiming his ... | Class: 2 | Label: Business | Confidence: 29.52%


In [58]:
classify(text2,'adapter1') ## both correction are wrong 'trained on small dataset so
classify(text2,'adapter2')

Adapter: adapter1 | Text: Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindlinand of ultra-cynics, are seeing green again. | Class: 2 | Label: Business | Confidence: 36.23%
Adapter: adapter2 | Text: Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindlinand of ultra-cynics, are seeing green again. | Class: 2 | Label: Business | Confidence: 31.64%


tensor([[-0.2170, -0.1073,  0.2246,  0.1906]], grad_fn=<AddmmBackward0>)

### What if saved_pretrained only saved adapter weight?

Load the base_model and tokenizer from the hub and keep everything the same. **Make sure when you are training you also save the classifier_head**



```python
base_model = AutoModelForSequenceClassification.from_pretrained(base_model, id2label=id2label)

tokenizer = AutoTokenizer.from_pretrained(base_model)
```


In [64]:

## Path of the save_model dire
## 
# base_model = 'roberta-base'
# saved_dire='../saved_weight/12_config_lora'
# ## loading the "Pretrained" base model and "Pretrained" tokenizer
# id2label={0: 'World', 1: 'Sports', 2: 'Business', 3: 'Sci/Tech'}
# ## we will load base_model from hub and only use adapter
# base_model=AutoModelForSequenceClassification.from_pretrained(base_model,id2label= id2label)
# tokenizer=AutoTokenizer.from_pretrained(base_model)
# ##loading the adapter1_config and adapter2_config
# adapter1_config=PeftConfig.from_pretrained(saved_dire+'/adapter1')
# adapter2_config=PeftConfig.from_pretrained(saved_dire+'/adapter2')
# print(f'adapter1_config: {adapter1_config}')
# print(f'adapter2_config: {adapter2_config}')

# # Load the entire model with adapters
# peft_model_ = PeftModel.from_pretrained(base_model, saved_dire)

# # Load adapter1 and adapter2
# peft_model_.load_adapter(saved_dire + '/adapter1', adapter_name='adapter1')
# peft_model_.load_adapter(saved_dire + '/adapter2', adapter_name='adapter2')