---
title: "PEFT - Adapter Tuning"
description: Perameter efficient finetuning using Adapters
author: "Uday"
date: "2024-09-18"
categories: [NLP, PEFT, Fine Tuning]
image: "images/adapters_1.png"
---

Large pre-trained language models (e.g., BERT, GPT) have revolutionized NLP tasks by leveraging massive amounts of unlabeled data. Transfer learning involves first pre-training these models on large corpora and then fine-tuning them on smaller, task-specific datasets. However, fine-tuning all the parameters of a model like BERT is computationally expensive and inefficient, particularly when there are multiple downstream tasks

# Adapter Layers

- `Adapters` are small, task-specific layers added between the layers of the pre-trained model.
- Instead of fine-tuning all the parameters of the model, only the parameters of the adapter layers are updated during training for a specific task. The rest of the model's parameters remain frozen.
- This method significantly reduces the number of trainable parameters and, thus, the computational cost of fine-tuning.

# Adapter Design

![adapters](images/adapters_1.png "https://arxiv.org/pdf/1902.00751")


- Each adapter consists of a `down-projection`, a `non-linearity`, and an `up-projection` as shown in above image
- The down-projection reduces the dimensionality of the intermediate layer activations, and the up-projection restores it, thus keeping the adapter small and efficient.
- The adapters first project the original d-dimensional features into a smaller dimension, m, apply a nonlinearity, then project back to d dimensions. 
- so The total number of parameters added per layer, including biases, is `2md + d + m`.
-  By setting `m << d`, we limit the number of parameters added per task.


# Results


![adapters](images/adapters_2.png "https://arxiv.org/pdf/1902.00751")

![adapters](images/adapters_3.png "https://arxiv.org/pdf/1902.00751")




Reference:

1. https://arxiv.org/pdf/1902.00751