Adapters

ADPTER TRAINING GUIDE

===========================================================================

THIS FILE WILL HELP TO UNDERSTAND THE PROCESS OF TRAINING AND EVALUATING AN ADAPTER.

PREREQUISITES

How an adapter setup works. reference - [https://arxiv.org/abs/2005.00052](MAD-X Paper)
NER Datset preprocessing. reference - https://www.youtube.com/watch?v=dzyDHMycx_c
Adapters - [https://docs.adapterhub.ml/quickstart.html] (Adapter Introduction)

DATASET PREPROCESSING

Preprcoessing is divided into 3 parts .

Langauge Adapter Preprocessing
Task Adapter Preprocessing
Evaluation Dataset preprocessing

Set the labels for NER Tags: {"O": 0, "B-per": 1, "I-per": 2, "B-org": 3, "I-org": 4, "B-loc": 5, "I-loc": 6}

This is default TAG List for BIO Tagging Format.

Language Adapter Training:

We start with training the language adapter, these adapters will be trained on unlabeled data.
In language.ipynb give the directory of the unlabled dataset and RUN, this will automatically save the langauge adapter in your folder.
You can change the adapter name and ouput directory accordingly.

NOTE: SAME PIPELINE CAN BE USED FOR TRAINING TARGET LANGUAGE ADAPTER.

Task Adapter Training:

RUN task.ipynb for TASK ADAPTER Training.
This adapter would be trained on labeled dataset, for this project NER Dataset have been used.
Set the path for NER Dataset, preprocess pipeline will convert the dataset into Huggingface Dict. i.e. the format for tokenizing the text. After Ist preprocess your output should be:

DatasetDict({ train: Dataset({ features: ['LABEL-1', 'LABEL-2'] }) })

NOTE: You can change name of the labels, the NER dataset used was having ['token','ner_tags'] as label.

This preprocessing will tokenize and map the text to their corresponding input_ids. Your ouput should be:

DatasetDict({ train: Dataset({ features: ['LABEL-1', 'LABEL-2', 'input_ids', 'attention_mask', 'labels'], }) })
We shall remove the LABELS from the Dataset for training purpose.
Import a pretrained adapter model for training. Add your adapter ['model.add_adapter("Your_Task_Adapter")']. this adapter would then be stacked with Language Adpater.

NOTE:The langauge adapter should be same as the langauage being used for training Task Adapter.
Set your parameters for training.
Save your Task adapter for further evalauation.

Evaluation:

We would evaluate on Target Adapter language. Evaluation would be done for labeled dataset of the langauge.
So we would preprcoess the data same way we did for Task Adapter Dataset. Repat the process till removing the LABELS.
Then we will call the TASK ADAPTER and replace the Language adapter with Target adapter.
Then would evaluate for test split of the dataset.

SINCE WE USED NER DATASET, WE SAW A HUGE CLASS IMBALANCE IN TAGGING, SO DURING EVALUATION WE WOULD IGNORE THE DOMINATED TAG FOR BETTER RESULTS.

YOU CAN TRY ANOTHER APPROACH TO EVALAUTE

INSTALL THE REQUIRED PACKAGES FROM 'requirements.txt'

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
0.26.0'		0.26.0'
Hindi_Adapter.ipynb		Hindi_Adapter.ipynb
Language_adapter.ipynb		Language_adapter.ipynb
Original_adapter.ipynb		Original_adapter.ipynb
README.md		README.md
Text_Editor.py		Text_Editor.py
adapter_fusion.ipynb		adapter_fusion.ipynb
adapter_inference.ipynb		adapter_inference.ipynb
marathi_adapter.ipynb		marathi_adapter.ipynb
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adapters

About

Uh oh!

Releases

Packages

Languages

A-Mayank/Adapters

Folders and files

Latest commit

History

Repository files navigation

Adapters

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages