# Cross-lingual Transfer Using Stacked Language Adapters

Google Colab to run the cross-lingual sentence retrieval experiments. Authored by Marcell Fekete.

Make sure that the `colab` folder from the repository is uploaded to the Google Drive before use.

## Install necessary libraries

In [None]:
!pip install -U adapter-transformers==2.3.0
!pip install datasets
!pip install faiss-gpu

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting adapter-transformers==2.3.0
  Downloading adapter_transformers-2.3.0-py3-none-any.whl (3.2 MB)
[K     |████████████████████████████████| 3.2 MB 39.1 MB/s 
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 63.2 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 65.3 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
[K     |████████████████████████████████| 880 kB 66.4 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.8.1-py3-none-any.whl (101 kB)
[K     |████████████████████████████████| 101

## Import necessary libraries

In [None]:
from google.colab import drive
drive.mount("/content/gdrive")

Mounted at /content/gdrive


## Modify the config file

Modify the `config.ini` file found in the `colab` folder to carry out different experiments.

Change the folder for the relevant language pair in the lines `path_to_source`, `path_to_target`, and `path_to_labels`, as well as `output_directory` and `save_source_path`.

Experimental setups can be provided by adding experiment names in the `experiments` line, divided by whitespaces. More experiments can be carried out with the same config file if the option on line `multiconfig` is set to True.

In lines `source_adapters` and `target_adapters`, provide language adapter IDs from [AdapterHub](https://adapterhub.ml/explore/text_lang/). Use whitespaces to divide adapters that are meant to be used in the same experiment. If adapters are meant to be stacked, divide the adapter IDs using a comma and no whitespace. If instead of adapter ID, the string "None" is used, no adapter is added.

Examples:

```
source_adapters = None en en
target_adapters = None hu hu,fi
```

In this example, no language adapter is loaded to process source or target sentences in the first experiment (`None` and `None`). In the second experiment, source sentences are processed using the English language adapter (`en`), and target sentences are processed using the Hungarian one (`hu`). Finally, in the third experiment, the English adapter is still used to process the source sentences (`en`), but the Finnish language adapter is stacked on top of the Hungarian one to process the Hungarian sentences (`hu,fi`).

## Run experiments

In [None]:
!python gdrive/MyDrive/Marcell_Fekete_Cross-Lingual_Transfer_Using_Stacked_Language_Adapters/src/main.py

Some weights of the model checkpoint at bert-base-multilingual-cased were not used when initializing BertModelWithHeads: ['cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModelWithHeads from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModelWithHeads from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Adapter setup   Stack[en]   added and activated
Overwriting existing adapter 'en'.
Adapter setup   Stack[en, vi]   ad

Results are placed in the `colab/output` folder.