# Limitations of Using a Pre-trained (Fine-tuned) Models

- To enable pretraining on large amounts of data, researchers often scrape all the content they can find on the internet (good and bad).

In [9]:
from transformers import pipeline

unmasker = pipeline(task="fill-mask",
                    model="bert-base-uncased")

result = unmasker("This man works as a [MASK].")
print([r["token_str"] for r in result])

['carpenter', 'lawyer', 'farmer', 'businessman', 'doctor']


We can see 2 warnings here: <br>
1️⃣ First Warning (About `GenerationMixin`)<br>
2️⃣ Second Warning (Unused Weights in `BertForMaskedLM`)<br>

### 1️⃣ About `GenerationMixin`

⚠️ BertForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. 
From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
 <br><br>

- Warning indicates that `BertForMaskedLM` has a method `prepare_inputs_for_generation` i.e., used for text generation, but it doesn't inherit from `GenerationMixin`
- Why *Warning*?
    - `BertForMaskedLM` isn't originally designed for text generation, but it does have some generation-related functionality.
    - In future versions (**v4.50+** of `transformers`), `PreTrainedModel` will no longer inherit `GenerationMixin` automatically, meaning **BERT models will lose the ability to generate text using `.generate()`**.

In [10]:
from transformers import pipeline, AutoModelForMaskedLM, AutoTokenizer

model_name = "bert-base-uncased"
unmasker = pipeline(task="fill-mask",
                    model=AutoModelForMaskedLM.from_pretrained(model_name),
                    tokenizer=AutoTokenizer.from_pretrained(model_name))

result = unmasker("This man works as a [MASK].")
print([r["token_str"] for r in result])

['carpenter', 'lawyer', 'farmer', 'businessman', 'doctor']


### 2️⃣ Unused Weights in `BertForMaskedLM`

⚠️ Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM:
- ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
<br>

- Warning indicates that **some weights from `bert-base-uncased` weren't used** when loading `BertForMaskedLM`.
- Why *Warning*?
    - `bert-base-uncased` was originally trained for multiple tasks, including sequence classification (e.g., next sentence prediction)
    - The **masked language model (MLM)** only needs **part of the weights**, so the **pooling layer and next sentence prediction head are ignored.**
- Is this an *issue*?
    - No. This is an expected behavior when using a BERT model for `fill-mask` instead of full pre-training.

In [11]:
import os
import logging
from transformers import pipeline, AutoModelForMaskedLM, AutoTokenizer

# Suppress Warning messages
logging.getLogger("transformers.modeling_utils").setLevel(logging.ERROR)

model_name = "bert-base-uncased"

unmasker = pipeline(
    task = "fill-mask",
    model = AutoModelForMaskedLM.from_pretrained(model_name),
    tokenizer = AutoTokenizer.from_pretrained(model_name)
)

result = unmasker("This man works as a [MASK].")
print([r["token_str"] for r in result])


['carpenter', 'lawyer', 'farmer', 'businessman', 'doctor']


Here, we've suppressed all the warnings related to model loading. It is safe, because **unused weights** don't affect performance, and the warning is only informational.

In [13]:
result = unmasker("This woman works as a [MASK].")
print([r["token_str"] for r in result])

['nurse', 'maid', 'teacher', 'waitress', 'prostitute']


In [14]:
result = unmasker("This girl works as a [MASK].")
print([r["token_str"] for r in result])

['waitress', 'nurse', 'model', 'maid', 'teacher']


In [15]:
result = unmasker("This boy works as a [MASK].")
print([r["token_str"] for r in result])

['teacher', 'carpenter', 'mechanic', 'farmer', 'waiter']


When asked to fill in the missing word in these two sentences, the model gives only one gender-free answer (waiter/waitress). The others are work occupations usually associated with one specific gender.

This happens even though BERT is one of the rare Transformer models not built by scraping data from all over the internet, but rather using apparently neutral data (it’s trained on the English Wikipedia and BookCorpus datasets).

Hence, when using these tools, we need to keep in mind that the original model could very easily generate sexist, racist or homophobic content. Fine-tuning the model on our data won't make this intrinsic bias disappear.