### Masked Language Modeling in Python
Masked Language Modeling (MLM) is a key technique used in natural language processing (NLP), particularly in training models like BERT (Bidirectional Encoder Representations from Transformers). The idea is to mask certain words in a sentence and train the model to predict them. This helps the model learn contextual representations of words.
### TODO: contextual representations of words.
### Step 1: Install Required Libraries
First, you need to install the transformers and datasets libraries from Hugging Face.

In [1]:
import torch
from transformers import BertTokenizer, BertForMaskedLM
from transformers import pipeline


  from .autonotebook import tqdm as notebook_tqdm


### Step 2: Load Pre-trained BERT Model and Tokenizer
Load a pre-trained BERT model and tokenizer.

In [2]:
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForMaskedLM.from_pretrained('bert-base-uncased')

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


### Step 3: Prepare Text for Masking
Let's prepare a sample sentence and mask one of the words.

In [3]:
text = "Transformers are a state-of-the-art technique in natural language processing."

# Tokenize the text
inputs = tokenizer(text, return_tensors='pt')

# Create a mask for a particular word
masked_index = 3  # Mask the word "a"
inputs['input_ids'][0][masked_index] = tokenizer.mask_token_id

print(tokenizer.decode(inputs['input_ids'][0]))

[CLS] transformers are [MASK] state - of - the - art technique in natural language processing. [SEP]


### Step 4: Perform Masked Language Modeling
Use the model to predict the masked word.

In [4]:
with torch.no_grad():
    outputs = model(**inputs)

predictions = outputs.logits

# Get the predicted token ID for the masked position
predicted_token_id = predictions[0, masked_index].argmax(axis=-1).item()

# Decode the predicted token ID to get the predicted word
predicted_token = tokenizer.decode(predicted_token_id)
print(f"Predicted word: {predicted_token}")

Predicted word: a


### Step 5: Using a Pipeline for Convenience
Hugging Face also provides a pipeline for masked language modeling which makes this process even easier.

In [5]:
mlm_pipeline = pipeline('fill-mask', model=model, tokenizer=tokenizer)
result = mlm_pipeline("Transformers are [MASK] state-of-the-art technique in natural language processing.")
print(result)

[{'score': 0.9940366744995117, 'token': 1037, 'token_str': 'a', 'sequence': 'transformers are a state - of - the - art technique in natural language processing.'}, {'score': 0.002596732461825013, 'token': 1996, 'token_str': 'the', 'sequence': 'transformers are the state - of - the - art technique in natural language processing.'}, {'score': 0.0013455171138048172, 'token': 2178, 'token_str': 'another', 'sequence': 'transformers are another state - of - the - art technique in natural language processing.'}, {'score': 0.0012169501278549433, 'token': 2019, 'token_str': 'an', 'sequence': 'transformers are an state - of - the - art technique in natural language processing.'}, {'score': 0.0004081668157596141, 'token': 2028, 'token_str': 'one', 'sequence': 'transformers are one state - of - the - art technique in natural language processing.'}]


### Explanation with Visualization
To visualize the process, we can use a simple diagram to explain the flow:

Input Sentence: "Transformers are a state-of-the-art technique in natural language processing."
Masked Sentence: "Transformers are [MASK] state-of-the-art technique in natural language processing."
Tokenization: Convert words into token IDs.
Masking: Replace the token ID for the word "a" with the mask token ID.
Prediction: Use the model to predict the masked token.
Decoding: Convert the predicted token ID back to a word.

In [6]:
from PIL import Image, ImageDraw, ImageFont

# Create a simple visualization
def create_visualization():
    width, height = 800, 400
    image = Image.new('RGB', (width, height), 'white')
    draw = ImageDraw.Draw(image)
    font = ImageFont.load_default()

    # Define text and positions
    texts = [
        "Input Sentence: Transformers are a state-of-the-art technique in natural language processing.",
        "Masked Sentence: Transformers are [MASK] state-of-the-art technique in natural language processing.",
        "Tokenization: ['Transformers', 'are', '[MASK]', 'state', '-', 'of', '-', 'the', '-', 'art', 'technique', 'in', 'natural', 'language', 'processing', '.']",
        "Prediction: Model predicts the masked token",
        "Decoding: Predicted word is 'a'"
    ]
    positions = [(10, 10), (10, 50), (10, 90), (10, 130), (10, 170)]

    # Draw text
    for text, position in zip(texts, positions):
        draw.text(position, text, fill='black', font=font)

    return image

visualization = create_visualization()
visualization.show()
