# Mask filling
in a **pipeline** refers to using a **natural language processing (NLP)** model, specifically a **fill-mask task model**, *to predict a missing word *(the **"mask"**) within a sentence.

This is done by feeding the model a sentence with a placeholder like **`[MASK]`** or **`<mask>`**.

The **pipeline** then *returns the most probable words to fill that gap,* based on the surrounding context. This technique, also known as a **cloze test**, is used for tasks like
* language learning,
* content generation, and
* data augmentation.

# How it works
1. **Inputting a masked prompt:** You provide a sentence to the **pipeline** with a special token representing the masked word (e.g., *"The capital of France is "*).
2. **Model prediction:** The **fill-mask model** analyzes the sentence and predicts the most likely word(s) to replace the mask token, based on its training.
3. **Outputting results:** The **pipeline** returns the predicted word(s), along with their scores (probability), to fill the gap.

# Examples of Use Cases
* **Language Learning:**
Creating fill-in-the-blank exercises for students.
* **Content Generation:**
Assisting writers in completing sentences or suggesting words.
* **Data Augmentation:**
Generating more diverse training data for other **NLP** tasks by predicting masked words in existing sentences.
* **Domain-Specific Tasks:**
Training models on specific datasets (like medical research papers) to understand and fill in context-specific information.

In [1]:
from transformers import pipeline

In [4]:
unmasker = pipeline("fill-mask")
unmasker("Amit is the most dynamic, aggressive and intelligent AI <mask>.", top_k=4)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'score': 0.04801853001117706,
  'token': 869,
  'token_str': ' player',
  'sequence': 'Amit is the most dynamic, aggressive and intelligent AI player.'},
 {'score': 0.04232377931475639,
  'token': 36749,
  'token_str': ' imaginable',
  'sequence': 'Amit is the most dynamic, aggressive and intelligent AI imaginable.'},
 {'score': 0.04084310308098793,
  'token': 655,
  'token_str': ' ever',
  'sequence': 'Amit is the most dynamic, aggressive and intelligent AI ever.'},
 {'score': 0.037871986627578735,
  'token': 1984,
  'token_str': ' candidate',
  'sequence': 'Amit is the most dynamic, aggressive and intelligent AI candidate.'}]