# Using pretrained models (PyTorch)

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [1]:
!pip install datasets evaluate transformers[sentencepiece]

Collecting evaluate
  Downloading evaluate-0.4.5-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.5-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.5


## 🤗✨ Using Pretrained Models

In this module, you'll learn how to easily leverage state-of-the-art pretrained models from the Hugging Face Hub for your NLP tasks.  

Pretrained models can be used right away for prediction, exploration, or as a foundation for further fine-tuning—saving you time and compute! 🚀  

We’ll cover how to load these models, use their tokenizers, and make inferences—all in just a few lines of code.


## 🔎 Fill-Mask Task with CamemBERT
We’ll use Hugging Face’s `pipeline` to perform the **fill-mask task** in French 🇫🇷.  
The pipeline automatically handles tokenization, model loading, and prediction.  



In [None]:
# Import the pipeline function
from transformers import pipeline

# Create a fill-mask pipeline using the CamemBERT model
camembert_fill_mask = pipeline("fill-mask", model="camembert-base")

# Provide an input sentence with <mask> token
results = camembert_fill_mask("Le camembert est <mask> :)")

# Print the predictions
print(results)


## Loading Model with Architecture-Specific Classes
Here we directly use **CamemBERT’s architecture classes**:  
- `CamembertTokenizer` for tokenization  
- `CamembertForMaskedLM` for masked language modeling  

⚠️ Limitation: This ties us specifically to **CamemBERT** and cannot be easily switched to another model architecture.  


In [None]:
# Import CamemBERT-specific tokenizer and model
from transformers import CamembertTokenizer, CamembertForMaskedLM

# Load the tokenizer and model for CamemBERT
tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
model = CamembertForMaskedLM.from_pretrained("camembert-base")


##  Loading Model with Auto Classes (Recommended)
Here we use **Auto classes**:  
- `AutoTokenizer`  
- `AutoModelForMaskedLM`  

✅ Advantage: Auto classes are **architecture-agnostic**, so you can easily switch between checkpoints (e.g., `bert-base-uncased`, `camembert-base`, `roberta-base`) without changing the class names.  


In [None]:
# Import Auto classes
from transformers import AutoTokenizer, AutoModelForMaskedLM

# Load tokenizer and model using Auto classes (architecture-agnostic)
tokenizer = AutoTokenizer.from_pretrained("camembert-base")
model = AutoModelForMaskedLM.from_pretrained("camembert-base")
