## Classify text with BERT

This tutorial contains complete code to fine-tune BERT to perform sentiment analysis on a dataset of plain-text IMDB movie reviews. In addition to training a model, you will learn how to preprocess text into an appropriate format.

In this notebook, you will:

- Load the IMDB dataset
- Load a BERT model from TensorFlow Hub
- Build your own model by combining BERT with a classifier
- Train your own model, fine-tuning BERT as part of that
- Save your model and use it to classify sentences


### Setup


In [49]:
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("The capital of France is [MASK].")


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'cls.seq_relationship.weight', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.4167883098125458,
  'token': 3000,
  'token_str': 'paris',
  'sequence': 'the capital of france is paris.'},
 {'score': 0.07141677290201187,
  'token': 22479,
  'token_str': 'lille',
  'sequence': 'the capital of france is lille.'},
 {'score': 0.06339266896247864,
  'token': 10241,
  'token_str': 'lyon',
  'sequence': 'the capital of france is lyon.'},
 {'score': 0.04444762319326401,
  'token': 16766,
  'token_str': 'marseille',
  'sequence': 'the capital of france is marseille.'},
 {'score': 0.03029726631939411,
  'token': 7562,
  'token_str': 'tours',
  'sequence': 'the capital of france is tours.'}]

In [50]:
from transformers import pipeline

# Charger un pipeline de génération de texte avec GPT-2
generator = pipeline('text-generation', model='gpt2')

# Demander une question
question = "What is the capital of France?"

# Générer une réponse
response = generator(question, max_length=50, num_return_sequences=1)

# Afficher la réponse générée
print(response[0]['generated_text'])


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


What is the capital of France?

The capital of France is Paris. This is the largest part of France; it's also the busiest in Europe. By the way, the capital was originally named after France's founder, Napoleon Bonaparte


In [52]:
question = "What is the capital of Egypt?"

# Générer une réponse
response = generator(question, max_length=50, num_return_sequences=1)

# Afficher la réponse générée
print(response[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


What is the capital of Egypt?

This is where we will use the value of all those years for building our cities. In Egypt we've never used capital when we didn't have money. And this gives us the resources we need to get


In [56]:
question1= 'Who is Victor Hugo'

reponse1 = generator(question1, max_length = 50, num_return_sequences = 1)

print(reponse1[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Who is Victor Hugo?

(From: The Amazing World of Harry Potter Fan Fic Archive)

Virgil: The Man Who Dived Down [The Man Who Saved the World]

(From: Harry Potter Book
