# Basic BERT models

BERT models can be used on their own to perform missing word prediction.
This notebook show in the first part how this works with the original
BERT model from Google. Abstractions have evolved, therefore we use
`pipeline` to accomplish that.

Apart from predicting missing words, BERT-like models are also suitable
for classification. The second part in this notebook uses a text classification
pipeline to perform (multilingual) sentiment detection.

## BERT models are Masked-Language models

They can guess missing words depending on the context they appear in!

In [2]:
# Disable progress bars to avoid ipywidgets rendering issues
import os
os.environ['TRANSFORMERS_NO_ADVISORY_WARNINGS'] = 'true'
os.environ['HF_HUB_DISABLE_PROGRESS_BARS'] = '1'

In [3]:
from transformers import pipeline

In [4]:
unmasker = pipeline('fill-mask', model='bert-base-uncased')

[1mBertForMaskedLM LOAD REPORT[0m from: bert-base-uncased
Key                         | Status     |  | 
----------------------------+------------+--+-
bert.pooler.dense.bias      | UNEXPECTED |  | 
bert.pooler.dense.weight    | UNEXPECTED |  | 
cls.seq_relationship.weight | UNEXPECTED |  | 
cls.seq_relationship.bias   | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


In [5]:
unmasker("[MASK] is the capital of France.")

[{'score': 0.6210554242134094,
  'token': 3000,
  'token_str': 'paris',
  'sequence': 'paris is the capital of france.'},
 {'score': 0.0738355815410614,
  'token': 2009,
  'token_str': 'it',
  'sequence': 'it is the capital of france.'},
 {'score': 0.03622717037796974,
  'token': 17209,
  'token_str': 'toulouse',
  'sequence': 'toulouse is the capital of france.'},
 {'score': 0.03180911764502525,
  'token': 16766,
  'token_str': 'marseille',
  'sequence': 'marseille is the capital of france.'},
 {'score': 0.031000634655356407,
  'token': 10241,
  'token_str': 'lyon',
  'sequence': 'lyon is the capital of france.'}]

## Sentiment detection

As a simple example, BERT models can be used for sentiment detection.

Find more at https://huggingface.co/models?pipeline_tag=text-classification&sort=trending&search=sentiment


In [9]:
# Load the classification pipeline with the specified model
sentiment = pipeline("text-classification", 
                     model="nlptown/bert-base-multilingual-uncased-sentiment", 
                     top_k=None)

In [7]:
# Classify a new sentence
sentiment("I love O'Reilly online courses. I can learn a lot.")

[[{'label': '5 stars', 'score': 0.681428849697113},
  {'label': '4 stars', 'score': 0.29210031032562256},
  {'label': '3 stars', 'score': 0.02321084961295128},
  {'label': '2 stars', 'score': 0.0018246863037347794},
  {'label': '1 star', 'score': 0.0014352959115058184}]]

In [10]:
sentiment("I love O'Reilly online courses! I can learn a lot!")

[[{'label': '5 stars', 'score': 0.8062286376953125},
  {'label': '4 stars', 'score': 0.18060483038425446},
  {'label': '3 stars', 'score': 0.010735545307397842},
  {'label': '1 star', 'score': 0.0012509096413850784},
  {'label': '2 stars', 'score': 0.0011800869833678007}]]

In [12]:
sentiment("I have to work so hard to learn this LLM stuff.")

[[{'label': '2 stars', 'score': 0.4619256556034088},
  {'label': '1 star', 'score': 0.3368138372898102},
  {'label': '3 stars', 'score': 0.17987318336963654},
  {'label': '4 stars', 'score': 0.01453670859336853},
  {'label': '5 stars', 'score': 0.006850594189018011}]]

In [13]:
sentiment("O'Reilly Kurse gefallen mir sehr gut. Ich kann dabei viel lernen!")

[[{'label': '5 stars', 'score': 0.6538269519805908},
  {'label': '4 stars', 'score': 0.31999242305755615},
  {'label': '3 stars', 'score': 0.02167619951069355},
  {'label': '2 stars', 'score': 0.0022884332574903965},
  {'label': '1 star', 'score': 0.0022159903310239315}]]