# Notebook 5: BERT Theory (The Lego Bricks)

BERT is a 'smart' AI. It doesn't just look at words; it looks at **word pieces**. Think of this as breaking down a sentence into **Lego Bricks**.

### Why this is better:
- It can understand words it has never seen before by looking at their pieces.
- It understands that 'helpful' and 'helpless' share the same root brick ('help').

In [1]:
from transformers import AutoTokenizer

# Load the BERT 'Dictionary' (Tokenizer)
model_name = "distilbert-base-multilingual-cased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

print(f"BERT's dictionary is loaded. It knows 100,000+ different bricks!")

BERT's dictionary is loaded. It knows 100,000+ different bricks!


## 1. Breaking words into Bricks

Watch how BERT handles a fake word like **'ShopEase'** or a long word like **'supercalifragilistic'**.

In [2]:
text = "ShopEase is supercalifragilistic"
tokens = tokenizer.tokenize(text)

print(f"Original Sentence: {text}")
print(f"BERT's Lego Bricks: {tokens}")

Original Sentence: ShopEase is supercalifragilistic
BERT's Lego Bricks: ['Shop', '##E', '##ase', 'is', 'super', '##cali', '##fra', '##gil', '##istic']


## 2. Converting Bricks to Numbers

Every 'Brick' in BERT's kit has a specific ID number. This is what the AI actually 'calculates' inside its digital brain.

In [3]:
ids = tokenizer.convert_tokens_to_ids(tokens)

for brick, brick_id in zip(tokens, ids):
    print(f"Brick: {brick:<12} | ID: {brick_id}")

Brick: Shop         | ID: 44132
Brick: ##E          | ID: 11259
Brick: ##ase        | ID: 16896
Brick: is           | ID: 10124
Brick: super        | ID: 25212
Brick: ##cali       | ID: 106407
Brick: ##fra        | ID: 31162
Brick: ##gil        | ID: 32837
Brick: ##istic      | ID: 29025


## Summary
By using these 'WordPieces', BERT is much more flexible and smart than our simple Baseline model which only understands 'whole words'.