![logo](https://github.com/donatellacea/DL_tutorials/blob/main/notebooks/figures/1128-191-max.png?raw=true)

# XAI in Deep Learning-Based Signal Analysis: Transformers

In this Notebook, we introduce the concept of transformers in machine learning, highlighting their significance in natural language processing (NLP) and Computer Vision (CV).

Tokenization is a fundamental step in text processing and natural language processing (NLP). It involves splitting text into smaller units, called tokens. Tokens are often words, but they can also be characters, subwords, or even sentences, depending on the level of tokenization. 

Example:
Text: "Natural Language Processing is fascinating."
Tokens: ["Natural", "Language", "Processing", "is", "fascinating"]

In [2]:
from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [8]:
raw_inputs = [
    "I love learning new things!",
    "This tutorial is super cool.",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)

{'input_ids': tensor([[  101,  1045,  2293,  4083,  2047,  2477,   999,   102,     0],
        [  101,  2023, 14924,  4818,  2003,  3565,  4658,  1012,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1]])}


Embeddings in the context of machine learning are a type of representation that allows words, phrases, or even entire sentences to be transformed into vectors of real numbers. The dimensions of this vector represent different features of the word, capturing its various semantic and syntactic properties. Unlike one-hot encoding which treats each word as independent, embeddings can capture the context and relationships between words, making them much more informative.

The concept of embedding is crucial because it provides a way to represent text in a format that machine learning algorithms can understand and process.