# Hugging Face <img src="figs/huggingface_logo-noborder.svg" width="50" title="https://huggingface.co/"/>

https://huggingface.co/course/

# 0: Setup

In [None]:
%pip install tensorflow

In [None]:
%pip install transformers

In [None]:
%pip install torch torchvision torchaudio

# 1. Transformer Models

## What is NLP?

NLP is a field of linguistics and machine learning focused on understanding everything related to human language. The aim of NLP tasks is not only to understand single words individually, but to be able to understand the context of those words.

### Examples of NLP tasks:

- Classifying whole sentences

- Classifying each word in a sentence

- Generating text content

- Extracting an answer from a text

- Generating a new sentence from an input text

## Pipeline function (pipeline())

It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer

In [None]:
from transformers import pipeline

In [None]:
classifier = pipeline("sentiment-analysis")

In [None]:
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

### Main pipelines steps:

1. The text is preprocessed into a format the model can understand.

2. The preprocessed inputs are passed to the model.

3. The predictions of the model are post-processed, so you can make sense of them.

### Available pipelines

- feature-extraction (get the vector representation of a text)

- fill-mask

- ner (named entity recognition)

- question-answering

- sentiment-analysis

- summarization

- text-generation

- translation

- zero-shot-classification

### Zero-shot classification

The zero-shot-classification pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model.
This pipeline is called zero-shot because you don’t need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want!

In [None]:
from transformers import pipeline

In [None]:
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

### Text generation

The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on many phones

In [None]:
from transformers import pipeline

In [None]:
generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

#### Using any model from the Hub in a pipeline

In [None]:
generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

### Mask filling

Fill in the blanks in a given text.

In [None]:
from transformers import pipeline

In [None]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

### Named entity recognition (NER)

Is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations. 

In [None]:
from transformers import pipeline

Here the model correctly identified that Sylvain is a person (PER), Hugging Face an organization (ORG), and Brooklyn a location (LOC).

In [None]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

### Question answering

Pipeline answers questions using information from a given context.

In [None]:
from transformers import pipeline

In [None]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

### Summarization

Is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text.

In [None]:
from transformers import pipeline

In [None]:
summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

### Translation

You can use a default model if you provide a language pair in the task name, but the easiest way is to pick the model you want to use on the Model Hub.

In [None]:
from transformers import pipeline
from transformers import TFAutoModelForSeq2SeqLM

In [None]:
model = TFAutoModelForSeq2SeqLM.from_pretrained('Helsinki-NLP/opus-mt-hy-en')
translator = pipeline("translation", model=model)
translator("Ce cours est produit par Hugging Face.")

## Transformer Archtecture

The Transformer architecture was introduced in June 2017. The focus of the original research was on translation tasks.

- GPT (June 2018)

- BERT (October 2018)

- GPT-2 (February 2019)

- BART/T5 (October 2019)

- GPT-3 (May 2020)

Broadly, they can be grouped into three categories:

1. GPT-like (also called auto-regressive Transformer models)

2. BERT-like (also called auto-encoding Transformer models)

3. BART/T5-like (also called sequence-to-sequence Transformer models)

### Transformers are language models

- All the Transformer models mentioned above have been trained as language models. 

- This means they have been trained on large amounts of raw text in a self-supervised fashion.

- Self-supervised learning is a type of training in which the objective is automatically computed from the inputs of the model.

### Transformers are big models

- The general strategy to achieve better performance is by increasing the models’ sizes as well as the amount of data they are pretrained on.

- Training a model, especially a large one, requires a large amount of data. 

- This becomes very costly in terms of time and compute resources.

<img src="figs/carbon_footprint-dark.svg" width="1000" title="https://huggingface.co/"/>

### Transfer Learning

#### ***Pretraining***

- *Pretraining* is the act of training a model from scratch: the weights are randomly initialized, and the training starts without any prior knowledge.

- *Pretraining* is usually done on very large amounts of data. 

- It requires a very large corpus of data, and training can take up to several weeks.

<img src="figs/pretraining-dark.svg" width="700" title="https://huggingface.co/"/>

### ***Fine-tuning***

- *Fine-tuning* is the training done after a model has been pretrained.

- To perform *fine-tuning*, you first acquire a pretrained language model, then perform additional training with a dataset specific to your task.

- The pretrained model was already trained on a dataset that has some similarities with the fine-tuning dataset.

- The fine-tuning process is thus able to take advantage of knowledge acquired by the initial model during pretraining.

- Since the pretrained model was already trained on lots of data, the fine-tuning requires way less data to get decent results.

- The amount of time and resources needed to get good results are much lower.

<img src="figs/finetuning-dark.svg" width="700" title="https://huggingface.co/"/>

## Transformes Architecture

### Encoder

- Receives an input and builds a representation of it (its features).

- The model is optimized to acquire understanding from the input.

### Decoder

- Uses the encoder’s representation (features) along with other inputs to generate a target sequence.

- The model is optimized for generating outputs.

<img src="figs/transformers_blocks-dark.svg" width="700" title="https://huggingface.co/"/>

- Each of these parts can be used independently, depending on the task:

1. **Encoder-only models**: Good for tasks that require understanding of the input, such as sentence classification and named entity recognition.

2. **Decoder-only models**: Good for generative tasks such as text generation.

3. **Encoder-decoder models** or **sequence-to-sequence models**: Good for generative tasks that require an input, such as translation or summarization.

### Attention layers

## Bias and Limitations

# 3.  Fine-Tuning a Pretrained Model

## Processing the data

In [None]:
import torch
from transformers import AdamW, AutoTokenizer, AutoModelForSequenceClassification

In [None]:
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
sequences = [
    "I've been waiting for a HuggingFace course my whole life.",
    "This course is amazing!",
]
batch = tokenizer(sequences, padding=True, truncation=True, return_tensors="pt")

In [None]:
batch["labels"] = torch.tensor([1, 1])
optimizer = AdamW(model.parameters())
loss = model(**batch).loss
loss.backward()
optimizer.step()