# Hugging Face <img src="figs/huggingface_logo-noborder.svg" width="50" title="https://huggingface.co/"/>

https://huggingface.co/course/

# 0: Setup

In [None]:
%pip install tensorflow

In [None]:
%pip install transformers

In [None]:
import transformers

# 1. Transformer Models

## What is NLP?

NLP is a field of linguistics and machine learning focused on understanding everything related to human language. The aim of NLP tasks is not only to understand single words individually, but to be able to understand the context of those words.

### Examples of NLP tasks:

- Classifying whole sentences

- Classifying each word in a sentence

- Generating text content

- Extracting an answer from a text

- Generating a new sentence from an input text

## Pipeline function (pipeline())

It connects a model with its necessary preprocessing and postprocessing steps, allowing us to directly input any text and get an intelligible answer

In [4]:
from transformers import pipeline

In [7]:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some layers from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english were not used when initializing TFDistilBertForSequenceClassification: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceCla

In [8]:
classifier(
    ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]
)

[{'label': 'POSITIVE', 'score': 0.9598046541213989},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

### Main pipelines steps:

1. The text is preprocessed into a format the model can understand.

2. The preprocessed inputs are passed to the model.

3. The predictions of the model are post-processed, so you can make sense of them.

### Available pipelines

- feature-extraction (get the vector representation of a text)

- fill-mask

- ner (named entity recognition)

- question-answering

- sentiment-analysis

- summarization

- text-generation

- translation

- zero-shot-classification

### Zero-shot classification

The zero-shot-classification pipeline is very powerful: it allows you to specify which labels to use for the classification, so you don’t have to rely on the labels of the pretrained model.
This pipeline is called zero-shot because you don’t need to fine-tune the model on your data to use it. It can directly return probability scores for any list of labels you want!

In [9]:
from transformers import pipeline

In [10]:
classifier = pipeline("zero-shot-classification")
classifier(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

No model was supplied, defaulted to roberta-large-mnli and revision 130fb28 (https://huggingface.co/roberta-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading config.json: 100%|██████████| 688/688 [00:00<00:00, 257kB/s]
Downloading tf_model.h5: 100%|██████████| 1.33G/1.33G [22:24<00:00, 1.06MB/s] 
All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

All the layers of TFRobertaForSequenceClassification were initialized from the model checkpoint at roberta-large-mnli.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForSequenceClassification for predictions without further training.
Downloading vocab.json: 100%|██████████| 878k/878k [00:01<00:00, 763kB/s] 
Downloading merges.txt: 100%|██████████| 446k/446k [00:01<00:00, 393kB/s]  
Downloading tokenizer.json: 100%|██████████| 1.29M/1.29M [00:02<00:00, 530kB/s] 


{'sequence': 'This is a course about the Transformers library',
 'labels': ['education', 'business', 'politics'],
 'scores': [0.9562344551086426, 0.026972182095050812, 0.01679334044456482]}

### Text generation

The main idea here is that you provide a prompt and the model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on many phones

In [None]:
from transformers import pipeline

In [None]:
generator = pipeline("text-generation")
generator("In this course, we will teach you how to")

#### Using any model from the Hub in a pipeline

In [11]:
generator = pipeline("text-generation", model="distilgpt2")
generator(
    "In this course, we will teach you how to",
    max_length=30,
    num_return_sequences=2,
)

Downloading config.json: 100%|██████████| 762/762 [00:00<00:00, 936kB/s]
Downloading tf_model.h5: 100%|██████████| 313M/313M [04:56<00:00, 1.11MB/s] 
All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at distilgpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.
Downloading vocab.json: 100%|██████████| 0.99M/0.99M [00:02<00:00, 462kB/s]
Downloading merges.txt: 100%|██████████| 446k/446k [00:01<00:00, 412kB/s]  
Downloading tokenizer.json: 100%|██████████| 1.29M/1.29M [00:02<00:00, 587kB/s] 
Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence


[{'generated_text': 'In this course, we will teach you how to create one of the most common tools on your network using your smartphone and an app built with Node.'},
 {'generated_text': 'In this course, we will teach you how to apply in a class, especially on special education.”'}]

### Mask filling

Fill in the blanks in a given text.

In [None]:
from transformers import pipeline

In [12]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading config.json: 100%|██████████| 480/480 [00:00<00:00, 696kB/s]
Downloading tf_model.h5: 100%|██████████| 465M/465M [07:28<00:00, 1.09MB/s] 
All model checkpoint layers were used when initializing TFRobertaForMaskedLM.

All the layers of TFRobertaForMaskedLM were initialized from the model checkpoint at distilroberta-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFRobertaForMaskedLM for predictions without further training.
Downloading vocab.json: 100%|██████████| 878k/878k [00:01<00:00, 714kB/s]  
Downloading merges.txt: 100%|██████████| 446k/446k [00:00<00:00, 679kB/s] 
Downloading tokenizer.json: 100%|██████████| 1.29M/1.29M [00:02<00:00, 662kB/s] 


[{'score': 0.19619688391685486,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you all about mathematical models.'},
 {'score': 0.040526993572711945,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you all about computational models.'}]

### Named entity recognition (NER)

Is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations. 

In [None]:
from transformers import pipeline

Here the model correctly identified that Sylvain is a person (PER), Hugging Face an organization (ORG), and Brooklyn a location (LOC).

In [None]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Sylvain and I work at Hugging Face in Brooklyn.")

### Question answering

Pipeline answers questions using information from a given context.

In [None]:
from transformers import pipeline

In [13]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Sylvain and I work at Hugging Face in Brooklyn",
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
Downloading config.json: 100%|██████████| 473/473 [00:00<00:00, 147kB/s]
Downloading tf_model.h5: 100%|██████████| 249M/249M [03:50<00:00, 1.13MB/s] 
Some layers from the model checkpoint at distilbert-base-cased-distilled-squad were not used when initializing TFDistilBertForQuestionAnswering: ['dropout_19']
- This IS expected if you are initializing TFDistilBertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceCl

{'score': 0.6949763894081116, 'start': 33, 'end': 45, 'answer': 'Hugging Face'}

### Summarization

Is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text.

In [None]:
from transformers import pipeline

In [None]:
summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

### Translation

You can use a default model if you provide a language pair in the task name, but the easiest way is to pick the model you want to use on the Model Hub.

In [None]:
from transformers import pipeline

In [14]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

Downloading config.json: 100%|██████████| 1.26k/1.26k [00:00<00:00, 486kB/s]


ValueError: Could not load model Helsinki-NLP/opus-mt-fr-en with any of the following classes: (<class 'transformers.models.auto.modeling_tf_auto.TFAutoModelForSeq2SeqLM'>, <class 'transformers.models.marian.modeling_tf_marian.TFMarianMTModel'>).

## Transformer Archtecture

The Transformer architecture was introduced in June 2017. The focus of the original research was on translation tasks.

- GPT (June 2018)

- BERT (October 2018)

- GPT-2 (February 2019)

- BART/T5 (October 2019)

- GPT-3 (May 2020)

Broadly, they can be grouped into three categories:

1. GPT-like (also called auto-regressive Transformer models)

2. BERT-like (also called auto-encoding Transformer models)

3. BART/T5-like (also called sequence-to-sequence Transformer models)

### Transformers are language models

- All the Transformer models mentioned above have been trained as language models. 

- This means they have been trained on large amounts of raw text in a self-supervised fashion.

- Self-supervised learning is a type of training in which the objective is automatically computed from the inputs of the model.

### Transformers are big models

- The general strategy to achieve better performance is by increasing the models’ sizes as well as the amount of data they are pretrained on.

- Training a model, especially a large one, requires a large amount of data. 

- This becomes very costly in terms of time and compute resources.

<img src="figs/carbon_footprint-dark.svg" width="1000" title="https://huggingface.co/"/>

### Transfer Learning

#### ***Pretraining***

- *Pretraining* is the act of training a model from scratch: the weights are randomly initialized, and the training starts without any prior knowledge.

- *Pretraining* is usually done on very large amounts of data. 

- It requires a very large corpus of data, and training can take up to several weeks.

<img src="figs/pretraining-dark.svg" width="700" title="https://huggingface.co/"/>

### ***Fine-tuning***

- *Fine-tuning* is the training done after a model has been pretrained.

- To perform *fine-tuning*, you first acquire a pretrained language model, then perform additional training with a dataset specific to your task.

- The pretrained model was already trained on a dataset that has some similarities with the fine-tuning dataset.

- The fine-tuning process is thus able to take advantage of knowledge acquired by the initial model during pretraining.

- Since the pretrained model was already trained on lots of data, the fine-tuning requires way less data to get decent results.

- The amount of time and resources needed to get good results are much lower.

<img src="figs/finetuning-dark.svg" width="700" title="https://huggingface.co/"/>