In this notebook we'll see how the different type of NLP models in Hugging Face behave.  
feature-extraction (get the vector representation of a text)  
fill-mask  
ner (named entity recognition)  
question-answering  
sentiment-analysis  
summarization  
text-generation  
translation  
zero-shot-classification  


## Logging in HF

In [1]:
from huggingface_hub import login

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
login("")

## Import libraries and checking if torch is working well

In [4]:
import sys
import torch
import transformers

print("Python version:", sys.version)
print("Torch version:", torch.__version__)
print("Transformers version:", transformers.__version__)
print("Is torch available (torch):", torch.cuda.is_available())
print("Is torch available (transformers):", transformers.is_torch_available())

Python version: 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)]
Torch version: 2.5.1+cu118
Transformers version: 4.48.1
Is torch available (torch): True
Is torch available (transformers): True


## Learning pipeline

### Sentiment-analysis

In [6]:
classifier = transformers.pipeline("sentiment-analysis")
classifier("At the end of this course I'll be really good at NLP")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.999765932559967}]

### Classifier

In [7]:
classifier(
    ["I love Hugging Face because it's open source", "I hate Open AI because they are not sharing their models"]
)

[{'label': 'POSITIVE', 'score': 0.9997205138206482},
 {'label': 'NEGATIVE', 'score': 0.999228835105896}]

### Text Generator

In [8]:
generator = transformers.pipeline("text-generation")
generator("Learning using transformers will help me to ")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Learning using transformers will help me to \xa0learn to create transformers with little or no coding skills."'}]

### Classification

In [10]:
classifier = transformers.pipeline("zero-shot-classification")
classifier(
    "I love to play league of legends",
    candidate_labels=["education", "video gaming", "business"],
)


No model was supplied, defaulted to facebook/bart-large-mnli and revision d7645e1 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cuda:0


{'sequence': 'I love to play league of legends',
 'labels': ['video gaming', 'business', 'education'],
 'scores': [0.9800666570663452, 0.012204304337501526, 0.0077289557084441185]}

### Generation using a specific model

In [11]:
generator = transformers.pipeline("text-generation", model="distilgpt2")
generator(
    "If I improved my AI skills daily instead of playing video games I will",
    max_length=30,
    num_return_sequences=2,
)


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'If I improved my AI skills daily instead of playing video games I will get to take some fun experimenting with the AI in the new RPG.\n\n'},
 {'generated_text': 'If I improved my AI skills daily instead of playing video games I will be seeing some improvement in AI skills in my games :) I am just starting to'}]

### Fill mask

In [12]:
unmasker = transformers.pipeline("fill-mask")
unmasker("Obama became the president of US in <mask>", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision fb53ab8 (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the check

[{'score': 0.32234489917755127,
  'token': 2338,
  'token_str': ' 2009',
  'sequence': 'Obama became the president of US in 2009'},
 {'score': 0.15483367443084717,
  'token': 2266,
  'token_str': ' 2008',
  'sequence': 'Obama became the president of US in 2008'}]

### Name entity recognition (NER)

In [13]:
ner = transformers.pipeline("ner", grouped_entities=True)
ner("My name is Baptiste, I would like to work in Amsterdam")


No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'entity_group': 'PER',
  'score': np.float32(0.9994764),
  'word': 'Baptiste',
  'start': 11,
  'end': 19},
 {'entity_group': 'LOC',
  'score': np.float32(0.9990601),
  'word': 'Amsterdam',
  'start': 45,
  'end': 54}]

### Question Answering

In [14]:
question_answerer = transformers.pipeline("question-answering")
question_answerer(
    question="Where would I like to work?",
    context="My name is Baptiste, I would like to work in Amsterdam",
)


No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Device set to use cuda:0


{'score': 0.9355255961418152, 'start': 45, 'end': 54, 'answer': 'Amsterdam'}

### Summarization

In [15]:
summarizer = transformers.pipeline("summarization")
summarizer(
    """
    Maître Corbeau, sur un arbre perché,
    Tenait en son bec un fromage.
    Maître Renard, par l'odeur alléché,
    Lui tint à peu près ce langage :
    Et bonjour, Monsieur du Corbeau.
    Que vous êtes joli ! que vous me semblez beau !
    Sans mentir, si votre ramage
    Se rapporte à votre plumage,
    Vous êtes le Phénix des hôtes de ces bois.
    À ces mots, le Corbeau ne se sent pas de joie ;
    Et pour montrer sa belle voix,
    Il ouvre un large bec, laisse tomber sa proie.
    Le Renard s'en saisit, et dit : Mon bon Monsieur,
    Apprenez que tout flatteur
    Vit aux dépens de celui qui l'écoute.
    Cette leçon vaut bien un fromage, sans doute.
    Le Corbeau honteux et confus
    Jura, mais un peu tard, qu'on ne l'y prendrait plus.

"""
)


No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Device set to use cuda:0


[{'summary_text': " The Corbeau, sur un arbre perché, tenait en son bec un fromage . The Renard s'en saisit, and dit : Mon bon Monsieur, mais un peu tard, qu'on ne l'y prendrait plus ."}]

### Translation

In [16]:
translator = transformers.pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Device set to use cuda:0


[{'translation_text': 'This course is produced by Hugging Face.'}]