In [54]:
!pip install transformers datasets
!pip install torch



In [55]:
#The pipeline function

#The pipeline function includes all the preprocessing, the model and the post-processing all at once

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I've always wanted to learn about transformers and I am finally doing it")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9993451237678528}]

Several texts can be passed to the same pipeline

In [56]:
classifier = pipeline("sentiment-analysis")
classifier(["I've always wanted to learn about transformers and I am finally doing it",
            "I hate AI, it's so difficult and trendy that meakes me nauseous"
])

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9993451237678528},
 {'label': 'NEGATIVE', 'score': 0.9981325268745422}]

Another possible task is zero-shot-classification, you can select it in the pipeline and it allows you to select your own labels for the modeto inferred among.

In [57]:
classifier = pipeline("zero-shot-classification")
classifier("I've always wanted to learn about transformers and I am finally doing it",
            candidate_labels = ["I love AI", "I hate AI"]
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': "I've always wanted to learn about transformers and I am finally doing it",
 'labels': ['I love AI', 'I hate AI'],
 'scores': [0.7838669419288635, 0.21613304316997528]}

We can move on to another task like text generation.

In [58]:
generator = pipeline("text-generation")
generator("In this couorse, we will teach you how to use transformers")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this couorse, we will teach you how to use transformers.\n\n1.1. Transformer - The second step to mastering transformers is to figure out the right way to handle one.\n\nThis step is where you will'}]

Up unitl now we've used only the default models but we can use any model that has been pretrained. This way you can try, download and uplaod all different models on your own. We're gonna try for example the distilgpt2 to see how well does it generate text


In [59]:
generator = pipeline("text-generation", model="distilgpt2")
generator("In this couorse, we will teach you how to",
          max_length=30, #Maximum of text generated,
          num_return_sequences=2)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this couorse, we will teach you how to use and use a similar style.\n\n\n\nIt is not the first time this cou'},
 {'generated_text': 'In this couorse, we will teach you how to remove obstacles in your movement.\n\nIf any of you are familiar with the process, you'}]

Another possible task will be to perform the fill-mask, which is complete the missing words in a sentences or a text

In [60]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you about all <mask> models.", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.13647134602069855,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This course will teach you about all mathematical models.'},
 {'score': 0.0626971498131752,
  'token': 38163,
  'token_str': ' computational',
  'sequence': 'This course will teach you about all computational models.'}]

Another task would be Name entity recognition

In [61]:
# ner = pipeline("ner", grouped_entities=True)
# ner("My name is Juan and I work at Circular-lab in Madrid")

Another task could be question answering

In [63]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="Where do I work?",
    context="My name is Juan and I work at Circular-Lab in Madrid"
)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

{'score': 0.7590852379798889, 'start': 30, 'end': 42, 'answer': 'Circular-Lab'}

Another possible task would be text-summarization

In [64]:
summarizer = pipeline("summarization")

summarizer("""John Fitzgerald Kennedy (May 29, 1917 – November 22, 1963), often referred to as JFK, was an American politician who served as the 35th president of the United States from 1961 until his assassination in 1963. He was the youngest person elected president.[2] Kennedy served at the height of the Cold War, and the majority of his foreign policy concerned relations with the Soviet Union and Cuba. A Democrat, Kennedy represented Massachusetts in both houses of the U.S. Congress prior to his presidency.

Born into the prominent Kennedy family in Brookline, Massachusetts, Kennedy graduated from Harvard University in 1940, joining the U.S. Naval Reserve the following year. During World War II, he commanded PT boats in the Pacific theater. Kennedy's survival following the sinking of PT-109 and his rescue of his fellow sailors made him a war hero and earned the Navy and Marine Corps Medal, but left him with serious injuries. After a brief stint in journalism, Kennedy represented a working-class Boston district in the U.S. House of Representatives from 1947 to 1953. He was subsequently elected to the U.S. Senate, serving as the junior senator for Massachusetts from 1953 to 1960. While in the Senate, Kennedy published his book, Profiles in Courage, which won a Pulitzer Prize. Kennedy ran in the 1960 presidential election. His campaign gained momentum after the first televised presidential debates in American history, and he was elected president, narrowly defeating Republican opponent Richard Nixon, the incumbent vice president.""")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

[{'summary_text': ' John Fitzgerald Kennedy was the 35th president of the United States from 1961 to 1963 . He was the youngest person elected president . Kennedy served at the height of the Cold War, and the majority of his foreign policy concerned relations with the Soviet Union and Cuba . A Democrat, Kennedy represented Massachusetts in both houses of the U.S. Congress .'}]

TO finish this quick introduction the last task supported by Hugging face api is translation

In [65]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]



[{'translation_text': 'This course is produced by Hugging Face.'}]