<a href="https://colab.research.google.com/github/cloudhood/learning-basics/blob/main/notebooks/HF_Chapter_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [Hugging Face Course - Chapter 1](https://huggingface.co/course/chapter1/1?fw=pt)

Common NLP tasks:


*   Classifying whole sentences: sentiment, spam/not spam, grammatically correct/incorrect, sentences logically related/not
*   Classifying words in sentences: grammatical component, named entities
* Generating text: completing a prompt, filling in blanks
* Extracting from text: Given a question and context, extract the answer based on contextual information.
* Generating sentences based on input text: Translation, text summarization

In [3]:
!pip install datasets transformers &> /dev/null

In [4]:
from transformers import pipeline

In [5]:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)


Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

In [6]:
 txt = ["I've been waiting for a HuggingFace course my whole life.", "I hate this so much!"]

In [7]:
classifier(txt)

[{'label': 'POSITIVE', 'score': 0.9598048329353333},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

In [11]:
classifier2 = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


Downloading:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [14]:
classifier2(
    "This is a course about the Transformers library",
    candidate_labels=["education", "politics", "business"],
)

{'labels': ['education', 'business', 'politics'],
 'scores': [0.8445987105369568, 0.11197440326213837, 0.043426960706710815],
 'sequence': 'This is a course about the Transformers library'}

In [17]:
classifier2(
    "Oracle's Database Dominance Eroded by Rise of Cloud-First Rivals",
    candidate_labels=["education", "politics", "business"],
)

{'labels': ['business', 'politics', 'education'],
 'scores': [0.9154168963432312, 0.051402222365140915, 0.0331808365881443],
 'sequence': 'Oracle’s Database Dominance Eroded by Rise of Cloud-First Rivals'}

In [18]:
txt2 = """
The move to the cloud is challenging the systems of the past. 
Newer providers are also making it much easier to adopt their technology directly, 
alleviating the need for corporate purchasers to negotiate large contracts with 
salespeople and allowing end users to more easily pick their own tools.
Offerings from the newer software makers can also be deployed without large
teams of database administrators that are typically needed to support 
Oracle’s products, a cost-saver for organizations that would otherwise 
have to fight against other businesses for these in-demand engineers.
"""


In [20]:
classifier2(txt2, candidate_labels=["education", "politics", "business"])

{'labels': ['business', 'education', 'politics'],
 'scores': [0.8273109793663025, 0.0881243422627449, 0.0845646858215332],
 'sequence': '\nThe move to the cloud is challenging the systems of the past. \nNewer providers are also making it much easier to adopt their technology directly, \nalleviating the need for corporate purchasers to negotiate large contracts with \nsalespeople and allowing end users to more easily pick their own tools.\nOfferings from the newer software makers can also be deployed without large\nteams of database administrators that are typically needed to support \nOracle’s products, a cost-saver for organizations that would otherwise \nhave to fight against other businesses for these in-demand engineers.\n'}

In [8]:
generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

In [9]:
generator("In this course, we will teach you how to")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to create, organize and maintain a secure web system using the OOP tools. Please read about the OOP Tools tutorials and learn more about the security and security features in this course.\n\nHow do'}]

In [10]:
generator("Sometimes I feel like")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Sometimes I feel like I got the message."\n\nBut the news that the NFL has placed an NFL player on a "disappointment reserve" would be a huge blow to the team.'}]

In [21]:
generator("The move to the cloud is challenging the systems of the past. ")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "The move to the cloud is challenging the systems of the past. \xa0As we've seen above, the cloud is much more expensive and very limited in scope of storage in today's large data centers. On the other hand cloud storage has become a"}]

In [22]:
generator2 = pipeline("text-generation", model="distilgpt2")
generator2(
    "In this course, we will teach you how to",
    max_length=30,
)

Downloading:   0%|          | 0.00/762 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/336M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, we will teach you how to create a simple program so this is simple to understand, and will also explain what this would look like'}]

In [23]:
unmasker = pipeline("fill-mask")
unmasker("This course will teach you all about <mask> models.", top_k=2)

No model was supplied, defaulted to distilroberta-base (https://huggingface.co/distilroberta-base)


Downloading:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/316M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

[{'score': 0.196198508143425,
  'sequence': 'This course will teach you all about mathematical models.',
  'token': 30412,
  'token_str': ' mathematical'},
 {'score': 0.040527332574129105,
  'sequence': 'This course will teach you all about computational models.',
  'token': 38163,
  'token_str': ' computational'}]

In [27]:
ner = pipeline("ner", grouped_entities=True)
ner("Vlaidmir Putin OJSC")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english)
  f'`grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="{aggregation_strategy}"` instead.'


[{'end': 19,
  'entity_group': 'ORG',
  'score': 0.796227,
  'start': 0,
  'word': 'Vlaidmir Putin OJSC'}]

In [33]:
ner("""
PJSC Sberbank (Public Joint-Stock Company Sberbank) is involved in obtaining a benefit from or
supporting the Government of Russia. PJSC Sberbank is Russia’s largest bank by assets
controlled, and offers a range of financial services to consumers and business clients. 
It is a highly significant entity in the Russian financial services sector, a sector of strategic
significance to the Government of Russia. The Government of Russia has a controlling share in PJSC Sberbank, 
meaning that PJSC Sberbank also carries on business as a Government of Russia-affiliated entity. 
(Phone number):+8 (800) 555-55-50 
(Emailaddress):media@Sberbank.ru 
(Typeofentity):(1) Bank (2) FinancialServices Company 
(Business RegNo):1027700132195
""")

[{'end': 14,
  'entity_group': 'ORG',
  'score': 0.9975168,
  'start': 1,
  'word': 'PJSC Sberbank'},
 {'end': 28,
  'entity_group': 'ORG',
  'score': 0.96061456,
  'start': 23,
  'word': 'Joint'},
 {'end': 51,
  'entity_group': 'ORG',
  'score': 0.90444696,
  'start': 29,
  'word': 'Stock Company Sberbank'},
 {'end': 124,
  'entity_group': 'ORG',
  'score': 0.5352233,
  'start': 122,
  'word': 'of'},
 {'end': 131,
  'entity_group': 'LOC',
  'score': 0.98288316,
  'start': 125,
  'word': 'Russia'},
 {'end': 146,
  'entity_group': 'ORG',
  'score': 0.9980531,
  'start': 133,
  'word': 'PJSC Sberbank'},
 {'end': 156,
  'entity_group': 'LOC',
  'score': 0.99936575,
  'start': 150,
  'word': 'Russia'},
 {'end': 319,
  'entity_group': 'MISC',
  'score': 0.9992052,
  'start': 312,
  'word': 'Russian'},
 {'end': 402,
  'entity_group': 'ORG',
  'score': 0.45284644,
  'start': 400,
  'word': 'of'},
 {'end': 409,
  'entity_group': 'LOC',
  'score': 0.9563299,
  'start': 403,
  'word': 'Russia'},

In [31]:
question_answerer = pipeline("question-answering")
question_answerer(
    question="Who is Vladimir Putin?",
    context="Director of Vladimir Putin OJSC",
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad (https://huggingface.co/distilbert-base-cased-distilled-squad)


{'answer': 'Director of Vladimir Putin OJSC',
 'end': 31,
 'score': 0.7999851107597351,
 'start': 0}

In [29]:
summarizer = pipeline("summarization")
summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

[{'summary_text': ' America has changed dramatically during recent years . The number of engineering graduates in the U.S. has declined in traditional engineering disciplines such as mechanical, civil,    electrical, chemical, and aeronautical engineering . Rapidly developing economies such as China and India continue to encourage and advance the teaching of engineering .'}]

In [30]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-fr-en")
translator("Ce cours est produit par Hugging Face.")

Downloading:   0%|          | 0.00/1.26k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/287M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/784k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/760k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.28M [00:00<?, ?B/s]



[{'translation_text': 'This course is produced by Hugging Face.'}]