# Let's try HuggingFace Transformers NLP Pipelines!


In [38]:
!pip install transformers



In [39]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "NLP (Natural Language Processing) is a field within Artificial Intelligence (AI)",
    candidate_labels=["Machine learning", "Education", "Science"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'NLP (Natural Language Processing) is a field within Artificial Intelligence (AI)',
 'labels': ['Machine learning', 'Education', 'Science'],
 'scores': [0.5223644375801086, 0.263776957988739, 0.21385857462882996]}

In [40]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "Mathematics is very important for life.",
    candidate_labels=['education', 'Machine learning', 'Business', 'Money'],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'Mathematics is very important for life.',
 'labels': ['Money', 'education', 'Business', 'Machine learning'],
 'scores': [0.3753499984741211,
  0.2622103691101074,
  0.2253294289112091,
  0.13711023330688477]}

In [41]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "The openness of the economy to technological innovation can enhance productivity and global competitiveness.",
    candidate_labels=['Economic','Business','Tecnology'],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'The openness of the economy to technological innovation can enhance productivity and global competitiveness.',
 'labels': ['Economic', 'Business', 'Tecnology'],
 'scores': [0.628516674041748, 0.2487160861492157, 0.12276724725961685]}

In [42]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("In this course, you will gain experience")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In this course, you will gain experience in the design language and in making your own user interfaces based on the framework used to demonstrate a typical web app.\n\nIn this course you will learn about how to design a web app using Objective-C'}]

In [43]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "After I shower, I feel",
    max_length=50,
    num_return_sequences=2,
)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'After I shower, I feel warm, I feel relaxed, I feel warm\n\nI feel a warmth feeling I feel warm\nIt feels natural, it feels natural, it feels natural, it feels natural, it feels natural\nI feel warm,'},
 {'generated_text': 'After I shower, I feel refreshed, calm and fresh for our days.\n"I\'m feeling refreshed. I used to feel tired and hungry. I always wanted some of myself." I\'ve had my fair share of this, but it\'s not'}]

In [44]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator("In the Deep Learning Fundamentals lesson, Mr. Arifian will explain about")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In the Deep Learning Fundamentals lesson, Mr. Arifian will explain about his experience of learning that there are now thousands of tools for learning.\n\n\n\nHow to learn Deep Learning\nOne of these learning techniques for learning a topic'}]

In [45]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("This class covers all aspects of <mask> modeling techniques", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.17206403613090515,
  'token': 30412,
  'token_str': ' mathematical',
  'sequence': 'This class covers all aspects of mathematical modeling techniques'},
 {'score': 0.11831101030111313,
  'token': 27930,
  'token_str': ' predictive',
  'sequence': 'This class covers all aspects of predictive modeling techniques'}]

In [46]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Didik, and I am studying at the University of Papua.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.9988661,
  'word': 'Didik',
  'start': 11,
  'end': 16},
 {'entity_group': 'ORG',
  'score': 0.99770373,
  'word': 'University of Papua',
  'start': 43,
  'end': 62}]

In [47]:
from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
context = "Fotosintesis adalah proses di mana tumbuhan hijau mengubah cahaya matahari menjadi energi kimia dalam bentuk glukosa, menggunakan air dan karbon dioksida."
question = "Apa itu fotosintesis?"

result = qa_pipeline(question=question, context=context)
print(f"Jawaban: {result['answer']}")

Jawaban: adalah proses


In [48]:
from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
context = "Lionel Andrés Leo Messi Cuccitini lahir 24 Juni 1987) adalah pemain sepak bola profesional Argentina yang bermain sebagai penyerang untuk klub Major League Soccer, Inter Miami CF dan merupakan kapten timnas Argentina. Dianggap luas sebagai pemain terhebat sepanjang masa,[7][8][9][10] Messi telah memenangkan delapan penghargaan Ballon d'Or dan enam Sepatu Emas Eropa. Di tim nasional negara ia juga telah memenangkan 2 Copa América dan 1 Piala Dunia FIFA. Ia juga merupakan salah satu pemain dengan trofi terbanyak dalam sejarah sepak bola, dengan 45 trofi."
question = "Messi adalah?"

result = qa_pipeline(question=question, context=context)
print(f"Jawaban: {result['answer']}")

Jawaban: pemain sepak bola


In [49]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("The shirt is very nice; I'm interested in this one.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9998447895050049}]

In [50]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    Artificial Intelligence (AI) is a branch of computer science focused on creating systems capable of performing tasks that normally require human intelligence.
    This includes abilities like visual perception, speech recognition, decision-making, and language translation.
    AI systems achieve these tasks by processing large amounts of data and using algorithms to recognize patterns, allowing them to make predictions or solve problems.
    There are different types of AI, from rule-based systems to advanced machine learning, which is widely used in fields like healthcare, finance, robotics, and customer service.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Your max_length is set to 142, but your input_length is only 126. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=63)


[{'summary_text': ' Artificial Intelligence (AI) is a branch of computer science focused on creating systems capable of performing tasks that normally require human intelligence . This includes visual perception, speech recognition, decision-making, and language translation . AI systems achieve these tasks by processing large amounts of data and using algorithms to recognize patterns .'}]

In [51]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-id-en")

text_to_translate = "saya senang dan menikmati pembelajaran ini"
result = translator(text_to_translate)

print(result[0]['translation_text'])

I'm happy and enjoying this study.


Zero-Shot Classification

Mengklasifikasikan teks tanpa pelatihan sebelumnya pada label tertentu, yang berguna untuk klasifikasi fleksibel. Misalnya, kalimat tentang ekonomi dapat secara otomatis diberi label ekonomi, teknologi, atau bisnis.


Text Generation

Menghasilkan teks lanjutan dari masukan awal, cocok untuk membuat konten otomatis atau menyelesaikan kalimat.


Masked Language Modeling

Mengisi kata yang hilang, berguna untuk analisis konteks atau prediksi kata dalam kalimat.


Named Entity Recognition (NER)

Mengidentifikasi entitas penting dalam teks (misalnya, nama atau institusi), berguna untuk analisis data teks yang memerlukan detail entitas.


Question Answering

Menggunakan model bertanya-jawab untuk mengambil informasi dari konteks tertentu. Misalnya, menanyakan "Apa itu fotosintesis?" pada teks tentang fotosintesis akan menghasilkan jawaban relevan.


Sentiment Analysis

Menilai sentimen dari kalimat atau teks, membantu dalam analisis reaksi pengguna terhadap produk atau layanan.


Summarization

Merangkum teks panjang menjadi versi singkat, cocok untuk meringkas artikel atau dokumen panjang.


Translation

Menerjemahkan teks antar bahasa, menggunakan model yang dilatih untuk bahasa tertentu.