# Let's try HuggingFace Transformers NLP Pipelines!


## Installing transformers

In [1]:
!pip install transformers



## Using Zero Shot Classification Pipeline

In [23]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

text = "I love hanging out in cafes, enjoying a cup of coffee in my free time."
candidate_labels = ["hobby", "personality", "entertainment"]
classifier(text, candidate_labels)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'I love hanging out in cafes, enjoying a cup of coffee in my free time.',
 'labels': ['hobby', 'entertainment', 'personality'],
 'scores': [0.7580190300941467, 0.1480296403169632, 0.09395135939121246]}

## Using Text Generation Pipeline

In [18]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("Once upon a time in a distant land,")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Once upon a time in a distant land, we heard of the old boy, now young, who was a good horse who'd rode all the way from Cape Breton to St. John's. He kept his hands out of the hands of the"}]

In [20]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "Once upon a time in a distant land,",
    max_length=50,
    num_return_sequences=3,
)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Once upon a time in a distant land, they were not so sure what would come. They didn't know what could happen in the future and would be waiting to see them.\nBut they hadn't heard anything before. The only thing they had"},
 {'generated_text': 'Once upon a time in a distant land, one man will have to go to sleep and find something to find. This will happen only with the help of a friend. Once you have decided all steps of the man, you will have to start doing'},
 {'generated_text': 'Once upon a time in a distant land, the first one of two ships to go along with Ipthac (another ship to go along with the other, to the sea where Ipthac is being piloted by a crew of a'}]

## Using Fill Mask Pipeline

In [26]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("I'll get some <mask> because im so stressing right now.", top_k=3)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.1335359811782837,
  'token': 3581,
  'token_str': ' sleep',
  'sequence': "I'll get some sleep because im so stressing right now."},
 {'score': 0.06626366078853607,
  'token': 18803,
  'token_str': ' pics',
  'sequence': "I'll get some pics because im so stressing right now."},
 {'score': 0.02399718388915062,
  'token': 1079,
  'token_str': ' rest',
  'sequence': "I'll get some rest because im so stressing right now."}]

##  Using Named Entity Recognition (NER) Pipeline

In [27]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Dimas and I student at Raja Ali Haji University in Indonesia.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.9986572,
  'word': 'Dimas',
  'start': 11,
  'end': 16},
 {'entity_group': 'ORG',
  'score': 0.98387563,
  'word': 'Raja Ali Haji University',
  'start': 34,
  'end': 58},
 {'entity_group': 'LOC',
  'score': 0.9987993,
  'word': 'Indonesia',
  'start': 62,
  'end': 71}]

## Using Question Answering Pipeline

In [30]:
from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
context = "Indonesia is the country in Southeast Asia, IKN is its capital city and its a new capital city of Indonesia"
question = "What is Indonesia capital city?"

result = qa_pipeline(question=question, context=context)
print(f"Jawaban: {result['answer']}")

Jawaban: IKN


## Using Sentiment Analysis Pipeline

In [33]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("I love using Hugging Face Transformers! It's so easy to use.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9974888563156128}]

## Using Summarization Pipeline

In [34]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    Artificial Intelligence (AI) refers to the simulation of human intelligence
    in machines programmed to think and learn like humans. The term is often used
    to describe machines or computers that can perform tasks typically requiring human
    intelligence, such as visual perception, speech recognition, decision-making, and language translation.
    AI can be classified into two main categories: narrow AI and general AI.
    Narrow AI, also known as weak AI, is designed and trained for specific tasks,
    such as facial recognition or internet searches. Examples include voice assistants
    like Siri and Alexa, as well as recommendation systems used by platforms like Netflix and Amazon.
    On the other hand, general AI, or strong AI, aims to perform any intellectual task
    that a human being can do. While this type of AI remains theoretical at this point,
    advancements in machine learning and deep learning are paving the way for developments
    that could eventually lead to general AI.
    The applications of AI are vast and include industries such as healthcare, finance,
    education, and transportation. For instance, in healthcare, AI systems can analyze
    medical data to assist in diagnosing diseases and predicting patient outcomes.
    In finance, AI algorithms are used for fraud detection and automated trading.
    As AI technology continues to evolve, ethical considerations regarding its impact
    on society and the workforce become increasingly important. Issues such as privacy,
    bias in AI algorithms, and the potential for job displacement are critical topics of
    discussion among policymakers, researchers, and industry leaders.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' Artificial Intelligence (AI) refers to the simulation of human intelligence in machines programmed to think and learn like humans . AI can be classified into two main categories: narrow AI and general AI . Narrow AI is designed and trained for specific tasks, such as facial recognition or internet searches . Strong AI aims to perform any intellectual task a human can do that a human being can do .'}]

## Using Translation Pipeline

In [38]:
from transformers import pipeline

translator = pipeline("translation_id_to_en", model="Helsinki-NLP/opus-mt-id-en")

text_to_translate = "Saya sangat suka kopi, saya minum kopi setiap hari"
result = translator(text_to_translate)

print(result[0]['translation_text'])

I love coffee so much, I drink coffee every day


## Analysis
- Zero-Shot Classification, memungkinkan model untuk mengklasifikasikan teks ke dalam kategori yang tidak pernah dilatih sebelumnya. Hasil klasifikasi mencerminkan relevansi teks dengan label yang diberikan.
- Text Generation, menggunakan model untuk melanjutkan teks atau menghasilkan konten baru berdasarkan prompt yang diberikan.
- Fill Mask, digunakan untuk mengisi kata yang hilang dalam kalimat. Ini sangat berguna untuk memperbaiki kalimat.
- Named Entity Recognition (NER), berfungsi untuk mengidentifikasi dan mengklasifikasikan entitas dalam teks, seperti nama orang, lokasi, dan organisasi.
- Question Answering, memungkinkan pengguna untuk mendapatkan jawaban langsung berdasarkan konteks yang diberikan.
- Sentiment Analysis, digunakan untuk menentukan perasaan atau opini yang terkandung dalam teks.
- Summarization, bertujuan untuk merangkum teks panjang menjadi versi yang lebih ringkas dan mudah dicerna.
- Translation, memungkinkan konversi teks dari satu bahasa ke bahasa lain.

Hugging Face Transformers menyediakan beragam pipeline yang memungkinkan pengguna untuk melakukan berbagai tugas pemrosesan bahasa alami (NLP) dengan mudah dan efektif. Setiap pipeline memiliki kekuatan dan aplikasi khusus, yang dapat dimanfaatkan untuk meningkatkan efisiensi dan akurasi dalam berbagai kasus.

