# **HuggingFace Transformers NLP Pipelines!**


> Library Installation



In [1]:
!pip install transformers



> Import Library

In [2]:
from transformers import pipeline



> a) Zero Shot Classification



In [17]:
classifier = pipeline("zero-shot-classification")
classifier(
    "The stock market saw a significant increase today, with tech stocks leading the way.",
    candidate_labels=["finance", "economy", "Investment", "Market"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


{'sequence': 'The stock market saw a significant increase today, with tech stocks leading the way.',
 'labels': ['Market', 'Investment', 'finance', 'economy'],
 'scores': [0.6858528256416321,
  0.14780130982398987,
  0.12047457695007324,
  0.045871272683143616]}

In [4]:
classifier = pipeline("zero-shot-classification")
classifier(
    "The government announced a new policy to tackle climate change and reduce carbon emissions.",
    candidate_labels=["environment", "politics", "climate change", "carbon emissions"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


{'sequence': 'The government announced a new policy to tackle climate change and reduce carbon emissions.',
 'labels': ['carbon emissions', 'climate change', 'environment', 'politics'],
 'scores': [0.4701637029647827,
  0.37425944209098816,
  0.1389905959367752,
  0.01658623479306698]}

In [5]:
classifier = pipeline("zero-shot-classification")
classifier(
    "The latest smartphone features a high-resolution camera, faster processor, and a sleek design.",
    candidate_labels=["technology", "device", "fashion", "health"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


{'sequence': 'The latest smartphone features a high-resolution camera, faster processor, and a sleek design.',
 'labels': ['technology', 'device', 'health', 'fashion'],
 'scores': [0.6615387797355652,
  0.32739749550819397,
  0.006408060435205698,
  0.004655580502003431]}



> b) Text Generation



In [18]:
generator = pipeline("text-generation")
generator("You are so beautiful. I love you, but")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'You are so beautiful. I love you, but it feels like I\'m in the middle of a dance."\n\n"We\'re a single family, right?"\n\n"No, I don\'t know how I can even explain it to you'}]

In [20]:
generator = pipeline("text-generation", model="distilgpt2")
generator(
    "You are so beautiful. I love you, but",
    max_length=30,
    num_return_sequences=2,
)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'You are so beautiful. I love you, but it\'s not my love."'},
 {'generated_text': 'You are so beautiful. I love you, but why am I here? I need you every night! I love you! Thank you!'}]

In [8]:
generator = pipeline("text-generation", model="distilgpt2")
generator("In the future, space exploration will allow humans to")

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'In the future, space exploration will allow humans to explore interstellar space and explore the future with little to no practical risk.\n\n\n\n"Space exploration is a technological breakthrough but it will enable life in deep space to survive and prosper. The future'}]



> c) Fill Mask



In [21]:
unmasker = pipeline("fill-mask")
unmasker("The <mask> festival attracted people from all over the world.", top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `P

[{'score': 0.07562767714262009,
  'token': 1013,
  'token_str': ' annual',
  'sequence': 'The annual festival attracted people from all over the world.'},
 {'score': 0.06016301363706589,
  'token': 930,
  'token_str': ' music',
  'sequence': 'The music festival attracted people from all over the world.'}]



> d) Named Entity Recognition (NER)



In [22]:
ner = pipeline("ner", grouped_entities=True)
ner("My name is Alif Naywa and I am undergraduate student at Yogyakarta State University")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Hardware accelerator e.g. GPU is

[{'entity_group': 'PER',
  'score': 0.99557364,
  'word': 'Alif Naywa',
  'start': 11,
  'end': 21},
 {'entity_group': 'ORG',
  'score': 0.9509055,
  'word': 'Yogyakarta State University',
  'start': 56,
  'end': 83}]



> e) Question Answering



In [23]:
qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
context = """
Indonesia adalah negara kepulauan terbesar di dunia dengan lebih dari 17.000 pulau.
Ibukota negara ini adalah Jakarta, dan bahasa resminya adalah Bahasa Indonesia.
Indonesia terkenal akan keanekaragaman budaya, bahasa, dan agamanya."""
question = "Berapa banyak pulau yang ada di Indonesia?"

result = qa_pipeline(question=question, context=context)
print(f"Jawaban: {result['answer']}")

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Jawaban: 17.000 pulau


In [24]:
qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
context =  """
Candi Borobudur adalah candi yang terletak di Magelang, Jawa Tengah, Indonesia.
Candi ini dibangun pada abad ke-8 oleh Dinasti Syailendra. Borobudur merupakan salah satu situs Warisan Dunia UNESCO dan merupakan candi Buddha terbesar di dunia.
"""
question = "Candi Borobudur terletak dimana?"

result = qa_pipeline(question=question, context=context)
print(f"Jawaban: {result['answer']}")

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Jawaban: Magelang




> f) Sentiment Analysis



In [25]:
classifier = pipeline("sentiment-analysis")
classifier("You are as beautiful as the day I lost you.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'label': 'POSITIVE', 'score': 0.9998743534088135}]



> g) Summarization



In [26]:
summarizer = pipeline("summarization")
summarizer(
    """
    Indonesia, yang terletak di Asia Tenggara, adalah negara kepulauan terbesar di dunia dengan lebih dari 17.000 pulau.
    Negara ini memiliki beragam budaya, bahasa, dan agama yang mencerminkan keragaman masyarakatnya.
    Indonesia adalah rumah bagi lebih dari 270 juta orang, menjadikannya sebagai negara berpenduduk terbesar keempat di dunia.
    Ibukota Indonesia adalah Jakarta, yang terletak di pulau Jawa. Selain itu, Indonesia dikenal dengan keindahan alamnya, termasuk hutan hujan tropis, gunung berapi, dan pantai yang menakjubkan.
    Masyarakat Indonesia terkenal dengan keramahtamahannya dan keanekaragaman kulinernya yang kaya, dengan berbagai hidangan tradisional yang beragam dari setiap daerah.

    Perekonomian Indonesia merupakan salah satu yang terbesar di Asia Tenggara dan didukung oleh berbagai sektor, termasuk pertanian, manufaktur, dan pariwisata.
    Pemerintah Indonesia berupaya untuk meningkatkan infrastruktur dan investasi asing untuk memperkuat pertumbuhan ekonomi.
    Namun, negara ini juga menghadapi berbagai tantangan, seperti masalah lingkungan, kemiskinan, dan ketidaksetaraan.
    Dengan berbagai inisiatif dan program, pemerintah dan masyarakat Indonesia berusaha untuk menciptakan masa depan yang lebih baik bagi generasi mendatang.
    Dalam beberapa tahun terakhir, sektor teknologi informasi dan komunikasi telah tumbuh pesat, dengan semakin banyaknya startup dan inovasi digital yang bermunculan di seluruh negeri.
    """
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


[{'summary_text': ' Indonesia adalah rumah bagi lebih dari 270 juta orang, menjadikannya sebagai negara berpenduduk terbesar keempat di dunia . Negara ini memiliki beragam budaya, bahasa, dan agama yang mencerminkan keragaman masyarakatnya .'}]



> h) Translation



In [27]:
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-id-en")

text_to_translate = "melihatmu senyummu yang manis, membuatku jatuh cinta"
result = translator(text_to_translate)

print(result[0]['translation_text'])

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


Seeing you smile sweetly, makes me fall in love


# Analisis

1. Zero Shot Classification

Zero-shot classification merupakan teknik yang sangat bermanfaat dalam klasifikasi teks. Output yang dianalisis menunjukkan bagaimana model dapat dengan efektif mengevaluasi teks dan memberikan label yang paling relevan berdasarkan skor kepercayaan.

2. Text Generation

Text generation merupakan alat yang mampu menciptakan teks baru yang relevan dengan input. Hasil yang dianalisis menunjukkan kemampuan model untuk menghasilkan kalimat yang ekspresif,

3. Fill Mask

Fill mask adalah teknik yang efektif dalam pemrosesan bahasa alami untuk memprediksi kata yang hilang dalam kalimat.Hasil analisis output menunjukkan kemampuan model untuk mengisi kata yang relevan berdasarkan konteks kalimat, meskipun skor yang dihasilkan menunjukkan tingkat kepercayaan yang bervariasi.

4. Named Entity Recognition (NER)

Named Entity Recognition (NER) merupakan alat dalam analisis teks yang memungkinkan identifikasi dan klasifikasi entitas kunci. Output yang dianalisis menunjukkan kemampuan model untuk secara akurat mengidentifikasi individu dan organisasi, serta memberikan skor kepercayaan yang tinggi pada klasifikasi tersebut.

5. Question Answering

Question answering adalah alat dalam pemrosesan bahasa alami yang memungkinkan pengguna untuk mendapatkan jawaban cepat dan akurat terhadap pertanyaan yang diajukan. Output yang dianalisis menunjukkan kemampuan sistem QA dalam memberikan jawaban numerik yang tepat.

6. Sentiment Analysis

Sentiment analysis adalah alat yang berguna dalam memahami emosi dan opini yang terdapat dalam teks. Sentiment analysis dapat digunakan dalam berbagai konteks termasuk pemasaran, analisis media sosial, dan umpan balik pelanggan.

7. Summarization

Summarization adalah alat untuk menyajikan informasi secara ringkas dan jelas. Output yang dianalisis menunjukkan bahwa model mampu menghasilkan ringkasan yang informatif dari teks yang panjang.

8. Translation

Translation adalah alat untuk mengubah suatu bahasa menjadi ke bahasa lain.