# Let's try HuggingFace Transformers NLP Pipelines!


In [None]:
!pip install transformers



In [None]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "Artificial Intelligence will replace humans in the future",
    candidate_labels=["education", "need", "business", "technology"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'Artificial Intelligence will replace humans in the future',
 'labels': ['technology', 'need', 'business', 'education'],
 'scores': [0.9194765090942383,
  0.072819784283638,
  0.005830238573253155,
  0.0018734720069915056]}

In [None]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("Artificial Intelligence will replace humans in the future")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Artificial Intelligence will replace humans in the future, but they won't replace us.\n\nOn his YouTube channel, Google chief business officer Sundar Pichai (via Bloomberg) shared with us some more specifics regarding the future of AI -- the"}]

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "Artificial Intelligence will replace humans in the future",
    max_length=50,
    num_return_sequences=5,
)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Artificial Intelligence will replace humans in the future. And because we are smart, we should be smart enough, too.\n\n\n\nHere's how people will look after smart things, and they won't look after stupid things or have a bad"},
 {'generated_text': 'Artificial Intelligence will replace humans in the future.\n\n\n\n\nThe AI will be coming in a variety of different colors and colors with different levels of accuracy. The design of artificial intelligence, the AI, or even computers, is very simple'},
 {'generated_text': 'Artificial Intelligence will replace humans in the future. With more than half the world’s population, computers will grow to be able to learn what was said before.\n\n\n\n\nThe robots (like Siri, Siri, Cortana and other'},
 {'generated_text': 'Artificial Intelligence will replace humans in the future – it could become even bigger and will be capable of becoming the driving force for everything else.\n\n\n\n\n\n\nIn a blog post titled “A Deep Dive int

In [None]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("in the future Artificial Intelligence will replace <mask> human.", top_k=3)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.24218778312206268,
  'token': 143,
  'token_str': ' any',
  'sequence': 'in the future Artificial Intelligence will replace any human.'},
 {'score': 0.15689757466316223,
  'token': 7945,
  'token_str': ' ordinary',
  'sequence': 'in the future Artificial Intelligence will replace ordinary human.'},
 {'score': 0.10553957521915436,
  'token': 5,
  'token_str': ' the',
  'sequence': 'in the future Artificial Intelligence will replace the human.'}]

In [None]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("I am Arya Pratama Putra and I am a Student at the Batam State Polytechnic and I live in Indonesia and now I also work at Google.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.9965387,
  'word': 'Arya Pratama Putra',
  'start': 5,
  'end': 23},
 {'entity_group': 'ORG',
  'score': 0.9953444,
  'word': 'Batam State Polytechnic',
  'start': 50,
  'end': 73},
 {'entity_group': 'LOC',
  'score': 0.999736,
  'word': 'Indonesia',
  'start': 88,
  'end': 97},
 {'entity_group': 'ORG',
  'score': 0.99836177,
  'word': 'Google',
  'start': 121,
  'end': 127}]

In [None]:
from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
context = "Indonesia is an archipelagic country located on the Southeast Asian continent and has 38 provinces, one of which is the Riau Islands province which is located near the South China Sea."
question = "Where is Indonesia located?"

result = qa_pipeline(question=question, context=context)
print(f"Jawaban: {result['answer']}")

Jawaban: Southeast Asian continent


In [None]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("The product was delivered very quickly and the service was very good.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9998351335525513}]

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    Melakukan revisi terhadap kurikulum merdeka dengan memikirkan aspek aspek yang disebutkan diatas harus segera dilakukan agar tidak ketinggalan zaman dan dapat bersaing dengan negara-negara lainnya. Pendidikan adalah hak setiap individu dan tanggung jawab bersama. Kita semua memiliki peran dalam mewujudkan pendidikan yang berkualitas. Pemerintah perlu mengambil langkah-langkah konkret untuk memperbaiki kualitas pendidikan, sementara masyarakat, khususnya orang tua dan guru, perlu memberikan dukungan penuh terhadap upaya tersebut. Mari bersama-sama kita berinvestasi pada pendidikan untuk menciptakan generasi penerus bangsa yang cerdas, kreatif, dan berkarakter.Pentingnya peran orang tua: Orang tua memiliki peran yang sangat penting dalam memberikan dukungan dan motivasi kepada anak dalam meraih cita-citanya.Peran teknologi: Teknologi dapat menjadi alat yang sangat berguna dalam meningkatkan kualitas pembelajaran, namun perlu dimanfaatkan secara bijak.Keterlibatan masyarakat: Masyarakat luas juga memiliki peran penting dalam mendukung upaya peningkatan kualitas pendidikan, misalnya dengan memberikan donasi atau menjadi relawan.Pentingnya pendidikan karakter: Selain penguasaan ilmu pengetahuan, pendidikan karakter juga sangat penting untuk membentuk generasi muda yang berakhlak mulia. Mari bersama - sama wujudkan Indonesia Emas 2045


"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' Pendidikan adalah hak setiap individu dan tanggung jawab bersama . Kita semua memiliki peran dalam mewujudkan pendidikan yang berkualitas . Pendidikahan    dapat bersaing dengan negara-negara lainnya .'}]

In [None]:
from transformers import pipeline

translator = pipeline("translation_id_to_en", model="Helsinki-NLP/opus-mt-id-en")

text_to_translate = "Hari ini aku ingin makan nasi goreng dan minum es teh"
result = translator(text_to_translate)

print(result[0]['translation_text'])

Today I want to eat fried rice and have iced tea


# Analisis

- Zero-Shot Classification
Merupakan jenis klasifikasi dengan metode Zero-Shot yang dimana metode ini mengklasifikasi teks ke dalam label yang diberikan sesuai dengan kategorinya dan semakin mendekati 1 berarti semakin akurat, metode ini mencoba menghubungkan konsep dengan hubungan kata - kata
- Jawaban akan pertanyaan yang dijawab oleh model tergantung dari konteks yang kita masukkan semakin relevan konteks yang kita masukkin semakin akurat model memberikan jawaban
- generator ("text-generation") dapat melakukan fungsi untuk melatih model untuk menghasilkan prediksi kata - kata yang logis dengan teks yang telah kita masukkin sebelumnya
- distilgpt2 adalah model yang bisa menghasilkan kalimat lanjutan dari kalimat yang kita masukkin dan kalimat yang dihasilkan juga berbeda - beda dan disini saya juga bisa menentukan maksimun token yang akan dihasilkan dan berapa banyak kalimat yang mau dihasilkan
- unmasker = pipeline("fill-mask") model ini dapat memprediksi kata yang bisa mengisi kalimat yang kosong dalam contoh yang saya buat saya memprediksi 3 kata yang mungkin bisa mengisi kekosongan kalimat berikut "in the future Artificial Intelligence will replace <mask> human."
- ner adalah fungsi yang dapat mengidentifikasi entitas di dalam text yang saya masukkin lalu mengelompokkan jenis entitas yang terdapat di dalam text
- "question-answering" merupakan fungsi yang dapat menjawab pertanyaan dari context yang kita berikan dan jawabannya akan sesuai context yang dimasukkan
- Classifier ("sentiment-analysis") adalah fungsi yang dapat menganalisa sentimen dari kalimat yang saya masukkin dan hasilnya akan diklasifikasikan menjadi label positive ataupu negatif dan apabila scorenya mendekati 1 semakin kuat sentiment nya positive
- Summarizer merupakan fungsi untuk meringkas teks dari konteks teks yang kita masukkin
- Translator merupakan fungsi untuk menerjemahkan bahasa dari teks dan bahasanya bisa kita masukkin di sini saya coba menerjemahkan teks bahasa indonesia lalu ditampilkan ke bahasa inggris
