# Let's try HuggingFace Transformers NLP Pipelines!


In [1]:
!pip install transformers



In [24]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "I love playing Dress to Impress and exploring new styles, creating unique outfits, and competing in fashion challenges",
    candidate_labels=["art", "fashion game", "lifestyle"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'I love playing Dress to Impress and exploring new styles, creating unique outfits, and competing in fashion challenges',
 'labels': ['fashion game', 'lifestyle', 'art'],
 'scores': [0.9750106334686279, 0.013017186895012856, 0.011972175911068916]}

In [31]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "With stunning CGI and lifelike animations, The Wild Robot captures the beauty of the wilderness",
    candidate_labels=['technology', 'animated', 'nature'],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'With stunning CGI and lifelike animations, The Wild Robot captures the beauty of the wilderness',
 'labels': ['animated', 'technology', 'nature'],
 'scores': [0.7051416635513306, 0.16357360780239105, 0.13128474354743958]}

In [33]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "My sister cut her hair today",
    candidate_labels=['style'],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


{'sequence': 'My sister cut her hair today',
 'labels': ['style'],
 'scores': [0.9663320779800415]}

In [43]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("I will give you")

No model was supplied, defaulted to openai-community/gpt2 and revision 6c0e608 (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "I will give you a list of everything that is necessary to keep a person's life safe, secure, and healthy, and that will help to maintain family relationships and to have a good life. Please keep this list up to date.\n\nWhat"}]

In [56]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator(
    "Data science is",
    max_length=15,
    num_return_sequences=3,
)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Data science is the key to our understanding of physics and the physics of life'},
 {'generated_text': 'Data science is something we can do.'},
 {'generated_text': 'Data science is a community based media and science research that encompasses a diverse set'}]

In [60]:
from transformers import pipeline

generator = pipeline("text-generation", model="distilgpt2")
generator("Hello, I'm a student")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Hello, I'm a student at University of Ottawa. I'm a freshman at Stony Brook University, so I've been working part-time and part-time during my 12-month break, as there's no better time than today. When"}]

In [71]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("Most popular dishes in Indonesia is  <mask> rice." ,top_k=2)

No model was supplied, defaulted to distilbert/distilroberta-base and revision ec58a5b (https://huggingface.co/distilbert/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at distilbert/distilroberta-base were not used when initializing RobertaForMaskedLM: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.23465314507484436,
  'token': 16708,
  'token_str': ' fried',
  'sequence': 'Most popular dishes in Indonesia is fried rice.'},
 {'score': 0.053955551236867905,
  'token': 29693,
  'token_str': ' boiled',
  'sequence': 'Most popular dishes in Indonesia is boiled rice.'}]

In [74]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("My name is Giby and I work as a pizza delivery in PKXD.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.9954741,
  'word': 'Giby',
  'start': 11,
  'end': 15},
 {'entity_group': 'ORG',
  'score': 0.97593707,
  'word': 'PKXD',
  'start': 50,
  'end': 54}]

In [106]:
from transformers import pipeline

ner = pipeline("ner", grouped_entities=True)
ner("Hello, I'm Chitra. I live in Jakarta. I'm the Founder and Creative Director of Sejauh Mata Memandang.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER',
  'score': 0.98421085,
  'word': 'Chitra',
  'start': 11,
  'end': 17},
 {'entity_group': 'LOC',
  'score': 0.9989772,
  'word': 'Jakarta',
  'start': 29,
  'end': 36},
 {'entity_group': 'ORG',
  'score': 0.9679503,
  'word': 'Sejauh Mata Memandang',
  'start': 79,
  'end': 100}]

In [77]:
from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
context = "Dress to impress adalah  game video berdandan multipemain yang dikembangkan untuk platform game Roblox .."
question = "Apa itu dress to impress?"

result = qa_pipeline(question=question, context=context)
print(f"Jawaban: {result['answer']}")

Jawaban: adalah  game video berdandan


In [78]:
from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")
context = " Saya suka bermain PKXD. PKXD adalah sebuah game sosial open-world untuk membuat avatar, mendekorasi rumah, berinteraksi dengan pemain lain, dan berkunjung ke rumah mereka."
question = "PKXD adalah?"

result = qa_pipeline(question=question, context=context)
print(f"Jawaban: {result['answer']}")

Jawaban: sebuah game sosial


In [104]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("The customer service was terrible. They didn't help me at all and I had to return the product..")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'NEGATIVE', 'score': 0.9996756315231323}]

In [82]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("This dimsum is very delicious. I want to buy more")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'label': 'POSITIVE', 'score': 0.9997479319572449}]

In [84]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    In case you’re unfamiliar, Dress to Impress (or DTI) is a game, developed by teenaged
    Roblox users, with a pretty simple concept: players are given a theme, and have a limited
    amount of time to dress their Bratz-style avatar appropriately. Or destabilise the
    ‘Hot Mess’ category as a goblin in rags – the choice is theirs. “It’s just fun,” Bella*,
    20, tells Dazed. She first played the game late last year, and has played it on and
    off over the last few weeks. But she says that her friend, who is also 20, “must spend
    three plus hours a day on it... she loves the game”.

    Style games have existed forever, of course, from Y2K browser classics like Stardoll,
    to the paper dolls your grandma used to cut out of magazines. So why are we talking about
    DTI now? An even better question: why is the internet so obsessed with what is ostensibly
    a game for kids, with streamers, Instagram influencers, and literal celebs logging in to
    strut their stuff on the digital runway?
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'summary_text': ' Dress to Impress (or DTI) is a game developed by teenaged Roblox users . Players are given a theme, and have a limited amount of time to dress their avatar appropriately . Players have to destabilise the ‘Hot Mess’ category as a goblin in rags .'}]

In [90]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-id-en")

text_to_translate = "Loopy sangat imut dan lucu"
result = translator(text_to_translate)

print(result[0]['translation_text'])

Loopy's so cute and funny.


In [91]:
from transformers import pipeline

translator = pipeline("translation", model="Helsinki-NLP/opus-mt-id-fr")

text_to_translate = "Loopy sangat imut dan lucu"
result = translator(text_to_translate)

print(result[0]['translation_text'])


config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/295M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/804k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/818k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.31M [00:00<?, ?B/s]



Loopy est mignon et drôle.


Analisis Pipeline:

1. Topic Classification. mengklasifikasikan teks ke dalam kategori/label yang telah ditentukan. Algoritma yang dipakai adalah zero-shot-classification. ZSL akan memetakan konteks dari kategori baru ke kategori-kategori yang telah dipelajari dan akhirya  bisa mengenali kategori/topik  yang benar tanpa contoh eksplisit seperti teks yang telah diinput.

2. Text Generator. Melanjutkan kata dari kalimat yang diinput. Model akan memperediksi lanjutan kata yang mungkin cocok untuk melengkapi kalimat dengan mengaitkan topik apa yang berhubungan dengan isi konteks kalimat.

3. Fill-Mask. Memprediksi kata yang hilang dalam sebuah kalimat. Contoh : Jalan-jalan ke ___ adalah impianku dari kecil. Model mungkin akan memberi daftar negara/tempat untuk mengisi bagian yang hilang.

4. Name Entity Recognition (NER). Mengklasifikasikan entitas yang ada pada teks yang diinput. Misalnya seperti entitas person(subject), lokasi, kelompok(group) yang terdapat pada teks dan outputnya berupa informasi.

5. Question Answering. Model akan menjawab  pertayaan yang kita berikan dengan mengekstrak jawaban dari informasi berupa teks/konteks yang sebelumnya kita berikan.

6. Sentiment Analysis. Mengidentifikasi apakah teks yang kita input itu bersifat positif, negatif ataupun netral.

7. Summarization. Meringkas sebuah teks/informasi yang cukup banyak menjadi beberapa kalimat saja. Seperti merangkum sebuah artikel ataupun pertanyaan agar menjadi ringkas dan mendapat informasi dengan cepat.

8. Translation. Saya kira mirip dengan cara kerja google translate dimana kita menginput bahasa awal kemudian meminta untuk menerjemahkan ke bahasa asing seperti bahasa inggris, prancis dan lain-lain.

Dari beberapa pipeline yang ada, saya cukup tertarik dengan question answering karena sangat berguna jika nanti saya ingin melakukan riset dari dokumen -dokumen  yang ada dan mendapat jawaban dari model tanpa harus membaca seluruh isi dokumen yang pastinya cukup banyak.