## Trying HuggingFace Transformers

Make sure you install the dependencies from `requirements.txt` before executing cells in this notebook.

In [1]:
from transformers import pipeline

Define the generator pipeline. In this case, use the `text2text` for NLP processing

In [2]:
generator = pipeline("text2text-generation", model="t5-base")

Downloading tf_model.h5:   0%|          | 0.00/851M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.

All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at t5-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.


Downloading spiece.model:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/1.32M [00:00<?, ?B/s]

For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


In [5]:
# Summarize
generator("summarize: Machine Learning in production environments is largely seen as the ultimate goal. Sometimes, deploying models can be difficult when automation is not part of the workflow. Creating a foundational process that is reliable and automated is complex and requires commitment from the team and the organizaton as a whole")

[{'generated_text': 'machine learning is a key to a successful production environment . deploying models can be'}]

In [6]:
# Sentiment
generator("sst2 sentence: Automation takes hard work but allows you to have a solid deployment")

[{'generated_text': 'positive'}]

In [20]:
# Questions
generator("question: Is deploying models into production hard?")

Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence


[{'generated_text': "question: Is deploying models into production hard? I'm not sure because the code already has one of the capabilities to do this yet but, as a developer, it can be difficult too.\n\nSkipper: The current version of Model class"}]

In [16]:
# Translation
generator("translate English to French: Automation takes hard work but allows you to have a solid deployment")

[{'generated_text': "L'automatisation exige beaucoup de travail, mais vous permet d'avoir un dé"}]

You can create other generation objects by calling in other models as well

In [21]:
gpt2_generator = pipeline("text-generation", model="gpt2")

All model checkpoint layers were used when initializing TFGPT2LMHeadModel.

All the layers of TFGPT2LMHeadModel were initialized from the model checkpoint at gpt2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


In [22]:
gpt2_generator("some phrase here was thought to be")

Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence


[{'generated_text': 'some phrase here was thought to be a joke, or a joke about a specific type of joke, but it wasn\'t. We decided to give it a shot and go with it, and it came out in a second."\n\n"That was'}]