# Classroom 11 - Text Generation with HuggingFace transformers

In this class, we'll look at how we can easily use HuggingFace Transformers for both text generation and for text-to-text models like (m)T5.

As with many tasks, Huggingface makes the basic aspects of text generation extremely simple to implement.

You can read about HuggingFace's text generation pipelines [here](https://huggingface.co/tasks/text-generation).

In [None]:
!pip install transformers torch

## Text completion

We first import the ```pipeline``` class, intialize it with some parameters, and then we get generating!

You can find a full range of models which are available for text generation via the Huggingface model zoo [here](https://huggingface.co/models?pipeline_tag=text-generation).

Try experimenting with the following model architectures:

- GPT style
    - GPT-2, GPT-J-6B
- Meta AI's OPT 
    - OPT-350m, OPT-2.7B, etc
- BigScience BLOOM
    - BLOOM-560m, BLOOM 7B1, etc

__Questions__
- Do some perform better than others?
- How much of an impact does scale make on compute?
- Is the trade off worth in, in terms of compute time compared to quality of output?

In [None]:
from transformers import pipeline
generator = pipeline('text-generation', model = 'gpt2')

We give the model a text prompt, define the max length of tokens, and how many examples we want to be generated:

In [None]:
outputs = generator("Hello, I'm a language model", 
                    max_length = 50, 
                    num_return_sequences=5)

The outputs are then returned as a list of dictionaries:

In [None]:
outputs

And we can index specific examples like this:

In [None]:
outputs[0].get("generated_text")


## Text summarization

Summarization is the prototype text-to-text task, where we are taking an input sequence and trying to train model to accurately produce a summary of the contents. 

Again, with Huggingface, it's easy to experiment with exisitng text summarization models. Check out their overview [here](https://huggingface.co/tasks/summarization).

Once more, check out the model zoo for existing models which have been explicitly finetuned for text summarization tasks [here](https://huggingface.co/models?filter=summarization).



__Questions__
- What do you think of the summaries?
- How do finetuned models compare to non-finetuned models like T5?
- What do you think of the multilingual models like mt5?
- If you are a Danish speaker, try the danT5 model. How well does it perform?

In [None]:
from transformers import pipeline

summarizer = pipeline("summarization", model = "t5-base")

We then define a text that we want to summarise. I've chosen the first part of a recent news article in The Guardian - feel free to change this to whatever you want!

In [None]:
text = """ More than 200 Indonesian fruit pickers have sought diplomatic help since July after facing difficulties working in Britain this season, the nation’s embassy has revealed.

The Guardian has spoken to a pair of workers sent to a farm in Scotland that supplies berries to M&S, Waitrose, Tesco and Lidl. They claim pickers were sent back to the caravan if they could not work fast enough and left with large debts to repay.

The embassy says the true number of people experiencing problems is likely to be far higher, as many were seeking help on behalf of several workers at the same farms – and others would not have the confidence to approach the embassy.
Agung was expecting six months of well-paid farming work in Britain.

It says the most common reported problem is a lack of work at farms, especially for those who arrived very late in the season. Some did not start until the harvest was all but over, giving them little opportunity to repay debts incurred when they signed up.

"""

As before, generate out summaries using the ```summarizer``` we defined above, which returns a list of dictionaries:

In [None]:
summary = summarizer(text)

And we can get just the summary text:

In [None]:
summary[0].get("summary_text")

__Note__

Here we are only using pretrained models to generate or summarize text. We haven't looked at, for example, how we might train or finetune models on specific tasks. That would take a bit more time than we have available in this class.

If you want to dig into that in a bit more detail, HuggingFace offer many high-quality walkthroughs via [their public Github repo](https://github.com/huggingface/transformers).

In particular, check out the directory called [Notebooks](https://github.com/huggingface/transformers/tree/main/notebooks) and also the one called [Examples](https://github.com/huggingface/transformers/tree/main/examples). The former are more pedagogical and explain things step-by-step; the latter are more advanced examples of how to fine-tune models effectively using the kinds of Python scripting skills you've developed in this course

Happy coding!