# Kiri Core Example: Text Generation

Generate text based on some provided input.

The default behavior here is that of a standard instance of GPT-2 -- it'll continue writing based on whatever context you've written.

Other generative models, such as T5, can be used as well. If you've trained a model, you can pass in the required tokenizer/model checkpoints and use generate for a variety of tasks.

## What's the deal with all the parameters?

Text generation has a *lot* of parameter options. Some tweaking may be needed for you to get optimal results for your use case. I'll cover what we make accessible, and how they can change generation. 

- `min_length`: Forces the model to continue writing until at least the supplied `min_length` is reached.
---
- `temperature`: Alters the probability distribution of the model's softmax. Raising this above 1.0 will lead to an increase in 'out there' token choices, that the model would ordinarily be less confident to select. Lowering it below 1.0 makes the distribution sharper, leading to 'safer' choices.
---
- `top_k`: Method of sampling in which the *K* most likely next words are identified, and the probability is redistributed among those *K*.
---
- `top_p`: Method of sampling in which a probability threshold *p* is chosen. The smallest possible set of words with a combined probability exceeding *p* are selected, and the probability is redistributed among that set.  
---
- `do_sample`: Determines whether or not sampling is performed.
---
- `repetition_penalty`: Adds a penalty to words that are present in the input context, and to words that are already included in the generated sequence.
---
- `length_penalty`: Penalty applied to the length of a generated sequence. Defaults to 1.0 (no penalty). Set to lower than 1.0 to get shorter sequences, or higher than 1.0 to get longer ones.
---
- `num_beams`: Number of beams to use in beam search. 

**Beam search** maintains `num_beams` different branches of word generation sequences, and returns the one with the highest overall probability. In practice, this is a way to ensure that the generator doesn't miss probable word sequences that may be obscured by an early low-probability word choice. 

Setting `num_beams` to 1 means no beam search will be used.

---
- `num_generations`: Number of times the generator will run on the given input. Will give you back a list of results from generation.
---


In [1]:
# If you've got one, change it here.
api_key = None

In [2]:
from kiri import Kiri

# No model specification needed for plain GPT-2.
if api_key:
    kiri = Kiri(api_key=api_key)
else:
    kiri = Kiri(local=True)

In [3]:
# The basic functionality, just picks up where you leave off.

kiri.generate("Geralt knew the signs: the monster was a")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


'Geralt knew the signs: the monster was a vampire that day; after the siege her companions'

### Supplying your own checkpoints

As mentioned, the default generator is GPT-2.

Let's try supplying another model -- one of Kiri's pretrained T5 models. I'll be using the same model that our Sentiment Detection and Summarization modules use.

In [4]:
summary_emote_model = "kiri-ai/t5-base-qa-summary-emotion"

# Initialise Kiri with our model checkpoint
kiri_t5 = Kiri(local=True, 
               generation_model="kiri-ai/t5-base-qa-summary-emotion")


# Our sentiment function automatically adds the 'emotion:' prefix.
# As we're accessing the generator directly, we need to do it.
input_text = """emotion: This food was just not good.
                Sorry, but you need to do better. 
                Really gross and undercooked."""

# We want 
kiri_t5.generate(input_text, do_sample=False)

'remorse, disappointment, sadness'