# IMDB movie review text generation

Once you have fine-tuned your model you can test it interactively with this notebook.

In [7]:
from transformers import pipeline

path_to_model = "/scratch/project_462000450/data/mvsjober/gpt-imdb-model/checkpoint-65000/"
generator = pipeline("text-generation", model=path_to_model)

In [8]:
def print_output(output):
    for item in output:
        text = item['generated_text']
        text = text.replace("<br />", "\n")
        print('-', text)
        print()

In [9]:
output = generator("This movie was")
print_output(output)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


- This movie was not quite to great though with all the action and the plot and the special effects you will find in the video store. The movie wasn't much good though there was some very good acting. Also some pretty good CGI effects. This film



## Experiment with the generation strategy

You can play with the text generation if you wish. Text generation strategies are discussed here: https://huggingface.co/docs/transformers/generation_strategies

The `generator()` function has some parameters than can be tweaked:

> max_new_tokens: the maximum number of tokens to generate. In other words, the size of the output sequence, not including the tokens in the prompt. As an alternative to using the output’s length as a stopping criteria, you can choose to stop generation whenever the full generation exceeds some amount of time. To learn more, check StoppingCriteria.
> 
> num_beams: by specifying a number of beams higher than 1, you are effectively switching from greedy search to beam search. This strategy evaluates several hypotheses at each time step and eventually chooses the hypothesis that has the overall highest probability for the entire sequence. This has the advantage of identifying high-probability sequences that start with a lower probability initial tokens and would’ve been ignored by the greedy search.
> 
> do_sample: if set to True, this parameter enables decoding strategies such as multinomial sampling, beam-search multinomial sampling, Top-K sampling and Top-p sampling. All these strategies select the next token from the probability distribution over the entire vocabulary with various strategy-specific adjustments.
> 
> num_return_sequences: the number of sequence candidates to return for each input. This option is only available for the decoding strategies that support multiple sequence candidates, e.g. variations of beam search and sampling. Decoding strategies like greedy search and contrastive search return a single output sequence.

In [10]:
output = generator("This movie was awful because", num_return_sequences=4, max_new_tokens=100, do_sample=True)
print_output(output)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


- This movie was awful because it was poorly directed, terrible dialog in the title, and a lot of unnecessary and unnecessary background footage.

Why do they keep making this movie? Why they should hire one actor at every movie they work on and hire someone? What is it about those two characters you could not care less about? 

OK seriously, so they hire one of the good actors, the other one is a cop and the others act like they think they can be some kind

- This movie was awful because she was a little too far the part was the little girl in the background! and then the only time she ever did was when she looked at her mother and said "Don't see that girl" or whatever she said. she just kept repeating this and even less and just threw in an episode where the little girl was talking to the guy like "I'm a big sister and want my boyfriend to get some coffee before I see her or something like that." I would have given the episode 8/

- This movie was awful because of the bad dialog. Al

## Compare with the original model without fine-tuning

We can also load the original `distilgpt2` model and see how it would have worked without fine-tuning.

In [11]:
generator_orig = pipeline("text-generation", model='distilgpt2')

In [12]:
output = generator_orig("This movie was awful because", num_return_sequences=4, max_new_tokens=100, do_sample=True)
print_output(output)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


- This movie was awful because there was no sound.


What was it like to have this movie in one hand?
It looked like this was really bad.
The soundtrack to the movie had a ton of strange, weird sounds, and there was some real weird noises or things. It seemed like the sound was actually all that different during that movie.
So I was not impressed by the sound itself, so I just went through the first day and didn't see any such sound.
The game was just

- This movie was awful because I really don't have an idea what I could do. I had thought it'd work out and that's when I read this book. It has been a very nice little book, and definitely helps me to learn how it works so I just need to do the script again. I've got a lot of ideas in each part of the story, but I still haven't worked out it yet.


This review's a must for me so keep checking out. If you're reading it

- This movie was awful because it turned into an all-out affair. It went from a horrible one to an absolute disgrace. A