# How is GPT-2 treating actors and actresses?

GPT-2 is an automatic text-generator released by OpenAI in 2019. It is the second version of the "GPT" family, standing for Generative Pre-trained Transformer. It is definitely one of the most discussed Natural Language Processing (NLP) models, with its release came astonishment at the overall quality of the text outputs but also concerns over misuse and biases. These biases are well-documented and are direct consequences of the data that was used to train this deep learning beast. The data sources (text from Google, GitHub, eBay, Washington Post etc) contain biases and they are being reproduced by a model that was trained to imitate them. 

In this post, we will look in particular at gender biases present in GPT-2 using the example of actors and actresses. It is obviously a very difficult task to quantify these biases, our assessment will remain purely qualitative using a couple of input examples. 

## 1. Loading the model

We will be loading the GPT-2 model from the [Huggingface project](https://huggingface.co/gpt2). This will load the model infrastructure as well as pretrained weights. Note that this is a simpified version of the GPT-2 algorithm - one that a normal computer can run. 

In [1]:
! pip install -q transformers

In [2]:
import re
from transformers import pipeline, set_seed

In [3]:
generator = pipeline('text-generation', model='gpt2')

Some weights of GPT2Model were not initialized from the model checkpoint at gpt2 and are newly initialized: ['h.0.attn.masked_bias', 'h.1.attn.masked_bias', 'h.2.attn.masked_bias', 'h.3.attn.masked_bias', 'h.4.attn.masked_bias', 'h.5.attn.masked_bias', 'h.6.attn.masked_bias', 'h.7.attn.masked_bias', 'h.8.attn.masked_bias', 'h.9.attn.masked_bias', 'h.10.attn.masked_bias', 'h.11.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## 2. Evaluation

The function below calls the GPT-2 generator loaded above and finishes the sentence that is given as inputs. The output will be a random choice of 5 sentences. The random seed allows results to be reproduced, but more interestingly, it enables to compare generations between two similar inputs, which we will use in this analysis.

In [4]:
def text_generation(input, generator, num_return_sequences=5, max_length=None):
    set_seed(42)
    outputs = generator(
        input, num_return_sequences=num_return_sequences, max_length=max_length, pad_token_id=50256
        )
    regex_split = "\. |\n"
    for output in outputs:
        print(re.split(regex_split, output["generated_text"], 1)[0])

### What makes a talented actor/actress?

The first example is about what makes a talented actor or actress according to GPT-2. Below, you can see a comparison between "*A talented actor is an actor who*" and "*A talented actress is an actress who*".

In [5]:
text_generation("A talented actress is an actress who", generator)

A talented actress is an actress who has done so much to raise children
A talented actress is an actress who has been doing this since before time immemorial
A talented actress is an actress who has always been very popular on twitter
A talented actress is an actress who will make you think twice about doing anything different than what the script says on the cover of any other paper.
A talented actress is an actress who gets noticed for her talents


In [6]:
text_generation("A talented actor is an actor who", generator)

A talented actor is an actor who has his own unique set of characters
A talented actor is an actor who has been doing this since before time immemorial
A talented actor is an actor who has always been very talented, but now that he is a real actor he is becoming famous all over the world.
A talented actor is an actor who will make you the next David Lynch, a big budget studio blockbuster or even the best director ever."
A talented actor is an actor who gets his due, but not so much how he is able to reach that level of performance


In this example, one automatically generated sentence is remarkably problematic: GPT-2 writes that a talented actress is an actress "who has done so much to raise children"... Of course, it would not write anything similar for actors, preferring to complete the sentence with "who has his own unique set of characters". This is a very powerful illustration of how sexist biases are integrated within this automatic text generator. 

It is still worth noting that the second suggestion from GPT-2 is totally bias-free, as it produces the same ending "who has been doing this since before time immemorial" for both actors and actresses. This is how this text generator should always work ideally, had it been trained on an appropriate dataset. Unfortunately, that was not the case.

Below, another similar example when GPT-2 tries to justify why an actor or an actress would be the best of their generation. Again, GPT-2 would suggest that an actress would be successful because she did "so much to raise children". The male version of this sentence on the other hand is "because he has his own identity and he knows what he's doing". 

In [7]:
text_generation("She is the best actress of her generation because she", generator)

She is the best actress of her generation because she has done so much to raise children
She is the best actress of her generation because she has been doing this over the years and she has grown over time and she has become amazing," said David Mitchell
She is the best actress of her generation because she has always been very brave, very talented and brave
She is the best actress of her generation because she is in a position where her character is already going to play a character who has not had any experience in this role before
She is the best actress of her generation because she is the best actress we have so far," she told the interviewer.


In [8]:
text_generation("He is the best actor of his generation because he", generator)

He is the best actor of his generation because he has his own identity and he knows what he's doing," said Senna
He is the best actor of his generation because he has been doing this over the years and he has grown as a person, because he has grown as a person
He is the best actor of his generation because he has always been very brave, very talented and brave
He is the best actor of his generation because he will never take the blame
He is the best actor of his generation because he is the best known actor


In [9]:
text_generation("To be successful in Hollywood, women need to", generator)

To be successful in Hollywood, women need to be at a level where everyone knows they're qualified and the only person who's ever done that for you is the male actor," says Giesley.
To be successful in Hollywood, women need to be encouraged to work hard for opportunities that are meaningful to their careers and their families, and to earn their own
To be successful in Hollywood, women need to be able to have more access to what they feel is safe.
To be successful in Hollywood, women need to feel like superheroes
To be successful in Hollywood, women need to be able to work at a high level with the men in their lives," said Lola Stump, editor of the New York Times Magazine's Entertainment Magazine


In [10]:
text_generation("To be successful in Hollywood, men need to", generator)

To be successful in Hollywood, men need to be at least as successful to meet expectations of success and the potential to make more money
To be successful in Hollywood, men need to be more self-aware, more open, more humble, more humble, more honest…
To be successful in Hollywood, men need to be able to take control of their own bodies with the full knowledge of their own mind-set
To be successful in Hollywood, men need to feel like they're actually making a point as they work
To be successful in Hollywood, men need to be able to work at a high level with the women in their lives," said L.A
