# Analyzing the Tennis Corpus with Surprise
This demo is based on the [Tie-breaker paper](https://www.cs.cornell.edu/~liye/tennis.html) on gender-bias in sports journalism. We compare utterances to a language model using cross entropy, as implemented by the Surprise transformer.

In [1]:
import convokit
import json
import numpy as np
from collections import defaultdict
from convokit import Corpus, Speaker, Utterance, download, Surprise
from tqdm import tqdm

### Create corpus using tennis game language dataset

In [2]:
PATH = '/home/axl4' # replace with your path to tennis_data directory
data_dir = f'{PATH}/tennis_data/'

In [3]:
corpus_speakers = {'COMMENTATOR': Speaker(id = 'COMMENTATOR', meta = {})}

In [4]:
with open(data_dir + 'text_commentaries.json', 'r') as f:
    commentaries = json.load(f)

In [5]:
utterances = []
count = 0
for c in tqdm(commentaries):
    idx = 'c{}'.format(count)
    meta = {'player_gender': c['gender'], 'scoreline': c['scoreline']}
    utterances.append(Utterance(id=idx, speaker=corpus_speakers['COMMENTATOR'], conversation_id=idx, text=c['commentary'], meta=meta))
    count += 1

100%|██████████| 3962/3962 [00:00<00:00, 268510.28it/s]


In [6]:
game_language_corpus = Corpus(utterances=utterances)

### Load interview corpus

In [7]:
interview_corpus = Corpus(filename=download('tennis-corpus'))

Dataset already exists at /home/axl4/.convokit/downloads/tennis-corpus


In [8]:
interview_corpus.print_summary_stats()

Number of Speakers: 359
Number of Utterances: 163948
Number of Conversations: 81974


To help with the analysis, let's add a metadata attribute `'player_gender'` to each utterance that is a reporter question describing the gender of the player the question is posed to.

In [9]:
for utt in interview_corpus.iter_utterances(selector=lambda u: u.meta['is_question']):
    utt.add_meta('player_gender', utt.get_conversation().get_utterance(utt.id.replace('q', 'a')).get_speaker().meta['gender'])

## Part 1: How surprising is each interview question compared to the other questions?

For this demo, we want to train one model for the entire game language corpus, so we'll make our `model_key_selector` a function that returns the same key for every utterance in a corpus.

In [10]:
from nltk import word_tokenize

for utt in tqdm(list(game_language_corpus.iter_utterances())):
    utt.meta['joined_tokens'] = word_tokenize(utt.text)
    
game_language_corpus_size = game_language_corpus.get_utterances_dataframe()['meta.joined_tokens'].map(len).sum()

100%|██████████| 3962/3962 [00:01<00:00, 2952.79it/s]


In [11]:
game_language_corpus.get_utterances_dataframe()['meta.joined_tokens'].map(len).sum()

195457

In [12]:
surp = Surprise(model_key_selector=lambda utt: 'corpus', target_sample_size=10, context_sample_size=195000)

Since we just want to look at how surprising questions asked by reporters are, we'll fit the transformer just on utterances that are questions.

In [13]:
surp.fit(game_language_corpus, text_func=lambda utt: [' '.join([u.text for u in game_language_corpus.iter_utterances()])])

fit1: 3962it [00:00, 818854.46it/s]
fit2: 100%|██████████| 1/1 [00:01<00:00,  1.19s/it]


<convokit.surprise.surprise.Surprise at 0x7f1198db3640>

To speed up the demo, we'll select a random subset of interview questions to compute surprise scores for.

In [14]:
import itertools

subset_utts = [interview_corpus.get_utterance(utt) for utt in interview_corpus.get_utterances_dataframe(selector=lambda utt: utt.meta['is_question']).sample(250).index]
subset_corpus = Corpus(utterances=subset_utts)

Again we want to select only utterances that are questions to compute surprise for.

In [15]:
surp.transform(subset_corpus, obj_type='utterance', selector=lambda utt: utt.meta['is_question'])

transform: 250it [15:36,  3.74s/it]


<convokit.model.corpus.Corpus at 0x7f12068b6e20>

### Results
Let's take a look at the average surprise score for questions posed to female players compared to those posed to male players. We see that the average surprise score for questions posed to females is indeed higher which aligns with the results of the Tie-breaker paper.

In [16]:
utterances = subset_corpus.get_utterances_dataframe(selector=lambda utt: utt.meta['is_question'])

In [17]:
female_qs = utterances[utterances['meta.player_gender'] == 'F']['meta.surprise']
female_qs.dropna().median()

7.303531563431685

In [18]:
male_qs = utterances[utterances['meta.player_gender'] == 'M']['meta.surprise']
male_qs.dropna().median()

7.292927677371575

## Part 2: How surprising is a question compared to all questions posed to male players and all questions posed to female players?

Let's see how surprising questions are compared to questions posed to players of each gender. To do this, we'll want to make our `model_key_selector` return a key based on the player's gender. Recall that we added `'player_gender'` as a metadata field to each question earlier.

In [19]:
gender_models_surp = Surprise(model_key_selector=lambda utt: utt.meta['player_gender'], target_sample_size=10, context_sample_size=5000, surprise_attr_name='surprise_gender_model')

In [20]:
gender_models_surp.fit(interview_corpus, selector=lambda utt: utt.meta['is_question'])

fit1: 81974it [00:00, 357113.13it/s]
fit2: 100%|██████████| 2/2 [00:12<00:00,  6.20s/it]


<convokit.surprise.surprise.Surprise at 0x7f119ae34a90>

Since for each question, we want to compute surprise based on both the male interview questions model and the female interview questions model, we will use the `group_and_models` parameter for the `transform` function. Each utterance should belong to it's own group and be compared to both the `'M'` and `'F'` gender models. 

Since each utterance belongs to only one group, we want the surprise attribute keys to just correspond to the model. We use the `group_model_attr_key` parameter to define this. This attribute takes in a group name (which will be the utterance id) and a model key (which will be either 'M' or 'F') and returns the corresponding key that should be added to the surprise metadata. For this case, we simply return the model key.

In [21]:
gender_models_surp.transform(subset_corpus, obj_type='utterance', group_and_models=lambda utt: (utt.id, ['M', 'F']), group_model_attr_key=lambda _, m: m, selector=lambda utt: utt.meta['is_question'])

transform: 250it [03:01,  1.38it/s]


<convokit.model.corpus.Corpus at 0x7f12068b6e20>

### Results
Let's take a look at the surprise scores. We see that questions posed to a certain gendered player are on average more surprising when compared to all questions posed to the other gender. From this we can surmise that there may be some difference in the types of questions posed to each gender.

In [22]:
utterances = subset_corpus.get_utterances_dataframe(selector=lambda utt: utt.meta['is_question'])

In [23]:
utterances[utterances['meta.player_gender'] == 'F']['meta.surprise_gender_model'].map(lambda x: x['M']).dropna().mean()

5.8033361331749616

In [24]:
utterances[utterances['meta.player_gender'] == 'F']['meta.surprise_gender_model'].map(lambda x: x['F']).dropna().mean()

5.7581237622760035

In [25]:
utterances[utterances['meta.player_gender'] == 'M']['meta.surprise_gender_model'].map(lambda x: x['M']).dropna().mean()

5.818653104258929

In [26]:
utterances[utterances['meta.player_gender'] == 'M']['meta.surprise_gender_model'].map(lambda x: x['F']).dropna().mean()

5.854159276866871