# Analyzing the Tennis Corpus with Surprise
This demo is based on the [Tie-breaker paper](https://www.cs.cornell.edu/~liye/tennis.html) on gender-bias in sports journalism. We compare utterances to a language model using cross entropy, as implemented by the Surprise transformer.

In [1]:
import convokit
import json
import numpy as np
from collections import defaultdict
from convokit import Corpus, Speaker, Utterance, download, Surprise
from tqdm import tqdm

### Create corpus using tennis game commentary dataset
This dataset consists of a gender-balanced set of play-by-play commentaries from tennis matches.

In [2]:
PATH = '/home/axl4' # replace with your path to tennis_data directory
data_dir = f'{PATH}/tennis_data/'

In [3]:
corpus_speakers = {'COMMENTATOR': Speaker(id = 'COMMENTATOR', meta = {})}

In [4]:
with open(data_dir + 'text_commentaries.json', 'r') as f:
    commentaries = json.load(f)

In [5]:
utterances = []
count = 0
for c in tqdm(commentaries):
    idx = 'c{}'.format(count)
    meta = {'player_gender': c['gender'], 'scoreline': c['scoreline']}
    utterances.append(Utterance(id=idx, speaker=corpus_speakers['COMMENTATOR'], conversation_id=idx, text=c['commentary'], meta=meta))
    count += 1

100%|██████████| 3962/3962 [00:00<00:00, 267184.91it/s]


In [6]:
game_commentary_corpus = Corpus(utterances=utterances)

### Load interview corpus
This dataset contains transcripts from post-match press conferences.

In [7]:
interview_corpus = Corpus(filename=download('tennis-corpus'))

Dataset already exists at /home/axl4/.convokit/downloads/tennis-corpus


In [8]:
interview_corpus.print_summary_stats()

Number of Speakers: 359
Number of Utterances: 163948
Number of Conversations: 81974


To help with the analysis, let's add a metadata attribute `'player_gender'` to each utterance that is a reporter question describing the gender of the player the question is posed to.

In [9]:
for utt in interview_corpus.iter_utterances(selector=lambda u: u.meta['is_question']):
    utt.add_meta('player_gender', utt.get_conversation().get_utterance(utt.id.replace('q', 'a')).get_speaker().meta['gender'])

## Part 1: How surprising is each interview question based on typical language used to describe tennis?

For this demo, we want to train one model for the entire game language corpus, so we'll make our `model_key_selector` a function that returns the same key for every utterance in a corpus. We will use a custom tokenizer to convert to lowercase and remove punctuation. We will set the `context_sample_size` parameter to `None`, so that the entire game commentary corpus is used as the context.

In [10]:
from nltk import word_tokenize

def tokenizer(text):
    return list(filter(lambda w: w.isalnum(), word_tokenize(text.lower())))

surp = Surprise(model_key_selector=lambda utt: 'corpus', tokenizer=tokenizer, target_sample_size=10, context_sample_size=None, n_samples=3)

Since we just want to look at how surprising questions asked by reporters are, we'll fit the transformer just on utterances that are questions.

In [11]:
surp.fit(game_commentary_corpus, text_func=lambda utt: [' '.join([u.text for u in game_commentary_corpus.iter_utterances()])])

fit1: 3962it [00:00, 842304.85it/s]
fit2: 100%|██████████| 1/1 [00:01<00:00,  1.22s/it]


<convokit.surprise.surprise.Surprise at 0x7fdcdc3aeb20>

To speed up the demo, we'll select a random subset of interview questions to compute surprise scores for. To run the demo on the entire interview corpus, set `SAMPLE` to `False`.

In [12]:
import itertools

SAMPLE = True
SAMPLE_SIZE = 10000  # edit this to change the number of interview questions to calculate surprise for

subset_utts = [interview_corpus.get_utterance(utt) for utt in interview_corpus.get_utterances_dataframe(selector=lambda utt: utt.meta['is_question']).sample(SAMPLE_SIZE).index]
subset_corpus = Corpus(utterances=subset_utts) if SAMPLE else interview_corpus

Again we want to select only utterances that are questions to compute surprise for.

In [13]:
surp.transform(subset_corpus, obj_type='utterance', selector=lambda utt: utt.meta['is_question'])

transform: 10000it [31:05,  5.36it/s]


<convokit.model.corpus.Corpus at 0x7fdd91004ee0>

### Results
Let's take a look at the average surprise score for questions posed to female players compared to those posed to male players. Based on results from the Tie-breaker paper, we should expect to see a higher average surprise score for questions posed to female players. A higher average surprise would indicate that questions asked to female players tend to be more different from typical tennis language. This may mean that female players are being asked questions that are less relevant to tennis.

In [14]:
utterances = subset_corpus.get_utterances_dataframe(selector=lambda utt: utt.meta['is_question'])

In [15]:
import pandas as pd

female_qs = pd.to_numeric(utterances[utterances['meta.player_gender'] == 'F']['meta.surprise']).dropna()
female_qs.median()

7.1372781396723255

In [16]:
male_qs = pd.to_numeric(utterances[utterances['meta.player_gender'] == 'M']['meta.surprise']).dropna()
male_qs.median()

7.147981123495766

When running this demo multiple times, we see that sometimes the average surprise for female players is higher than male players, but sometimes it is lower. This may be due to the random sampling used by the Surprise transformer when selecting targets and contexts. Another possible explanation for the difference in results from the Tie-breaker paper may be that the paper used a bigram language model with modified Kneser-Nay smoothing. Our transformer currently only allows for unigram language models and add one Laplace smoothing. These differences may explain why we do not get the same statistically significant results as the paper.

Looking at the most and least surprising questions posed to each gender, we can see that the surprise scores assigned seem to make sense. The least surprising questions seem to relate well to the game of tennis while the most surprising focus on other things such as fashion choices or social lives.

In [17]:
sorted_female_qs = female_qs.sort_values().keys()
sorted_male_qs = male_qs.sort_values().keys()

In [18]:
for utt in sorted_female_qs[:5]:
    print(interview_corpus.get_utterance(utt).text)

And when was that in the match?  The first set?  Second set?
When she broke you in the eighth game of the third set, she did a backhand off the net and it kind of clipped the net and you kind of netted the next one. Was that just a tough break?
You started 3Love down in the first set. You came back and won it 64. What was the turnaround for you in the opening set and on through the match?
Would you give her a good chance against Stosur in the next round?
Do you enjoy the balance of the life as a tour player and then back home in and the ability to serve your country in the military?


In [19]:
for utt in sorted_male_qs[:5]:
    print(interview_corpus.get_utterance(utt).text)

And the second serve on the set point in the fourth set, just another day at the office?
Was it a big advantage to serve first in the third set?
But at the start of the third set again you had a little bit of a...
Speaking of the mental game, much is made of being the hunter or the hunted. For so long you were the hunted. This is the first week in a long time being the hunter. Is there a change at all in you?
How big of a deal was it get that break in the first game of the second set?


In [20]:
for utt in sorted_female_qs[-1:-6:-1]:
    print(interview_corpus.get_utterance(utt).text)

No yoga, you prefer to dance? Some players do yoga.
What aspects of the match do you think were decisive, technically speaking?
Did you hear the birds?  They were really crying.  They were trapped and --
Did Sasha get an invitation to Kris Humphries' wedding this weekend?
Are you primarily based in Southern California or South Florida now?


In [21]:
for utt in sorted_male_qs[-1:-6:-1]:
    print(interview_corpus.get_utterance(utt).text)

Are you planning to play tactically against James or Mathieu tomorrow?
Did you consider yourself a streaky player even in college?
You said you watched Scream last night to relax. Do you normally watch horror films to relax?
How do you view your secondround matchup with Bernard Tomic?
Just talk us through the messages on your kit bag.


## Part 2: How surprising is a question compared to all questions posed to male players and all questions posed to female players?

Let's see how surprising questions are compared to questions posed to players of each gender. To do this, we'll want to make our `model_key_selector` return a key based on the player's gender. Recall that we added `'player_gender'` as a metadata field to each question earlier.

In [22]:
gender_models_surp = Surprise(model_key_selector=lambda utt: utt.meta['player_gender'], target_sample_size=10, context_sample_size=5000, surprise_attr_name='surprise_gender_model')

In [23]:
gender_models_surp.fit(interview_corpus, selector=lambda utt: utt.meta['is_question'])

fit1: 81974it [00:00, 302952.81it/s]
fit2: 100%|██████████| 2/2 [00:12<00:00,  6.31s/it]


<convokit.surprise.surprise.Surprise at 0x7fdcf63e9d90>

Since for each question, we want to compute surprise based on both the male interview questions model and the female interview questions model, we will use the `group_and_models` parameter for the `transform` function. Each utterance should belong to it's own group and be compared to both the `'M'` and `'F'` gender models. 

Since each utterance belongs to only one group, we want the surprise attribute keys to just correspond to the model. We use the `group_model_attr_key` parameter to define this. This attribute takes in a group name (which will be the utterance id) and a model key (which will be either 'M' or 'F') and returns the corresponding key that should be added to the surprise metadata. For this case, we simply return the model key.

In [24]:
gender_models_surp.transform(subset_corpus, obj_type='utterance', group_and_models=lambda utt: (utt.id, ['M', 'F']), group_model_attr_key=lambda _, m: m, selector=lambda utt: utt.meta['is_question'])

transform: 10000it [2:02:06,  1.36it/s]


<convokit.model.corpus.Corpus at 0x7fdd91004ee0>

### Results
Let's take a look at the surprise scores. We see that questions posed to a certain gendered player are on average more surprising when compared to all questions posed to the other gender. From this we can surmise that there may be some difference in the types of questions posed to each gender.

In [25]:
utterances = subset_corpus.get_utterances_dataframe(selector=lambda utt: utt.meta['is_question'])

In [26]:
utterances[utterances['meta.player_gender'] == 'F']['meta.surprise_gender_model'].map(lambda x: x['M']).dropna().mean()

5.78670861966856

In [27]:
utterances[utterances['meta.player_gender'] == 'F']['meta.surprise_gender_model'].map(lambda x: x['F']).dropna().mean()

5.7477053372750335

In [28]:
utterances[utterances['meta.player_gender'] == 'M']['meta.surprise_gender_model'].map(lambda x: x['M']).dropna().mean()

5.784562889828235

In [29]:
utterances[utterances['meta.player_gender'] == 'M']['meta.surprise_gender_model'].map(lambda x: x['F']).dropna().mean()

5.81045743833415