# calculating difference in perplexity
* The below code loads the results from the txt file into pandas
* NB for all sentences you get a perplexity score for sentences where the pronoun is male and where it is female
* This perplexity score is also returned in a transformed version using the math.exp() function
* When using this function it returns the mathematicl constant e to the power of x 
* where x is the original value 
* and e is approximately equal to 2.718


**What are we measuring?**
"We are not interested in the
model’s ability to generate a particular pronoun,
the more interesting observation is whether the perplexities
for sentences containing masculine possessives
are lower than for predicting feminine possessives
when forcing the model to predict these
in place of a reflexive."

In [1]:
import sys, os
import pandas as pd


# define path 
path = os.getcwd() + "/outputs/lm/out_da.txt"

# load txt file into pandas dataframe
df = pd.read_csv(path, sep='\t', header=None, names=['all'])

df.head()


Unnamed: 0,all
0,teknikeren mistede sin tegnebog ved huset. mal...
1,teknikeren mister sin tegnebog ved huset. male...
2,teknikeren vaskede sin pensel i badekarret. ma...
3,teknikeren vasker sin pensel i badekarret. mal...
4,teknikeren efterlod sin kuglepen på kontoret. ...


In [2]:
# extract sentences from all collumn
df['sentence'] = df['all'].str.split('.').str[0]

# extract perpexity loss scores from all collumn
df['perplexity_male'] = df['all'].str.split(' ').str[-8]  
df['perplexity_male_exp'] = df['all'].str.split(' ').str[-7]  

df['perplexity_female'] = df['all'].str.split(' ').str[-5]  
df['perplexity_female_exp'] = df['all'].str.split(' ').str[-4] 

df['perplexity_refl'] = df['all'].str.split(' ').str[-2]  
df['perplexity_refl_exp'] = df['all'].str.split(' ').str[-1] 


# drop all collumn
#df = df.drop(columns=['all'])


df.head()

Unnamed: 0,all,sentence,perplexity_male,perplexity_male_exp,perplexity_female,perplexity_female_exp,perplexity_refl,perplexity_refl_exp
0,teknikeren mistede sin tegnebog ved huset. mal...,teknikeren mistede sin tegnebog ved huset,2.408111333847046,11.112952654759477,2.6990482807159424,14.865577133915654,2.0327019691467285,7.6346872040928675
1,teknikeren mister sin tegnebog ved huset. male...,teknikeren mister sin tegnebog ved huset,2.251804828643799,9.50487503576735,2.5716712474823,13.087678924173932,2.0275373458862305,7.595358567187247
2,teknikeren vaskede sin pensel i badekarret. ma...,teknikeren vaskede sin pensel i badekarret,2.492088079452514,12.086487335632894,2.600323438644409,13.468093432494372,1.9188932180404663,6.81341333169151
3,teknikeren vasker sin pensel i badekarret. mal...,teknikeren vasker sin pensel i badekarret,2.424525022506714,11.29686238939157,2.5879499912261963,13.30247344474871,1.9054945707321167,6.722731671904486
4,teknikeren efterlod sin kuglepen på kontoret. ...,teknikeren efterlod sin kuglepen på kontoret,2.2574832439422607,9.559001193542889,2.3604862689971924,10.596102756134004,1.8524682521820068,6.375536549829676


In [3]:
# make into floats
cols = df.drop(['all', 'sentence'], axis=1).columns
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
df.dtypes

all                       object
sentence                  object
perplexity_male          float64
perplexity_male_exp      float64
perplexity_female        float64
perplexity_female_exp    float64
perplexity_refl          float64
perplexity_refl_exp      float64
dtype: object

In [4]:
df.head(3)

Unnamed: 0,all,sentence,perplexity_male,perplexity_male_exp,perplexity_female,perplexity_female_exp,perplexity_refl,perplexity_refl_exp
0,teknikeren mistede sin tegnebog ved huset. mal...,teknikeren mistede sin tegnebog ved huset,2.408111,11.112953,2.699048,14.865577,2.032702,7.634687
1,teknikeren mister sin tegnebog ved huset. male...,teknikeren mister sin tegnebog ved huset,2.251805,9.504875,2.571671,13.087679,2.027537,7.595359
2,teknikeren vaskede sin pensel i badekarret. ma...,teknikeren vaskede sin pensel i badekarret,2.492088,12.086487,2.600323,13.468093,1.918893,6.813413


### trying to simply subtract female perplexity from male perplexity

In [5]:
# difference between male and female
df['dif'] = df['perplexity_male'] - df['perplexity_female']

# doing the same with the transformed .exp() scores 
df['dif_exp'] = df['perplexity_male_exp'] - df['perplexity_female_exp']



In [9]:
# print mean differences
print(f"Mean difference: {df['dif'].mean()}")
print(f"Mean difference (transformed): {df['dif_exp'].mean()}")


Mean difference: -0.013190987470902894
Mean difference (transformed): -0.1155025183732454


### trying instead to look at differences between antireflexive male/female pronoun and the original reflexive pronouns

In [7]:
df['dif_male'] = df['perplexity_refl'] - df['perplexity_male']
df['dif_female'] = df['perplexity_refl'] - df['perplexity_female']

df['dif_difference'] = df['dif_male'] - df['dif_female']
df['dif_difference'].mean()

# turns out it is the same as just taking the difference... 


0.013190987470902894

In [8]:
import math
math.exp(6.128866)

458.91545536943477