# Analyzing Home-Country Preference At The 2018 Olympics

This notebook looks at the overall home-country preference of judges at the 2018 Olympics. First, we calculate that home-country preference added about 4.0 points per performance (and that the result is statistically significant). Then we find the judge-performance combinations with the highest home-country preference.

In [1]:
import pandas as pd
from scipy import stats

## Load Judge and Score Data

In [2]:
all_judges = pd.read_csv("../data/judges.csv")
all_judges.head()

Unnamed: 0,clean_judge_name,country,judge_name,role,program,competition,clean_role,judge_country
0,Vladimir CUCHRAN,ISU,CUCHRAN Vladimir,Judge No.1,Ice Dance - Free Dance,Olympic Winter Games 2018,J1,SVK
1,Leanna CARON,ISU,CARON Leanna,Judge No.2,Ice Dance - Free Dance,Olympic Winter Games 2018,J2,CAN
2,Malgorzata SOBKOW,ISU,SOBKOW Malgorzata,Judge No.3,Ice Dance - Free Dance,Olympic Winter Games 2018,J3,POL
3,Kaoru TAKINO,ISU,TAKINO Kaoru,Judge No.4,Ice Dance - Free Dance,Olympic Winter Games 2018,J4,JPN
4,Sharon ROGERS,ISU,ROGERS Sharon,Judge No.5,Ice Dance - Free Dance,Olympic Winter Games 2018,J5,USA


### Load scoring data

See this repository's `README.md` file for more details about the source and structure of the scoring data.

In [3]:
performances = pd.read_csv("../data/performances.csv")
print("{:,} performances".format(len(performances)))

aspects = pd.read_csv("../data/judged-aspects.csv")
print("{:,} aspects".format(len(aspects)))

scores = pd.read_csv("../data/judge-scores.csv")
print("{:,} scores".format(len(scores)))

250 performances
3,405 aspects
30,546 scores


In [4]:
performances.head()

Unnamed: 0,performance_id,competition,program,name,nation,rank,starting_number,total_segment_score,total_element_score,total_component_score,total_deductions
0,a3f8fac157,Olympic Winter Games 2018,Ice Dance - Free Dance,LAURIAULT Marie-Jade / le GAC Romain,FRA,17,1,89.62,47.04,42.58,0.0
1,d727237592,Olympic Winter Games 2018,Ice Dance - Free Dance,MYSLIVECKOVA Lucie / CSOLLEY Lukas,SVK,20,2,82.82,41.65,41.17,0.0
2,93fe6322fa,Olympic Winter Games 2018,Ice Dance - Free Dance,LORENZ Kavita / POLIZOAKIS Joti,GER,16,3,90.5,46.78,43.72,0.0
3,cb67dacba3,Olympic Winter Games 2018,Ice Dance - Free Dance,MIN Yura / GAMELIN Alexander,KOR,19,4,86.52,44.61,41.91,0.0
4,b79025399c,Olympic Winter Games 2018,Ice Dance - Free Dance,AGAFONOVA Alisa / UCAR Alper,TUR,18,5,87.76,44.01,43.75,0.0


In [5]:
aspects.head()

Unnamed: 0,aspect_id,performance_id,section,aspect_num,aspect_desc,info_flag,credit_flag,base_value,factor,goe,ref,scores_of_panel
0,004e382688,648ff2cbff,elements,2.0,3Tw2,,,5.8,,0.8,,6.6
1,005bdf4588,5458eddc1d,elements,3.0,4STh,,,8.2,,-2.71,,5.49
2,0070f9cc40,c39eade62e,components,,Performance,,,,0.8,,,6.29
3,0071f2e3ae,cb67dacba3,components,,Interpretation of the Music/Timing,,,,1.2,,,7.25
4,007ae3fc4b,9e771ce55d,elements,3.0,3Tw4,,,6.6,,1.9,,8.5


In [6]:
scores.head()

Unnamed: 0,aspect_id,judge,score
0,004e382688,J1,1.0
1,004e382688,J2,1.0
2,004e382688,J3,1.0
3,004e382688,J4,2.0
4,004e382688,J5,1.0


In [7]:
judge_goe = pd.read_csv("../data/judge-goe.csv")
judge_goe.head()

Unnamed: 0,aspect_id,judge,judge_goe
0,004e382688,J1,0.7
1,004e382688,J2,0.7
2,004e382688,J3,0.7
3,004e382688,J4,1.4
4,004e382688,J5,0.7


In [8]:
scores_with_context = scores.pipe(
    pd.merge,
    aspects,
    on = "aspect_id",
    how = "left"
).pipe(
    pd.merge,
    performances,
    on = "performance_id",
    how = "left"
).pipe(
    pd.merge,
    judge_goe,
    on = [ "aspect_id", "judge" ],
    how = "left"
).assign(
    total_deductions = lambda x: x["total_deductions"].abs(),
    program_type = lambda x: x["program"]\
        .apply(lambda x: "short" if "SHORT" in x else "free")
)

In [9]:
assert len(scores) == len(scores_with_context)

In [10]:
scores_with_context.tail().T

Unnamed: 0,30541,30542,30543,30544,30545
aspect_id,fff0940a94,fff0940a94,fff0940a94,fff0940a94,fff0940a94
judge,J5,J6,J7,J8,J9
score,3,2,2,2,2
performance_id,7bfaa8fc93,7bfaa8fc93,7bfaa8fc93,7bfaa8fc93,7bfaa8fc93
section,elements,elements,elements,elements,elements
aspect_num,8,8,8,8,8
aspect_desc,CCoSp4,CCoSp4,CCoSp4,CCoSp4,CCoSp4
info_flag,,,,,
credit_flag,,,,,
base_value,3.5,3.5,3.5,3.5,3.5


## Calculate Total Points for Each Judge and Difference from the Mean

### Calculate the total number of points awarded for each aspect

The total score given by a judge is calculated differently for elements vs. components. Technical elements are scored by adding the base value of the element to the translated Grade of Execution. Artistic components are scored by multiplying the score the judge gave by a pre-determined factor. The function below does this math for both sections of each program.

In [11]:
def total_points(row):
    if row["section"] == "elements":
        return round(row["base_value"] + row["judge_goe"], 2)
    
    elif row["section"] == "components":
        return round(row["factor"] * row["score"], 2)
    
    else:
        print("Unknown section: {}".format(row["section"]))
        return None

In [12]:
scores_with_context["total_points"] = scores_with_context.apply(total_points, axis=1)

In [13]:
assert scores_with_context["total_points"].isnull().sum() == 0

### Calculate the total number of points awarded for each performance by each judge


After calculating the total points awarded for each aspect, it is possible to calculate the total score that a skater would have received from any individual judge. Points can be deducted from the final score for falls or other problems. These deductions are issued by the technical panel and are not the purview of any individual judge; still, we subtract them from the final score to get an accurate representation of how a judge scored the overall skate.

In [14]:
perf_judge_grps = scores_with_context.groupby([
    "performance_id",
    "judge"
])

In [15]:
len(perf_judge_grps)

2250

In [16]:
points_by_judge = pd.DataFrame({
    "points": perf_judge_grps["total_points"].sum(),
    "deductions": perf_judge_grps["total_deductions"].first(),
    "name": perf_judge_grps["name"].first(),
    "nation": perf_judge_grps["nation"].first(),
    "program": perf_judge_grps["program"].first(),
    "program_type": perf_judge_grps["program_type"].first(),
    "competition": perf_judge_grps["competition"].first()
}).reset_index()
points_by_judge["final_score"] = points_by_judge["points"] - points_by_judge["deductions"]

points_by_judge.head()

Unnamed: 0,performance_id,judge,competition,deductions,name,nation,points,program,program_type,final_score
0,00c3e17bd3,J1,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,134.33,Men Single Skating - Free Skating,free,128.33
1,00c3e17bd3,J2,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,130.63,Men Single Skating - Free Skating,free,124.63
2,00c3e17bd3,J3,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,135.83,Men Single Skating - Free Skating,free,129.83
3,00c3e17bd3,J4,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,134.13,Men Single Skating - Free Skating,free,128.13
4,00c3e17bd3,J5,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,134.63,Men Single Skating - Free Skating,free,128.63


In [17]:
print("Deductions occur in about {:.0f}% of scores:"\
    .format((points_by_judge["deductions"] > 0).mean() * 100))

points_by_judge["deductions"].astype(int).value_counts()

Deductions occur in about 26% of scores:


0    1656
1     450
2     117
3       9
6       9
4       9
Name: deductions, dtype: int64

### Calculate the total number of points awarded for each performance

In [18]:
perf_grps = points_by_judge.groupby(["performance_id"])

In [19]:
len(perf_grps)

250

In [20]:
perfs = pd.DataFrame({
    "total_points": perf_grps["final_score"].sum(),
    "total_judges": perf_grps.size()
}).reset_index()

perfs.head()

Unnamed: 0,performance_id,total_judges,total_points
0,00c3e17bd3,9,1150.57
1,011d526566,9,1488.81
2,01a3f2a443,9,990.0
3,01ef6a14a8,9,546.27
4,02c6b6bb2f,9,1290.5


In [21]:
points_with_comparison = pd.merge(
    points_by_judge,
    perfs,
    how = "left",
    on = "performance_id"
)

points_with_comparison.head()

Unnamed: 0,performance_id,judge,competition,deductions,name,nation,points,program,program_type,final_score,total_judges,total_points
0,00c3e17bd3,J1,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,134.33,Men Single Skating - Free Skating,free,128.33,9,1150.57
1,00c3e17bd3,J2,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,130.63,Men Single Skating - Free Skating,free,124.63,9,1150.57
2,00c3e17bd3,J3,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,135.83,Men Single Skating - Free Skating,free,129.83,9,1150.57
3,00c3e17bd3,J4,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,134.13,Men Single Skating - Free Skating,free,128.13,9,1150.57
4,00c3e17bd3,J5,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,134.63,Men Single Skating - Free Skating,free,128.63,9,1150.57


### Calculate the average points for each performance, excluding the given judge

`points_vs_avg` is the total number of points a judge scored the performance above or below the average score of all the remaining judges for that particular performance. It is the comparison point that we will use in all of the analyses moving forward.

In [22]:
points_with_comparison["avg_without_judge"] = points_with_comparison\
    .apply(lambda x: (x["total_points"] - x["final_score"]) / (x["total_judges"] - 1), axis=1)

In [23]:
points_with_comparison["points_vs_avg"] = points_with_comparison["final_score"] - \
    points_with_comparison["avg_without_judge"]

## Merge Judge Data with Score Data

In [24]:
judge_points = pd.merge(
    points_with_comparison,
    all_judges[[
        "program", "competition", "clean_judge_name", 
        "judge_country", "clean_role"
    ]],
    left_on=[ "program", "competition", "judge" ],
    right_on=[ "program", "competition", "clean_role" ],
    how="left"
).dropna(subset=["judge_country"])

judge_points.head()

Unnamed: 0,performance_id,judge,competition,deductions,name,nation,points,program,program_type,final_score,total_judges,total_points,avg_without_judge,points_vs_avg,clean_judge_name,judge_country,clean_role
0,00c3e17bd3,J1,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,134.33,Men Single Skating - Free Skating,free,128.33,9,1150.57,127.78,0.55,Yuriy GUSKOV,KAZ,J1
1,00c3e17bd3,J2,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,130.63,Men Single Skating - Free Skating,free,124.63,9,1150.57,128.2425,-3.6125,Lorrie PARKER,USA,J2
2,00c3e17bd3,J3,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,135.83,Men Single Skating - Free Skating,free,129.83,9,1150.57,127.5925,2.2375,Saodat NUMANOVA,UZB,J3
3,00c3e17bd3,J4,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,134.13,Men Single Skating - Free Skating,free,128.13,9,1150.57,127.805,0.325,Sakae YAMAMOTO,JPN,J4
4,00c3e17bd3,J5,Olympic Winter Games 2018,6.0,KVITELASHVILI Morisi,GEO,134.63,Men Single Skating - Free Skating,free,128.63,9,1150.57,127.7425,0.8875,Albert ZAYDMAN,ISR,J5


In [25]:
judge_points["skater_judge_same_country"] = (judge_points["nation"] == judge_points["judge_country"])

In [26]:
judge_points["program_type"] = judge_points["program"]\
        .apply(lambda x: "short" if "SHORT" in x.upper() else "free")

In [27]:
len(judge_points)

2250

## Analyze Overall Home-Country Preference

**Account for judge "generosity" by program type**

One reason that home-country preferences among groups of judges — e.g., all judges overall or for all judges representing an entire country — might appear is if the most generous-grading judges are over-represented among home-country judgements. Additionally, the range in scores is larger for "free" programs than "short" programs. So, below, we adjust each judge's "points versus the average" to account for their overall tendency to give scores higher or lower than the average, for both the free and short programs.

In [28]:
adj_df = judge_points\
            .groupby(["clean_judge_name", "program_type"])["points_vs_avg"]\
            .mean()\
            .to_frame()\
            .reset_index()

In [29]:
adj_df.head()

Unnamed: 0,clean_judge_name,program_type,points_vs_avg
0,Agita ABELE,free,0.676293
1,Agita ABELE,short,-0.287188
2,Albert ZAYDMAN,free,-1.519853
3,Albert ZAYDMAN,short,1.084801
4,Anastassiya MAKAROVA,free,3.16


In [30]:
judge_points_adj = pd.merge(
    judge_points,
    adj_df,
    on=["clean_judge_name", "program_type"],
    suffixes=["_overall", "_mean"]
)

In [31]:
judge_points_adj["adj_points_vs_avg"] = judge_points_adj["points_vs_avg_overall"] - \
    judge_points_adj["points_vs_avg_mean"]

In [32]:
print("""
In the dataset, there are {:,} performance-judge combinations 
in which the judge and skater(s) represent the *same* country.

There are {:,} performance-judge combinations in which the 
judge and skater(s) represent *different* countries.
""".format(
    judge_points["skater_judge_same_country"].sum(),
  (~judge_points["skater_judge_same_country"]).sum()
))


In the dataset, there are 138 performance-judge combinations 
in which the judge and skater(s) represent the *same* country.

There are 2,112 performance-judge combinations in which the 
judge and skater(s) represent *different* countries.



To examine whether an overall home-country preference exists in figure skating, we compare the scores given by judges to skaters from their own country to those they give skaters from other countries. Then we use a t-test to determine whether this difference is statistically significant. (It is.)

In [33]:
overall_point_diffs = judge_points_adj\
    .groupby("skater_judge_same_country")["adj_points_vs_avg"]\
    .mean()

print((
    "- Same-country point difference: {:.3f}\n"
    "- Other-country point difference: {:.3f}\n"
    "- Overall same-country preference: {:.1f}"
).format(
    overall_point_diffs[True],
    overall_point_diffs[False],
    overall_point_diffs[True] - overall_point_diffs[False]
))


- Same-country point difference: 3.649
- Other-country point difference: -0.238
- Overall same-country preference: 3.9


In [34]:
stats.ttest_ind(
    judge_points_adj[
        judge_points_adj["skater_judge_same_country"]
    ]["adj_points_vs_avg"],
    
    judge_points_adj[
        ~judge_points_adj["skater_judge_same_country"]
    ]["adj_points_vs_avg"],
    
    equal_var = False
)

Ttest_indResult(statistic=13.524990819755743, pvalue=7.095638950808801e-28)

## Examine `points_vs_avg` for Individual Performances

**Performances with the `points_vs_avg` ...**

... among all non-team performances:

In [35]:
judge_points[
    ~judge_points["program"].str.contains("Team")
]\
.sort_values("points_vs_avg", ascending=False)[[
    "name", "nation", "clean_judge_name", 
    "judge_country", "points_vs_avg", "program"
]].head(10)

Unnamed: 0,name,nation,clean_judge_name,judge_country,points_vs_avg,program
987,JIN Boyang,CHN,Weiguang CHEN,CHN,24.95,Men Single Skating - Free Skating
1573,HOCKE Annika / BLOMMAERT Ruben,GER,Deborah NOYES,AUS,12.75,Pair Skating - Free Skating
100,RIPPON Adam,USA,Lorrie PARKER,USA,11.5875,Men Single Skating - Free Skating
1558,CHEN Nathan,USA,Lorrie PARKER,USA,11.5,Men Single Skating - Free Skating
1011,MURAMOTO Kana / REED Chris,JPN,Kaoru TAKINO,JPN,10.775,Ice Dance - Free Dance
180,SAMOHIN Daniel,ISR,Yuriy GUSKOV,KAZ,10.7625,Men Single Skating - Free Skating
134,JIN Boyang,CHN,Weiguang CHEN,CHN,10.65625,Men Single Skating - Short Program
2115,VASILJEVS Deniss,LAT,Yuriy GUSKOV,KAZ,10.4375,Men Single Skating - Free Skating
810,KOLYADA Mikhail,OAR,Yuriy GUSKOV,KAZ,10.0,Men Single Skating - Free Skating
1452,KALISZEK Natalia / SPODYRIEV Maksym,POL,Kaoru TAKINO,JPN,9.9,Ice Dance - Free Dance


... among same-country judgments for short programs:

In [36]:
judge_points[
    (judge_points["program_type"] == "short") &
    (judge_points["skater_judge_same_country"])
]\
.sort_values("points_vs_avg", ascending=False)[[
    "name", "nation", "clean_judge_name", 
    "judge_country", "points_vs_avg", "program"
]].head(5)

Unnamed: 0,name,nation,clean_judge_name,judge_country,points_vs_avg,program
134,JIN Boyang,CHN,Weiguang CHEN,CHN,10.65625,Men Single Skating - Short Program
2198,NAZAROVA Alexandra / NIKITIN Maxim,UKR,Anastassiya MAKAROVA,UKR,6.275,Ice Dance - Short Dance
1660,MOORE-TOWERS Kirsten / MARINARO\nMichael,CAN,Jeff LUKASIK,CAN,5.9375,Pair Skating - Short Program
1827,ALEXANDROVSKAYA Ekaterina / WINDSOR\nHarley,AUS,Deborah NOYES,AUS,5.2625,Pair Skating - Short Program
1321,TANKOVA Adel / ZILBERBERG Ronald,ISR,Albert ZAYDMAN,ISR,5.0625,Team Event - Ice Dance Short Dance


... and among same-country judgments for free programs:

In [37]:
judge_points[
    (judge_points["program_type"] == "free") &
    (judge_points["skater_judge_same_country"])
]\
.sort_values("points_vs_avg", ascending=False)[[
    "name", "nation", "clean_judge_name", 
    "judge_country", "points_vs_avg", "program"
]].head(5)

Unnamed: 0,name,nation,clean_judge_name,judge_country,points_vs_avg,program
987,JIN Boyang,CHN,Weiguang CHEN,CHN,24.95,Men Single Skating - Free Skating
100,RIPPON Adam,USA,Lorrie PARKER,USA,11.5875,Men Single Skating - Free Skating
1558,CHEN Nathan,USA,Lorrie PARKER,USA,11.5,Men Single Skating - Free Skating
606,RIPPON Adam,USA,Lorrie PARKER,USA,10.9125,Team Event - Men Single Skating Free Skating
1011,MURAMOTO Kana / REED Chris,JPN,Kaoru TAKINO,JPN,10.775,Ice Dance - Free Dance


---

---

---