# Leaderboard Ranking

This notebook puts our best scores in context by comparing them to the leaderboard rankings for the competition.
Because the Kaggle competition has ended before we started working on it, we cannot submit our results to the leaderboard directly.
Instead, we downloaded the raw JSON data from the leaderboard page and parsed it to extract the relevant information.

In [68]:
import pandas as pd
import json

In [69]:
OUR_PUBLIC_SCORE = 0.340593
OUR_PRIVATE_SCORE = 0.400336

In [73]:
with open("kaggle-leaderboard.json", "r") as f:
  leaderboard = json.load(f)
  
public_leaderboard = pd.DataFrame(leaderboard["publicLeaderboard"]).astype({"displayScore": float}).drop(columns=["teamId", "submissionId", "medal", "inTheMoney"])
private_leaderboard = pd.DataFrame(leaderboard["privateLeaderboard"]).astype({"displayScore": float}).drop(columns=["teamId", "submissionId", "medal", "inTheMoney"])

total_participants = len(public_leaderboard)

In [74]:
our_public_rank = (public_leaderboard["displayScore"] < OUR_PUBLIC_SCORE).sum() + 1
our_private_rank = (private_leaderboard["displayScore"] < OUR_PRIVATE_SCORE).sum() + 1

print(f"Our public score rank: {our_public_rank} out of {total_participants}")
print(f"Our private score rank: {our_private_rank} out of {total_participants}")

Our public score rank: 1044 out of 2767
Our private score rank: 979 out of 2767


In [81]:
def get_score_groups(leaderboard_df):
  # group by score and count participants with the same score
  score_groups = leaderboard_df.groupby("displayScore").size().reset_index(name="Count")

  # get range of ranks for each score group
  score_groups["Rank_Start"] = score_groups["Count"].cumsum() - score_groups["Count"] + 1
  score_groups["Rank_End"] = score_groups["Count"].cumsum()
  return score_groups.sort_values(by="Count", ascending=False).reset_index(drop=True)

public_score_groups = get_score_groups(public_leaderboard)
private_score_groups = get_score_groups(private_leaderboard)

print(public_score_groups.head())
print()
print(private_score_groups.head())

   displayScore  Count  Rank_Start  Rank_End
0      0.299859    212         556       767
1      0.299361    138         397       534
2      0.438976    124        1662      1785
3      0.399485     77        1411      1487
4      0.307711     57         836       892

   displayScore  Count  Rank_Start  Rank_End
0      0.381614    179         597       775
1      0.513482    126        1653      1778
2      0.380968    108         460       567
3      0.473958     76        1388      1463
4      0.382100     68         789       856


In [88]:
public_score_groups = public_score_groups.sort_values(by="Rank_Start", ascending=True).reset_index(drop=True)
private_score_groups = private_score_groups.sort_values(by="Rank_Start", ascending=True).reset_index(drop=True)

our_public_rank_deduplicated = (public_score_groups["displayScore"] < OUR_PUBLIC_SCORE).sum() + 1
our_private_rank_deduplicated = (private_score_groups["displayScore"] < OUR_PRIVATE_SCORE).sum() + 1

print(f"Our public score rank (deduplicated): {our_public_rank_deduplicated} out of {len(public_score_groups)} (improvement of {our_public_rank - our_public_rank_deduplicated})")
print(f"Our private score rank (deduplicated): {our_private_rank_deduplicated} out of {len(private_score_groups)} (improvement of {our_private_rank - our_private_rank_deduplicated})")

Our public score rank (deduplicated): 497 out of 1544 (improved of 547)
Our private score rank (deduplicated): 501 out of 1608 (improved of 478)
