The Nobel Prize, established by Alfred Nobel in 1896, has been awarded annually since 1901 in fields like chemistry, literature, physics, medicine, economics, and peace. Winners receive not only prestige and prize money but also a gold medal bearing Nobel’s likeness.

This project explores a dataset of Nobel laureates from 1901 to 2023, sourced from the official Nobel Prize API. Using Python and pandas, I analyze trends such as gender distribution, country of origin, and repeat winners. Along the way, I’ll answer key questions and uncover interesting patterns in this historic data.

In [None]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Load the dataset
nobel = pd.read_csv("data/nobel.csv")

# Most common gender and birth country
top_gender = nobel['sex'].value_counts().index[0]
top_country = nobel['birth_country'].value_counts().index[0]

# Decade with highest ratio of US-born winners
nobel['us_born_winners'] = nobel['birth_country'] == 'United States of America'
nobel['decade'] = (np.floor(nobel['year'] / 10) * 10).astype(int)
us_winners_by_decade = nobel.groupby('decade', as_index=False)['us_born_winners'].mean()
max_decade_usa = us_winners_by_decade[us_winners_by_decade['us_born_winners'] == us_winners_by_decade['us_born_winners'].max()]['decade'].values[0]

# Decade and category with highest proportion of female winners
nobel['female_winner'] = nobel['sex'] == 'Female'
female_ratio = nobel.groupby(['decade', 'category'], as_index=False)['female_winner'].mean()
max_female_row = female_ratio[female_ratio['female_winner'] == female_ratio['female_winner'].max()]
max_female_dict = {
    max_female_row['decade'].values[0]: max_female_row['category'].values[0]
}

# First woman to win a Nobel Prize
female_winners = nobel[nobel['female_winner']]
earliest_female = female_winners.sort_values('year').iloc[0]
first_woman_name = earliest_female['full_name']
first_woman_category = earliest_female['category']

# Repeat winners
counts = nobel['full_name'].value_counts()
repeat_list = list(counts[counts >= 2].index)