The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

In [111]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Start coding here!

df = pd.read_csv('data/nobel.csv')
df.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France


In [112]:
# print(df['sex'].value_counts())
top_gender = df['sex'].mode().iloc[0]
print(top_gender)
top_country = df['birth_country'].mode().iloc[0]
print(top_country)



Male
United States of America


In [113]:
df.columns

Index(['year', 'category', 'prize', 'motivation', 'prize_share', 'laureate_id',
       'laureate_type', 'full_name', 'birth_date', 'birth_city',
       'birth_country', 'sex', 'organization_name', 'organization_city',
       'organization_country', 'death_date', 'death_city', 'death_country'],
      dtype='object')

In [114]:
df.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France


In [115]:
df['decade'] = (df['year'] // 10) * 10
df_clean = df.dropna(subset = ['birth_country', 'decade'])
# need df['US-born']: df_clean['birth_country'], ratio: df['US-born'] / the total number in that decade
# df_clean['birth_country'].unique()
us_countries = ['United States of America', 'USA']
df_clean['US_born'] = df_clean['birth_country'].isin(us_countries)

decade_stats = df_clean.groupby('decade').agg(
    total_winners=('birth_country', 'count'),
    us_born_winners=('US_born', 'sum')
).reset_index()
print(decade_stats)
# Calculate ratio
decade_stats['us_ratio'] = decade_stats['us_born_winners'] / decade_stats['total_winners']

# Find the decade with highest ratio
max_ratio_row = decade_stats.loc[decade_stats['us_ratio'].idxmax()]
max_decade_usa = int(max_ratio_row['decade'])
print(max_decade_usa)



    decade  total_winners  us_born_winners
0     1900             56                1
1     1910             38                3
2     1920             54                4
3     1930             55               14
4     1940             40               13
5     1950             71               21
6     1960             75               21
7     1970            103               33
8     1980             94               31
9     1990            101               42
10    2000            119               52
11    2010            117               39
12    2020             46               18
2000


In [116]:
df.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country,decade
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany,1900
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France,1900
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany,1900
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland,1900
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France,1900


In [117]:
# df_clean = df.dropna(subset=['sex', 'category'])
# df_clean['is_female'] = df_clean['sex'] == 'female'
# # print(df_clean['is_female'])
# grouped = df_clean.groupby(['decade','category']).agg(
#     total = ('is_female', 'count'),
#     female_count = ('is_female', 'sum')
# ).reset_index()
# grouped['female_proportion'] = grouped['female_count'] / grouped['total']


# print(grouped)

# # Find the combination with maximum female proportion
# max_idx = grouped['female_proportion'].idxmax()
# max_row = grouped.loc[max_idx]
# max_female_dict = {int(max_row['decade']): max_row['category']}
# print(max_female_dict)


# Read in the Nobel Prize data
nobel = pd.read_csv('data/nobel.csv')

# Store and display the most commonly awarded gender and birth country in requested variables
top_gender = nobel['sex'].value_counts().index[0]
top_country = nobel['birth_country'].value_counts().index[0]

print("\n The gender with the most Nobel laureates is :", top_gender)
print(" The most common birth country of Nobel laureates is :", top_country)

# Calculate the proportion of USA born winners per decade
nobel['usa_born_winner'] = nobel['birth_country'] == 'United States of America'
nobel['decade'] = (np.floor(nobel['year'] / 10) * 10).astype(int)
prop_usa_winners = nobel.groupby('decade', as_index=False)['usa_born_winner'].mean()

# Identify the decade with the highest proportion of US-born winners
max_decade_usa = prop_usa_winners[prop_usa_winners['usa_born_winner'] == prop_usa_winners['usa_born_winner'].max()]['decade'].values[0]

# # Optional: Plotting USA born winners
# ax1 = sns.relplot(x='decade', y='usa_born_winner', data=prop_usa_winners, kind="line")

# Calculating the proportion of female laureates per decade
nobel['female_winner'] = nobel['sex'] == 'Female'
prop_female_winners = nobel.groupby(['decade', 'category'], as_index=False)['female_winner'].mean()

# Find the decade and category with the highest proportion of female laureates
max_female_decade_category = prop_female_winners[prop_female_winners['female_winner'] == prop_female_winners['female_winner'].max()][['decade', 'category']]

# Create a dictionary with the decade and category pair
max_female_dict = {max_female_decade_category['decade'].values[0]: max_female_decade_category['category'].values[0]}
print(max_female_dict)



 The gender with the most Nobel laureates is : Male
 The most common birth country of Nobel laureates is : United States of America
{2020: 'Literature'}


In [118]:
df.head()


Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country,decade
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany,1900
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France,1900
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany,1900
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland,1900
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France,1900


In [119]:
# first_woman_name?
female_laureates = df[df['sex'] == 'Female']
female_laureates = female_laureates.sort_values('year')
# print(female_laureates.head())
first_woman_information = female_laureates.iloc[0]
# print(first_woman_information)
first_woman_name = first_woman_information.loc['full_name']
print(first_woman_name)
first_woman_category = first_woman_information.loc['category']
print(first_woman_category)

Marie Curie, née Sklodowska
Physics


In [120]:
df.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country,decade
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany,1900
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France,1900
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany,1900
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland,1900
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France,1900


In [121]:
prize_counts = df['full_name'].value_counts()
# print(prize_counts)

repeat_winners = prize_counts[prize_counts > 1].index.tolist()
# print(repeat_winners)

repeat_list = (repeat_winners)
# print(repeat_list)

for i, name in enumerate(repeat_list, 1):
    count = prize_counts[name]
    winner_data = df[df['full_name'] == name]
    
    # Get categories and years
    categories = winner_data['category'].unique()
    years = sorted(winner_data['year'].unique())
    
    print(f"\n{i}. {name}")
    print(f"   Prizes: {count}")
    print(f"   Categories: {', '.join(categories)}")
    print(f"   Years: {', '.join(map(str, years))}")
    
    # Check if organization (based on laureate_type if available)
    if 'laureate_type' in df.columns:
        types = winner_data['laureate_type'].unique()
        if 'organization' in types:
            print(f"   Type: Organization")
        else:
            print(f"   Type: Individual")


1. Comité international de la Croix Rouge (International Committee of the Red Cross)
   Prizes: 3
   Categories: Peace
   Years: 1917, 1944, 1963
   Type: Individual

2. Linus Carl Pauling
   Prizes: 2
   Categories: Chemistry, Peace
   Years: 1954, 1962
   Type: Individual

3. John Bardeen
   Prizes: 2
   Categories: Physics
   Years: 1956, 1972
   Type: Individual

4. Frederick Sanger
   Prizes: 2
   Categories: Chemistry
   Years: 1958, 1980
   Type: Individual

5. Marie Curie, née Sklodowska
   Prizes: 2
   Categories: Physics, Chemistry
   Years: 1903, 1911
   Type: Individual

6. Office of the United Nations High Commissioner for Refugees (UNHCR)
   Prizes: 2
   Categories: Peace
   Years: 1954, 1981
   Type: Individual


In [None]:
# Calculate the proportion of USA born winners per decade
nobel["usa_born_winner"] = nobel["birth_country"] == "United States of America"
nobel["decade"] = (np.floor(nobel["year"] / 10) * 10).astype(int)
prop_usa_winners = nobel.groupby("decade", as_index=False)["usa_born_winner"].mean()

# Identify the decade with the highest proportion of US-born winners
max_decade_usa = prop_usa_winners[
    prop_usa_winners["usa_born_winner"] == prop_usa_winners["usa_born_winner"].max()
]["decade"].values[0]

# Optional: Plotting USA born winners
ax1 = sns.relplot(x="decade", y="usa_born_winner", data=prop_usa_winners, kind="line")

# Calculating the proportion of female laureates per decade
nobel["female_winner"] = nobel["sex"] == "Female"
prop_female_winners = nobel.groupby(["decade", "category"], as_index=False)[
    "female_winner"
].mean()

# Find the decade and category with the highest proportion of female laureates
max_female_decade_category = prop_female_winners[
    prop_female_winners["female_winner"] == prop_female_winners["female_winner"].max()
][["decade", "category"]]

# Create a dictionary with the decade and category pair
max_female_dict = {
    max_female_decade_category["decade"]
    .values[0]: max_female_decade_category["category"]
    .values[0]
}

# Optional: Plotting female winners with % winners on the y-axis
ax2 = sns.relplot(
    x="decade", y="female_winner", hue="category", data=prop_female_winners, kind="line"
)

# Finding the first woman to win a Nobel Prize
nobel_women = nobel[nobel["female_winner"]]
min_row = nobel_women[nobel_women["year"] == nobel_women["year"].min()]
first_woman_name = min_row["full_name"].values[0]
first_woman_category = min_row["category"].values[0]
print(
    f"\n The first woman to win a Nobel Prize was {first_woman_name}, in the category of {first_woman_category}."
)

# Selecting the laureates that have received 2 or more prizes
counts = nobel["full_name"].value_counts()
repeats = counts[counts >= 2].index
repeat_list = list(repeats)

print("\n The repeat winners are :", repeat_list)