The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

In [54]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Reading in the Nobel Prize data
nobel = pd.read_csv('data/nobel.csv')

# Taking a look at the first several winners
nobel.head(n=6)

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France
5,1901,Physics,The Nobel Prize in Physics 1901,"""in recognition of the extraordinary services ...",1/1,1,Individual,Wilhelm Conrad Röntgen,1845-03-27,Lennep (Remscheid),Prussia (Germany),Male,Munich University,Munich,Germany,1923-02-10,Munich,Germany


In [55]:
# Display the number of (possibly shared) Nobel Prizes handed
# out between 1901 and 2016
display(len(nobel))

# Display the number of prizes won by male and female recipients.
display(nobel['sex'].value_counts())

# Display the number of prizes won by the top 10 nationalities.
nobel['birth_country'].value_counts().head(10)

1000

Male      905
Female     65
Name: sex, dtype: int64

United States of America    291
United Kingdom               91
Germany                      67
France                       58
Sweden                       30
Japan                        28
Canada                       21
Switzerland                  19
Netherlands                  19
Italy                        18
Name: birth_country, dtype: int64

In [56]:
# Finding the most commonly awarded gender.
top_gender = nobel['sex'].mode().iloc[0]

# Finding the most commonly awarded birth country.
top_country = nobel['birth_country'].mode().iloc[0]

print('The most commonly awarded gender: '+ top_gender, '\nThe most commonly awarded birth country: ' + top_country)

The most commonly awarded gender: Male 
The most commonly awarded birth country: United States of America


In [57]:
# Filter the DataFrame to include only US-born Nobel Prize winners
us_born = nobel[nobel['birth_country'] == 'United States of America']

# Convert the 'year' column to datetime format
nobel['year'] = pd.to_datetime(nobel['year'], format='%Y')

# Group the data by decade
grouped_by_decade = nobel.groupby(pd.Grouper(key='year', freq='10Y'))

# Drop US born rows with invalid dates (NaT values)
us_born = us_born.dropna(subset=['birth_date'])

# Group the filtered DataFrame by decade and count the number of US-born winners in each decade
us_born_winners_by_decade = us_born.groupby(nobel['year'].dt.year // 10 * 10).size()

# Group the DataFrame by decade and count the total number of winners in each decade
total_by_decade = nobel.groupby(nobel['year'].dt.year // 10 * 10).size()

# Calculate the ratio of US-born winners to total winners in each decade
ratio_by_decade = us_born_winners_by_decade / total_by_decade

# Find the decade with the highest ratio
max_decade_usa = ratio_by_decade.idxmax()

print("The decade with the highest ratio of US-born Nobel Prize winners to total winners:", max_decade_usa)


The decade with the highest ratio of US-born Nobel Prize winners to total winners: 2000


In [58]:
# Filter by female winners
female_winners = nobel[nobel['sex'] == 'Female']

nobel['year'] = pd.to_datetime(nobel['year'], format='%Y')

# Group the filtered DataFrame by decade and Nobel Prize category and count the number of female laureates in each combination
female_by_decade_category = female_winners.groupby([female_winners['year'].dt.year // 10 * 10, 'category']).size()

# Group the original DataFrame by decade and Nobel Prize category and count the total number of laureates in each combination
total_by_decade_category = nobel.groupby([nobel['year'].dt.year // 10 * 10, 'category']).size()

# Calculate the proportion of female laureates in each combination
proportion_by_decade_category = female_by_decade_category / total_by_decade_category

# Find the combination with the highest proportion
max_female_combination = proportion_by_decade_category.idxmax()

# Store the result as a dictionary
max_female_dict = {max_female_combination[0]: max_female_combination[1]}

print("Decade and Nobel Prize category combination with the highest proportion of female laureates:", max_female_dict)

Decade and Nobel Prize category combination with the highest proportion of female laureates: {2020: 'Literature'}


In [59]:
# Filter the DataFrame to include only female Nobel Prize winners
female_winners = nobel[nobel['sex'] == 'Female']

# Sort the DataFrame by year to find the earliest female Nobel Prize winner
earliest_female_winner = female_winners.sort_values(by='year').iloc[0]

# Extract the name and category of the earliest female winner
first_woman_name = earliest_female_winner['full_name']
first_woman_category = earliest_female_winner['category']

# Print the results
print("The first woman to receive a Nobel Prize was", first_woman_name, "in the category of", first_woman_category)

The first woman to receive a Nobel Prize was Marie Curie, née Sklodowska in the category of Physics


In [60]:
# Group the data by full names and count the occurrences
name_counts = nobel['full_name'].value_counts()

# Filter out names with counts greater than one
repeat_names = name_counts[name_counts > 1]

# Store the names in a list
repeat_list = repeat_names.index.tolist()

# Print the list of repeat winners
print("Individuals or organizations who have won more than one Nobel Prize:")
for name in repeat_list:
    print(name)

Individuals or organizations who have won more than one Nobel Prize:
Comité international de la Croix Rouge (International Committee of the Red Cross)
Linus Carl Pauling
John Bardeen
Frederick Sanger
Marie Curie, née Sklodowska
Office of the United Nations High Commissioner for Refugees (UNHCR)
