The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

In [1]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Start coding here!

In [3]:
df = pd.read_csv('data/nobel.csv')
df.sample(5)

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
939,2019,Physics,The Nobel Prize in Physics 2019,"""for theoretical discoveries in physical cosmo...",1/2,973,Individual,James Peebles,1935-04-25,Winnipeg,Canada,Male,Princeton University,"Princeton, NJ",United States of America,,,
314,1958,Physics,The Nobel Prize in Physics 1958,"""for the discovery and the interpretation of t...",1/3,721,Individual,Il´ja Mikhailovich Frank,1908-10-23,Leningrad (Saint Petersburg),Russia,Male,University of Moscow,Moscow,Union of Soviet Socialist Republics,1990-06-22,Moscow,Union of Soviet Socialist Republics (Russia)
688,1998,Chemistry,The Nobel Prize in Chemistry 1998,"""for his development of computational methods ...",1/2,291,Individual,John A. Pople,1925-10-31,Burnham-on-Sea,United Kingdom,Male,Northwestern University,"Evanston, IL",United States of America,2004-03-15,"Chicago, IL",United States of America
91,1918,Chemistry,The Nobel Prize in Chemistry 1918,"""for the synthesis of ammonia from its elements""",1/1,177,Individual,Fritz Haber,1868-12-09,Breslau (Wroclaw),Prussia (Poland),Male,Kaiser-Wilhelm-Institut (now Fritz-Haber-Insti...,Berlin-Dahlem,Germany,1934-01-29,Basel,Switzerland
270,1952,Physics,The Nobel Prize in Physics 1952,"""for their development of new methods for nucl...",1/2,58,Individual,Felix Bloch,1905-10-23,Zurich,Switzerland,Male,Stanford University,"Stanford, CA",United States of America,1983-09-10,Zurich,Switzerland


Display a list of all column names and iterate throug to see columns of your data frame

In [4]:
for x in df.columns.tolist():
    print(x)

year
category
prize
motivation
prize_share
laureate_id
laureate_type
full_name
birth_date
birth_city
birth_country
sex
organization_name
organization_city
organization_country
death_date
death_city
death_country


### What is the most commonly awarded gender and birth country

In [10]:
# Most commonly awarded gender
top_gender = df["sex"].value_counts().idxmax()
print(top_gender)

Male


In [11]:
# Most commonly awarded birth country
top_country=df["birth_country"].value_counts().idxmax()
print(top_country)

United States of America


### Decade with the highest ratio of US-born Nobel Prize winners to total winners in all categories

In [12]:
# Convert year to decade
df["decade"] = (df["year"] // 10) * 10

# Count total winners per decade
total_winners = df.groupby("decade")["full_name"].count()

# Count US-born winners per decade
us_winners = df[df["birth_country"] == "United States of America"].groupby("decade")["full_name"].count()

# Calculate ratio
ratio = (us_winners / total_winners).fillna(0)

# Find the decade with the highest ratio
max_decade_usa = int(ratio.idxmax())

print("Decade with highest ratio of US-born Nobel Prize winners:", max_decade_usa)

Decade with highest ratio of US-born Nobel Prize winners: 2000


### Decade and Nobel Prize category combination with the highest proportion of female laureates

In [13]:
# Extract only female laureates
df_female = df[df["sex"] == "Female"]
df_female.sample(5)

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country,decade
849,2011,Peace,The Nobel Peace Prize 2011,"""for their non-violent struggle for the safety...",1/3,871,Individual,Tawakkol Karman,1979-02-07,Ta'izz,Yemen,Female,,,,,,,2010
160,1931,Peace,The Nobel Peace Prize 1931,,1/2,496,Individual,Jane Addams,1860-09-06,"Cedarville, IL",United States of America,Female,,,,1935-05-21,"Chicago, IL",United States of America,1930
586,1988,Medicine,The Nobel Prize in Physiology or Medicine 1988,"""for their discoveries of important principles...",1/3,438,Individual,Gertrude B. Elion,1918-01-23,"New York, NY",United States of America,Female,Wellcome Research Laboratories,"Research Triangle Park, NC",United States of America,1999-02-21,"Chapel Hill, NC",United States of America,1980
998,2023,Peace,The Nobel Peace Prize 2023,"""for her fight against the oppression of women...",1/1,1033,Individual,Narges Mohammadi,1972-04-21,Zanjan,Iran,Female,,,,,,,2020
141,1928,Literature,The Nobel Prize in Literature 1928,"""principally for her powerful descriptions of ...",1/1,601,Individual,Sigrid Undset,1882-05-20,Kalundborg,Denmark,Female,,,,1949-06-10,Lillehammer,Norway,1920


In [23]:
# count female and totatl laureates per decade and category
counts = df.groupby(["decade", "category"]).size().reset_index(name="total_count")
# counts = df.groupby(["decade", "category"]).size()
female_counts = df_female.groupby(["decade", "category"]).size().reset_index(name="female_count")

In [24]:
# merge the total and female counts:
merged = counts.merge(female_counts, on=["decade", "category"], how="left").fillna(0)
merged.head()

Unnamed: 0,decade,category,total_count,female_count
0,1900,Chemistry,9,0.0
1,1900,Literature,10,1.0
2,1900,Medicine,11,0.0
3,1900,Peace,14,1.0
4,1900,Physics,13,1.0


In [25]:
# calculate the female proportion
merged["female_ratio"] = merged["female_count"] / merged["total_count"]
merged.sample(5)

Unnamed: 0,decade,category,total_count,female_count,female_ratio
28,1950,Peace,8,0.0,0.0
53,1990,Physics,22,0.0,0.0
34,1960,Peace,9,0.0,0.0
58,2000,Peace,14,2.0,0.142857
3,1900,Peace,14,1.0,0.071429


In [27]:
# find the maximem female ratio
max_row = merged.loc[merged["female_ratio"].idxmax()]
max_female_dict = {int(max_row["decade"]): max_row["category"]}
print(max_female_dict)

{2020: 'Literature'}


### Who was the first woman to receive a Nobel Prize, and what category?

In [29]:
# first year a woman received a Nobel Prize:
first_year = df_female["year"].min()
print(first_year)

1903


In [33]:
# Filter the dataset for this year and extract the name and category:
first_woman = df_female[df_female["year"] == first_year].iloc[0]
first_woman_name = first_woman["full_name"]
first_woman_category = first_woman["category"]
#print(first_woman)
print(first_woman_name)
print(first_woman_category)

Marie Curie, née Sklodowska
Physics


### Which individuals or organizations have won more than one Nobel Prize throughout the years?

In [34]:
# Identify repeat winners
repeat_winners = df["full_name"].value_counts()
print(repeat_winners)

Comité international de la Croix Rouge (International Committee of the Red Cross)    3
Linus Carl Pauling                                                                   2
John Bardeen                                                                         2
Frederick Sanger                                                                     2
Marie Curie, née Sklodowska                                                          2
                                                                                    ..
Karl Ziegler                                                                         1
Giulio Natta                                                                         1
Giorgos Seferis                                                                      1
Sir John Carew Eccles                                                                1
Claudia Goldin                                                                       1
Name: full_name, Length: 993, dtype: int64


In [35]:
# Filter for those who whon more than once (count is greate than 1)
repeat_list = repeat_winners[repeat_winners > 1].index.tolist()
print(repeat_list)

['Comité international de la Croix Rouge (International Committee of the Red Cross)', 'Linus Carl Pauling', 'John Bardeen', 'Frederick Sanger', 'Marie Curie, née Sklodowska', 'Office of the United Nations High Commissioner for Refugees (UNHCR)']
