The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

In [29]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np


In [30]:
# Reading the data and checking the first rows.

nobel_prize_data = pd.read_csv("data/nobel.csv")
nobel_prize_data.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France


In [31]:
# The most commonly awarded gender is found through the first line of code and the most commonly awarded birth country is found through the second line. Here the value_counts method gives the summary of the counts of the values from most frequent to less frequent. The tolist function is applied to the index of such result and with proper indexing, the top value is obtained.

top_gender=nobel_prize_data["sex"].value_counts().index.tolist()[0]
top_country=nobel_prize_data["birth_country"].value_counts().index.tolist()[0]

In [32]:
# Using the cut function to create a new feature of the data that stores the decade. The functions requires the labels, which are of type int, and the bins. The function head is used to check the new column.
labels = [1900, 1910, 1920, 1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020]
bins = [1900, 1909, 1919, 1929, 1939, 1949, 1959, 1969, 1979, 1989, 1999, 2009, 2019, 2023]
nobel_prize_data["decade"] = pd.cut(nobel_prize_data["year"], labels=labels, bins=bins)
nobel_prize_data.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country,decade
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany,1900
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France,1900
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany,1900
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland,1900
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France,1900


In [33]:
# Counting all nobel prize winners and grouping by decade
nobel_prize_data_winners_per_decade=nobel_prize_data.groupby("decade").agg({"laureate_id": "count"})
# Creating a DataFrame of only USA nobel winners
USA_nobel_winners=nobel_prize_data.loc[nobel_prize_data["birth_country"]=="United States of America",:]
# Counting USA nobel prize winners per decade
nobel_prize_USA_winners_per_decade=USA_nobel_winners.groupby("decade").agg({"laureate_id": "count"})
# Calculating the ratio
ratio_of_USA_nobel_winners=nobel_prize_USA_winners_per_decade["laureate_id"]/nobel_prize_data_winners_per_decade["laureate_id"]
# Finding the decade with the maximum ratio
max_decade_usa=ratio_of_USA_nobel_winners.sort_values(ascending=False).index.tolist()[0]

In [34]:
# Counting all nobel prize winners and grouping by decade and by category
nobel_prize_data_winners_per_decade_and_category=nobel_prize_data.groupby(["decade","category"]).agg({"laureate_id": "count"})
# Creating a DataFrame of only female nobel winners
female_nobel_winners=nobel_prize_data.loc[nobel_prize_data["sex"]=="Female",:]
# Counting USA nobel prize winners per decade and category
female_nobel_prize_winners_per_category_and_decade=female_nobel_winners.groupby(["decade","category"]).agg({"laureate_id": "count"})
# Calculating the proportion
female_proportion=female_nobel_prize_winners_per_category_and_decade/nobel_prize_data_winners_per_decade_and_category
# Obtaining the top highest proportion of both year and category of female nobel wineers. Last line saves the result in a dictionary
max_female_tuple = female_proportion.sort_values(by="laureate_id", ascending=False).index[0]
max_female_dict = {max_female_tuple[0]: max_female_tuple[1]}
max_female_dict

{2020: 'Literature'}

In [35]:
# In the previous cell a DataFrame containing information about female winners was built. The first record of such DataFrame gives the first woman to receive a nobel prize since it is organized by ascending year
first_woman_name=female_nobel_winners["full_name"].iloc[0]
first_woman_category=female_nobel_winners["category"].iloc[0]

In [36]:
#Gives the list of the organizations or individuals with more than one nobel prize. The value_counts method is used, because it gives statistical data.
repeat_list=nobel_prize_data["full_name"].value_counts(ascending=False).head(6).index.to_list()
repeat_list

['Comité international de la Croix Rouge (International Committee of the Red Cross)',
 'Linus Carl Pauling',
 'John Bardeen',
 'Frederick Sanger',
 'Marie Curie, née Sklodowska',
 'Office of the United Nations High Commissioner for Refugees (UNHCR)']