The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

Analyze Nobel Prize winner data and identify patterns by answering the following questions:

What is the most commonly awarded gender and birth country?

Store your answers as string variables top_gender and top_country.
Which decade had the highest ratio of US-born Nobel Prize winners to total winners in all categories?

Store this as an integer called max_decade_usa.
Which decade and Nobel Prize category combination had the highest proportion of female laureates?

Store this as a dictionary called max_female_dict where the decade is the key and the category is the value. There should only be one key:value pair.
Who was the first woman to receive a Nobel Prize, and in what category?

Save your string answers as first_woman_name and first_woman_category.
Which individuals or organizations have won more than one Nobel Prize throughout the years?

Store the full names in a list named repeat_list.


In [284]:
# Loading in required libraries and df
import pandas as pd
import seaborn as sns
import numpy as np

nobel = pd.read_csv("data/nobel.csv" )
nobel.head(10)

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France
5,1901,Physics,The Nobel Prize in Physics 1901,"""in recognition of the extraordinary services ...",1/1,1,Individual,Wilhelm Conrad Röntgen,1845-03-27,Lennep (Remscheid),Prussia (Germany),Male,Munich University,Munich,Germany,1923-02-10,Munich,Germany
6,1902,Chemistry,The Nobel Prize in Chemistry 1902,"""in recognition of the extraordinary services ...",1/1,161,Individual,Hermann Emil Fischer,1852-10-09,Euskirchen,Prussia (Germany),Male,Berlin University,Berlin,Germany,1919-07-15,Berlin,Germany
7,1902,Literature,The Nobel Prize in Literature 1902,"""the greatest living master of the art of hist...",1/1,571,Individual,Christian Matthias Theodor Mommsen,1817-11-30,Garding,Schleswig (Germany),Male,,,,1903-11-01,Charlottenburg,Germany
8,1902,Medicine,The Nobel Prize in Physiology or Medicine 1902,"""for his work on malaria, by which he has show...",1/1,294,Individual,Ronald Ross,1857-05-13,Almora,India,Male,University College,Liverpool,United Kingdom,1932-09-16,Putney Heath,United Kingdom
9,1902,Peace,The Nobel Peace Prize 1902,,1/2,464,Individual,Élie Ducommun,1833-02-19,Geneva,Switzerland,Male,,,,1906-12-07,Bern,Switzerland


What is the most commonly awarded gender and birth country?

Store your answers as string variables top_gender and top_country.

Requirements:
- most common gender and birth country

In [285]:
gender_count = nobel["sex"].value_counts()
top_gender = gender_count.idxmax()
print(f"Most common gender: {top_gender}")

country_count = nobel["birth_country"].value_counts()
top_country = country_count.idxmax()
print(f"Most common country: {top_country}")

Most common gender: Male
Most common country: United States of America


Which decade had the highest ratio of US-born Nobel Prize winners to total winners in all categories?

Store this as an integer called max_decade_usa.

Requirements:
- groupby decade, ratio born winners/ all winners
- save as int

In [286]:
# new column decade
nobel["decade"] = (nobel["year"] // 10) * 10 ## Floors year to create decade
usa_nobel = nobel[nobel['birth_country'] == 'United States of America'] # df with only usa

usa_winners_decade = usa_nobel.groupby("decade").count() # count usa winners
all_winners = nobel.groupby("decade").count() # count all winners

decade_ratio = usa_winners_decade["birth_country"] / all_winners["birth_country"] # calculate ratio on same column as other columns have missing values

max_decade_usa = decade_ratio.idxmax() # use .idxmax() to find decade
print(max_decade_usa)

2000


Which decade and Nobel Prize category combination had the highest proportion of female laureates?

Store this as a dictionary called max_female_dict where the decade is the key and the category is the value. There should only be one key:value **pair.**

In [287]:
female_nobel = nobel[nobel["sex"] == 'Female']
female_decade_category_grouped = female_nobel.groupby(['decade','category']).size().reset_index(name='count')
# # create count list
decade_category_grouped = nobel.groupby(['decade','category']).size().reset_index(name='count')

# Merge both DataFrames to ensure proper alignment by decade and category
merged_df = pd.merge(female_decade_category_grouped, 
                     decade_category_grouped, 
                     on=['decade', 'category'], 
                     suffixes=('_female', '_total'))

merged_df['ratio'] = merged_df['count_female'] / merged_df['count_total']

max_index = merged_df['ratio'].idxmax() # index of max count
max_row = merged_df.iloc[max_index] # row of index

max_female_dict = {max_row[0]:max_row[1]} 
print(max_female_dict)


{2020: 'Literature'}


Who was the first woman to receive a Nobel Prize, and in what category?

Save your string answers as first_woman_name and first_woman_category.

In [288]:
female_nobel = nobel[nobel["sex"] == 'Female'] # filter by gender
female_nobel.sort_values('year', ascending = True) # sort values by year
female_nobel.reset_index() # reset index to use 0 as first woman

# assigns variables to name and category string
first_woman_name = female_nobel["full_name"].iloc[0]
first_woman_category = female_nobel["category"].iloc[0]

# Output Result
print(f"name: {first_woman_name} | category: {first_woman_category}")


name: Marie Curie, née Sklodowska | category: Physics


Which individuals or organizations have won more than one Nobel Prize throughout the years?

Store the full names in a list named repeat_list.

In [289]:
repeat_winners = nobel.groupby('full_name').size().reset_index(name='count')
repeat_winners = repeat_winners.sort_values('count', ascending = False)
repeat_winners = repeat_winners[repeat_winners["count"] > 1]

repeat_list = []
for index in range(len(repeat_winners)):
    repeat_list.append(repeat_winners["full_name"].iloc[index])

print(repeat_list)

['Comité international de la Croix Rouge (International Committee of the Red Cross)', 'Office of the United Nations High Commissioner for Refugees (UNHCR)', 'Frederick Sanger', 'Linus Carl Pauling', 'John Bardeen', 'Marie Curie, née Sklodowska']
