# The History of Nobel Prize Winners

The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. 

In this project, you'll explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

## I. Import libraries

In [92]:
import pandas as pd
import seaborn as sns
import numpy as np

## II. Load and preview the dataset

In [93]:
# Load the dataset
nobels = pd.read_csv('data/nobel.csv')

# Preview
nobels.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France


## III. Solution 

### Q1: What is the most commonly awarded gender and birth country?

- Calcuate the most commonly awarded gender
- Calculate the most commonly awarded birth country

In [94]:
# Most commonly awarded gender
top_gender = nobels['sex'].mode()[0]
print(f'The most commonly awarded gender: {top_gender}')

# Most commonly awarded birth country
top_country = nobels['birth_country'].mode()[0]
print(f'The most commonly awarded birth country: {top_country}')

The most commonly awarded gender: Male
The most commonly awarded birth country: United States of America


### Q2: Which decade had the highest ratio of US-born Nobel Prize winners to total winners in all categories?

- Extract the decade from the year
- Filter US-born Nobel Prize winners

In [95]:
# Extract the decade
nobels['decade'] = (nobels['year'] // 10) * 10

# Filter for US-born
us_born = nobels[nobels['birth_country'] == 'United States of America']

- Calcuate the number of US-born winners per decade

In [96]:
# Number of US-born winners
us_born_per_decade = us_born.groupby('decade').size()

# Preview
us_born_per_decade

decade
1900     1
1910     3
1920     4
1930    14
1940    13
1950    21
1960    21
1970    33
1980    31
1990    42
2000    52
2010    38
2020    18
dtype: int64

- Calcuate the total number of winners per decade

In [97]:
# Total number of winners
total_winners_per_decade = nobels.groupby('decade').size()

# Preview
total_winners_per_decade

decade
1900     57
1910     40
1920     54
1930     56
1940     43
1950     72
1960     79
1970    104
1980     97
1990    104
2000    123
2010    121
2020     50
dtype: int64

- Calcuate the ratio of US-born to the total winners per decade

In [98]:
# Ratio of US-born winners to total winners
ratio_per_decade = us_born_per_decade / total_winners_per_decade

# Preview
ratio_per_decade

decade
1900    0.017544
1910    0.075000
1920    0.074074
1930    0.250000
1940    0.302326
1950    0.291667
1960    0.265823
1970    0.317308
1980    0.319588
1990    0.403846
2000    0.422764
2010    0.314050
2020    0.360000
dtype: float64

- Find the decade with the highest ration

In [99]:
# The highest ratio
max_decade_usa = ratio_per_decade.idxmax()

# Preview
print(f'Higest Ration of US-born to total winners: {max_decade_usa}')

Higest Ration of US-born to total winners: 2000


### Q3: Which decade and Nobel Prize category combination had the highest proportion of female laureates?

- Filter for female laureates

In [100]:
# Filter for female
f_laureates = nobels[nobels['sex'] == 'Female']

- Count the female laureates per decade and category

In [101]:
# Count the female per decade and category
f_count = f_laureates.groupby(['decade', 'category']).size()

# Preview
f_count

decade  category  
1900    Literature    1
        Peace         1
        Physics       1
1910    Chemistry     1
1920    Literature    2
1930    Chemistry     1
        Literature    1
        Peace         1
1940    Literature    1
        Medicine      1
        Peace         1
1960    Chemistry     1
        Literature    1
        Physics       1
1970    Medicine      1
        Peace         3
1980    Medicine      3
        Peace         1
1990    Literature    3
        Medicine      1
        Peace         3
2000    Chemistry     1
        Economics     1
        Literature    3
        Medicine      4
        Peace         2
2010    Chemistry     1
        Economics     1
        Literature    3
        Medicine      2
        Peace         5
        Physics       1
2020    Chemistry     3
        Economics     1
        Literature    2
        Medicine      1
        Peace         2
        Physics       2
dtype: int64

- Count the total laureates per decade and category

In [102]:
# Count the total laureates per decade and category
total_count = nobels.groupby(['decade', 'category']).size()

# Preview
total_count

decade  category  
1900    Chemistry      9
        Literature    10
        Medicine      11
        Peace         14
        Physics       13
                      ..
2020    Economics      9
        Literature     4
        Medicine       8
        Peace          7
        Physics       12
Length: 72, dtype: int64

- Calculate the proportion of female laureates

In [103]:
# Calculate the proportion of female
f_proportion = f_count / total_count

# Preivew
f_proportion

decade  category  
1900    Chemistry          NaN
        Literature    0.100000
        Medicine           NaN
        Peace         0.071429
        Physics       0.076923
                        ...   
2020    Economics     0.111111
        Literature    0.500000
        Medicine      0.125000
        Peace         0.285714
        Physics       0.166667
Length: 72, dtype: float64

- Find the decade and category with the highest proportion of female laureates
- Store the result in a dictionary, key:value pair 

In [104]:
# Find the decade and category with highest proportion
max_f_proportion = f_proportion.idxmax()

# Store the results
max_female_dict = {max_f_proportion[0]: max_f_proportion[1]}

# Preview
max_female_dict

{2020: 'Literature'}

### Q4: Who was the first woman to receive a Nobel Prize, and in what category?

- Find the first women to recieven a Nobel Prize
- Extract the name and the category
- Preview results

In [105]:
# First woman to receive a Nobel Prize
first_woman = f_laureates.sort_values(by='year').iloc[0]

# Extract the name and category
first_woman_name = first_woman['full_name']
first_woman_category = first_woman['category']

# Preview the results
print(f'First woman won Nobel Prize: {first_woman_name}\nCategory: {first_woman_category}')

First woman won Nobel Prize: Marie Curie, née Sklodowska
Category: Physics


### Q5: Which individuals or organizations have won more than one Nobel Prize throughout the years?

- Find individuals or organizations with more than one Nobel Prize
- Store the full names in a list
- Preview the list

In [106]:
# Find individuals or organizations with more than one Nobel Prize
repeat_winners = nobels['full_name'].value_counts()
repeat_winners = repeat_winners[repeat_winners > 1]

# Store the full names
repeat_list = repeat_winners.index.tolist()

# Preview
repeat_list

['Comité international de la Croix Rouge (International Committee of the Red Cross)',
 'Linus Carl Pauling',
 'John Bardeen',
 'Frederick Sanger',
 'Marie Curie, née Sklodowska',
 'Office of the United Nations High Commissioner for Refugees (UNHCR)']