The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

In [62]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Start coding here!
df_nobel = pd.read_csv(r'data/nobel.csv')
df_nobel.head(2)

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France


## Top Gender & Top Country

In [13]:
top_gender = pd.DataFrame(df_nobel['sex'].value_counts()).reset_index().loc[0,'index']
top_country = pd.DataFrame(df_nobel['birth_country'].value_counts()).reset_index().loc[0,'index']
print(f'top_gender==>{top_gender}\ntop_country==>{top_country}')

top_gender==>Male
top_country==>United States of America


In [14]:
# Add Decade to DataFrame
df_nobel.insert(0, 'decade', df_nobel['year']//10*10)
df_nobel.head()

Unnamed: 0,decade,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1900,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1900,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1900,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1900,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1900,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France


## Decade when US had highest proportion

In [15]:
def df_category_ratio(cat_column, cat_value, groupby, df):
    '''
    Function to return highest ratio of category value (cat_value)
    in category column (cat_column)
    grouped by a selected category (decade)
    in a DataFarme (df)
    '''
    value_counts = df[df[cat_column]==cat_value].groupby(groupby).size()
    cat_counts = df.groupby(groupby)[cat_column].size()
    max_decade = pd.DataFrame((value_counts/cat_counts*100).sort_values(ascending=False)).reset_index().loc[0,groupby]
    return max_decade

In [16]:
max_decade_usa = df_category_ratio('birth_country', 'United States of America', 'decade', df_nobel)
max_decade_usa

2000

## Decade & Category when Females had highest proportion of female laureates

In [33]:
df_nobel['female_winner'] = df_nobel['sex'] == 'Female'
df_female = df_nobel.groupby(['decade', 'category'], as_index=False)['female_winner'].mean()
max_female_dict = {
    df_female[df_female['female_winner'] == df_female['female_winner'].max()]['decade'].values[0]:
    df_female[df_female['female_winner'] == df_female['female_winner'].max()]['category'].values[0]
}
max_female_dict

{2020: 'Literature'}

## First Woman to win Nobel Prize

In [38]:
df_first_woman = df_nobel[df_nobel['female_winner']==True].sort_values(by='decade').head(1)
print(f"First woman to receive a Nobel Prize was {df_first_woman['full_name'].values[0]} in {df_first_woman['year'].values[0]} for category {df_first_woman['category'].values[0]}")

First woman to receive a Nobel Prize was Marie Curie, née Sklodowska in 1903 for category Physics


## Individuals & Organizations to have won Nobel Prize more than once

In [58]:
# full_name
# organization_name
df_individuals = df_nobel.groupby('full_name', as_index=False).size()
df_organizations = df_nobel.groupby('organization_name', as_index=False).size()
individuals = list(df_individuals[df_individuals['size']>1].sort_values(ascending=False, by='size')['full_name'])
organisartions = list(df_organizations[df_organizations['size']>1].sort_values(ascending=False, by='size')['organization_name'])

In [61]:
repeat_list = individuals+organisartions
repeat_list

['Comité international de la Croix Rouge (International Committee of the Red Cross)',
 'Frederick Sanger',
 'John Bardeen',
 'Linus Carl Pauling',
 'Marie Curie, née Sklodowska',
 'Office of the United Nations High Commissioner for Refugees (UNHCR)',
 'University of California',
 'Harvard University',
 'Massachusetts Institute of Technology (MIT)',
 'Stanford University',
 'University of Chicago',
 'University of Cambridge',
 'Princeton University',
 'California Institute of Technology (Caltech)',
 'Columbia University',
 'Rockefeller University',
 'MRC Laboratory of Molecular Biology',
 'University of Oxford',
 'Cornell University',
 'Yale University',
 'Harvard Medical School',
 'University College',
 'Institut Pasteur',
 'University of Heidelberg',
 'Sorbonne University',
 'London University',
 'Berlin University',
 'Rockefeller Institute for Medical Research',
 'Goettingen University',
 'Uppsala University',
 'Bell Laboratories',
 'University of Texas Southwestern Medical Center at