The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

In [2]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Load the df
nobel = pd.read_csv('data/nobel.csv')

# Common gender and country:
countries = nobel['birth_country'].value_counts()
top_country = countries.index[0]
print('Most commonly awarded country is ', top_country)
genders = nobel['sex'].value_counts()
top_gender=genders.index[0]
print('Most commonly awarded gender is ', top_gender)

# Add decade column to nobel: 
nobel['decade']=(np.floor(nobel['year']/10)*10).astype(int) 
#to calculate the decade np.floor(/)*10

# The decade has the highest proportion of US-born winners
nobel['decade']=(np.floor(nobel['year']/10)*10).astype('int') #to calculate the decade np.floor(/)*10

# Decade has the highest proportion of US-born winners
usa_decade = nobel[nobel['birth_country'] == 'United States of America'].groupby('decade').size()/nobel.groupby('decade').size()
print(usa_decade)
max_decade_usa = usa_decade.idxmax()
print('The decade has the highest proportion of US-born winners  is ',max_decade_usa, ' with ',round(usa_decade.max()*100,2), '%')

# Decade and category pair had the highest proportion of female laureates:
sex_prop = nobel[nobel['sex'] == 'Female'].groupby(['decade', 'category']).size() / nobel.groupby(['decade', 'category']).size()

#Idxmax returns a tuple with indexs (decade, category) and the values: highest proportion
max_female_dict = {sex_prop.idxmax()[0]:sex_prop.idxmax()[1]}

print('Decade and category pair had the highest proportion of female laureates: ', max_female_dict)

# First women to receive a Nobel prize and in what category
female_winners = nobel[nobel['sex'] == 'Female']
first_female_winner = female_winners[female_winners['year'] == female_winners['year'].min()]
first_woman_name = first_female_winner['full_name'].values[0]

first_woman_category = first_female_winner['category'].values[0]
print('The first women to receive a Nobel prize is ', first_woman_name,' in ', first_woman_category)

# Individuals or organizations have won multiple Nobel Prizes
counts = nobel['full_name'].value_counts()
repeat_list = counts[counts >=2].index.to_list()
print(repeat_list)

Most commonly awarded country is  United States of America
Most commonly awarded gender is  Male
decade
1900    0.017544
1910    0.075000
1920    0.074074
1930    0.250000
1940    0.302326
1950    0.291667
1960    0.265823
1970    0.317308
1980    0.319588
1990    0.403846
2000    0.422764
2010    0.314050
2020    0.360000
dtype: float64
The decade has the highest proportion of US-born winners  is  2000  with  42.28 %
Decade and category pair had the highest proportion of female laureates:  {2020: 'Literature'}
The first women to receive a Nobel prize is  Marie Curie, née Sklodowska  in  Physics
['Comité international de la Croix Rouge (International Committee of the Red Cross)', 'Frederick Sanger', 'John Bardeen', 'Office of the United Nations High Commissioner for Refugees (UNHCR)', 'Linus Carl Pauling', 'Marie Curie, née Sklodowska']
