The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

In [62]:
# Loading in required libraries
import pandas as pd
import numpy as np

# Leitura dos dados 
nobel_winners = pd.read_csv("data/nobel.csv")

# Gênero e o país de nascimento mais comuns 
top_gender = nobel_winners['sex'].value_counts().idxmax()
top_country = nobel_winners['birth_country'].value_counts().idxmax()

# Crir coluna das décadas
nobel_winners['decade'] = (np.floor(nobel_winners['year'] / 10) * 10).astype(int)

# Criar a coluna booleana para nascidos no EUA
nobel_winners['usa_winner'] = nobel_winners['birth_country'] == "United States of America"
# Agrupar por década e calcular a proporção (usando média de Bernoulli) 
prop_usa_winners = nobel_winners.groupby('decade', as_index=False)['usa_winner'].mean()
# Década que houve o maior proporção de vencedores do Nobel nos EUA em relação ao total de vencedores em todas as categorias
max_decade_usa = prop_usa_winners.loc[prop_usa_winners['usa_winner'].idxmax(), 'decade']

# Criar coluna booleana para mulheres
nobel_winners['female_winner'] = nobel_winners['sex'] == "Female"
# Agrupar por década e categoria, e calcular a proporção (usando média de Bernoulli)
prop_female_winners = nobel_winners.groupby(['decade', 'category'], as_index=False)['female_winner'].mean()
# Armazenar em um dicionário a década e categoria com a maior proporção de mulheres vencedoras
max_female_row = prop_female_winners.loc[prop_female_winners['female_winner'].idxmax()]
max_female_dict = {max_female_row['decade']:max_female_row['category']}

# Primeira mulher a receber o prêmio Nobel e sua categoria 
first_woman_row = nobel_winners.loc[nobel_winners[nobel_winners['female_winner'] == True]['year'].idxmin()]
first_woman_name = first_woman_row['full_name']
first_woman_category = first_woman_row['category']

# Indivíduos ou origanizações que ganharam mais de um prêmio Nobel
count_name = nobel_winners['full_name'].dropna().value_counts()
repeat_list = count_name[count_name >= 2].index.tolist()