# Visualizing the History of Nobel Prize Winners

The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

In [171]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Adding the "csv" data.
nobel_ds = pd.read_csv("./data/nobel.csv")

## 1) Most commonly awarded gender and birth country?

In [179]:
# Most commonly awarded gender and birth country?
top_gender = nobel_ds["sex"].value_counts().index[0]
top_country = nobel_ds["birth_country"].value_counts().index[0]
print(f"The most common gender is {top_gender} and the most common country is {top_country}")

The most common gender is Male and the most common country is United States of America


## 2) What decade had the highest proportion of US-born winners?

In [180]:
nobel_ds_v2 = nobel_ds.copy() # let's copy this
nobel_ds_v2['usa_born_winner'] = nobel_ds_v2['birth_country'] == 'United States of America' # Create a column "usa_born_winner"
# dividing year by 10 to get the de last digit in decimal so we can apply np.floor() --> np.floor() to get the lower decimal --> multiply by 10 to get the decade
nobel_ds_v2["decade"] = (np.floor(nobel_ds_v2["year"]/10) * 10).astype(int)
# Now let's group by "decade" with the `as_index=False` to get the ouput as a Data.Frame. Then apply mean() to the "usa_born_winner" column.
prop_usa_winners = nobel_ds_v2.groupby("decade", as_index=False)["usa_born_winner"].mean()

# Let's see what row has the maximun value for the column "usa_born_winner".
max_decade_usa = prop_usa_winners[prop_usa_winners["usa_born_winner"] == prop_usa_winners["usa_born_winner"].max()]
max_decade_usa = max_decade_usa["decade"].values[0] # Let's isolate "usa_born_winner" value:
print(f"The decade with the highest proportion of US-born winners is {max_decade_usa}")

The decade with the highest proportion of US-born winners is 2000


## 3) What decade and category pair had the highest proportion of female laureates?

Because we are talking about **decade**, we have to to use the `nobel_ds_v2` *data.frame*, since this one's got the decade correctly made.

In [174]:
# First let's isolate what we need
nobel_ds_v3 = nobel_ds_v2[["decade", "category", "sex"]].copy()

# Filter for female and then count
female_counts = nobel_ds_v3[nobel_ds_v3['sex'] == 'Female'].groupby(["decade", "category"], as_index=False).size()

# Finding the max count
max_count = female_counts['size'].max()

# Filtering for the row(s) with the max count
max_female_rows = female_counts[female_counts['size'] == max_count]

# Converting the result to a dictionary
max_female_dict = max_female_rows.to_dict('records')

print(f"The decade and category pair with the highest proportion of female laureates is: {max_female_dict[0]['decade']} and {max_female_dict[0]['category']}")


## 4) Who was the first woman to receive a Nobel Prize, and in what category?

In [175]:
# Let's isolate the needed data
nobel_ds_v4 = nobel_ds_v2[["sex", "category", "year", "full_name"]].copy()

# Now only for women
nobel_ds_v4 = nobel_ds_v4[nobel_ds_v4["sex"] == "Female"]

# Now for the date
nobel_ds_v4 = nobel_ds_v4[nobel_ds_v4["year"] == nobel_ds_v4["year"].min()]
first_woman_name = nobel_ds_v4['full_name'].values[0]
first_woman_category = nobel_ds_v4['category'].values[0]
print(f"The first woman to receive a Nobel Prize was {first_woman_name} in the category {first_woman_category}")

## 5) Which individuals or organizations have won multiple Nobel Prizes throughout the years?

In [181]:
# First let's isolated the needed data
nobel_ds_v5 = nobel_ds_v2[["full_name", "laureate_type"]].copy()

# The rest of the coding
max_value = nobel_ds_v5["full_name"].value_counts()
repeats = max_value[max_value >= 2].index
repeat_list = list(repeats)
print(f"The repeat winners are: {repeat_list}")

The repeat winners are: ['Comité international de la Croix Rouge (International Committee of the Red Cross)', 'Linus Carl Pauling', 'John Bardeen', 'Frederick Sanger', 'Marie Curie, née Sklodowska', 'Office of the United Nations High Commissioner for Refugees (UNHCR)']
