## Project #3 

## Visualizing the History of Nobel Prize Winners

The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

## Project Instructions

Analyze Nobel Prize winner data and identify patterns by answering the following questions:

- What is the most commonly awarded gender and birth country?

    Store your answers as string variables `top_gender` and `top_country`.
<br>

- Which decade had the highest ratio of US-born Nobel Prize winners to total winners in all categories?

    Store this as an integer called `max_decade_usa`.
<br>

- Which decade and Nobel Prize category combination had the highest proportion of female laureates?

    Store this as a dictionary called `max_female_dict` where the decade is the key and the category is the value. There should only be one key:value pair.
<br>
- Who was the first woman to receive a Nobel Prize, and in what category?

    Save your string answers as `first_woman_name` and `first_woman_category`.
<br>
- Which individuals or organizations have won more than one Nobel Prize throughout the years?

    Store the full names in a list `named repeat_list`.

In [31]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

In [2]:
nobel_winners = pd.read_csv("data/nobel.csv")

In [3]:
nobel_winners.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France


In [4]:
len(nobel_winners)

1000

In [5]:
top_gender = nobel_winners['sex'].value_counts().index[0]
top_gender

'Male'

In [6]:
top_country = nobel_winners['birth_country'].value_counts().index[0]
top_country

'United States of America'

In [7]:
year_cat = nobel_winners.groupby(['year', 'category']).value_counts()
year_cat

year  category   prize                                                  motivation                                                                                                                                                                                                                                          prize_share  laureate_id  laureate_type  full_name                     birth_date  birth_city          birth_country             sex   organization_name         organization_city  organization_country      death_date  death_city    death_country           
1901  Chemistry  The Nobel Prize in Chemistry 1901                      "in recognition of the extraordinary services he has rendered by the discovery of the laws of chemical dynamics and osmotic pressure in solutions"                                                                                                  1/1          160          Individual     Jacobus Henricus van 't Hoff  1852-08-30  Rotterdam           Netherla

In [8]:
nobel_winners['decade'] = ((np.floor(nobel_winners['year'] // 10)) * 10).astype(int)

In [9]:
nobel_winners['USA'] = nobel_winners['birth_country'] == top_country

In [10]:
nobel_winners.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country,decade,USA
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany,1900,False
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France,1900,False
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany,1900,False
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland,1900,False
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France,1900,False


In [11]:
usa_decades = nobel_winners.groupby('decade', as_index=False).agg({'USA': 'mean'})

In [12]:
usa_decades

Unnamed: 0,decade,USA
0,1900,0.017544
1,1910,0.075
2,1920,0.074074
3,1930,0.25
4,1940,0.302326
5,1950,0.291667
6,1960,0.265823
7,1970,0.317308
8,1980,0.319588
9,1990,0.403846


In [13]:
# ****** Important ****** How to select a "ROW" with a filter

max_decade_usa = usa_decades[usa_decades['USA'] == usa_decades['USA'].max()]['decade'].values[0]

In [14]:
max_decade_usa

2000

In [15]:
nobel_winners_fem = nobel_winners[nobel_winners['sex'] == 'Female']

In [16]:
nobel_winners['female'] = nobel_winners['sex'] == 'Female'

In [17]:
fem_per_cat = nobel_winners.groupby(['decade', 'category'], as_index=False)['female'].mean()

In [18]:
fem_per_cat

Unnamed: 0,decade,category,female
0,1900,Chemistry,0.000000
1,1900,Literature,0.100000
2,1900,Medicine,0.000000
3,1900,Peace,0.071429
4,1900,Physics,0.076923
...,...,...,...
67,2020,Economics,0.111111
68,2020,Literature,0.500000
69,2020,Medicine,0.125000
70,2020,Peace,0.285714


In [19]:
max_decade_fem_cat = fem_per_cat[fem_per_cat['female'] == fem_per_cat['female'].max()]
max_decade_fem_cat

Unnamed: 0,decade,category,female
68,2020,Literature,0.5


In [20]:
max_female_dict = {max_decade_fem_cat['decade'].values[0]: max_decade_fem_cat['category'].values[0]}
max_female_dict

{2020: 'Literature'}

In [21]:
year_woman_first = nobel_winners_fem[nobel_winners_fem['year'] == nobel_winners_fem['year'].min()]
year_woman_first.reset_index()

Unnamed: 0,index,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,...,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country,decade,USA
0,19,1903,Physics,The Nobel Prize in Physics 1903,"""in recognition of the extraordinary services ...",1/4,6,Individual,"Marie Curie, née Sklodowska",1867-11-07,...,Russian Empire (Poland),Female,,,,1934-07-04,Sallanches,France,1900,False


In [22]:
first_woman_name = year_woman_first['full_name'].values[0]
first_woman_name

'Marie Curie, née Sklodowska'

In [23]:
first_woman_category = year_woman_first['category'].values[0]
first_woman_category

'Physics'

In [24]:
nobel_winners_mult = nobel_winners.groupby('laureate_id', as_index=False)['laureate_id'].value_counts()
nobel_winners_mult

Unnamed: 0,laureate_id,count
0,1,1
1,2,1
2,3,1
3,4,1
4,5,1
...,...,...
987,1030,1
988,1031,1
989,1032,1
990,1033,1


In [25]:
nobel_laureate_repeat = nobel_winners_mult[nobel_winners_mult['count'] > 1]
nobel_laureate_repeat

Unnamed: 0,laureate_id,count
5,6,2
64,66,2
212,217,2
217,222,2
475,482,3
507,515,2
720,743,2


In [26]:

nobel_winners_repeat = nobel_winners[nobel_winners['laureate_id'].isin(nobel_laureate_repeat['laureate_id'])]['full_name']

In [27]:
nobel_winners_repeat

19                           Marie Curie, née Sklodowska
62                           Marie Curie, née Sklodowska
89     Comité international de la Croix Rouge (Intern...
215    Comité international de la Croix Rouge (Intern...
278                                   Linus Carl Pauling
283    Office of the United Nations High Commissioner...
298                                         John Bardeen
306                                     Frederick Sanger
340                                   Linus Carl Pauling
348    Comité international de la Croix Rouge (Intern...
424                                         John Bardeen
505                                     Frederick Sanger
523    Office of the United Nations High Commissioner...
721                                   K. Barry Sharpless
975                                      Barry Sharpless
Name: full_name, dtype: object

In [28]:
nobel_repeat = nobel_winners_repeat.replace('Barry Sharpless', 'K. Barry Sharpless')

In [29]:
repeat_list = list(set(nobel_repeat))

In [30]:
repeat_list

['Marie Curie, née Sklodowska',
 'John Bardeen',
 'K. Barry Sharpless',
 'Frederick Sanger',
 'Office of the United Nations High Commissioner for Refugees (UNHCR)',
 'Linus Carl Pauling',
 'Comité international de la Croix Rouge (International Committee of the Red Cross)']

In [32]:
# Add a blurb here about how there were supposed to be 6 names listed in the DataCamp project solution for the repeat_list, 
# however, there are actually 7, since one person's name (K. Barry Sharpless) was spelled differently for each of his 2 wins. 
# If tallied by laureate_id instead, then the correct answer of 7 is found.