The Nobel Prize has been among the most prestigious international awards since 1901. Each year, awards are bestowed in chemistry, literature, physics, physiology or medicine, economics, and peace. In addition to the honor, prestige, and substantial prize money, the recipient also gets a gold medal with an image of Alfred Nobel (1833 - 1896), who established the prize.

![](Nobel_Prize.png)

The Nobel Foundation has made a dataset available of all prize winners from the outset of the awards from 1901 to 2023. The dataset used in this project is from the Nobel Prize API and is available in the `nobel.csv` file in the `data` folder.

In this project, you'll get a chance to explore and answer several questions related to this prizewinning data. And we encourage you then to explore further questions that you're interested in!

In [1]:
# Loading in required libraries
import pandas as pd
import seaborn as sns
import numpy as np

# Start coding here!

In [2]:
import os

# Get the current working directory
current_folder = os.getcwd()
current_folder

'G:\\My Drive\\Ingegneria\\Data Science GD\\Models\\Business cases\\Nobel Prize winners'

In [3]:
# Get all files and directories in the current directory
content = os.listdir(current_folder)
content

['data', 'Nobel_Prize.png', 'notebook.ipynb', '.ipynb_checkpoints']

In [4]:
nobel_df = pd.read_csv(current_folder+'/data/nobel.csv')
nobel_df

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,2023,Chemistry,The Nobel Prize in Chemistry 2023,"""for the discovery and synthesis of quantum dots""",1/3,1030,Individual,Louis Brus,1943-00-00,"Cleveland, OH",United States of America,Male,Columbia University,"New York, NY",United States of America,,,
996,2023,Chemistry,The Nobel Prize in Chemistry 2023,"""for the discovery and synthesis of quantum dots""",1/3,1031,Individual,Aleksey Yekimov,1945-00-00,,USSR (now Russia),Male,Nanocrystals Technology Inc.,"New York, NY",United States of America,,,
997,2023,Literature,The Nobel Prize in Literature 2023,"""for his innovative plays and prose which give...",1/1,1032,Individual,Jon Fosse,1959-09-29,Haugesund,Norway,Male,,,,,,
998,2023,Peace,The Nobel Peace Prize 2023,"""for her fight against the oppression of women...",1/1,1033,Individual,Narges Mohammadi,1972-04-21,Zanjan,Iran,Female,,,,,,


In [5]:
top_gender = nobel_df.sex.value_counts().idxmax()
top_gender

'Male'

In [6]:
top_country = nobel_df.birth_country.value_counts().idxmax()
top_country

'United States of America'

In [7]:
# max_decade = int(str(nobel_df.year.max())[:-1]+'9')
# min_decade = int(str(nobel_df.year.min())[:-1]+'0')
# min_decade, max_decade

In [8]:
nobel_df['decade'] = nobel_df.year // 10 * 10
nobel_df

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country,decade
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany,1900
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France,1900
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany,1900
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland,1900
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France,1900
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,2023,Chemistry,The Nobel Prize in Chemistry 2023,"""for the discovery and synthesis of quantum dots""",1/3,1030,Individual,Louis Brus,1943-00-00,"Cleveland, OH",United States of America,Male,Columbia University,"New York, NY",United States of America,,,,2020
996,2023,Chemistry,The Nobel Prize in Chemistry 2023,"""for the discovery and synthesis of quantum dots""",1/3,1031,Individual,Aleksey Yekimov,1945-00-00,,USSR (now Russia),Male,Nanocrystals Technology Inc.,"New York, NY",United States of America,,,,2020
997,2023,Literature,The Nobel Prize in Literature 2023,"""for his innovative plays and prose which give...",1/1,1032,Individual,Jon Fosse,1959-09-29,Haugesund,Norway,Male,,,,,,,2020
998,2023,Peace,The Nobel Peace Prize 2023,"""for her fight against the oppression of women...",1/1,1033,Individual,Narges Mohammadi,1972-04-21,Zanjan,Iran,Female,,,,,,,2020


In [9]:
winnersXdecade = nobel_df.groupby('decade')['laureate_id'].count()
winnersXdecade

decade
1900     57
1910     40
1920     54
1930     56
1940     43
1950     72
1960     79
1970    104
1980     97
1990    104
2000    123
2010    121
2020     50
Name: laureate_id, dtype: int64

In [10]:
USwinnersXdecade = nobel_df[nobel_df.birth_country == 'United States of America'].groupby('decade')['laureate_id'].count()
USwinnersXdecade

decade
1900     1
1910     3
1920     4
1930    14
1940    13
1950    21
1960    21
1970    33
1980    31
1990    42
2000    52
2010    38
2020    18
Name: laureate_id, dtype: int64

In [11]:
print(max((USwinnersXdecade / winnersXdecade)))

max_decade_usa = (USwinnersXdecade / winnersXdecade).idxmax()
max_decade_usa

0.42276422764227645


2000

In [12]:
female_count = nobel_df[nobel_df.sex == 'Female'].groupby(['category','decade'])['laureate_id'].count()
female_count

category    decade
Chemistry   1910      1
            1930      1
            1960      1
            2000      1
            2010      1
            2020      3
Economics   2000      1
            2010      1
            2020      1
Literature  1900      1
            1920      2
            1930      1
            1940      1
            1960      1
            1990      3
            2000      3
            2010      3
            2020      2
Medicine    1940      1
            1970      1
            1980      3
            1990      1
            2000      4
            2010      2
            2020      1
Peace       1900      1
            1930      1
            1940      1
            1970      3
            1980      1
            1990      3
            2000      2
            2010      5
            2020      2
Physics     1900      1
            1960      1
            2010      1
            2020      2
Name: laureate_id, dtype: int64

In [13]:
all_count = nobel_df.groupby(['category','decade'])['laureate_id'].count()
all_count

category   decade
Chemistry  1900       9
           1910       8
           1920      10
           1930      13
           1940       9
                     ..
Physics    1980      22
           1990      22
           2000      28
           2010      26
           2020      12
Name: laureate_id, Length: 72, dtype: int64

In [14]:
fem_prop_count = female_count / all_count
fem_prop_count

category   decade
Chemistry  1900           NaN
           1910      0.125000
           1920           NaN
           1930      0.076923
           1940           NaN
                       ...   
Physics    1980           NaN
           1990           NaN
           2000           NaN
           2010      0.038462
           2020      0.166667
Name: laureate_id, Length: 72, dtype: float64

In [15]:
max_female_tuple = fem_prop_count.idxmax()
max_female_tuple

('Literature', 2020)

In [16]:
max_female_dict = {max_female_tuple[1] : max_female_tuple[0]}
max_female_dict

{2020: 'Literature'}

In [17]:
nobel_df_w = nobel_df[nobel_df.sex == 'Female'][['year', 'category', 'full_name']].sort_values('year')
nobel_df_w

Unnamed: 0,year,category,full_name
19,1903,Physics,"Marie Curie, née Sklodowska"
29,1905,Peace,"Baroness Bertha Sophie Felicita von Suttner, n..."
51,1909,Literature,Selma Ottilia Lovisa Lagerlöf
62,1911,Chemistry,"Marie Curie, née Sklodowska"
128,1926,Literature,Grazia Deledda
...,...,...,...
982,2022,Literature,Annie Ernaux
993,2023,Physics,Anne L’Huillier
998,2023,Peace,Narges Mohammadi
989,2023,Medicine,Katalin Karikó


In [18]:
first_woman_name = nobel_df_w.full_name.iloc[0]
first_woman_name

'Marie Curie, née Sklodowska'

In [19]:
first_woman_category = nobel_df_w.category.iloc[0]
first_woman_category

'Physics'

In [20]:
count_nobel_df = nobel_df.full_name.value_counts()[nobel_df.full_name.value_counts() > 1]
count_nobel_df

Comité international de la Croix Rouge (International Committee of the Red Cross)    3
Linus Carl Pauling                                                                   2
John Bardeen                                                                         2
Frederick Sanger                                                                     2
Marie Curie, née Sklodowska                                                          2
Office of the United Nations High Commissioner for Refugees (UNHCR)                  2
Name: full_name, dtype: int64

In [21]:
repeat_list = list(count_nobel_df.index)
repeat_list

['Comité international de la Croix Rouge (International Committee of the Red Cross)',
 'Linus Carl Pauling',
 'John Bardeen',
 'Frederick Sanger',
 'Marie Curie, née Sklodowska',
 'Office of the United Nations High Commissioner for Refugees (UNHCR)']