# Nobel Prize Analysis

In [2]:
import pandas as pd
import numpy as np
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt

### Data Exploration & Cleaning

In [4]:
data = pd.read_csv('nobel_prize_data.csv')

In [5]:
data.sample(5)

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_type,full_name,birth_date,birth_city,birth_country,birth_country_current,sex,organization_name,organization_city,organization_country,ISO
486,1978,Medicine,The Nobel Prize in Physiology or Medicine 1978,"""for the discovery of restriction enzymes and ...",1/3,Individual,Daniel Nathans,1928-10-30,"Wilmington, DE",United States of America,United States of America,Male,Johns Hopkins University,"Baltimore, MD",United States of America,USA
146,1929,Literature,The Nobel Prize in Literature 1929,"""principally for his great novel, <I>Buddenbro...",1/1,Individual,Thomas Mann,1875-06-06,Lübeck,Germany,Germany,Male,,,,DEU
861,2012,Physics,The Nobel Prize in Physics 2012,"""for ground-breaking experimental methods that...",1/2,Individual,David J. Wineland,1944-02-24,"Milwaukee, WI",United States of America,United States of America,Male,National Institute of Standards and Technology,"Boulder, CO",United States of America,USA
416,1972,Chemistry,The Nobel Prize in Chemistry 1972,"""for his work on ribonuclease, especially conc...",1/2,Individual,Christian B. Anfinsen,1916-03-26,"Monessen, PA",United States of America,United States of America,Male,National Institutes of Health,"Bethesda, MD",United States of America,USA
82,1915,Chemistry,The Nobel Prize in Chemistry 1915,"""for his researches on plant pigments, especia...",1/1,Individual,Richard Martin Willstätter,1872-08-13,Karlsruhe,Germany,Germany,Male,Munich University,Munich,Germany,DEU


In [6]:
data.shape

(962, 16)

In [7]:
data[data.isna().any(axis=1)]

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_type,full_name,birth_date,birth_city,birth_country,birth_country_current,sex,organization_name,organization_city,organization_country,ISO
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,Individual,Sully Prudhomme,1839-03-16,Paris,France,France,Male,,,,FRA
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,Individual,Frédéric Passy,1822-05-20,Paris,France,France,Male,,,,FRA
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Switzerland,Male,,,,CHE
7,1902,Literature,The Nobel Prize in Literature 1902,"""the greatest living master of the art of hist...",1/1,Individual,Christian Matthias Theodor Mommsen,1817-11-30,Garding,Schleswig (Germany),Germany,Male,,,,DEU
9,1902,Peace,The Nobel Peace Prize 1902,,1/2,Individual,Charles Albert Gobat,1843-05-21,Tramelan,Switzerland,Switzerland,Male,,,,CHE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
942,2019,Literature,The Nobel Prize in Literature 2019,“for an influential work that with linguistic ...,1/1,Individual,Peter Handke,1942-12-06,Griffen,Austria,Austria,Male,,,,AUT
946,2019,Peace,The Nobel Peace Prize 2019,“for his efforts to achieve peace and internat...,1/1,Individual,Abiy Ahmed Ali,1976-08-15,Beshasha,Ethiopia,Ethiopia,Male,,,,ETH
954,2020,Literature,The Nobel Prize in Literature 2020,“for her unmistakable poetic voice that with a...,1/1,Individual,Louise Glück,1943-04-22,"New York, NY",United States of America,United States of America,Female,,,,USA
957,2020,Medicine,The Nobel Prize in Physiology or Medicine 2020,“for the discovery of Hepatitis C virus”,1/3,Individual,Michael Houghton,1949-07-02,,United Kingdom,United Kingdom,Male,University of Alberta,Edmonton,Canada,GBR


In [8]:
data.duplicated(['full_name'], keep=False).value_counts()

False    949
True      13
Name: count, dtype: int64

In [9]:
data.loc[data.duplicated(['full_name'], keep=False)].sort_values('full_name')

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_type,full_name,birth_date,birth_city,birth_country,birth_country_current,sex,organization_name,organization_city,organization_country,ISO
89,1917,Peace,The Nobel Peace Prize 1917,,1/1,Organization,Comité international de la Croix Rouge (Intern...,,,,,,,,,
215,1944,Peace,The Nobel Peace Prize 1944,,1/1,Organization,Comité international de la Croix Rouge (Intern...,,,,,,,,,
348,1963,Peace,The Nobel Peace Prize 1963,,1/2,Organization,Comité international de la Croix Rouge (Intern...,,,,,,,,,
306,1958,Chemistry,The Nobel Prize in Chemistry 1958,"""for his work on the structure of proteins, es...",1/1,Individual,Frederick Sanger,1918-08-13,Rendcombe,United Kingdom,United Kingdom,Male,University of Cambridge,Cambridge,United Kingdom,GBR
505,1980,Chemistry,The Nobel Prize in Chemistry 1980,"""for their contributions concerning the determ...",1/4,Individual,Frederick Sanger,1918-08-13,Rendcombe,United Kingdom,United Kingdom,Male,MRC Laboratory of Molecular Biology,Cambridge,United Kingdom,GBR
297,1956,Physics,The Nobel Prize in Physics 1956,"""for their researches on semiconductors and th...",1/3,Individual,John Bardeen,1908-05-23,"Madison, WI",United States of America,United States of America,Male,University of Illinois,"Urbana, IL",United States of America,USA
424,1972,Physics,The Nobel Prize in Physics 1972,"""for their jointly developed theory of superco...",1/3,Individual,John Bardeen,1908-05-23,"Madison, WI",United States of America,United States of America,Male,University of Illinois,"Urbana, IL",United States of America,USA
278,1954,Chemistry,The Nobel Prize in Chemistry 1954,"""for his research into the nature of the chemi...",1/1,Individual,Linus Carl Pauling,1901-02-28,"Portland, OR",United States of America,United States of America,Male,California Institute of Technology (Caltech),"Pasadena, CA",United States of America,USA
340,1962,Peace,The Nobel Peace Prize 1962,,1/1,Individual,Linus Carl Pauling,1901-02-28,"Portland, OR",United States of America,United States of America,Male,California Institute of Technology (Caltech),"Pasadena, CA",United States of America,USA
18,1903,Physics,The Nobel Prize in Physics 1903,"""in recognition of the extraordinary services ...",1/4,Individual,"Marie Curie, née Sklodowska",1867-11-07,Warsaw,Russian Empire (Poland),Poland,Female,,,,POL


In [10]:
data.columns

Index(['year', 'category', 'prize', 'motivation', 'prize_share',
       'laureate_type', 'full_name', 'birth_date', 'birth_city',
       'birth_country', 'birth_country_current', 'sex', 'organization_name',
       'organization_city', 'organization_country', 'ISO'],
      dtype='object')

In [11]:
data.sample()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_type,full_name,birth_date,birth_city,birth_country,birth_country_current,sex,organization_name,organization_city,organization_country,ISO
142,1928,Medicine,The Nobel Prize in Physiology or Medicine 1928,"""for his work on typhus""",1/1,Individual,Charles Jules Henri Nicolle,1866-09-21,Rouen,France,France,Male,Institut Pasteur,Tunis,Tunisia,FRA


In [12]:
data.dtypes

year                      int64
category                 object
prize                    object
motivation               object
prize_share              object
laureate_type            object
full_name                object
birth_date               object
birth_city               object
birth_country            object
birth_country_current    object
sex                      object
organization_name        object
organization_city        object
organization_country     object
ISO                      object
dtype: object

In [13]:
data.year.min()

1901

In [14]:
data.year.max()

2020

In [16]:
data.birth_date = pd.to_datetime(data.birth_date, format="%Y-%m-%d")

In [22]:
data.dtypes

year                              int64
category                         object
prize                            object
motivation                       object
prize_share                      object
laureate_type                    object
full_name                        object
birth_date               datetime64[ns]
birth_city                       object
birth_country                    object
birth_country_current            object
sex                              object
organization_name                object
organization_city                object
organization_country             object
ISO                              object
dtype: object

In [84]:
for share in data.prize_share.astype(str):
    share.split('/')
    num = int(share[0])
    denom = int(share[2])
    decimal = round((num / denom),2)
    data["share_pct"] = decimal

In [90]:
data.sample()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_type,full_name,birth_date,birth_city,birth_country,birth_country_current,sex,organization_name,organization_city,organization_country,ISO,share_pct
492,1978,Physics,The Nobel Prize in Physics 1978,"""for his basic inventions and discoveries in t...",1/2,Individual,Pyotr Leonidovich Kapitsa,1894-07-09,Kronshtadt,Russian Empire (Russia),Russia,Male,Academy of Sciences,Moscow,Russia,RUS,0.5


- There are no duplicates as such when searching by full name, as some person or organisations won prizes in more than one year, which is why they show up when searching for duplicates using this field.
- There are some missing/NaN values in fields such as birth date, but this is due to the prize being won by an organisation, therefore there will be no birth date listed. In the organisation name column, when there are no values, this indicates that the prize went to a person who was not affiliated with a university or research institute e.g. Literature or Peace prize winners

## Exploratory questions
1. How many of each prize was awarded?
2. What is the most common prize? What is the least common prize?
3. What is the split of all prizes awarded females vs males?
4. What were the names of the first 3 female Nobel laureates?
5. What did the first 3 female laureates win prizes for?
6. What were the names of the first 3 male Nobel laureates?
7. What did the first 3 male laureates win prizes for?
8. How many people won prizes more than once?
9. How many organisation
10. In how many categories are prizes awarded?
11. Which category has the most number of prizes awarded?
12. Which category has the fewest number of prizes awarded?
13. What is the distribution of prizes awarded by country?
14. What is the split of prizes between organisations and individuals?