## 1. The most Nobel of Prizes
<p><img style="float: right;margin:5px 20px 5px 1px; max-width:250px" src="https://assets.datacamp.com/production/project_441/img/Nobel_Prize.png"></p>
<p>The Nobel Prize is perhaps the world's most well known scientific award. Except for the honor, prestige and substantial prize money the recipient also gets a gold medal showing Alfred Nobel (1833 - 1896) who established the prize. Every year it's given to scientists and scholars in the categories chemistry, literature, physics, physiology or medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time the Prize was very Eurocentric and male-focused, but nowadays it's not biased in any way whatsoever. Surely. Right?</p>
<p>Well, we're going to find out! The Nobel Foundation has made a dataset available of all prize winners from the start of the prize, in 1901, to 2016. Let's load it in and take a look.</p>

In [1]:
# import import libraries
import pandas as pd
import numpy as np

In [2]:
# read nobel1, nobel2 & nobel_demo data using respective import function

nobel1 = pd.read_csv("nobel1.csv")
nobel_category = pd.read_csv("nobel_category_master.csv")
nobel2 = pd.read_excel("nobel2.xlsx")
nobel_demo = pd.read_excel("nobel_demo.xlsx")

In [3]:
# using head & tail inspect basic features of the data
nobel1.head(5)

Unnamed: 0,ID,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country
0,19011,1901,1,The Nobel Prize in Chemistry 1901,44197,160,Individual,Berlin University,Berlin,Germany
1,19013,1901,3,The Nobel Prize in Literature 1901,44197,569,Individual,,,
2,19014,1901,4,The Nobel Prize in Physiology or Medicine 1901,44197,293,Individual,Marburg University,Marburg,Germany
3,19015,1901,5,The Nobel Peace Prize 1901,44228,462,Individual,,,
4,19015,1901,5,The Nobel Peace Prize 1901,44228,463,Individual,,,


In [4]:
nobel2.head(5)

Unnamed: 0,Primary_key,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country
0,20001,2000,1,The Nobel Prize in Chemistry 2000,44256,729,Individual,University of California,"Santa Barbara, CA",United States of America
1,20001,2000,1,The Nobel Prize in Chemistry 2000,44256,730,Individual,University of Pennsylvania,"Philadelphia, PA",United States of America
2,20001,2000,1,The Nobel Prize in Chemistry 2000,44256,731,Individual,University of Tsukuba,Tokyo,Japan
3,20002,2000,2,The Sveriges Riksbank Prize in Economic Scienc...,44228,732,Individual,University of Chicago,"Chicago, IL",United States of America
4,20002,2000,2,The Sveriges Riksbank Prize in Economic Scienc...,44228,733,Individual,University of California,"Berkeley, CA",United States of America


In [5]:
nobel1.tail(5)

Unnamed: 0,ID,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country
701,19993,1999,3,The Nobel Prize in Literature 1999,44197,676,Individual,,,
702,19994,1999,4,The Nobel Prize in Physiology or Medicine 1999,44197,461,Individual,Rockefeller University,"New York, NY",United States of America
703,19995,1999,5,The Nobel Peace Prize 1999,44197,568,Organization,,,
704,19996,1999,6,The Nobel Prize in Physics 1999,44228,158,Individual,Utrecht University,Utrecht,Netherlands
705,19996,1999,6,The Nobel Prize in Physics 1999,44228,159,Individual,,Bilthoven,Netherlands


In [6]:
nobel2.tail(5)

Unnamed: 0,Primary_key,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country
200,20164,2016,4,The Nobel Prize in Physiology or Medicine 2016,44197,927,Individual,Tokyo Institute of Technology,Tokyo,Japan
201,20165,2016,5,The Nobel Peace Prize 2016,44197,934,Individual,,,
202,20166,2016,6,The Nobel Prize in Physics 2016,44228,928,Individual,University of Washington,"Seattle, WA",United States of America
203,20166,2016,6,The Nobel Prize in Physics 2016,44287,929,Individual,Princeton University,"Princeton, NJ",United States of America
204,20166,2016,6,The Nobel Prize in Physics 2016,44287,930,Individual,Brown University,"Providence, RI",United States of America


In [7]:
nobel_demo.head(5)

Unnamed: 0,laureate_id,full_name,birth_date,birth_city,birth_country,sex,death_date,death_city,death_country
0,1,Wilhelm Conrad RÃ¶ntgen,1845-03-27,Lennep (Remscheid),Prussia (Germany),Male,1923-02-10,Munich,Germany
1,2,Hendrik Antoon Lorentz,1853-07-18,Arnhem,Netherlands,Male,1928-02-04,,Netherlands
2,3,Pieter Zeeman,1865-05-25,Zonnemaire,Netherlands,Male,1943-10-09,Amsterdam,Netherlands
3,4,Antoine Henri Becquerel,1852-12-15,Paris,France,Male,1908-08-25,,France
4,5,Pierre Curie,1859-05-15,Paris,France,Male,1906-04-19,Paris,France


In [8]:
nobel_demo.tail(5)

Unnamed: 0,laureate_id,full_name,birth_date,birth_city,birth_country,sex,death_date,death_city,death_country
899,933,Bernard L. Feringa,1951-05-18 00:00:00,Barger-Compascuum,Netherlands,Male,NaT,,
900,934,Juan Manuel Santos,1951-08-10 00:00:00,BogotÃ¡,Colombia,Male,NaT,,
901,935,Oliver Hart,1948-10-09 00:00:00,London,United Kingdom,Male,NaT,,
902,936,Bengt HolmstrÃ¶m,1949-04-18 00:00:00,Helsinki,Finland,Male,NaT,,
903,937,Bob Dylan,1941-05-24 00:00:00,"Duluth, MN",United States of America,Male,NaT,,


In [9]:
nobel_category

Unnamed: 0,category,Category_master
0,Chemistry,1
1,Economics,2
2,Literature,3
3,Medicine,4
4,Peace,5
5,Physics,6


## 2. Create a universal data 
<p>We can observe that, nobel1 & nobel2 are the same datasets distributed within years. So ideally we should append them. The nobel_demo data gives demograhic details, so we should merge them

In [10]:
# Append nobel1 & nobel2 - make sure you dont repeat column names
nobel = nobel1.append(nobel2.rename(columns = {'Primary_key':'ID'}))
nobel.head(5)

Unnamed: 0,ID,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country
0,19011,1901,1,The Nobel Prize in Chemistry 1901,44197,160,Individual,Berlin University,Berlin,Germany
1,19013,1901,3,The Nobel Prize in Literature 1901,44197,569,Individual,,,
2,19014,1901,4,The Nobel Prize in Physiology or Medicine 1901,44197,293,Individual,Marburg University,Marburg,Germany
3,19015,1901,5,The Nobel Peace Prize 1901,44228,462,Individual,,,
4,19015,1901,5,The Nobel Peace Prize 1901,44228,463,Individual,,,


In [11]:
# Merge nobel dataset & nobel_demo dataset using appropriate primary key
merged_table = pd.merge(nobel , nobel_demo, how='outer', on= 'laureate_id')
merged_table.head(5)

Unnamed: 0,ID,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country,full_name,birth_date,birth_city,birth_country,sex,death_date,death_city,death_country
0,19011,1901,1,The Nobel Prize in Chemistry 1901,44197,160,Individual,Berlin University,Berlin,Germany,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,1911-03-01,Berlin,Germany
1,19013,1901,3,The Nobel Prize in Literature 1901,44197,569,Individual,,,,Sully Prudhomme,1839-03-16,Paris,France,Male,1907-09-07,ChÃ¢tenay,France
2,19014,1901,4,The Nobel Prize in Physiology or Medicine 1901,44197,293,Individual,Marburg University,Marburg,Germany,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,1917-03-31,Marburg,Germany
3,19015,1901,5,The Nobel Peace Prize 1901,44228,462,Individual,,,,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,1910-10-30,Heiden,Switzerland
4,19015,1901,5,The Nobel Peace Prize 1901,44228,463,Individual,,,,FrÃ©dÃ©ric Passy,1822-05-20,Paris,France,Male,1912-06-12,Paris,France


In [12]:
merged_table.shape

(911, 18)

Healthy practice - delete the interim dataset to improve RAM usage

In [13]:
# delete the tables no longer required

del [nobel1 , nobel2, nobel_demo]

## 3. So, who gets the Nobel Prize?
<p>Just looking at the first couple of prize winners, or Nobel laureates as they are also called, we already see a celebrity: Wilhelm Conrad Röntgen, the guy who discovered X-rays. And actually, we see that all of the winners in 1901 were guys that came from Europe. But that was back in 1901, looking at all winners in the dataset, from 1901 to 2016, which sex and which country is the most commonly represented? </p>
<p>(For <em>country</em>, we will use the <code>birth_country</code> of the winner, as the <code>organization_country</code> is <code>NaN</code> for all shared Nobel Prizes.)</p>

In [14]:
# Display the number of prizes won by male and female recipients.
merged_table['sex'].value_counts() #to just get the count

Male      836
Female     49
Name: sex, dtype: int64

In [15]:
# Display the % of prizes won by male and female recipients.
merged_table['sex'].value_counts(normalize = True)*100 

Male      94.463277
Female     5.536723
Name: sex, dtype: float64

In [16]:
# Display the number of prizes won by the top 10 nationalities
merged_table['birth_country'].value_counts().head(10)

United States of America    259
United Kingdom               85
Germany                      61
France                       51
Sweden                       29
Japan                        24
Canada                       18
Netherlands                  18
Italy                        17
Russia                       17
Name: birth_country, dtype: int64

In [17]:
# Display the % of prizes won by the top 10 nationalities
merged_table['birth_country'].value_counts(normalize = True).head(10)

United States of America    0.292655
United Kingdom              0.096045
Germany                     0.068927
France                      0.057627
Sweden                      0.032768
Japan                       0.027119
Canada                      0.020339
Netherlands                 0.020339
Italy                       0.019209
Russia                      0.019209
Name: birth_country, dtype: float64

## 4. USA dominance
<p>Not so surprising perhaps: the most common Nobel laureate between 1901 and 2016 was a man born in the United States of America. But in 1901 all the winners were European. When did the USA start to dominate the Nobel Prize charts?</p>

In [18]:
# Calculating the proportion of USA born winners per decade
merged_table['decade'] = np.floor(merged_table['year']/10).astype(int)
merged_table['USA_born'] = merged_table['birth_country'] == 'United States of America'

In [19]:
# Display the proportions of USA born winners per decade
merged_table.pivot_table(index='decade', values ='USA_born')*100 # when we use pivot_table it gives mean() values incase we dont use aggfunc

Unnamed: 0_level_0,USA_born
decade,Unnamed: 1_level_1
190,1.754386
191,7.5
192,7.407407
193,25.0
194,30.232558
195,29.166667
196,26.582278
197,31.730769
198,31.958763
199,40.384615


## 5. What is the gender of a typical Nobel Prize winner?
<p>So the USA became the dominating winner of the Nobel Prize first in the 1930s and had kept the leading position ever since. But one group that was in the lead from the start, and never seems to let go, are <em>men</em>. Maybe it shouldn't come as a shock that there is some imbalance between how many male and female prize winners there are, but how significant is this imbalance? And is it better or worse within specific prize categories like physics, medicine, literature, etc.?</p>

In [20]:
# Calculating the proportion of female laureates per decade w.r.t category
merged_table['females'] = merged_table['sex'] == 'Female' 
merged_table.pivot_table(index='decade',
                         columns='category_master', 
                         values ='females',
                         aggfunc=sum,
                         margins =True,
                         fill_value =0) 

category_master,1,2,3,4,5,6,All
decade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
190,0,0,1,0,1,1,3
191,1,0,0,0,0,0,1
192,0,0,2,0,0,0,2
193,1,0,1,0,1,0,3
194,0,0,1,1,1,0,3
195,0,0,0,0,0,0,0
196,1,0,1,0,0,1,3
197,0,0,0,1,3,0,4
198,0,0,0,3,1,0,4
199,0,0,3,1,3,0,7


As the category is numeric in nature, the picture looks incomplete. We do not know in which categories did they excel. We have a category master using which we can reprint the above data & make it look better. 

In [21]:
# merge the category master based on appropriate columns
# import the category_master shared 
#nobel_category = pd.read_csv("nobel_category_master.csv")
merged_table = pd.merge(merged_table, nobel_category, left_on="category_master", right_on="Category_master")
merged_table.head()

Unnamed: 0,ID,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country,...,birth_country,sex,death_date,death_city,death_country,decade,USA_born,females,category,Category_master
0,19011,1901,1,The Nobel Prize in Chemistry 1901,44197,160,Individual,Berlin University,Berlin,Germany,...,Netherlands,Male,1911-03-01,Berlin,Germany,190,False,False,Chemistry,1
1,19021,1902,1,The Nobel Prize in Chemistry 1902,44197,161,Individual,Berlin University,Berlin,Germany,...,Prussia (Germany),Male,1919-07-15,Berlin,Germany,190,False,False,Chemistry,1
2,19031,1903,1,The Nobel Prize in Chemistry 1903,44197,162,Individual,Stockholm University,Stockholm,Sweden,...,Sweden,Male,1927-10-02,Stockholm,Sweden,190,False,False,Chemistry,1
3,19111,1911,1,The Nobel Prize in Chemistry 1911,44197,6,Individual,Sorbonne University,Paris,France,...,Russian Empire (Poland),Female,1934-07-04,Sallanches,France,191,False,True,Chemistry,1
4,19041,1904,1,The Nobel Prize in Chemistry 1904,44197,163,Individual,University College,London,United Kingdom,...,Scotland,Male,1916-07-23,High Wycombe,United Kingdom,190,False,False,Chemistry,1


In [22]:
# Calculating the proportion of female laureates per decade w.r.t. category_master for better interpretation
merged_table['females'] = merged_table['sex'] == 'Female' 
merged_table.pivot_table(index='decade',
                         columns='category', 
                         values ='females',
                         aggfunc=sum,
                         margins =True,
                         fill_value =0) 

category,Chemistry,Economics,Literature,Medicine,Peace,Physics,All
decade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
190,0,0,1,0,1,1,3
191,1,0,0,0,0,0,1
192,0,0,2,0,0,0,2
193,1,0,1,0,1,0,3
194,0,0,1,1,1,0,3
195,0,0,0,0,0,0,0
196,1,0,1,0,0,1,3
197,0,0,0,1,3,0,4
198,0,0,0,3,1,0,4
199,0,0,3,1,3,0,7


## 6. The first woman to win the Nobel Prize
<p>The plot above is a bit messy as the lines are overplotting. But it does show some interesting trends and patterns. Overall the imbalance is pretty large with physics, economics, and chemistry having the largest imbalance. Medicine has a somewhat positive trend, and since the 1990s the literature prize is also now more balanced. The big outlier is the peace prize during the 2010s, but keep in mind that this just covers the years 2010 to 2016.</p>
<p>Given this imbalance, who was the first woman to receive a Nobel Prize? And in what category?</p>

In [23]:
# Picking out the first woman to win a Nobel Prize
merged_table.sort_values('year', inplace = True)

In [24]:
#first method
merged_table[merged_table['sex'] == 'Female'].head(1)

Unnamed: 0,ID,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country,...,birth_country,sex,death_date,death_city,death_country,decade,USA_born,females,category,Category_master
634,19036,1903,6,The Nobel Prize in Physics 1903,44287,6,Individual,,,,...,Russian Empire (Poland),Female,1934-07-04,Sallanches,France,190,False,True,Physics,6


In [25]:
#second method
merged_table[merged_table['sex']== 'Female'].nsmallest(1,'year')

Unnamed: 0,ID,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country,...,birth_country,sex,death_date,death_city,death_country,decade,USA_born,females,category,Category_master
634,19036,1903,6,The Nobel Prize in Physics 1903,44287,6,Individual,,,,...,Russian Empire (Poland),Female,1934-07-04,Sallanches,France,190,False,True,Physics,6


In [26]:
merged_table.loc[merged_table['sex'] == 'Female'].iloc[0]


ID                                                19036
year                                               1903
category_master                                       6
prize                   The Nobel Prize in Physics 1903
prize_share                                      44,287
laureate_id                                           6
laureate_type                                Individual
organization_name                                   NaN
organization_city                                   NaN
organization_country                                NaN
full_name                  Marie Curie, nÃ©e Sklodowska
birth_date                                   1867-11-07
birth_city                                       Warsaw
birth_country                   Russian Empire (Poland)
sex                                              Female
death_date                          1934-07-04 00:00:00
death_city                                   Sallanches
death_country                                   

## 7. Repeat laureates
<p>For most scientists/writers/activists a Nobel Prize would be the crowning achievement of a long career. But for some people, one is just not enough, and few have gotten it more than once. Who are these lucky few? (Having won no Nobel Prize myself, I'll assume it's just about luck.)</p>

In [27]:
# Selecting the laureates that have received 2 or more prizes.
t = merged_table.groupby('laureate_id', as_index = False)['ID'].count()

In [28]:
t = t.rename(columns = {'ID':'total_nobels'})

In [29]:
merged_table = pd.merge(merged_table, t , on = 'laureate_id')
merged_table.head(2)

Unnamed: 0,ID,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country,...,sex,death_date,death_city,death_country,decade,USA_born,females,category,Category_master,total_nobels
0,19011,1901,1,The Nobel Prize in Chemistry 1901,44197,160,Individual,Berlin University,Berlin,Germany,...,Male,1911-03-01,Berlin,Germany,190,False,False,Chemistry,1,1
1,19013,1901,3,The Nobel Prize in Literature 1901,44197,569,Individual,,,,...,Male,1907-09-07,ChÃ¢tenay,France,190,False,False,Literature,3,1


In [30]:
merged_table[merged_table['total_nobels'] > 1]

Unnamed: 0,ID,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country,...,sex,death_date,death_city,death_country,decade,USA_born,females,category,Category_master,total_nobels
16,19036,1903,6,The Nobel Prize in Physics 1903,44287,6,Individual,,,,...,Female,1934-07-04,Sallanches,France,190,False,True,Physics,6,2
17,19111,1911,1,The Nobel Prize in Chemistry 1911,44197,6,Individual,Sorbonne University,Paris,France,...,Female,1934-07-04,Sallanches,France,191,False,True,Chemistry,1,2
89,19175,1917,5,The Nobel Peace Prize 1917,44197,482,Organization,,,,...,,NaT,,,191,False,False,Peace,5,3
90,19445,1944,5,The Nobel Peace Prize 1944,44197,482,Organization,,,,...,,NaT,,,194,False,False,Peace,5,3
91,19635,1963,5,The Nobel Peace Prize 1963,44228,482,Organization,,,,...,,NaT,,,196,False,False,Peace,5,3
281,19541,1954,1,The Nobel Prize in Chemistry 1954,44197,217,Individual,California Institute of Technology (Caltech),"Pasadena, CA",United States of America,...,Male,1994-08-19,"Big Sur, CA",United States of America,195,True,False,Chemistry,1,2
282,19625,1962,5,The Nobel Peace Prize 1962,44197,217,Individual,California Institute of Technology (Caltech),"Pasadena, CA",United States of America,...,Male,1994-08-19,"Big Sur, CA",United States of America,196,True,False,Peace,5,2
283,19545,1954,5,The Nobel Peace Prize 1954,44197,515,Organization,,,,...,,NaT,,,195,False,False,Peace,5,2
284,19815,1981,5,The Nobel Peace Prize 1981,44197,515,Organization,,,,...,,NaT,,,198,False,False,Peace,5,2
302,19566,1956,6,The Nobel Prize in Physics 1956,44256,66,Individual,University of Illinois,"Urbana, IL",United States of America,...,Male,1991-01-30,"Boston, MA",United States of America,195,True,False,Physics,6,2


## 8. How old are you when you get the prize?
<p>The list of repeat winners contains some illustrious names! We again meet Marie Curie, who got the prize in physics for discovering radiation and in chemistry for isolating radium and polonium. John Bardeen got it twice in physics for transistors and superconductivity, Frederick Sanger got it twice in chemistry, and Linus Carl Pauling got it first in chemistry and later in peace for his work in promoting nuclear disarmament. We also learn that organizations also get the prize as both the Red Cross and the UNHCR have gotten it twice.</p>
<p>But how old are you generally when you get the prize?</p>

In [31]:
# Check the data type of all columns 
merged_table.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 911 entries, 0 to 910
Data columns (total 24 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   ID                    911 non-null    int64         
 1   year                  911 non-null    int64         
 2   category_master       911 non-null    int64         
 3   prize                 911 non-null    object        
 4   prize_share           911 non-null    object        
 5   laureate_id           911 non-null    int64         
 6   laureate_type         911 non-null    object        
 7   organization_name     665 non-null    object        
 8   organization_city     667 non-null    object        
 9   organization_country  667 non-null    object        
 10  full_name             911 non-null    object        
 11  birth_date            883 non-null    object        
 12  birth_city            883 non-null    object        
 13  birth_country       

In [32]:
# Converting birth_date from String to datetime
merged_table['birth_date'] = pd.to_datetime(merged_table['birth_date'])
merged_table['birth_date']

0     1852-08-30
1     1839-03-16
2     1828-05-08
3     1822-05-20
4     1845-03-27
         ...    
906   1934-09-21
907   1951-09-14
908   1943-06-22
909   1951-05-18
910   1949-04-18
Name: birth_date, Length: 911, dtype: datetime64[ns]

In [33]:
# Calculating the age of Nobel Prize winners
merged_table['age'] = merged_table['year'] - merged_table['birth_date'].dt.year # .dt.year will save the year in dtype:float
merged_table

Unnamed: 0,ID,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country,...,death_date,death_city,death_country,decade,USA_born,females,category,Category_master,total_nobels,age
0,19011,1901,1,The Nobel Prize in Chemistry 1901,44197,160,Individual,Berlin University,Berlin,Germany,...,1911-03-01,Berlin,Germany,190,False,False,Chemistry,1,1,49.0
1,19013,1901,3,The Nobel Prize in Literature 1901,44197,569,Individual,,,,...,1907-09-07,ChÃ¢tenay,France,190,False,False,Literature,3,1,62.0
2,19015,1901,5,The Nobel Peace Prize 1901,44228,462,Individual,,,,...,1910-10-30,Heiden,Switzerland,190,False,False,Peace,5,1,73.0
3,19015,1901,5,The Nobel Peace Prize 1901,44228,463,Individual,,,,...,1912-06-12,Paris,France,190,False,False,Peace,5,1,79.0
4,19016,1901,6,The Nobel Prize in Physics 1901,44197,1,Individual,Munich University,Munich,Germany,...,1923-02-10,Munich,Germany,190,False,False,Physics,6,1,56.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
906,20166,2016,6,The Nobel Prize in Physics 2016,44228,928,Individual,University of Washington,"Seattle, WA",United States of America,...,NaT,,,201,False,False,Physics,6,1,82.0
907,20166,2016,6,The Nobel Prize in Physics 2016,44287,929,Individual,Princeton University,"Princeton, NJ",United States of America,...,NaT,,,201,False,False,Physics,6,1,65.0
908,20166,2016,6,The Nobel Prize in Physics 2016,44287,930,Individual,Brown University,"Providence, RI",United States of America,...,NaT,,,201,False,False,Physics,6,1,73.0
909,20161,2016,1,The Nobel Prize in Chemistry 2016,44256,933,Individual,University of Groningen,Groningen,Netherlands,...,NaT,,,201,False,False,Chemistry,1,1,65.0


In [34]:
merged_table['age'].mean()

59.453001132502834

In [35]:
merged_table['age'].mode()

0    61.0
dtype: float64

In [36]:
merged_table['age'].median()

60.0

## 9. How old were they, when they got their first & second prize ?
<p>Clearly a few individuals have been so good that they received nobel prize twice. Can we see ? what was their age at first & second adajecent to each other ? Let us also see, at what age did they passed away (if so )

In [37]:
# Converting death_date from String to datetime
merged_table['death_time'] = pd.to_datetime(merged_table['death_date'])
merged_table.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 911 entries, 0 to 910
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   ID                    911 non-null    int64         
 1   year                  911 non-null    int64         
 2   category_master       911 non-null    int64         
 3   prize                 911 non-null    object        
 4   prize_share           911 non-null    object        
 5   laureate_id           911 non-null    int64         
 6   laureate_type         911 non-null    object        
 7   organization_name     665 non-null    object        
 8   organization_city     667 non-null    object        
 9   organization_country  667 non-null    object        
 10  full_name             911 non-null    object        
 11  birth_date            883 non-null    datetime64[ns]
 12  birth_city            883 non-null    object        
 13  birth_country       

In [38]:
# Calculating the age of death of Nobel Prize winners
merged_table['death_age'] = merged_table['death_date'].dt.year - merged_table['birth_date'].dt.year # .dt.year will save the year in dtype:float
merged_table.head()

Unnamed: 0,ID,year,category_master,prize,prize_share,laureate_id,laureate_type,organization_name,organization_city,organization_country,...,death_country,decade,USA_born,females,category,Category_master,total_nobels,age,death_time,death_age
0,19011,1901,1,The Nobel Prize in Chemistry 1901,44197,160,Individual,Berlin University,Berlin,Germany,...,Germany,190,False,False,Chemistry,1,1,49.0,1911-03-01,59.0
1,19013,1901,3,The Nobel Prize in Literature 1901,44197,569,Individual,,,,...,France,190,False,False,Literature,3,1,62.0,1907-09-07,68.0
2,19015,1901,5,The Nobel Peace Prize 1901,44228,462,Individual,,,,...,Switzerland,190,False,False,Peace,5,1,73.0,1910-10-30,82.0
3,19015,1901,5,The Nobel Peace Prize 1901,44228,463,Individual,,,,...,France,190,False,False,Peace,5,1,79.0,1912-06-12,90.0
4,19016,1901,6,The Nobel Prize in Physics 1901,44197,1,Individual,Munich University,Munich,Germany,...,Germany,190,False,False,Physics,6,1,56.0,1923-02-10,78.0


In [43]:
# Identify the first & second age when they won the prize & also the age at the time of death.
pd.pivot_table(merged_table[(merged_table['total_nobels']>1) & (merged_table['laureate_type'] == 'Individual')],
               index = 'full_name',
               values = 'age',
               aggfunc =['min', 'max'],
               fill_value = 0,
               margins = True)               

Unnamed: 0_level_0,min,max
Unnamed: 0_level_1,age,age
full_name,Unnamed: 1_level_2,Unnamed: 2_level_2
Frederick Sanger,40,62
John Bardeen,48,64
Linus Carl Pauling,53,61
"Marie Curie, nÃ©e Sklodowska",36,44
All,36,64


## 10. Oldest and youngest winners
<p>More plots with lots of exciting stuff going on! We see that both winners of the chemistry, medicine, and physics prize have gotten older over time. The trend is strongest for physics: the average age used to be below 50, and now it's almost 70. Literature and economics are more stable. We also see that economics is a newer category. But peace shows an opposite trend where winners are getting younger! </p>
<p>In the peace category we also a winner around 2010 that seems exceptionally young. This begs the questions, who are the oldest and youngest people ever to have won a Nobel Prize?</p>

In [None]:
# The oldest winner of a Nobel Prize as of 2016
merged_table.sort_values('age', ascending = False)[['full_name', 'age', 'year']].head(1)

# The youngest winner of a Nobel Prize as of 2016
merged_table.sort_values('age')[['full_name', 'age', 'year']].head(1)

## 11. Winners from my country - India
<p> We do not have citizenship in our data, but using what is available. Let us identify nobel prize winners from India. A psuedo measure could be, birth_country & death_country.
    Sometimes there is data limitation, and we can see how it can affect analysis.


In [55]:
# Identify the rows where birth country is India
merged_table.loc[merged_table['birth_country'] == 'India', ['full_name', 'category','age','birth_country','death_country']]

Unnamed: 0,full_name,category,age,birth_country,death_country
9,Ronald Ross,Medicine,45.0,India,United Kingdom
75,Rabindranath Tagore,Literature,52.0,India,India
153,Sir Chandrasekhara Venkata Raman,Physics,42.0,India,India
387,Har Gobind Khorana,Medicine,46.0,India,United States of America
690,Amartya Sen,Economics,65.0,India,
827,Venkatraman Ramakrishnan,Chemistry,,India,
878,Kailash Satyarthi,Peace,60.0,India,


In [69]:
# Identify the rows where birth country or death country is India
merged_table.loc[(merged_table['birth_country']  == 'India') | (merged_table['death_country'] == 'India'), 
                                                                             ['full_name', 'category','age','birth_country','death_country']]

Unnamed: 0,full_name,category,age,birth_country,death_country
9,Ronald Ross,Medicine,45.0,India,United Kingdom
75,Rabindranath Tagore,Literature,52.0,India,India
153,Sir Chandrasekhara Venkata Raman,Physics,42.0,India,India
387,Har Gobind Khorana,Medicine,46.0,India,United States of America
500,Mother Teresa,Peace,69.0,Ottoman Empire (Republic of Macedonia),India
690,Amartya Sen,Economics,65.0,India,
827,Venkatraman Ramakrishnan,Chemistry,,India,
878,Kailash Satyarthi,Peace,60.0,India,


In [61]:
# Inspect the rows of the year 1983
merged_table.loc[merged_table['year'] == 1983, ['full_name', 'category','age','birth_country','death_country']]

Unnamed: 0,full_name,category,age,birth_country,death_country
536,William Alfred Fowler,Physics,72.0,United States of America,United States of America
537,Gerard Debreu,Economics,62.0,France,France
538,William Golding,Literature,72.0,United Kingdom,United Kingdom
539,Henry Taube,Chemistry,68.0,Canada,United States of America
540,Subramanyan Chandrasekhar,Physics,73.0,India (Pakistan),United States of America
541,Lech Walesa,Peace,40.0,Poland,
542,Barbara McClintock,Medicine,81.0,United States of America,United States of America


In [82]:
# Winners from my country - India (final answer)
Indian_nobel_winners = merged_table.loc[(merged_table['birth_country'].str.find('India')>=0)|(merged_table['death_country'].str.find('India')>=0),
                 ['full_name','category','age','birth_country','death_country']]
Indian_nobel_winners

Unnamed: 0,full_name,category,age,birth_country,death_country
9,Ronald Ross,Medicine,45.0,India,United Kingdom
38,Rudyard Kipling,Literature,42.0,British India (India),United Kingdom
75,Rabindranath Tagore,Literature,52.0,India,India
153,Sir Chandrasekhara Venkata Raman,Physics,42.0,India,India
387,Har Gobind Khorana,Medicine,46.0,India,United States of America
500,Mother Teresa,Peace,69.0,Ottoman Empire (Republic of Macedonia),India
506,Abdus Salam,Physics,53.0,India (Pakistan),United Kingdom
540,Subramanyan Chandrasekhar,Physics,73.0,India (Pakistan),United States of America
690,Amartya Sen,Economics,65.0,India,
784,Muhammad Yunus,Peace,66.0,British India (Bangladesh),
