## Visualizing History Of Nobel Prize Winners 

### Project Description

<b>Project Goal: <b>
- Analyze Nobel Prize winner data

<b>Business Problem</b>
- What is the most commonly awarded gender and birth country?
- Which decade had the highest ratio of US-born Nobel Prize winners to total winners in all categories ?
- Which decade and Nobel Prize category combination had the highest proportion of female laureates ?
- Who was the first woman to receive a Nobel Prize, and in what category?
- Which individuals or organizations have won more than one Nobel Prize throughout the years?

<b> Data Source </b>
- Developer Zone - API Version 1 -https://api.nobelprize.org/2.1/laureates 

In [2]:
import requests
import numpy as np
import pandas as pd 

#### API Version 1

In [7]:
url='http://api.nobelprize.org/v1/prize.json'

In [8]:
response_api_v1=requests.get(url).json()

In [9]:
response_api_v1['prizes'][0]

{'year': '2024',
 'category': 'chemistry',
 'laureates': [{'id': '1039',
   'firstname': 'David',
   'surname': 'Baker',
   'motivation': '"for computational protein design"',
   'share': '2'},
  {'id': '1040',
   'firstname': 'Demis',
   'surname': 'Hassabis',
   'motivation': '"for protein structure prediction"',
   'share': '4'},
  {'id': '1041',
   'firstname': 'John',
   'surname': 'Jumper',
   'motivation': '"for protein structure prediction"',
   'share': '4'}]}

In [10]:
response_api_v1['prizes'][3]

{'year': '2024',
 'category': 'peace',
 'laureates': [{'id': '1043',
   'motivation': '"for its efforts to achieve a world free of nuclear weapons and for demonstrating through witness testimony that nuclear weapons must never be used again"',
   'share': '1',
   'firstname': 'Nihon Hidankyo'}]}

In [11]:
ids=[]
firstname=[]
surname=[]
year=[]
category=[]
motivation=[]
share=[]
for i in response_api_v1['prizes'][0]['laureates']:
    # Here, i means each individual dictionary.
    # Appending the details of laureates dictionary to a list
    # Appending the id to the list 
    ids.append(i['id'])
    # Appending the firstname to the list
    firstname.append(i['firstname'])
    # Appending the surname to the list 
    surname.append(i['surname'])
    # Appending the motivation to the list 
    motivation.append(i['motivation'])
    year.append(response_api_v1['prizes'][0]['year'])
    category.append(response_api_v1['prizes'][0]['category'])

In [12]:
print(ids)
print(firstname)
print(surname)
print(year)
print(category)
print(motivation)

['1039', '1040', '1041']
['David', 'Demis', 'John']
['Baker', 'Hassabis', 'Jumper']
['2024', '2024', '2024']
['chemistry', 'chemistry', 'chemistry']
['"for computational protein design"', '"for protein structure prediction"', '"for protein structure prediction"']


In [13]:
# Length of the whole dictionary
var=len(response_api_v1['prizes'])
var

676

### Checking Consistencies of keys in dictionary

In [15]:
'surname' in response_api_v1['prizes'][0]['laureates'][0]

True

In [16]:
response_api_v1['prizes'][0]['laureates'][0]['surname']

'Baker'

#### Checking 'surname' is there in every 'laureate' or not

In [18]:
k=0
for count in range(var):
    if 'laureates' in response_api_v1['prizes'][count]:
        for index,j in enumerate(response_api_v1['prizes'][count]['laureates']):
            if 'surname' in response_api_v1['prizes'][count]['laureates'][index]:
                pass
            else:
                # Printing out the keys where surname is not present
                print('Not Consistent Key',count,response_api_v1['prizes'][count]['laureates'][index].keys())
                k+=1
                break
    else: 
        print(count)
        break
print(k)

Not Consistent Key 3 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 15 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 27 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 45 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 57 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 69 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 75 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 105 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 111 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 117 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 141 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 153 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Consistent Key 165 dict_keys(['id', 'motivation', 'share', 'firstname'])
Not Con

#### Checking 'firstname' is there in every 'laureate' or not

In [20]:
k=True
for count in range(var):
    if 'laureates' in response_api_v1['prizes'][count]:
        for index,j in enumerate(response_api_v1['prizes'][count]['laureates']):
            if 'firstname' in response_api_v1['prizes'][count]['laureates'][index]:
                pass
            else:
                # Printing out the keys where surname is not present
                k=False
                print('Not Consistent Key',count,response_api_v1['prizes'][count]['laureates'][index].keys())
                break
if k==False:
    print('firstname is not present in all of the dictionary')
else:
    print('firstname is there in all of the dictionary')

firstname is there in all of the dictionary


#### Checking 'motivation' is there in every 'laureate' or not

In [22]:
k=True
for count in range(var):
    if 'laureates' in response_api_v1['prizes'][count]:
        for index,j in enumerate(response_api_v1['prizes'][count]['laureates']):
            if 'motivation' in response_api_v1['prizes'][count]['laureates'][index]:
                pass
            else:
                # Printing out the keys where surname is not present
                k=False
                print('Not Consistent Key',count,response_api_v1['prizes'][count]['laureates'][index].keys())
                break
if k==False:
    print('motivation is not present in all of the dictionary')
else:
    print('motivation is there in all of the dictionary')

motivation is there in all of the dictionary


#### Checking 'share' is there in every 'laureate' or not

In [24]:
k=True
for count in range(var):
    if 'laureates' in response_api_v1['prizes'][count]:
        for index,j in enumerate(response_api_v1['prizes'][count]['laureates']):
            if 'share' in response_api_v1['prizes'][count]['laureates'][index]:
                pass
            else:
                # Printing out the keys where surname is not present
                print('Not Consistent Key',count,response_api_v1['prizes'][count]['laureates'][index].keys())
                break
if k==False:
    print('share is not present in all of the dictionary')
else:
    print('share is there in all of the dictionary')

share is there in all of the dictionary


#### Checking 'share' is there in every 'laureate' or not

In [26]:
k=True
for count in range(var):
    if 'laureates' in response_api_v1['prizes'][count]:
        for index,j in enumerate(response_api_v1['prizes'][count]['laureates']):
            if 'id' in response_api_v1['prizes'][count]['laureates'][index]:
                pass
            else:
                # Printing out the keys where surname is not present
                k=False
                print('Not Consistent Key',count,response_api_v1['prizes'][count]['laureates'][index].keys())
                break
if k==False:
    print('id is not present in all of the dictionary')
else:
    print('id is there in all of the dictionary')

id is there in all of the dictionary


### Now writing a script where the surname is not present 

In [28]:
ids=[]
firstname=[]
surname=[]
year=[]
category=[]
motivation=[]
share=[]

In [29]:
for count in range(var):
    if 'laureates' in response_api_v1['prizes'][count]:
        for index,j in enumerate(response_api_v1['prizes'][count]['laureates']):
            if 'surname' not in response_api_v1['prizes'][count]['laureates'][index]:
                surname.append(None)
                ids.append(j['id'])
                # Appending the firstname to the list
                firstname.append(j['firstname'])
                # Appending the motivation to the list 
                motivation.append(j['motivation'])
                year.append(response_api_v1['prizes'][count]['year'])
                category.append(response_api_v1['prizes'][count]['category'])
                share.append(j['share'])
            else:
                ids.append(j['id'])
                # Appending the firstname to the list
                firstname.append(j['firstname'])
                # Appending the motivation to the list 
                motivation.append(j['motivation'])
                # Appending the surname to the list 
                surname.append(j['surname'])
                year.append(response_api_v1['prizes'][count]['year'])
                category.append(response_api_v1['prizes'][count]['category'])
                share.append(j['share'])

In [30]:
len(surname)

1012

In [31]:
len(ids)

1012

In [32]:
len(firstname)

1012

In [33]:
len(motivation)

1012

In [34]:
len(category)

1012

In [35]:
len(share)

1012

In [36]:
len(year)

1012

In [37]:
count=0
for i in surname:
    if i==None:
        count+=1
print(count)

33


In [38]:
for i in surname:
    print(type(i),i)

<class 'str'> Baker
<class 'str'> Hassabis
<class 'str'> Jumper
<class 'str'> Acemoglu
<class 'str'> Johnson
<class 'str'> Robinson
<class 'str'> Han
<class 'NoneType'> None
<class 'str'> Hopfield
<class 'str'> Hinton
<class 'str'> Ambros
<class 'str'> Ruvkun
<class 'str'> Bawendi
<class 'str'> Brus
<class 'str'> Yekimov
<class 'str'> Goldin
<class 'str'> Fosse
<class 'str'> Mohammadi
<class 'str'> Agostini
<class 'str'> Krausz
<class 'str'> L’Huillier
<class 'str'> Karikó
<class 'str'> Weissman
<class 'str'> Bertozzi
<class 'str'> Meldal
<class 'str'> Sharpless
<class 'str'> Bernanke
<class 'str'> Diamond
<class 'str'> Dybvig
<class 'str'> Ernaux
<class 'str'> Bialiatski 
<class 'NoneType'> None
<class 'NoneType'> None
<class 'str'> Aspect
<class 'str'> Clauser
<class 'str'> Zeilinger
<class 'str'> Pääbo
<class 'str'> List
<class 'str'> MacMillan
<class 'str'> Card
<class 'str'> Angrist
<class 'str'> Imbens
<class 'str'> Gurnah
<class 'str'> Ressa
<class 'str'> Muratov
<class 'str'> M

#### Now, creating a dictionary using this list 

In [67]:
data={'id':ids,'year':year,'category':category,'firstname':firstname,'surname':surname,'motivation':motivation,'share':share}

#### Importing this data into pandas dataframe 

In [70]:
data=pd.DataFrame(data)
data

Unnamed: 0,id,year,category,firstname,surname,motivation,share
0,1039,2024,chemistry,David,Baker,"""for computational protein design""",2
1,1040,2024,chemistry,Demis,Hassabis,"""for protein structure prediction""",4
2,1041,2024,chemistry,John,Jumper,"""for protein structure prediction""",4
3,1044,2024,economics,Daron,Acemoglu,"""for studies of how institutions are formed an...",3
4,1045,2024,economics,Simon,Johnson,"""for studies of how institutions are formed an...",3
...,...,...,...,...,...,...,...
1007,569,1901,literature,Sully,Prudhomme,"""in special recognition of his poetic composit...",1
1008,462,1901,peace,Henry,Dunant,"""for his humanitarian efforts to help wounded ...",2
1009,463,1901,peace,Frédéric,Passy,"""for his lifelong work for international peace...",2
1010,1,1901,physics,Wilhelm Conrad,Röntgen,"""in recognition of the extraordinary services ...",1


In [72]:
len(data['id'])

1012

In [91]:
data['year'].max()

'2024'

In [93]:
data['year'].min()

'1901'

In [95]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1012 entries, 0 to 1011
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          1012 non-null   object
 1   year        1012 non-null   object
 2   category    1012 non-null   object
 3   firstname   1012 non-null   object
 4   surname     979 non-null    object
 5   motivation  1012 non-null   object
 6   share       1012 non-null   object
dtypes: object(7)
memory usage: 55.5+ KB


In [111]:
data[data['surname'].isnull()]

Unnamed: 0,id,year,category,firstname,surname,motivation,share
7,1043,2024,peace,Nihon Hidankyo,,"""for its efforts to achieve a world free of nu...",1
31,1019,2022,peace,Memorial,,"""The Peace Prize laureates represent civil soc...",3
32,1020,2022,peace,Center for Civil Liberties,,"""The Peace Prize laureates represent civil soc...",3
55,994,2020,peace,World Food Programme,,"""for its efforts to combat hunger, for its con...",1
94,948,2017,peace,International Campaign to Abolish Nuclear Weapons,,"""for its work to draw attention to the catastr...",1
117,925,2015,peace,National Dialogue Quartet,,"""for its decisive contribution to the building...",1
143,893,2013,peace,Organisation for the Prohibition of Chemical W...,,"""for its extensive efforts to eliminate chemic...",1
154,881,2012,peace,European Union,,"""for over six decades contributed to the advan...",1
213,818,2007,peace,Intergovernmental Panel on Climate Change,,"""for their efforts to build up and disseminate...",2
224,810,2006,peace,Grameen Bank,,"""for their efforts to create economic and soci...",2


#### Creating an column name laureate_type

In [133]:
data['laureate_type']=np.where(data['surname'].isnull(),'Organization','Individual')

In [135]:
data

Unnamed: 0,id,year,category,firstname,surname,motivation,share,laureate_type
0,1039,2024,chemistry,David,Baker,"""for computational protein design""",2,Individual
1,1040,2024,chemistry,Demis,Hassabis,"""for protein structure prediction""",4,Individual
2,1041,2024,chemistry,John,Jumper,"""for protein structure prediction""",4,Individual
3,1044,2024,economics,Daron,Acemoglu,"""for studies of how institutions are formed an...",3,Individual
4,1045,2024,economics,Simon,Johnson,"""for studies of how institutions are formed an...",3,Individual
...,...,...,...,...,...,...,...,...
1007,569,1901,literature,Sully,Prudhomme,"""in special recognition of his poetic composit...",1,Individual
1008,462,1901,peace,Henry,Dunant,"""for his humanitarian efforts to help wounded ...",2,Individual
1009,463,1901,peace,Frédéric,Passy,"""for his lifelong work for international peace...",2,Individual
1010,1,1901,physics,Wilhelm Conrad,Röntgen,"""in recognition of the extraordinary services ...",1,Individual


In [213]:
str.rstrip(' Hello ')

' Hello'

In [231]:
def clean_motivation(string):
    string=string.replace('"','')
    string=string.lstrip()
    string=string.capitalize()
    return string

In [233]:
data['motivation']=data['motivation'].apply(clean_motivation)

In [239]:
data

Unnamed: 0,id,year,category,firstname,surname,motivation,share,laureate_type
0,1039,2024,chemistry,David,Baker,For computational protein design,2,Individual
1,1040,2024,chemistry,Demis,Hassabis,For protein structure prediction,4,Individual
2,1041,2024,chemistry,John,Jumper,For protein structure prediction,4,Individual
3,1044,2024,economics,Daron,Acemoglu,For studies of how institutions are formed and...,3,Individual
4,1045,2024,economics,Simon,Johnson,For studies of how institutions are formed and...,3,Individual
...,...,...,...,...,...,...,...,...
1007,569,1901,literature,Sully,Prudhomme,In special recognition of his poetic compositi...,1,Individual
1008,462,1901,peace,Henry,Dunant,For his humanitarian efforts to help wounded s...,2,Individual
1009,463,1901,peace,Frédéric,Passy,For his lifelong work for international peace ...,2,Individual
1010,1,1901,physics,Wilhelm Conrad,Röntgen,In recognition of the extraordinary services h...,1,Individual


In [370]:
data['id'].astype(int)

TypeError: list indices must be integers or slices, not str

In [241]:
# Creating list to store the items in each dictionary 
ids=[]
firstname=[]
surname=[]
year=[]
category=[]
motivation=[]
share=[]

In [243]:
var=len(response_api_v1['prizes'])
var

676

In [None]:
# Checking if the 

In [24]:
for dict_track in range(var):
    for i in response_api_v1['prizes'][dict_track]['laureates']:
        ids.append(i['id'])
        firstname.append(i['firstname'])
        surname.append(i['surname'])
        motivation.append(i['motivation'])
        year.append(response_api_v1['prizes'][dict_track]['year'])
        category.append(response_api_v1['prizes'][dict_track]['category'])

KeyError: 'surname'

In [189]:
key3='laureates'
list_of_incon_keys=[]
for i in range(var):
    if key3 in response_api_v1['prizes'][i].keys():
        pass
    else:
        list_of_incon_keys.append(i)

In [193]:
# Keys which doesn't match with rest of the keys
len(list_of_incon_keys)

49

In [203]:
# Dictionary for years in which nobel prize was not awarded 
for i in list_of_incon_keys:
    print(response_api_v1['prizes'][i].keys())

dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivation'])
dict_keys(['year', 'category', 'overallMotivat

In [223]:
print(np.nan)

nan


In [245]:
for dict_track in range(var):
    for i in response_api_v1['prizes'][dict_track]:
        if 'laureates' in i.keys():
        pass
        """
        else:
            
            ids.append(np.nan)
            firstname.append(np.nan)
            surname.append(np.nan)
            motivation.append(np.nan)
            year.append(np.nan)
            category.append(np.nan)
            print(i)
            break
        """
    
    #for i in response_api_v1['prizes'][dict_track]['laureates']:
        """
        ids.append(i['id'])
        firstname.append(i['firstname'])
        surname.append(i['surname'])
        motivation.append(i['motivation'])
        year.append(response_api_v1['prizes'][dict_track]['year'])
        category.append(response_api_v1['prizes'][dict_track]['category'])"""
        

AttributeError: 'str' object has no attribute 'keys'

In [253]:
surname

['Baker',
 'Hassabis',
 'Jumper',
 'Acemoglu',
 'Johnson',
 'Robinson',
 'Han',
 'Baker',
 'Hassabis',
 'Jumper',
 'Acemoglu',
 'Johnson',
 'Robinson',
 'Han',
 nan,
 nan,
 'Baker',
 'Hassabis',
 'Jumper',
 'Acemoglu',
 'Johnson',
 'Robinson',
 'Han']

In [255]:
ids=[]
firstname=[]
surname=[]
year=[]
category=[]
motivation=[]
share=[]

In [257]:
for dict_track in range(var):
    if 'laureates' in response_api_v1['prizes'][dict_track]:
        for i in response_api_v1['prizes'][dict_track]['laureates']:
            ids.append(i['id'])
            firstname.append(i['firstname'])
            surname.append(i['surname'])
            motivation.append(i['motivation'])
            year.append(response_api_v1['prizes'][dict_track]['year'])
            category.append(response_api_v1['prizes'][dict_track]['category'])
    else:
        print(dict_track)
        break

KeyError: 'surname'

In [263]:
# Checking if surname is consistent all over the dicrionary of not
for dict_track in range(var):
    if 'laureates' in response_api_v1['prizes'][dict_track]:
        if 'surname' in response_api_v1['prizes'][dict_track]['laureates'].keys():
            pass
        else:
            print(dict_track)
            break

AttributeError: 'list' object has no attribute 'keys'

In [243]:
'laureates' in response_api_v1['prizes'][0].keys()

True

In [197]:
response_api_v1['prizes'][608]

{'year': '1914',
 'category': 'peace',
 'overallMotivation': '"No Nobel Prize was awarded this year. The prize money was allocated to the Special Fund of this prize section."'}

In [187]:
response_api_v1['prizes'][343]

{'year': '1967',
 'category': 'peace',
 'overallMotivation': '"No Nobel Prize was awarded this year. 1/3 of the prize money was allocated to the main fund and 2/3 was allocated to the special fund of this prize section."'}

In [165]:
response_api_v1['prizes'][315]

{'year': '1972',
 'category': 'peace',
 'overallMotivation': '"No Nobel Prize was awarded this year. The prize money for 1972 was allocated to the Main Fund."'}

In [149]:
response_api_v1['prizes'][0].keys()

dict_keys(['year', 'category', 'laureates'])

In [42]:
# Checking the consistency of ids again API Version 2 
# Checking an arbitary id e.g-> 164

In [269]:
check_id='745'

In [271]:
for i in range(var):
    if 'laureates' in response_api_v1['prizes'][i]:
        for j in response_api_v1['prizes'][i]['laureates']:
            if j['id']==check_id:
                print(j['id'],j['firstname']+' '+j['surname'])

745 A. Michael Spence


In [352]:
# API Version 2 
url="https://api.nobelprize.org/2.1/laureates"

In [354]:
# Fetching the response from the api version 2 
response_laureates=requests.get(url).json()

In [358]:
len(response_laureates['laureates'])

25

In [362]:
response_laureates['laureates'][-1]

{'id': '403',
 'knownName': {'en': 'Albert Claude', 'se': 'Albert Claude'},
 'givenName': {'en': 'Albert', 'se': 'Albert'},
 'familyName': {'en': 'Claude', 'se': 'Claude'},
 'fullName': {'en': 'Albert Claude', 'se': 'Albert Claude'},
 'fileName': 'claude',
 'gender': 'male',
 'birth': {'date': '1898-08-24',
  'place': {'city': {'en': 'Longlier', 'no': 'Longlier', 'se': 'Longlier'},
   'country': {'en': 'Belgium', 'no': 'Belgia', 'se': 'Belgien'},
   'cityNow': {'en': 'Longlier', 'no': 'Longlier', 'se': 'Longlier'},
   'countryNow': {'en': 'Belgium',
    'no': 'Belgia',
    'se': 'Belgien',
    'sameAs': ['https://www.wikidata.org/wiki/Q31'],
    'latitude': '50.641111',
    'longitude': '4.668056'},
   'continent': {'en': 'Europe', 'no': 'Europa', 'se': 'Europa'},
   'locationString': {'en': 'Longlier, Belgium',
    'no': 'Longlier, Belgia',
    'se': 'Longlier, Belgien'}}},
 'death': {'date': '1983-05-22',
  'place': {'city': {'en': 'Brussels', 'no': 'Brussel', 'se': 'Bryssel'},
   'c

In [267]:
# Basically, we want the following dictionary from this response.
# title,gender,birthdate,birthcity,birthcountry,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country,dateAwarded

In [328]:
response_laureates

{'laureates': [{'id': '745',
   'knownName': {'en': 'A. Michael Spence', 'se': 'A. Michael Spence'},
   'givenName': {'en': 'A. Michael', 'se': 'A. Michael'},
   'familyName': {'en': 'Spence', 'se': 'Spence'},
   'fullName': {'en': 'A. Michael Spence', 'se': 'A. Michael Spence'},
   'fileName': 'spence',
   'gender': 'male',
   'birth': {'date': '1943-00-00',
    'place': {'city': {'en': 'Montclair, NJ',
      'no': 'Montclair, NJ',
      'se': 'Montclair, NJ'},
     'country': {'en': 'USA', 'no': 'USA', 'se': 'USA'},
     'cityNow': {'en': 'Montclair, NJ',
      'no': 'Montclair, NJ',
      'se': 'Montclair, NJ',
      'sameAs': ['https://www.wikidata.org/wiki/Q678437',
       'https://www.wikipedia.org/wiki/Montclair,_New_Jersey'],
      'latitude': '40.825930',
      'longitude': '-74.209030'},
     'countryNow': {'en': 'USA',
      'no': 'USA',
      'se': 'USA',
      'sameAs': ['https://www.wikidata.org/wiki/Q30'],
      'latitude': '39.828175',
      'longitude': '-98.579500'},


#### Importing the data into a JSON File

In [331]:
import json

filename='response_api_version_2.json'
data=response_laureates['laureates']
with open(filename,'w') as json_file:
    json.dump(data,json_file)
print(f'Data saved to {filename}')

Data saved to response_api_version_2.json


#### Loading the JSON data into pandas DataFrame

In [4]:
data_read=pd.read_json(filename)

NameError: name 'filename' is not defined

In [336]:
data_read.tail()

Unnamed: 0,id,knownName,givenName,familyName,fullName,fileName,gender,birth,wikipedia,wikidata,sameAs,links,nobelPrizes,death
20,376,"{'en': 'Alan Hodgkin', 'se': 'Alan Hodgkin'}","{'en': 'Alan', 'se': 'Alan'}","{'en': 'Hodgkin', 'se': 'Hodgkin'}","{'en': 'Alan Lloyd Hodgkin', 'se': 'Alan Lloyd...",hodgkin,male,"{'date': '1914-02-05', 'place': {'city': {'en'...","{'slug': 'Alan_Lloyd_Hodgkin', 'english': 'htt...","{'id': 'Q193650', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q193650, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '1963', 'category': {'en': 'Phy...","{'date': '1998-12-20', 'place': {'city': {'en'..."
21,730,"{'en': 'Alan MacDiarmid', 'se': 'Alan MacDiarm...","{'en': 'Alan', 'se': 'Alan'}","{'en': 'MacDiarmid', 'se': 'MacDiarmid'}","{'en': 'Alan G. MacDiarmid', 'se': 'Alan G. Ma...",macdiarmid,male,"{'date': '1927-04-14', 'place': {'city': {'en'...","{'slug': 'Alan_MacDiarmid', 'english': 'https:...","{'id': 'Q110942', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q110942, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '2000', 'category': {'en': 'Che...","{'date': '2007-02-07', 'place': {'city': {'en'..."
22,11,"{'en': 'Albert A. Michelson', 'se': 'Albert A....","{'en': 'Albert A.', 'se': 'Albert A.'}","{'en': 'Michelson', 'se': 'Michelson'}","{'en': 'Albert Abraham Michelson', 'se': 'Albe...",michelson,male,"{'date': '1852-12-19', 'place': {'city': {'en'...","{'slug': 'Albert_Abraham_Michelson', 'english'...","{'id': 'Q127234', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q127234, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '1907', 'category': {'en': 'Phy...","{'date': '1931-05-09', 'place': {'city': {'en'..."
23,628,"{'en': 'Albert Camus', 'se': 'Albert Camus'}","{'en': 'Albert', 'se': 'Albert'}","{'en': 'Camus', 'se': 'Camus'}","{'en': 'Albert Camus', 'se': 'Albert Camus'}",camus,male,"{'date': '1913-11-07', 'place': {'city': {'en'...","{'slug': 'Albert_Camus', 'english': 'https://e...","{'id': 'Q34670', 'url': 'https://www.wikidata....","[https://www.wikidata.org/wiki/Q34670, https:/...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '1957', 'category': {'en': 'Lit...","{'date': '1960-01-04', 'place': {'city': {'en'..."
24,403,"{'en': 'Albert Claude', 'se': 'Albert Claude'}","{'en': 'Albert', 'se': 'Albert'}","{'en': 'Claude', 'se': 'Claude'}","{'en': 'Albert Claude', 'se': 'Albert Claude'}",claude,male,"{'date': '1898-08-24', 'place': {'city': {'en'...","{'slug': 'Albert_Claude', 'english': 'https://...","{'id': 'Q233943', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q233943, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '1974', 'category': {'en': 'Phy...","{'date': '1983-05-22', 'place': {'city': {'en'..."


In [348]:
data

[{'id': '745',
  'knownName': {'en': 'A. Michael Spence', 'se': 'A. Michael Spence'},
  'givenName': {'en': 'A. Michael', 'se': 'A. Michael'},
  'familyName': {'en': 'Spence', 'se': 'Spence'},
  'fullName': {'en': 'A. Michael Spence', 'se': 'A. Michael Spence'},
  'fileName': 'spence',
  'gender': 'male',
  'birth': {'date': '1943-00-00',
   'place': {'city': {'en': 'Montclair, NJ',
     'no': 'Montclair, NJ',
     'se': 'Montclair, NJ'},
    'country': {'en': 'USA', 'no': 'USA', 'se': 'USA'},
    'cityNow': {'en': 'Montclair, NJ',
     'no': 'Montclair, NJ',
     'se': 'Montclair, NJ',
     'sameAs': ['https://www.wikidata.org/wiki/Q678437',
      'https://www.wikipedia.org/wiki/Montclair,_New_Jersey'],
     'latitude': '40.825930',
     'longitude': '-74.209030'},
    'countryNow': {'en': 'USA',
     'no': 'USA',
     'se': 'USA',
     'sameAs': ['https://www.wikidata.org/wiki/Q30'],
     'latitude': '39.828175',
     'longitude': '-98.579500'},
    'continent': {'en': 'North America

In [346]:
response_laureates['laureates']

[{'id': '745',
  'knownName': {'en': 'A. Michael Spence', 'se': 'A. Michael Spence'},
  'givenName': {'en': 'A. Michael', 'se': 'A. Michael'},
  'familyName': {'en': 'Spence', 'se': 'Spence'},
  'fullName': {'en': 'A. Michael Spence', 'se': 'A. Michael Spence'},
  'fileName': 'spence',
  'gender': 'male',
  'birth': {'date': '1943-00-00',
   'place': {'city': {'en': 'Montclair, NJ',
     'no': 'Montclair, NJ',
     'se': 'Montclair, NJ'},
    'country': {'en': 'USA', 'no': 'USA', 'se': 'USA'},
    'cityNow': {'en': 'Montclair, NJ',
     'no': 'Montclair, NJ',
     'se': 'Montclair, NJ',
     'sameAs': ['https://www.wikidata.org/wiki/Q678437',
      'https://www.wikipedia.org/wiki/Montclair,_New_Jersey'],
     'latitude': '40.825930',
     'longitude': '-74.209030'},
    'countryNow': {'en': 'USA',
     'no': 'USA',
     'se': 'USA',
     'sameAs': ['https://www.wikidata.org/wiki/Q30'],
     'latitude': '39.828175',
     'longitude': '-98.579500'},
    'continent': {'en': 'North America

In [350]:
data_read

Unnamed: 0,id,knownName,givenName,familyName,fullName,fileName,gender,birth,wikipedia,wikidata,sameAs,links,nobelPrizes,death
0,745,"{'en': 'A. Michael Spence', 'se': 'A. Michael ...","{'en': 'A. Michael', 'se': 'A. Michael'}","{'en': 'Spence', 'se': 'Spence'}","{'en': 'A. Michael Spence', 'se': 'A. Michael ...",spence,male,"{'date': '1943-00-00', 'place': {'city': {'en'...","{'slug': 'Michael_Spence', 'english': 'https:/...","{'id': 'Q157245', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q157245, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '2001', 'category': {'en': 'Eco...",
1,102,"{'en': 'Aage N. Bohr', 'se': 'Aage N. Bohr'}","{'en': 'Aage N.', 'se': 'Aage N.'}","{'en': 'Bohr', 'se': 'Bohr'}","{'en': 'Aage Niels Bohr', 'se': 'Aage Niels Bo...",bohr,male,"{'date': '1922-06-19', 'place': {'city': {'en'...","{'slug': 'Aage_Bohr', 'english': 'https://en.w...","{'id': 'Q103854', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q103854, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '1975', 'category': {'en': 'Phy...","{'date': '2009-09-08', 'place': {'city': {'en'..."
2,779,"{'en': 'Aaron Ciechanover', 'se': 'Aaron Ciech...","{'en': 'Aaron', 'se': 'Aaron'}","{'en': 'Ciechanover', 'se': 'Ciechanover'}","{'en': 'Aaron Ciechanover', 'se': 'Aaron Ciech...",ciechanover,male,"{'date': '1947-10-01', 'place': {'city': {'en'...","{'slug': 'Aaron_Ciechanover', 'english': 'http...","{'id': 'Q233205', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q233205, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '2004', 'category': {'en': 'Che...",
3,259,"{'en': 'Aaron Klug', 'se': 'Aaron Klug'}","{'en': 'Aaron', 'se': 'Aaron'}","{'en': 'Klug', 'se': 'Klug'}","{'en': 'Aaron Klug', 'se': 'Aaron Klug'}",klug,male,"{'date': '1926-08-11', 'place': {'city': {'en'...","{'slug': 'Aaron_Klug', 'english': 'https://en....","{'id': 'Q190626', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q190626, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '1982', 'category': {'en': 'Che...","{'date': '2018-11-20', 'place': {'locationStri..."
4,1004,"{'en': 'Abdulrazak Gurnah', 'se': 'Abdulrazak ...","{'en': 'Abdulrazak', 'se': 'Abdulrazak'}","{'en': 'Gurnah', 'se': 'Gurnah'}","{'en': 'Abdulrazak Gurnah', 'se': 'Abdulrazak ...",gurnah,male,{'date': '1948-00-00'},"{'slug': 'Abdulrazak_Gurnah', 'english': 'http...","{'id': 'Q317877', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q317877, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '2021', 'category': {'en': 'Lit...",
5,114,"{'en': 'Abdus Salam', 'se': 'Abdus Salam'}","{'en': 'Abdus', 'se': 'Abdus'}","{'en': 'Salam', 'se': 'Salam'}","{'en': 'Abdus Salam', 'se': 'Abdus Salam'}",salam,male,"{'date': '1926-01-29', 'place': {'city': {'en'...","{'slug': 'Abdus_Salam', 'english': 'https://en...","{'id': 'Q28189', 'url': 'https://www.wikidata....","[https://www.wikidata.org/wiki/Q28189, https:/...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '1979', 'category': {'en': 'Phy...","{'date': '1996-11-21', 'place': {'city': {'en'..."
6,982,"{'en': 'Abhijit Banerjee', 'se': 'Abhijit Bane...","{'en': 'Abhijit', 'se': 'Abhijit'}","{'en': 'Banerjee', 'se': 'Banerjee'}","{'en': 'Abhijit Banerjee', 'se': 'Abhijit Bane...",banerjee,male,"{'date': '1961-02-21', 'place': {'city': {'en'...","{'slug': 'Abhijit_Banerjee', 'english': 'https...","{'id': 'Q320578', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q320578, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '2019', 'category': {'en': 'Eco...",
7,981,"{'en': 'Abiy Ahmed Ali', 'se': 'Abiy Ahmed Ali'}","{'en': 'Abiy', 'se': 'Abiy'}","{'en': 'Ahmed Ali', 'se': 'Ahmed Ali'}","{'en': 'Abiy Ahmed Ali', 'se': 'Abiy Ahmed Ali'}",abiy,male,"{'date': '1976-08-15', 'place': {'city': {'en'...","{'slug': 'Abiy_Ahmed', 'english': 'https://en....","{'id': 'Q50365049', 'url': 'https://www.wikida...","[https://www.wikidata.org/wiki/Q50365049, http...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '2019', 'category': {'en': 'Pea...",
8,843,"{'en': 'Ada E. Yonath', 'se': 'Ada E. Yonath'}","{'en': 'Ada E.', 'se': 'Ada E.'}","{'en': 'Yonath', 'se': 'Yonath'}","{'en': 'Ada E. Yonath', 'se': 'Ada E. Yonath'}",yonath,female,"{'date': '1939-06-22', 'place': {'city': {'en'...","{'slug': 'Ada_Yonath', 'english': 'https://en....","{'id': 'Q7426', 'url': 'https://www.wikidata.o...","[https://www.wikidata.org/wiki/Q7426, https://...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '2009', 'category': {'en': 'Che...",
9,866,"{'en': 'Adam G. Riess', 'se': 'Adam G. Riess'}","{'en': 'Adam G.', 'se': 'Adam G.'}","{'en': 'Riess', 'se': 'Riess'}","{'en': 'Adam G. Riess', 'se': 'Adam G. Riess'}",riess,male,"{'date': '1969-12-16', 'place': {'city': {'en'...","{'slug': 'Adam_Riess', 'english': 'https://en....","{'id': 'Q106454', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q106454, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '2011', 'category': {'en': 'Phy...",


In [4]:
import pandas as pd 
import numpy as np

#### Loading completed data into Pandas Data Frame 

In [7]:
all_laureates_data=pd.read_json('all_laureates.json')

In [9]:
all_laureates_data.head()

Unnamed: 0,id,knownName,givenName,familyName,fullName,fileName,gender,birth,wikipedia,wikidata,...,death,orgName,acronym,founded,nativeName,penName,penNameOf,foundedCountry,foundedCountryNow,foundedContinent
0,745,"{'en': 'A. Michael Spence', 'se': 'A. Michael ...","{'en': 'A. Michael', 'se': 'A. Michael'}","{'en': 'Spence', 'se': 'Spence'}","{'en': 'A. Michael Spence', 'se': 'A. Michael ...",spence,male,"{'date': '1943-00-00', 'place': {'city': {'en'...","{'slug': 'Michael_Spence', 'english': 'https:/...","{'id': 'Q157245', 'url': 'https://www.wikidata...",...,,,,,,,,,,
1,102,"{'en': 'Aage N. Bohr', 'se': 'Aage N. Bohr'}","{'en': 'Aage N.', 'se': 'Aage N.'}","{'en': 'Bohr', 'se': 'Bohr'}","{'en': 'Aage Niels Bohr', 'se': 'Aage Niels Bo...",bohr,male,"{'date': '1922-06-19', 'place': {'city': {'en'...","{'slug': 'Aage_Bohr', 'english': 'https://en.w...","{'id': 'Q103854', 'url': 'https://www.wikidata...",...,"{'date': '2009-09-08', 'place': {'city': {'en'...",,,,,,,,,
2,779,"{'en': 'Aaron Ciechanover', 'se': 'Aaron Ciech...","{'en': 'Aaron', 'se': 'Aaron'}","{'en': 'Ciechanover', 'se': 'Ciechanover'}","{'en': 'Aaron Ciechanover', 'se': 'Aaron Ciech...",ciechanover,male,"{'date': '1947-10-01', 'place': {'city': {'en'...","{'slug': 'Aaron_Ciechanover', 'english': 'http...","{'id': 'Q233205', 'url': 'https://www.wikidata...",...,,,,,,,,,,
3,259,"{'en': 'Aaron Klug', 'se': 'Aaron Klug'}","{'en': 'Aaron', 'se': 'Aaron'}","{'en': 'Klug', 'se': 'Klug'}","{'en': 'Aaron Klug', 'se': 'Aaron Klug'}",klug,male,"{'date': '1926-08-11', 'place': {'city': {'en'...","{'slug': 'Aaron_Klug', 'english': 'https://en....","{'id': 'Q190626', 'url': 'https://www.wikidata...",...,"{'date': '2018-11-20', 'place': {'locationStri...",,,,,,,,,
4,1004,"{'en': 'Abdulrazak Gurnah', 'se': 'Abdulrazak ...","{'en': 'Abdulrazak', 'se': 'Abdulrazak'}","{'en': 'Gurnah', 'se': 'Gurnah'}","{'en': 'Abdulrazak Gurnah', 'se': 'Abdulrazak ...",gurnah,male,{'date': '1948-00-00'},"{'slug': 'Abdulrazak_Gurnah', 'english': 'http...","{'id': 'Q317877', 'url': 'https://www.wikidata...",...,,,,,,,,,,


In [11]:
# No. of Laureates 
print(len(all_laureates_data))

1004


In [13]:
all_laureates_data.columns

Index(['id', 'knownName', 'givenName', 'familyName', 'fullName', 'fileName',
       'gender', 'birth', 'wikipedia', 'wikidata', 'sameAs', 'links',
       'nobelPrizes', 'death', 'orgName', 'acronym', 'founded', 'nativeName',
       'penName', 'penNameOf', 'foundedCountry', 'foundedCountryNow',
       'foundedContinent'],
      dtype='object')

In [15]:
all_laureates_data['orgName'].apply(pd.Series)['en'].unique()

array([nan, 'American Friends Service Committee', 'Amnesty International',
       'Center for Civil Liberties', 'Doctors Without Borders',
       'European Union', 'Friends Service Council', 'Grameen Bank',
       'Institute of International Law',
       'Intergovernmental Panel on Climate Change',
       'International Atomic Energy Agency',
       'International Campaign to Abolish Nuclear Weapons',
       'International Campaign to Ban Landmines',
       'International Committee of the Red Cross',
       'International Labour Organization',
       'International Physicians for the Prevention of Nuclear War',
       'League of Red Cross Societies', 'Memorial',
       'Nansen International Office for Refugees',
       'National Dialogue Quartet', 'Nihon Hidankyo',
       'Office of the United Nations High Commissioner for Refugees',
       'Organisation for the Prohibition of Chemical Weapons',
       'Permanent International Peace Bureau',
       'Pugwash Conferences on Science and W

In [17]:
# Creating a column name laureate_type to show if it's been a individual or an organization. 
all_laureates_data['laureate_type']=np.where(all_laureates_data['orgName'].isnull(),'Individual','Organization')
all_laureates_data.head()

Unnamed: 0,id,knownName,givenName,familyName,fullName,fileName,gender,birth,wikipedia,wikidata,...,orgName,acronym,founded,nativeName,penName,penNameOf,foundedCountry,foundedCountryNow,foundedContinent,laureate_type
0,745,"{'en': 'A. Michael Spence', 'se': 'A. Michael ...","{'en': 'A. Michael', 'se': 'A. Michael'}","{'en': 'Spence', 'se': 'Spence'}","{'en': 'A. Michael Spence', 'se': 'A. Michael ...",spence,male,"{'date': '1943-00-00', 'place': {'city': {'en'...","{'slug': 'Michael_Spence', 'english': 'https:/...","{'id': 'Q157245', 'url': 'https://www.wikidata...",...,,,,,,,,,,Individual
1,102,"{'en': 'Aage N. Bohr', 'se': 'Aage N. Bohr'}","{'en': 'Aage N.', 'se': 'Aage N.'}","{'en': 'Bohr', 'se': 'Bohr'}","{'en': 'Aage Niels Bohr', 'se': 'Aage Niels Bo...",bohr,male,"{'date': '1922-06-19', 'place': {'city': {'en'...","{'slug': 'Aage_Bohr', 'english': 'https://en.w...","{'id': 'Q103854', 'url': 'https://www.wikidata...",...,,,,,,,,,,Individual
2,779,"{'en': 'Aaron Ciechanover', 'se': 'Aaron Ciech...","{'en': 'Aaron', 'se': 'Aaron'}","{'en': 'Ciechanover', 'se': 'Ciechanover'}","{'en': 'Aaron Ciechanover', 'se': 'Aaron Ciech...",ciechanover,male,"{'date': '1947-10-01', 'place': {'city': {'en'...","{'slug': 'Aaron_Ciechanover', 'english': 'http...","{'id': 'Q233205', 'url': 'https://www.wikidata...",...,,,,,,,,,,Individual
3,259,"{'en': 'Aaron Klug', 'se': 'Aaron Klug'}","{'en': 'Aaron', 'se': 'Aaron'}","{'en': 'Klug', 'se': 'Klug'}","{'en': 'Aaron Klug', 'se': 'Aaron Klug'}",klug,male,"{'date': '1926-08-11', 'place': {'city': {'en'...","{'slug': 'Aaron_Klug', 'english': 'https://en....","{'id': 'Q190626', 'url': 'https://www.wikidata...",...,,,,,,,,,,Individual
4,1004,"{'en': 'Abdulrazak Gurnah', 'se': 'Abdulrazak ...","{'en': 'Abdulrazak', 'se': 'Abdulrazak'}","{'en': 'Gurnah', 'se': 'Gurnah'}","{'en': 'Abdulrazak Gurnah', 'se': 'Abdulrazak ...",gurnah,male,{'date': '1948-00-00'},"{'slug': 'Abdulrazak_Gurnah', 'english': 'http...","{'id': 'Q317877', 'url': 'https://www.wikidata...",...,,,,,,,,,,Individual


In [19]:
# Expanding the birth column 
birth_data=all_laureates_data['birth'].apply(pd.Series)
# Expanding the birth place column 
new_birth_data_place=birth_data['place'].apply(pd.Series)
# Getting the city column 
birth_city_data=new_birth_data_place[['city','country']]['city'].apply(pd.Series)[['en']].rename(columns={'en':'birth_city'})
# Getting the country column
birth_country_data=new_birth_data_place[['city','country']]['country'].apply(pd.Series)[['en']].rename(columns={'en':'birth_country'})

In [21]:
birth_city_data.head()

Unnamed: 0,birth_city
0,"Montclair, NJ"
1,Copenhagen
2,Haifa
3,Zelvas
4,


In [23]:
birth_country_data.head()

Unnamed: 0,birth_country
0,USA
1,Denmark
2,British Protectorate of Palestine
3,Lithuania
4,


In [25]:
birth_date_data=birth_data[['date']]

In [27]:
# Adding the three columns to the main dataframe
# Adding birth_country_data, birth_city_data, birth_date_data
all_laureates_data=pd.concat([all_laureates_data,birth_city_data,birth_country_data,birth_date_data],axis=1)
all_laureates_data.head()

Unnamed: 0,id,knownName,givenName,familyName,fullName,fileName,gender,birth,wikipedia,wikidata,...,nativeName,penName,penNameOf,foundedCountry,foundedCountryNow,foundedContinent,laureate_type,birth_city,birth_country,date
0,745,"{'en': 'A. Michael Spence', 'se': 'A. Michael ...","{'en': 'A. Michael', 'se': 'A. Michael'}","{'en': 'Spence', 'se': 'Spence'}","{'en': 'A. Michael Spence', 'se': 'A. Michael ...",spence,male,"{'date': '1943-00-00', 'place': {'city': {'en'...","{'slug': 'Michael_Spence', 'english': 'https:/...","{'id': 'Q157245', 'url': 'https://www.wikidata...",...,,,,,,,Individual,"Montclair, NJ",USA,1943-00-00
1,102,"{'en': 'Aage N. Bohr', 'se': 'Aage N. Bohr'}","{'en': 'Aage N.', 'se': 'Aage N.'}","{'en': 'Bohr', 'se': 'Bohr'}","{'en': 'Aage Niels Bohr', 'se': 'Aage Niels Bo...",bohr,male,"{'date': '1922-06-19', 'place': {'city': {'en'...","{'slug': 'Aage_Bohr', 'english': 'https://en.w...","{'id': 'Q103854', 'url': 'https://www.wikidata...",...,,,,,,,Individual,Copenhagen,Denmark,1922-06-19
2,779,"{'en': 'Aaron Ciechanover', 'se': 'Aaron Ciech...","{'en': 'Aaron', 'se': 'Aaron'}","{'en': 'Ciechanover', 'se': 'Ciechanover'}","{'en': 'Aaron Ciechanover', 'se': 'Aaron Ciech...",ciechanover,male,"{'date': '1947-10-01', 'place': {'city': {'en'...","{'slug': 'Aaron_Ciechanover', 'english': 'http...","{'id': 'Q233205', 'url': 'https://www.wikidata...",...,,,,,,,Individual,Haifa,British Protectorate of Palestine,1947-10-01
3,259,"{'en': 'Aaron Klug', 'se': 'Aaron Klug'}","{'en': 'Aaron', 'se': 'Aaron'}","{'en': 'Klug', 'se': 'Klug'}","{'en': 'Aaron Klug', 'se': 'Aaron Klug'}",klug,male,"{'date': '1926-08-11', 'place': {'city': {'en'...","{'slug': 'Aaron_Klug', 'english': 'https://en....","{'id': 'Q190626', 'url': 'https://www.wikidata...",...,,,,,,,Individual,Zelvas,Lithuania,1926-08-11
4,1004,"{'en': 'Abdulrazak Gurnah', 'se': 'Abdulrazak ...","{'en': 'Abdulrazak', 'se': 'Abdulrazak'}","{'en': 'Gurnah', 'se': 'Gurnah'}","{'en': 'Abdulrazak Gurnah', 'se': 'Abdulrazak ...",gurnah,male,{'date': '1948-00-00'},"{'slug': 'Abdulrazak_Gurnah', 'english': 'http...","{'id': 'Q317877', 'url': 'https://www.wikidata...",...,,,,,,,Individual,,,1948-00-00


In [29]:
all_laureates_data.columns

Index(['id', 'knownName', 'givenName', 'familyName', 'fullName', 'fileName',
       'gender', 'birth', 'wikipedia', 'wikidata', 'sameAs', 'links',
       'nobelPrizes', 'death', 'orgName', 'acronym', 'founded', 'nativeName',
       'penName', 'penNameOf', 'foundedCountry', 'foundedCountryNow',
       'foundedContinent', 'laureate_type', 'birth_city', 'birth_country',
       'date'],
      dtype='object')

In [31]:
# Now removing the birth column from this data set 
#all_laureates_data.drop('birth',axis=1)

In [33]:
# Birth data is removed from this dataset.
all_laureates_data.columns

Index(['id', 'knownName', 'givenName', 'familyName', 'fullName', 'fileName',
       'gender', 'birth', 'wikipedia', 'wikidata', 'sameAs', 'links',
       'nobelPrizes', 'death', 'orgName', 'acronym', 'founded', 'nativeName',
       'penName', 'penNameOf', 'foundedCountry', 'foundedCountryNow',
       'foundedContinent', 'laureate_type', 'birth_city', 'birth_country',
       'date'],
      dtype='object')

In [35]:
# Expanding the dataset based on fullname
full_name_data=all_laureates_data['fullName'].apply(pd.Series)[['en']]
# Renaming the column
full_name_data.rename(columns={'en':'Fullname'},inplace=True)
full_name_data

Unnamed: 0,Fullname
0,A. Michael Spence
1,Aage Niels Bohr
2,Aaron Ciechanover
3,Aaron Klug
4,Abdulrazak Gurnah
...,...
999,Yoichiro Nambu
1000,Yoshinori Ohsumi
1001,Yuan T. Lee
1002,Yves Chauvin


In [37]:
# Adding this data to the dataset
all_laureates_data=pd.concat([all_laureates_data,full_name_data],axis=1)
all_laureates_data.columns

Index(['id', 'knownName', 'givenName', 'familyName', 'fullName', 'fileName',
       'gender', 'birth', 'wikipedia', 'wikidata', 'sameAs', 'links',
       'nobelPrizes', 'death', 'orgName', 'acronym', 'founded', 'nativeName',
       'penName', 'penNameOf', 'foundedCountry', 'foundedCountryNow',
       'foundedContinent', 'laureate_type', 'birth_city', 'birth_country',
       'date', 'Fullname'],
      dtype='object')

In [39]:
# Deleting the knownName,familyName,givenName,fullName,fileName from the dataset 
all_laureates_data.drop(['knownName','familyName','givenName','fullName','fileName'],axis=1,inplace=True)
all_laureates_data.head()

Unnamed: 0,id,gender,birth,wikipedia,wikidata,sameAs,links,nobelPrizes,death,orgName,...,penName,penNameOf,foundedCountry,foundedCountryNow,foundedContinent,laureate_type,birth_city,birth_country,date,Fullname
0,745,male,"{'date': '1943-00-00', 'place': {'city': {'en'...","{'slug': 'Michael_Spence', 'english': 'https:/...","{'id': 'Q157245', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q157245, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '2001', 'category': {'en': 'Eco...",,,...,,,,,,Individual,"Montclair, NJ",USA,1943-00-00,A. Michael Spence
1,102,male,"{'date': '1922-06-19', 'place': {'city': {'en'...","{'slug': 'Aage_Bohr', 'english': 'https://en.w...","{'id': 'Q103854', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q103854, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '1975', 'category': {'en': 'Phy...","{'date': '2009-09-08', 'place': {'city': {'en'...",,...,,,,,,Individual,Copenhagen,Denmark,1922-06-19,Aage Niels Bohr
2,779,male,"{'date': '1947-10-01', 'place': {'city': {'en'...","{'slug': 'Aaron_Ciechanover', 'english': 'http...","{'id': 'Q233205', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q233205, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '2004', 'category': {'en': 'Che...",,,...,,,,,,Individual,Haifa,British Protectorate of Palestine,1947-10-01,Aaron Ciechanover
3,259,male,"{'date': '1926-08-11', 'place': {'city': {'en'...","{'slug': 'Aaron_Klug', 'english': 'https://en....","{'id': 'Q190626', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q190626, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '1982', 'category': {'en': 'Che...","{'date': '2018-11-20', 'place': {'locationStri...",,...,,,,,,Individual,Zelvas,Lithuania,1926-08-11,Aaron Klug
4,1004,male,{'date': '1948-00-00'},"{'slug': 'Abdulrazak_Gurnah', 'english': 'http...","{'id': 'Q317877', 'url': 'https://www.wikidata...","[https://www.wikidata.org/wiki/Q317877, https:...","[{'rel': 'laureate', 'href': 'https://api.nobe...","[{'awardYear': '2021', 'category': {'en': 'Lit...",,,...,,,,,,Individual,,,1948-00-00,Abdulrazak Gurnah


#### Expanding the nobel_Prizes Column

In [42]:
# Now expanding the nobelPrizes Column 
nobel_Prizes_Data=all_laureates_data['nobelPrizes'].apply(pd.Series)
nobel_Prizes_Data[0].apply(pd.Series).head()

Unnamed: 0,awardYear,category,categoryFullName,sortOrder,portion,dateAwarded,prizeStatus,motivation,prizeAmount,prizeAmountAdjusted,affiliations,links,residences,topMotivation
0,2001,"{'en': 'Economic Sciences', 'no': 'Økonomi', '...",{'en': 'The Sveriges Riksbank Prize in Economi...,2,1/3,2001-10-10,received,{'en': 'for their analyses of markets with asy...,10000000,15547541,"[{'name': {'en': 'Stanford University', 'no': ...","[{'rel': 'nobelPrize', 'href': 'https://api.no...",,
1,1975,"{'en': 'Physics', 'no': 'Fysikk', 'se': 'Fysik'}","{'en': 'The Nobel Prize in Physics', 'no': 'No...",1,1/3,1975-10-17,received,{'en': 'for the discovery of the connection be...,630000,4304697,"[{'name': {'en': 'Niels Bohr Institute', 'no':...","[{'rel': 'nobelPrize', 'href': 'https://api.no...",,
2,2004,"{'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}","{'en': 'The Nobel Prize in Chemistry', 'no': '...",1,1/3,2004-10-06,received,{'en': 'for the discovery of ubiquitin-mediate...,10000000,14874529,[{'name': {'en': 'Technion - Israel Institute ...,"[{'rel': 'nobelPrize', 'href': 'https://api.no...",,
3,1982,"{'en': 'Chemistry', 'no': 'Kjemi', 'se': 'Kemi'}","{'en': 'The Nobel Prize in Chemistry', 'no': '...",1,1,1982-10-18,received,{'en': 'for his development of crystallographi...,1150000,3923237,[{'name': {'en': 'MRC Laboratory of Molecular ...,"[{'rel': 'nobelPrize', 'href': 'https://api.no...",,
4,2021,"{'en': 'Literature', 'no': 'Litteratur', 'se':...","{'en': 'The Nobel Prize in Literature', 'no': ...",1,1,2021-10-07,received,{'en': 'for his uncompromising and compassiona...,10000000,12096939,,"[{'rel': 'nobelPrize', 'href': 'https://api.no...",,


In [44]:
# awardYear,category,portion,dateAwarded,motivation,affiliation(organization_name,organization_city,organization_country),links(prize(title)


In [46]:
# Fetching the award year 
award_year_data=nobel_Prizes_Data[0].apply(pd.Series)[['awardYear']]
# Fetching the portion as prize_share
prize_share_data=nobel_Prizes_Data[0].apply(pd.Series)[['portion']]
# Fetching the dateAwarded column
date_Awarded_data=nobel_Prizes_Data[0].apply(pd.Series)[['dateAwarded']]

In [48]:
# Now, fetching the motivation column 
motivation_data=nobel_Prizes_Data[0].apply(pd.Series)['motivation'].apply(pd.Series)[['en']]
motivation_data.rename(columns={'en':'motivation'},inplace=True)
motivation_data.head()

Unnamed: 0,motivation
0,for their analyses of markets with asymmetric ...
1,for the discovery of the connection between co...
2,for the discovery of ubiquitin-mediated protei...
3,for his development of crystallographic electr...
4,for his uncompromising and compassionate penet...


#### Do not run this cell

In [66]:
# Now fetching the category column 
# Do not run this cell
category_data=nobel_Prizes_Data[0].apply(pd.Series).head()[['category']]['category'].apply(pd.Series)[['en']]
category_data.rename(columns={'en':'category'},inplace=True)
category_data['category'].head()

0    Economic Sciences
1              Physics
2            Chemistry
3            Chemistry
4           Literature
Name: category, dtype: object

In [None]:
# Not all category is present
# Rechecking again

In [50]:
category_data_new=nobel_Prizes_Data[0].apply(pd.Series)['category'].apply(pd.Series)[['en']]
category_data_new['en'].unique()

array(['Economic Sciences', 'Physics', 'Chemistry', 'Literature', 'Peace',
       'Physiology or Medicine'], dtype=object)

In [52]:
category_data_new.rename(columns={'en':'category'},inplace=True)

In [54]:
category_data_new.tail()

Unnamed: 0,category
999,Physics
1000,Physiology or Medicine
1001,Chemistry
1002,Chemistry
1003,Physics


In [56]:
# Expanding the links column to get the title 
# Then again the 3rd column
# Extracting the title column 
prize_title_data=nobel_Prizes_Data[0].apply(pd.Series)['links'].apply(pd.Series)[[2]][2].apply(pd.Series)[['title']]
prize_title_data.rename(columns={'title':'Prize'},inplace=True)
prize_title_data.head()

Unnamed: 0,Prize
0,The Sveriges Riksbank Prize in Economic Scienc...
1,The Nobel Prize in Physics 1975
2,The Nobel Prize in Chemistry 2004
3,The Nobel Prize in Chemistry 1982
4,The Nobel Prize in Literature 2021


In [58]:
# Now expanding the affiliations column 
organization_data=nobel_Prizes_Data[0].apply(pd.Series)[['affiliations']]['affiliations'].apply(pd.Series)[0].apply(pd.Series)
organization_data.head()

Unnamed: 0,name,nameNow,city,country,cityNow,countryNow,continent,locationString,0,nativeName
0,"{'en': 'Stanford University', 'no': 'Stanford ...",{'en': 'Stanford University'},"{'en': 'Stanford, CA', 'no': 'Stanford, CA', '...","{'en': 'USA', 'no': 'USA', 'se': 'USA'}","{'en': 'Stanford, CA', 'no': 'Stanford, CA', '...","{'en': 'USA', 'no': 'USA', 'se': 'USA', 'sameA...",{'en': 'North America'},"{'en': 'Stanford, CA, USA', 'no': 'Stanford, C...",,
1,"{'en': 'Niels Bohr Institute', 'no': 'Niels Bo...",{'en': 'Niels Bohr Institute'},"{'en': 'Copenhagen', 'no': 'København', 'se': ...","{'en': 'Denmark', 'no': 'Danmark', 'se': 'Danm...","{'en': 'Copenhagen', 'no': 'København', 'se': ...","{'en': 'Denmark', 'no': 'Danmark', 'se': 'Danm...",{'en': 'Europe'},"{'en': 'Copenhagen, Denmark', 'no': 'København...",,
2,{'en': 'Technion - Israel Institute of Technol...,{'en': 'Technion - Israel Institute of Technol...,"{'en': 'Haifa', 'no': 'Haifa', 'se': 'Haifa'}","{'en': 'Israel', 'no': 'Israel', 'se': 'Israel'}","{'en': 'Haifa', 'no': 'Haifa', 'se': 'Haifa', ...","{'en': 'Israel', 'no': 'Israel', 'se': 'Israel...",{'en': 'Asia'},"{'en': 'Haifa, Israel', 'no': 'Haifa, Israel',...",,
3,"{'en': 'MRC Laboratory of Molecular Biology', ...",{'en': 'MRC Laboratory of Molecular Biology'},"{'en': 'Cambridge', 'no': 'Cambridge', 'se': '...","{'en': 'United Kingdom', 'no': 'Storbritannia'...","{'en': 'Cambridge', 'no': 'Cambridge', 'se': '...","{'en': 'United Kingdom', 'no': 'Storbritannia'...",{'en': 'Europe'},"{'en': 'Cambridge, United Kingdom', 'no': 'Cam...",,
4,,,,,,,,,,


In [60]:
organization_name=organization_data['nameNow'].apply(pd.Series)[['en']].rename(columns={'en':'organization_name'})
organization_name.head()

Unnamed: 0,organization_name
0,Stanford University
1,Niels Bohr Institute
2,Technion - Israel Institute of Technology
3,MRC Laboratory of Molecular Biology
4,


In [62]:
# Extracting the cityNow column
organization_city=organization_data['cityNow'].apply(pd.Series)[['en']]
organization_city=organization_city.rename(columns={'en':'organization_city'})
organization_city.head()

Unnamed: 0,organization_city
0,"Stanford, CA"
1,Copenhagen
2,Haifa
3,Cambridge
4,


In [64]:
organization_city['organization_city'].unique()

array(['Stanford, CA', 'Copenhagen', 'Haifa', 'Cambridge', nan, 'Trieste',
       'Cambridge, MA', 'Rehovot', 'Baltimore, MD', 'Berlin-Dahlem',
       'Munich', 'Göttingen', 'Pasadena, CA', 'Sapporo', 'Tokyo', 'Paris',
       'Santa Barbara, CA', 'Philadelphia, PA', 'Chicago, IL', 'Louvain',
       'Berlin', 'Orsay', 'Szeged', 'Heidelberg', 'Moscow',
       'New York, NY', 'Argonne, IL', 'Long Island, New York, NY',
       'Dallas, TX', 'Zurich', 'Medford, MA', 'Uppsala', 'Manchester',
       'Los Angeles, CA', 'London', 'New Orleans, LA', 'Princeton, NJ',
       'Lund', 'Urbana, IL', 'Vienna', 'La Jolla, CA', 'Holmdel, NJ',
       'Kingston', 'Helsinki', 'Gothenburg', 'Chapel Hill, NC',
       'Cold Spring Harbor, NY', 'Nedlands', 'Boston, MA',
       'Washington, D.C.', 'Stockholm', 'Mülheim an der Ruhr',
       'Groningen', 'Buenos Aires', 'Hamilton, Ontario', 'Weston Creek',
       'Pavia', 'St. Louis, MO', 'Boulder, CO', 'Geneva', 'Bristol',
       'Sèvres', 'Edinburgh', 'Wilmingt

In [66]:
# Extracting the countryNow column 
organization_country=organization_data['countryNow'].apply(pd.Series)[['en']]
organization_country=organization_country.rename(columns={'en':'organization_country'})
organization_country.head()

Unnamed: 0,organization_country
0,USA
1,Denmark
2,Israel
3,United Kingdom
4,


In [68]:
organization_country['organization_country'].unique()

array(['USA', 'Denmark', 'Israel', 'United Kingdom', nan, 'Italy',
       'Germany', 'Japan', 'France', 'Belgium', 'Hungary', 'Russia',
       'Switzerland', 'Sweden', 'Austria', 'Canada', 'Finland',
       'Australia', 'the Netherlands', 'Argentina', 'Tunisia', 'Norway',
       'Portugal', 'Ireland', 'Czech Republic', 'Spain', 'India', 'China'],
      dtype=object)

#### Adding all new dataframe to the Original Dataframe 

In [71]:
all_laureates_data=pd.concat([all_laureates_data,award_year_data,prize_share_data,date_Awarded_data,motivation_data,category_data_new,prize_title_data,organization_name,organization_city,organization_country],axis=1)

In [73]:
all_laureates_data.columns

Index(['id', 'gender', 'birth', 'wikipedia', 'wikidata', 'sameAs', 'links',
       'nobelPrizes', 'death', 'orgName', 'acronym', 'founded', 'nativeName',
       'penName', 'penNameOf', 'foundedCountry', 'foundedCountryNow',
       'foundedContinent', 'laureate_type', 'birth_city', 'birth_country',
       'date', 'Fullname', 'awardYear', 'portion', 'dateAwarded', 'motivation',
       'category', 'Prize', 'organization_name', 'organization_city',
       'organization_country'],
      dtype='object')

In [75]:
# Removing the un-necessary columns
cols_to_drop=['wikipedia','links','wikidata','sameAs','nobelPrizes','orgName','acronym','founded','nativeName','penName','penNameOf','foundedCountry','foundedCountryNow','foundedContinent']
all_laureates_data.drop(cols_to_drop,axis=1,inplace=True)

In [77]:
# Printing out the dataframe 
all_laureates_data.head()

Unnamed: 0,id,gender,birth,death,laureate_type,birth_city,birth_country,date,Fullname,awardYear,portion,dateAwarded,motivation,category,Prize,organization_name,organization_city,organization_country
0,745,male,"{'date': '1943-00-00', 'place': {'city': {'en'...",,Individual,"Montclair, NJ",USA,1943-00-00,A. Michael Spence,2001,1/3,2001-10-10,for their analyses of markets with asymmetric ...,Economic Sciences,The Sveriges Riksbank Prize in Economic Scienc...,Stanford University,"Stanford, CA",USA
1,102,male,"{'date': '1922-06-19', 'place': {'city': {'en'...","{'date': '2009-09-08', 'place': {'city': {'en'...",Individual,Copenhagen,Denmark,1922-06-19,Aage Niels Bohr,1975,1/3,1975-10-17,for the discovery of the connection between co...,Physics,The Nobel Prize in Physics 1975,Niels Bohr Institute,Copenhagen,Denmark
2,779,male,"{'date': '1947-10-01', 'place': {'city': {'en'...",,Individual,Haifa,British Protectorate of Palestine,1947-10-01,Aaron Ciechanover,2004,1/3,2004-10-06,for the discovery of ubiquitin-mediated protei...,Chemistry,The Nobel Prize in Chemistry 2004,Technion - Israel Institute of Technology,Haifa,Israel
3,259,male,"{'date': '1926-08-11', 'place': {'city': {'en'...","{'date': '2018-11-20', 'place': {'locationStri...",Individual,Zelvas,Lithuania,1926-08-11,Aaron Klug,1982,1,1982-10-18,for his development of crystallographic electr...,Chemistry,The Nobel Prize in Chemistry 1982,MRC Laboratory of Molecular Biology,Cambridge,United Kingdom
4,1004,male,{'date': '1948-00-00'},,Individual,,,1948-00-00,Abdulrazak Gurnah,2021,1,2021-10-07,for his uncompromising and compassionate penet...,Literature,The Nobel Prize in Literature 2021,,,


In [79]:
# Expanding the death columns 
death_data=all_laureates_data['death'].apply(pd.Series)
death_date_data=death_data[['date']]
death_date_data=death_date_data.rename(columns={'date':'death_date'})
death_date_data.head()

Unnamed: 0,death_date
0,
1,2009-09-08
2,
3,2018-11-20
4,


In [81]:
death_data_place=death_data[['place']].apply(pd.Series)['place'].apply(pd.Series)
death_data_place.head()

Unnamed: 0,0,city,country,cityNow,countryNow,continent,locationString
0,,,,,,,
1,,"{'en': 'Copenhagen', 'no': 'København', 'se': ...","{'en': 'Denmark', 'no': 'Danmark', 'se': 'Danm...","{'en': 'Copenhagen', 'no': 'København', 'se': ...","{'en': 'Denmark', 'no': 'Danmark', 'se': 'Danm...","{'en': 'Europe', 'no': 'Europa', 'se': 'Europa'}","{'en': 'Copenhagen, Denmark', 'no': 'København..."
2,,,,,,,
3,,,,,,,"{'en': '', 'no': '', 'se': ''}"
4,,,,,,,


In [83]:
death_city=death_data_place['city'].apply(pd.Series)[['en']].rename(columns={'en':'death_city'})
death_city.head()

Unnamed: 0,death_city
0,
1,Copenhagen
2,
3,
4,


In [85]:
death_country=death_data_place['country'].apply(pd.Series)[['en']].rename(columns={'en':'death_country'})
death_country.head()

Unnamed: 0,death_country
0,
1,Denmark
2,
3,
4,


In [87]:
# Adding this death data to the original dataframe 
# Adding the death_city, death_country
all_laureates_data=pd.concat([all_laureates_data,death_city,death_country],axis=1)

In [89]:
# Adding death_date_data
all_laureates_data=pd.concat([all_laureates_data,death_date_data],axis=1)

In [91]:
all_laureates_data.head()

Unnamed: 0,id,gender,birth,death,laureate_type,birth_city,birth_country,date,Fullname,awardYear,...,dateAwarded,motivation,category,Prize,organization_name,organization_city,organization_country,death_city,death_country,death_date
0,745,male,"{'date': '1943-00-00', 'place': {'city': {'en'...",,Individual,"Montclair, NJ",USA,1943-00-00,A. Michael Spence,2001,...,2001-10-10,for their analyses of markets with asymmetric ...,Economic Sciences,The Sveriges Riksbank Prize in Economic Scienc...,Stanford University,"Stanford, CA",USA,,,
1,102,male,"{'date': '1922-06-19', 'place': {'city': {'en'...","{'date': '2009-09-08', 'place': {'city': {'en'...",Individual,Copenhagen,Denmark,1922-06-19,Aage Niels Bohr,1975,...,1975-10-17,for the discovery of the connection between co...,Physics,The Nobel Prize in Physics 1975,Niels Bohr Institute,Copenhagen,Denmark,Copenhagen,Denmark,2009-09-08
2,779,male,"{'date': '1947-10-01', 'place': {'city': {'en'...",,Individual,Haifa,British Protectorate of Palestine,1947-10-01,Aaron Ciechanover,2004,...,2004-10-06,for the discovery of ubiquitin-mediated protei...,Chemistry,The Nobel Prize in Chemistry 2004,Technion - Israel Institute of Technology,Haifa,Israel,,,
3,259,male,"{'date': '1926-08-11', 'place': {'city': {'en'...","{'date': '2018-11-20', 'place': {'locationStri...",Individual,Zelvas,Lithuania,1926-08-11,Aaron Klug,1982,...,1982-10-18,for his development of crystallographic electr...,Chemistry,The Nobel Prize in Chemistry 1982,MRC Laboratory of Molecular Biology,Cambridge,United Kingdom,,,2018-11-20
4,1004,male,{'date': '1948-00-00'},,Individual,,,1948-00-00,Abdulrazak Gurnah,2021,...,2021-10-07,for his uncompromising and compassionate penet...,Literature,The Nobel Prize in Literature 2021,,,,,,


In [93]:
all_laureates_data.columns

Index(['id', 'gender', 'birth', 'death', 'laureate_type', 'birth_city',
       'birth_country', 'date', 'Fullname', 'awardYear', 'portion',
       'dateAwarded', 'motivation', 'category', 'Prize', 'organization_name',
       'organization_city', 'organization_country', 'death_city',
       'death_country', 'death_date'],
      dtype='object')

In [95]:
all_laureates_data.drop('death',axis=1,inplace=True)

In [97]:
all_laureates_data.head()

Unnamed: 0,id,gender,birth,laureate_type,birth_city,birth_country,date,Fullname,awardYear,portion,dateAwarded,motivation,category,Prize,organization_name,organization_city,organization_country,death_city,death_country,death_date
0,745,male,"{'date': '1943-00-00', 'place': {'city': {'en'...",Individual,"Montclair, NJ",USA,1943-00-00,A. Michael Spence,2001,1/3,2001-10-10,for their analyses of markets with asymmetric ...,Economic Sciences,The Sveriges Riksbank Prize in Economic Scienc...,Stanford University,"Stanford, CA",USA,,,
1,102,male,"{'date': '1922-06-19', 'place': {'city': {'en'...",Individual,Copenhagen,Denmark,1922-06-19,Aage Niels Bohr,1975,1/3,1975-10-17,for the discovery of the connection between co...,Physics,The Nobel Prize in Physics 1975,Niels Bohr Institute,Copenhagen,Denmark,Copenhagen,Denmark,2009-09-08
2,779,male,"{'date': '1947-10-01', 'place': {'city': {'en'...",Individual,Haifa,British Protectorate of Palestine,1947-10-01,Aaron Ciechanover,2004,1/3,2004-10-06,for the discovery of ubiquitin-mediated protei...,Chemistry,The Nobel Prize in Chemistry 2004,Technion - Israel Institute of Technology,Haifa,Israel,,,
3,259,male,"{'date': '1926-08-11', 'place': {'city': {'en'...",Individual,Zelvas,Lithuania,1926-08-11,Aaron Klug,1982,1,1982-10-18,for his development of crystallographic electr...,Chemistry,The Nobel Prize in Chemistry 1982,MRC Laboratory of Molecular Biology,Cambridge,United Kingdom,,,2018-11-20
4,1004,male,{'date': '1948-00-00'},Individual,,,1948-00-00,Abdulrazak Gurnah,2021,1,2021-10-07,for his uncompromising and compassionate penet...,Literature,The Nobel Prize in Literature 2021,,,,,,


In [99]:
# Renaming the columns 
all_laureates_data=all_laureates_data.rename(columns={'id':'laureate_id','Fullname':'fullname','Prize':'prize'})

In [101]:
all_laureates_data.head()

Unnamed: 0,laureate_id,gender,birth,laureate_type,birth_city,birth_country,date,fullname,awardYear,portion,dateAwarded,motivation,category,prize,organization_name,organization_city,organization_country,death_city,death_country,death_date
0,745,male,"{'date': '1943-00-00', 'place': {'city': {'en'...",Individual,"Montclair, NJ",USA,1943-00-00,A. Michael Spence,2001,1/3,2001-10-10,for their analyses of markets with asymmetric ...,Economic Sciences,The Sveriges Riksbank Prize in Economic Scienc...,Stanford University,"Stanford, CA",USA,,,
1,102,male,"{'date': '1922-06-19', 'place': {'city': {'en'...",Individual,Copenhagen,Denmark,1922-06-19,Aage Niels Bohr,1975,1/3,1975-10-17,for the discovery of the connection between co...,Physics,The Nobel Prize in Physics 1975,Niels Bohr Institute,Copenhagen,Denmark,Copenhagen,Denmark,2009-09-08
2,779,male,"{'date': '1947-10-01', 'place': {'city': {'en'...",Individual,Haifa,British Protectorate of Palestine,1947-10-01,Aaron Ciechanover,2004,1/3,2004-10-06,for the discovery of ubiquitin-mediated protei...,Chemistry,The Nobel Prize in Chemistry 2004,Technion - Israel Institute of Technology,Haifa,Israel,,,
3,259,male,"{'date': '1926-08-11', 'place': {'city': {'en'...",Individual,Zelvas,Lithuania,1926-08-11,Aaron Klug,1982,1,1982-10-18,for his development of crystallographic electr...,Chemistry,The Nobel Prize in Chemistry 1982,MRC Laboratory of Molecular Biology,Cambridge,United Kingdom,,,2018-11-20
4,1004,male,{'date': '1948-00-00'},Individual,,,1948-00-00,Abdulrazak Gurnah,2021,1,2021-10-07,for his uncompromising and compassionate penet...,Literature,The Nobel Prize in Literature 2021,,,,,,


In [103]:
all_laureates_data.columns

Index(['laureate_id', 'gender', 'birth', 'laureate_type', 'birth_city',
       'birth_country', 'date', 'fullname', 'awardYear', 'portion',
       'dateAwarded', 'motivation', 'category', 'prize', 'organization_name',
       'organization_city', 'organization_country', 'death_city',
       'death_country', 'death_date'],
      dtype='object')

In [105]:
# Reordering the columns 
new_order=['laureate_id','awardYear','category', 'prize','motivation','portion','laureate_type', 'fullname','gender', 'birth_city', 'birth_country',
       'date',  'dateAwarded', 'organization_name', 'organization_city','organization_country', 'death_city', 'death_country', 'death_date']

In [107]:
# Dataframe with reorded columns 
all_laureates_data=all_laureates_data.reindex(columns=new_order)
all_laureates_data.head()

Unnamed: 0,laureate_id,awardYear,category,prize,motivation,portion,laureate_type,fullname,gender,birth_city,birth_country,date,dateAwarded,organization_name,organization_city,organization_country,death_city,death_country,death_date
0,745,2001,Economic Sciences,The Sveriges Riksbank Prize in Economic Scienc...,for their analyses of markets with asymmetric ...,1/3,Individual,A. Michael Spence,male,"Montclair, NJ",USA,1943-00-00,2001-10-10,Stanford University,"Stanford, CA",USA,,,
1,102,1975,Physics,The Nobel Prize in Physics 1975,for the discovery of the connection between co...,1/3,Individual,Aage Niels Bohr,male,Copenhagen,Denmark,1922-06-19,1975-10-17,Niels Bohr Institute,Copenhagen,Denmark,Copenhagen,Denmark,2009-09-08
2,779,2004,Chemistry,The Nobel Prize in Chemistry 2004,for the discovery of ubiquitin-mediated protei...,1/3,Individual,Aaron Ciechanover,male,Haifa,British Protectorate of Palestine,1947-10-01,2004-10-06,Technion - Israel Institute of Technology,Haifa,Israel,,,
3,259,1982,Chemistry,The Nobel Prize in Chemistry 1982,for his development of crystallographic electr...,1,Individual,Aaron Klug,male,Zelvas,Lithuania,1926-08-11,1982-10-18,MRC Laboratory of Molecular Biology,Cambridge,United Kingdom,,,2018-11-20
4,1004,2021,Literature,The Nobel Prize in Literature 2021,for his uncompromising and compassionate penet...,1,Individual,Abdulrazak Gurnah,male,,,1948-00-00,2021-10-07,,,,,,


In [109]:
all_laureates_data['laureate_id'].max()

1046

In [111]:
all_laureates_data['laureate_id'].min()

1

In [113]:
# Checking if our dataset contains any duplicated data or not
all_laureates_data[all_laureates_data['laureate_id'].duplicated()]
# No duplicated data is present

Unnamed: 0,laureate_id,awardYear,category,prize,motivation,portion,laureate_type,fullname,gender,birth_city,birth_country,date,dateAwarded,organization_name,organization_city,organization_country,death_city,death_country,death_date


In [115]:
all_laureates_data

Unnamed: 0,laureate_id,awardYear,category,prize,motivation,portion,laureate_type,fullname,gender,birth_city,birth_country,date,dateAwarded,organization_name,organization_city,organization_country,death_city,death_country,death_date
0,745,2001,Economic Sciences,The Sveriges Riksbank Prize in Economic Scienc...,for their analyses of markets with asymmetric ...,1/3,Individual,A. Michael Spence,male,"Montclair, NJ",USA,1943-00-00,2001-10-10,Stanford University,"Stanford, CA",USA,,,
1,102,1975,Physics,The Nobel Prize in Physics 1975,for the discovery of the connection between co...,1/3,Individual,Aage Niels Bohr,male,Copenhagen,Denmark,1922-06-19,1975-10-17,Niels Bohr Institute,Copenhagen,Denmark,Copenhagen,Denmark,2009-09-08
2,779,2004,Chemistry,The Nobel Prize in Chemistry 2004,for the discovery of ubiquitin-mediated protei...,1/3,Individual,Aaron Ciechanover,male,Haifa,British Protectorate of Palestine,1947-10-01,2004-10-06,Technion - Israel Institute of Technology,Haifa,Israel,,,
3,259,1982,Chemistry,The Nobel Prize in Chemistry 1982,for his development of crystallographic electr...,1,Individual,Aaron Klug,male,Zelvas,Lithuania,1926-08-11,1982-10-18,MRC Laboratory of Molecular Biology,Cambridge,United Kingdom,,,2018-11-20
4,1004,2021,Literature,The Nobel Prize in Literature 2021,for his uncompromising and compassionate penet...,1,Individual,Abdulrazak Gurnah,male,,,1948-00-00,2021-10-07,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999,826,2008,Physics,The Nobel Prize in Physics 2008,for the discovery of the mechanism of spontane...,1/2,Individual,Yoichiro Nambu,male,Tokyo,Japan,1921-01-18,2008-10-07,"Enrico Fermi Institute, University of Chicago","Chicago, IL",USA,Osaka,Japan,2015-07-05
1000,927,2016,Physiology or Medicine,The Nobel Prize in Physiology or Medicine 2016,for his discoveries of mechanisms for autophagy,1,Individual,Yoshinori Ohsumi,male,Fukuoka,Japan,1945-02-09,2016-10-03,Tokyo Institute of Technology,Tokyo,Japan,,,
1001,265,1986,Chemistry,The Nobel Prize in Chemistry 1986,for their contributions concerning the dynamic...,1/3,Individual,Yuan T. Lee,male,Hsinchu,Taiwan,1936-11-19,1986-10-15,University of California,"Berkeley, CA",USA,,,
1002,794,2005,Chemistry,The Nobel Prize in Chemistry 2005,for the development of the metathesis method i...,1/3,Individual,Yves Chauvin,male,Menin,Belgium,1930-10-10,2005-10-05,Institut Français du Pétrole,Rueil-Malmaison,France,Tours,France,2015-01-27


In [117]:
all_laureates_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1004 entries, 0 to 1003
Data columns (total 19 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   laureate_id           1004 non-null   int64 
 1   awardYear             1004 non-null   object
 2   category              1004 non-null   object
 3   prize                 1004 non-null   object
 4   motivation            1004 non-null   object
 5   portion               1004 non-null   object
 6   laureate_type         1004 non-null   object
 7   fullname              976 non-null    object
 8   gender                976 non-null    object
 9   birth_city            972 non-null    object
 10  birth_country         974 non-null    object
 11  date                  976 non-null    object
 12  dateAwarded           1004 non-null   object
 13  organization_name     741 non-null    object
 14  organization_city     736 non-null    object
 15  organization_country  738 non-null    

In [119]:
# We need to change the dtype of the awardYear,laureate_id  to int 

In [121]:
all_laureates_data['awardYear'].astype('int')
all_laureates_data['laureate_id'].astype('int')

0        745
1        102
2        779
3        259
4       1004
        ... 
999      826
1000     927
1001     265
1002     794
1003     726
Name: laureate_id, Length: 1004, dtype: int32

In [123]:
all_laureates_data['category'].unique()

array(['Economic Sciences', 'Physics', 'Chemistry', 'Literature', 'Peace',
       'Physiology or Medicine'], dtype=object)

In [125]:
all_laureates_data.sort_values(by='awardYear')

Unnamed: 0,laureate_id,awardYear,category,prize,motivation,portion,laureate_type,fullname,gender,birth_city,birth_country,date,dateAwarded,organization_name,organization_city,organization_country,death_city,death_country,death_date
961,1,1901,Physics,The Nobel Prize in Physics 1901,in recognition of the extraordinary services h...,1,Individual,Wilhelm Conrad Röntgen,male,Lennep,Prussia,1845-03-27,1901-11-12,Munich University,Munich,Germany,Munich,Germany,1923-02-10
447,160,1901,Chemistry,The Nobel Prize in Chemistry 1901,in recognition of the extraordinary services h...,1,Individual,Jacobus Henricus van 't Hoff,male,Rotterdam,the Netherlands,1852-08-30,1901-11-12,Berlin University,Berlin,Germany,Berlin,Germany,1911-03-01
235,293,1901,Physiology or Medicine,The Nobel Prize in Physiology or Medicine 1901,"for his work on serum therapy, especially its ...",1,Individual,Emil Adolf von Behring,male,Hansdorf,Prussia,1854-03-15,1901-10-30,Marburg University,Marburg,Germany,Marburg,Germany,1917-03-31
893,569,1901,Literature,The Nobel Prize in Literature 1901,in special recognition of his poetic compositi...,1,Individual,Sully Prudhomme,male,Paris,France,1839-03-16,1901-11-14,,,,Châtenay,France,1907-09-07
286,463,1901,Peace,The Nobel Peace Prize 1901,for his lifelong work for international peace ...,1/2,Individual,Frédéric Passy,male,Paris,France,1822-05-20,1901-12-10,,,,Paris,France,1912-06-12
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
306,1036,2024,Physiology or Medicine,The Nobel Prize in Physiology or Medicine 2024,for the discovery of microRNA and its role in ...,1/2,Individual,Gary Ruvkun,male,"Berkeley, CA",USA,1952-00-00,2024-10-07,Massachusetts General Hospital,"Boston, MA",USA,,,
943,1035,2024,Physiology or Medicine,The Nobel Prize in Physiology or Medicine 2024,for the discovery of microRNA and its role in ...,1/2,Individual,Victor Ambros,male,"Hanover, NH",USA,1953-12-01,2024-10-07,UMass Chan Medical School,"Worcester, MA",USA,,,
186,1040,2024,Chemistry,The Nobel Prize in Chemistry 2024,for protein structure prediction,1/4,Individual,Demis Hassabis,male,London,United Kingdom,1976-07-27,2024-10-09,Google DeepMind,London,United Kingdom,,,
451,1046,2024,Economic Sciences,The Sveriges Riksbank Prize in Economic Scienc...,for studies of how institutions are formed and...,1/3,Individual,James A. Robinson,male,,,1960-00-00,2024-10-14,University of Chicago,"Chicago, IL",USA,,,


In [127]:
all_laureates_data['category'].unique()

array(['Economic Sciences', 'Physics', 'Chemistry', 'Literature', 'Peace',
       'Physiology or Medicine'], dtype=object)

In [129]:
all_laureates_data.head()

Unnamed: 0,laureate_id,awardYear,category,prize,motivation,portion,laureate_type,fullname,gender,birth_city,birth_country,date,dateAwarded,organization_name,organization_city,organization_country,death_city,death_country,death_date
0,745,2001,Economic Sciences,The Sveriges Riksbank Prize in Economic Scienc...,for their analyses of markets with asymmetric ...,1/3,Individual,A. Michael Spence,male,"Montclair, NJ",USA,1943-00-00,2001-10-10,Stanford University,"Stanford, CA",USA,,,
1,102,1975,Physics,The Nobel Prize in Physics 1975,for the discovery of the connection between co...,1/3,Individual,Aage Niels Bohr,male,Copenhagen,Denmark,1922-06-19,1975-10-17,Niels Bohr Institute,Copenhagen,Denmark,Copenhagen,Denmark,2009-09-08
2,779,2004,Chemistry,The Nobel Prize in Chemistry 2004,for the discovery of ubiquitin-mediated protei...,1/3,Individual,Aaron Ciechanover,male,Haifa,British Protectorate of Palestine,1947-10-01,2004-10-06,Technion - Israel Institute of Technology,Haifa,Israel,,,
3,259,1982,Chemistry,The Nobel Prize in Chemistry 1982,for his development of crystallographic electr...,1,Individual,Aaron Klug,male,Zelvas,Lithuania,1926-08-11,1982-10-18,MRC Laboratory of Molecular Biology,Cambridge,United Kingdom,,,2018-11-20
4,1004,2021,Literature,The Nobel Prize in Literature 2021,for his uncompromising and compassionate penet...,1,Individual,Abdulrazak Gurnah,male,,,1948-00-00,2021-10-07,,,,,,


# Stage - Data Cleaning 

### Checking Individual Columns 


#### Gender Column

In [145]:
all_laureates_data['gender'].unique()

array(['male', 'female', nan], dtype=object)

In [147]:
all_laureates_data[all_laureates_data['gender'].isnull()].head()

Unnamed: 0,laureate_id,awardYear,category,prize,motivation,portion,laureate_type,fullname,gender,birth_city,birth_country,date,dateAwarded,organization_name,organization_city,organization_country,death_city,death_country,death_date
51,509,1947,Peace,The Nobel Peace Prize 1947,for their pioneering work in the international...,1/2,Organization,,,,,,1947-10-30,,,,,,
52,537,1977,Peace,The Nobel Peace Prize 1977,for worldwide respect for human rights,1,Organization,,,,,,1977-10-10,,,,,,
134,1020,2022,Peace,The Nobel Peace Prize 2022,The Peace Prize laureates represent civil soci...,1/3,Organization,,,,,,2022-10-07,,,,,,
195,568,1999,Peace,The Nobel Peace Prize 1999,in recognition of the organisation's pioneerin...,1,Organization,,,,,,1999-10-15,,,,,,
261,881,2012,Peace,The Nobel Peace Prize 2012,for over six decades contributed to the advanc...,1,Organization,,,,,,2012-10-12,,,,,,


#### Counting Null Values in every columns

In [150]:
all_laureates_data.isnull().sum()

laureate_id               0
awardYear                 0
category                  0
prize                     0
motivation                0
portion                   0
laureate_type             0
fullname                 28
gender                   28
birth_city               32
birth_country            30
date                     28
dateAwarded               0
organization_name       263
organization_city       268
organization_country    266
death_city              345
death_country           339
death_date              324
dtype: int64

In [162]:
# Renaming the date as birthdat
all_laureates_data.rename(columns={'date':'birth_date'},inplace=True)

In [164]:
all_laureates_data.head()

Unnamed: 0,laureate_id,awardYear,category,prize,motivation,portion,laureate_type,fullname,gender,birth_city,birth_country,birth_date,dateAwarded,organization_name,organization_city,organization_country,death_city,death_country,death_date
0,745,2001,Economic Sciences,The Sveriges Riksbank Prize in Economic Scienc...,for their analyses of markets with asymmetric ...,1/3,Individual,A. Michael Spence,male,"Montclair, NJ",USA,1943-00-00,2001-10-10,Stanford University,"Stanford, CA",USA,,,
1,102,1975,Physics,The Nobel Prize in Physics 1975,for the discovery of the connection between co...,1/3,Individual,Aage Niels Bohr,male,Copenhagen,Denmark,1922-06-19,1975-10-17,Niels Bohr Institute,Copenhagen,Denmark,Copenhagen,Denmark,2009-09-08
2,779,2004,Chemistry,The Nobel Prize in Chemistry 2004,for the discovery of ubiquitin-mediated protei...,1/3,Individual,Aaron Ciechanover,male,Haifa,British Protectorate of Palestine,1947-10-01,2004-10-06,Technion - Israel Institute of Technology,Haifa,Israel,,,
3,259,1982,Chemistry,The Nobel Prize in Chemistry 1982,for his development of crystallographic electr...,1,Individual,Aaron Klug,male,Zelvas,Lithuania,1926-08-11,1982-10-18,MRC Laboratory of Molecular Biology,Cambridge,United Kingdom,,,2018-11-20
4,1004,2021,Literature,The Nobel Prize in Literature 2021,for his uncompromising and compassionate penet...,1,Individual,Abdulrazak Gurnah,male,,,1948-00-00,2021-10-07,,,,,,


In [166]:
all_laureates_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1004 entries, 0 to 1003
Data columns (total 19 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   laureate_id           1004 non-null   int64 
 1   awardYear             1004 non-null   object
 2   category              1004 non-null   object
 3   prize                 1004 non-null   object
 4   motivation            1004 non-null   object
 5   portion               1004 non-null   object
 6   laureate_type         1004 non-null   object
 7   fullname              976 non-null    object
 8   gender                976 non-null    object
 9   birth_city            972 non-null    object
 10  birth_country         974 non-null    object
 11  birth_date            976 non-null    object
 12  dateAwarded           1004 non-null   object
 13  organization_name     741 non-null    object
 14  organization_city     736 non-null    object
 15  organization_country  738 non-null    

In [168]:
# Changing the type of columns to appropiate types 
all_laureates_data['awardYear'].astype(int)

0       2001
1       1975
2       2004
3       1982
4       2021
        ... 
999     2008
1000    2016
1001    1986
1002    2005
1003    2000
Name: awardYear, Length: 1004, dtype: int32

In [176]:
import datetime

In [184]:
date=datetime.datetime(1943,1,1)

In [188]:
datetime.datetime?

[1;31mInit signature:[0m [0mdatetime[0m[1;33m.[0m[0mdatetime[0m[1;33m([0m[0mself[0m[1;33m,[0m [1;33m/[0m[1;33m,[0m [1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints.
[1;31mFile:[0m           c:\users\dassa\anaconda3\lib\datetime.py
[1;31mType:[0m           type
[1;31mSubclasses:[0m     ABCTimestamp, _NaT

In [None]:
# To convert a string into datetime object we can use strptime()

In [186]:
date

datetime.datetime(1943, 1, 1, 0, 0)

In [190]:
date_data={'date':['1943-01-12','1999-09-25']}

In [192]:
date_df=pd.DataFrame(date_data)

In [218]:
def split_date_str(date):
    date=date.split("-")
    date_num=[int(i) for i in date]
    return date_num

In [220]:
split_date_str('1943-01-12')

[1943, 1, 12]

In [236]:
datetime.datetime.strptime('1943-01-12','%Y-%m-%d')

datetime.datetime(1943, 1, 12, 0, 0)

In [240]:
# Creating a function 
def convert_datestr_datetime(date):
    return datetime.datetime.strptime(date,'%Y-%m-%d')

In [242]:
date_df['date'].apply(convert_datestr_datetime)

0   1943-01-12
1   1999-09-25
Name: date, dtype: datetime64[ns]