# Analysis of Nobel Prize Laureates (1901-2023)

## Introduction

The Nobel Prize, awarded since 1901, is one of the most prestigious international accolades. Each year, prizes are bestowed in six categories: Chemistry, Literature, Physics, Physiology or Medicine, Economics, and Peace.

In this project, we will explore and analyze data on Nobel Prize laureates from 1901 to 2023, provided by the Nobel Foundation through their API.

Our analysis aims to answer several key questions, including:

1. What is the most commonly awarded gender and birth country among laureates?
2. Which decade had the highest ratio of US-born Nobel Prize winners to total winners across all categories?
3. Which decade and Nobel Prize category combination had the highest proportion of female laureates?
4. Who was the first woman to receive a Nobel Prize, and in what category?
5. Which individuals or organizations have won more than one Nobel Prize throughout the years?

Through this analysis, we will uncover fascinating trends and valuable insights into the history of Nobel Prizes.

Let's begin our exploration!

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [2]:
df = pd.read_csv("Dataset/nobel.csv")

In [3]:
df.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France


In [4]:
df.columns

Index(['year', 'category', 'prize', 'motivation', 'prize_share', 'laureate_id',
       'laureate_type', 'full_name', 'birth_date', 'birth_city',
       'birth_country', 'sex', 'organization_name', 'organization_city',
       'organization_country', 'death_date', 'death_city', 'death_country'],
      dtype='object')

In [5]:
df.shape

(1000, 18)

### 1. What is the most commonly awarded gender and birth country among laureates?

In [6]:
top_nobel_coutry_sum = df["birth_country"].groupby(df['prize']).agg('sum').value_counts()

In [7]:
top_nobel_coutry_sum

birth_country
United States of America                            64
United Kingdom                                      27
France                                              27
United States of AmericaUnited States of America    26
0                                                   21
                                                    ..
Guadeloupe Island                                    1
Bosnia (Bosnia and Herzegovina)                      1
Ottoman Empire (Turkey)                              1
Austria-Hungary (Ukraine)Germany                     1
CanadaUnited States of AmericaNetherlands            1
Name: count, Length: 281, dtype: int64

In [8]:
top_nobel_gender = df["sex"].groupby(df['prize']).agg('sum').value_counts()

In [9]:
top_nobel_gender

sex
Male                  319
MaleMale              125
MaleMaleMale           97
Female                 28
0                      20
MaleFemaleMale          8
MaleMaleFemale          7
FemaleMale              6
MaleFemale              6
FemaleFemale            2
FemaleFemaleFemale      1
FemaleMaleMale          1
FemaleFemaleMale        1
Name: count, dtype: int64

So we can see that the top 3 countries are the USA, United Kingdom, and Germany. As for gender, men have received the most Nobel Prizes.

### 2. Which decade had the highest ratio of US-born Nobel Prize winners to total winners across all categories?

In [10]:
birth_nobel_prize 

NameError: name 'birth_nobel_prize' is not defined

In [None]:
us_birth_nobel = df[df["birth_country"] == "United States of America"]


In [None]:
us_birth_nobel.shape

In [None]:
us_birth_nobel["birth_date"].dtype

In [None]:
def corriger_date(valeur):
    if pd.isna(valeur):
        return pd.NaT
    if isinstance(valeur, (int, float)):
        return pd.to_datetime(str(int(valeur)), format='%Y')
    if isinstance(valeur, str):
        if '-00-00' in valeur:
            return pd.to_datetime(valeur[:4], format='%Y')
        else:
            return pd.to_datetime(valeur, errors='coerce')
    return pd.NaT

In [None]:
us_birth_nobel['birth_date'] = us_birth_nobel['birth_date'].apply(corriger_date)

In [None]:
us_birth_nobel['birth_date'] = pd.to_datetime(us_birth_nobel['birth_date'])

In [None]:
us_birth_nobel['birth_year'] = us_birth_nobel['birth_date'].dt.year

In [None]:
us_birth_nobel[us_birth_nobel["birth_year"].isna()]

In [None]:
us_birth_nobel['birth_year'].min()

In [None]:
us_birth_nobel['birth_year'].max()

In [None]:
us_birth_nobel["birth_year"].value_counts().sum

Now we index value by decade to analyse the result

In [None]:
bins = [1840, 1850, 1860, 1870, 1880, 1890, 1900, 1910, 1920, 1930, 1940, 1950, 1960, 1970]
labels = ['1840-1850','1850-1860', '1860-1870', '1870-1880', '1880-1890', '1890-1900',
          '1900-1910', '1910-1920', '1920-1930', '1930-1940', '1940-1950', '1950-1960', '1960-1970']

In [None]:
us_birth_nobel["decade"] = pd.cut(us_birth_nobel["birth_year"], bins=bins, labels=labels, right=False)

In [None]:
us_birth_nobel["decade"]

In [None]:
seeNan_value = us_birth_nobel[us_birth_nobel["decade"].isna()]

In [None]:
seeNan_value.shape

In [None]:
seeNan_value

We can find is birth date to update the dataset so it was born on 22/09/1959 so we add that on the dataset 

In [None]:
us_birth_nobel.loc[850, 'decade'] = '1950-1960'

In [None]:
seeNan_value

In [None]:
seeNan_value = us_birth_nobel[us_birth_nobel["decade"].isna()]

In [None]:
seeNan_value

In [None]:
top_decade = us_birth_nobel['decade'].value_counts()

In [None]:
top_decade

In [None]:
df = top_decade.reset_index()
df.columns = ['decade', 'counts']  

In [None]:
df

In [None]:
plt.style.use('grayscale')
sns.barplot(data=df, x="decade", y="counts")
plt.xticks(rotation = 45)
plt.xlabel("")
plt.title('Decadal Comparison of US-Born Nobel Prize Winners')



We can see that the 1940-1950 decade had the highest number of US-born Nobel Prize winners, with a total of 60 laureates.

### 3. Which decade and Nobel Prize category combination had the highest proportion of female laureates?

First, we will **isolate** the female laureates and **create** a new DataFrame to **analyze** their contributions more effectively.

In [None]:
df["sex"].unique()

In [None]:
df[pd.isna(df['sex'])]

We observe that the NaN values for the 'sex' column correspond to organizations or associations, indicating that these entries are not errors in the data.

In [None]:
female_laureates = df[df["sex"] == "Female" ]

In [None]:
female_laureates.shape

We can observe that there are 65 female laureates in the dataset

In [None]:
female_laureates["birth_date"].dtype

We need to convert the data type of the relevant columns to a date format. To achieve this, we will utilize the functions we created for Question 2.

After the conversion, we will conduct a similar analysis to determine the minimum and maximum values, and subsequently create decade columns for further categorization.

In [None]:
female_laureates['birth_date'] = female_laureates['birth_date'].apply(corriger_date)

In [None]:
female_laureates['birth_date'] = pd.to_datetime(female_laureates['birth_date'])

In [None]:
female_laureates['birth_year'] = female_laureates['birth_date'].dt.year

In [None]:
female_laureates[female_laureates["birth_year"].isna()]

In [None]:
female_laureates['birth_year'].min()

In [None]:
female_laureates['birth_year'].max()

In [None]:
bins = [1840, 1850, 1860, 1870, 1880, 1890, 1900, 1910, 1920, 1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000, 2010]
labels = ['1840-1850','1850-1860', '1860-1870', '1870-1880', '1880-1890', '1890-1900',
          '1900-1910', '1910-1920', '1920-1930', '1930-1940', '1940-1950', '1950-1960', '1960-1970', '1970-1980', '1980-1990', '1990-2000', '2000-2010']

In [None]:
female_laureates["decade"] = pd.cut(female_laureates["birth_year"], bins=bins, labels=labels, right=False)

In [None]:
top_female_decade = female_laureates["decade"].value_counts()

In [None]:
top_female_decade

Without plotting the results, we can observe that the decade from 1940 to 1950 has the highest number of Nobel laureates, with a total of 14 awarded during this period.

For better visibility of the results, we will nonetheless create a graph to illustrate these findings. 

In [None]:
top_female_decade = top_female_decade.reset_index()
top_female_decade.columns = ['decade', 'counts']  

In [None]:
plt.style.use('grayscale')
sns.barplot(data=top_female_decade, x="decade", y="counts")
plt.xticks(rotation = 45)
plt.xlabel("")
plt.title('Decadal Comparison of Female Nobel Prize Winners')



For this question, we aim to identify which combination of Nobel Prize categories has the highest proportion of female laureates, rather than focusing solely on the decade.

In [None]:
top_female_combinaison = female_laureates.groupby('category')['decade'].value_counts()

In [None]:
top_female_combinaison_df = top_female_combinaison.reset_index(name='count')

In [None]:
top_values = top_female_combinaison_df.loc[top_female_combinaison_df.groupby('category')['count'].idxmax()]

In [None]:
top_values

The combination of the decade 1940-1950 and the Peace category had the highest proportion of female Nobel laureates.

### 4. Who was the first woman to receive a Nobel Prize, and in what category?

In [None]:
female_laureates.columns

In [None]:
female_laureates['year'].min()

In [None]:
female_laureates["category"][female_laureates['year'] == 1903]

In [None]:
The first 

In [None]:
female_laureates[female_laureates['year'] == 1903]

**Marie Curie** made history in **1903** as the first woman to be awarded a Nobel Prize, receiving the prestigious honor in the field of **Physics**.

### 5. Which individuals or organizations have won more than one Nobel Prize throughout the years?

In [14]:
counts = df['full_name'].value_counts()

In [15]:
counts

full_name
Comité international de la Croix Rouge (International Committee of the Red Cross)    3
Linus Carl Pauling                                                                   2
John Bardeen                                                                         2
Frederick Sanger                                                                     2
Marie Curie, née Sklodowska                                                          2
                                                                                    ..
Karl Ziegler                                                                         1
Giulio Natta                                                                         1
Giorgos Seferis                                                                      1
Sir John Carew Eccles                                                                1
Claudia Goldin                                                                       1
Name: count, Length: 993, dtype: 

In [16]:
nobel_more_1 = counts[counts > 1].index.tolist()


In [17]:
nobel_more_1

['Comité international de la Croix Rouge (International Committee of the Red Cross)',
 'Linus Carl Pauling',
 'John Bardeen',
 'Frederick Sanger',
 'Marie Curie, née Sklodowska',
 'Office of the United Nations High Commissioner for Refugees (UNHCR)']

We can observe that only four individuals and two organizations have distinguished themselves by winning multiple Nobel Prizes over the years.