# Analysis of Nobel Prize Laureates (1901-2023)

## Introduction

The Nobel Prize, awarded since 1901, is one of the most prestigious international accolades. Each year, prizes are bestowed in six categories: Chemistry, Literature, Physics, Physiology or Medicine, Economics, and Peace.

In this project, we will explore and analyze data on Nobel Prize laureates from 1901 to 2023, provided by the Nobel Foundation through their API.

Our analysis aims to answer several key questions, including:

1. What is the most commonly awarded gender and birth country among laureates?
2. Which decade had the highest ratio of US-born Nobel Prize winners to total winners across all categories?
3. Which decade and Nobel Prize category combination had the highest proportion of female laureates?
4. Who was the first woman to receive a Nobel Prize, and in what category?
5. Which individuals or organizations have won more than one Nobel Prize throughout the years?

Through this analysis, we will uncover fascinating trends and valuable insights into the history of Nobel Prizes.

Let's begin our exploration!

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [3]:
df = pd.read_csv("Dataset/nobel.csv")

In [4]:
df.head()

Unnamed: 0,year,category,prize,motivation,prize_share,laureate_id,laureate_type,full_name,birth_date,birth_city,birth_country,sex,organization_name,organization_city,organization_country,death_date,death_city,death_country
0,1901,Chemistry,The Nobel Prize in Chemistry 1901,"""in recognition of the extraordinary services ...",1/1,160,Individual,Jacobus Henricus van 't Hoff,1852-08-30,Rotterdam,Netherlands,Male,Berlin University,Berlin,Germany,1911-03-01,Berlin,Germany
1,1901,Literature,The Nobel Prize in Literature 1901,"""in special recognition of his poetic composit...",1/1,569,Individual,Sully Prudhomme,1839-03-16,Paris,France,Male,,,,1907-09-07,Châtenay,France
2,1901,Medicine,The Nobel Prize in Physiology or Medicine 1901,"""for his work on serum therapy, especially its...",1/1,293,Individual,Emil Adolf von Behring,1854-03-15,Hansdorf (Lawice),Prussia (Poland),Male,Marburg University,Marburg,Germany,1917-03-31,Marburg,Germany
3,1901,Peace,The Nobel Peace Prize 1901,,1/2,462,Individual,Jean Henry Dunant,1828-05-08,Geneva,Switzerland,Male,,,,1910-10-30,Heiden,Switzerland
4,1901,Peace,The Nobel Peace Prize 1901,,1/2,463,Individual,Frédéric Passy,1822-05-20,Paris,France,Male,,,,1912-06-12,Paris,France


In [5]:
df.columns

Index(['year', 'category', 'prize', 'motivation', 'prize_share', 'laureate_id',
       'laureate_type', 'full_name', 'birth_date', 'birth_city',
       'birth_country', 'sex', 'organization_name', 'organization_city',
       'organization_country', 'death_date', 'death_city', 'death_country'],
      dtype='object')

In [6]:
df.shape

(1000, 18)

### 1. What is the most commonly awarded gender and birth country among laureates?

In [7]:
top_nobel_coutry_sum = df["birth_country"].groupby(df['prize']).agg('sum').value_counts()

In [8]:
top_nobel_coutry_sum

birth_country
United States of America                            64
United Kingdom                                      27
France                                              27
United States of AmericaUnited States of America    26
0                                                   21
                                                    ..
Guadeloupe Island                                    1
Bosnia (Bosnia and Herzegovina)                      1
Ottoman Empire (Turkey)                              1
Austria-Hungary (Ukraine)Germany                     1
CanadaUnited States of AmericaNetherlands            1
Name: count, Length: 281, dtype: int64

In [9]:
top_nobel_gender = df["sex"].groupby(df['prize']).agg('sum').value_counts()

In [10]:
top_nobel_gender

sex
Male                  319
MaleMale              125
MaleMaleMale           97
Female                 28
0                      20
MaleFemaleMale          8
MaleMaleFemale          7
FemaleMale              6
MaleFemale              6
FemaleFemale            2
FemaleFemaleFemale      1
FemaleMaleMale          1
FemaleFemaleMale        1
Name: count, dtype: int64

So we can see that the top 3 countries are the USA, United Kingdom, and Germany. As for gender, men have received the most Nobel Prizes.

### 2. Which decade had the highest ratio of US-born Nobel Prize winners to total winners across all categories?

In [17]:
birth_nobel_prize = df["birth_country"].unique()

In [18]:
birth_nobel_prize 

array(['Netherlands', 'France', 'Prussia (Poland)', 'Switzerland',
       'Prussia (Germany)', 'Schleswig (Germany)', 'India', 'Sweden',
       'Norway', 'Faroe Islands (Denmark)', 'United Kingdom',
       'Russian Empire (Poland)', 'Scotland', 'Spain', 'Russia', nan,
       'Poland', 'Germany', 'Austrian Empire (Czech Republic)',
       'Hungary (Slovakia)', 'Tuscany (Italy)', 'Italy',
       'United States of America', 'Bavaria (Germany)',
       'British India (India)', 'Austrian Empire (Italy)', 'New Zealand',
       'East Friesland (Germany)', 'Russian Empire (Ukraine)', 'Denmark',
       'Luxembourg', 'Russian Empire (Latvia)', 'Belgium',
       'Hesse-Kassel (Germany)', 'Germany (Russia)',
       'Mecklenburg (Germany)', 'Austria', 'Prussia (Russia)',
       'Australia', 'Austria-Hungary (Slovenia)', 'Ireland', 'Canada',
       'Java, Dutch East Indies (Indonesia)', 'Austrian Empire (Austria)',
       'Germany (Poland)', 'W&uuml;rttemberg (Germany)', 'Argentina',
       'Austria

In [19]:
us_birth_nobel = df[df["birth_country"] == "United States of America"]


In [21]:
us_birth_nobel.shape

(291, 18)

In [30]:
us_birth_nobel["birth_date"].dtype

dtype('O')

In [39]:
def corriger_date(valeur):
    if pd.isna(valeur):
        return pd.NaT
    if isinstance(valeur, (int, float)):
        return pd.to_datetime(str(int(valeur)), format='%Y')
    if isinstance(valeur, str):
        if '-00-00' in valeur:
            return pd.to_datetime(valeur[:4], format='%Y')
        else:
            return pd.to_datetime(valeur, errors='coerce')
    return pd.NaT

In [40]:
us_birth_nobel['birth_date'] = us_birth_nobel['birth_date'].apply(corriger_date)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  us_birth_nobel['birth_date'] = us_birth_nobel['birth_date'].apply(corriger_date)


In [41]:
us_birth_nobel['birth_date'] = pd.to_datetime(us_birth_nobel['birth_date'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  us_birth_nobel['birth_date'] = pd.to_datetime(us_birth_nobel['birth_date'])


In [42]:
us_birth_nobel['birth_year'] = us_birth_nobel['birth_date'].dt.year

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  us_birth_nobel['birth_year'] = us_birth_nobel['birth_date'].dt.year


In [43]:
us_birth_nobel['birth_year']

35     1858.0
72     1845.0
79     1868.0
95     1856.0
117    1868.0
        ...  
987    1953.0
988    1955.0
990    1959.0
995       NaN
999    1946.0
Name: birth_year, Length: 291, dtype: float64

In [None]:
# To do : clean row with Nan , creer une DF avec les decennie et plot pour voir les meilleurs resultats