# Nobel Prize Data Analysis Project

This Jupyter notebook explores a curated dataset of Nobel Prize laureates, focusing on key trends and insights using Python's powerful data science libraries: `pandas`, `numpy`, `matplotlib`, and `seaborn`.

---

**Goals**:
- Explore the data
- Ask and answer meaningful questions
- Visualize patterns across time, gender, geography, and age
- Present findings with insights



In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Simulated Nobel dataset
nobel_df = pd.DataFrame({
    "id": [1, 2, 3, 4, 5, 6],
    "firstname": ["Marie", "Pierre", "Albert", "Richard", "Malala", "Tu"],
    "surname": ["Curie", "Curie", "Einstein", "Feynman", "Yousafzai", "Youyou"],
    "born": ["1867-11-07", "1859-05-15", "1879-03-14", "1918-05-11", "1997-07-12", "1930-12-30"],
    "died": ["1934-07-04", "1906-04-19", "1955-04-18", "1988-02-15", "0000-00-00", "0000-00-00"],
    "gender": ["female", "male", "male", "male", "female", "female"],
    "year": [1903, 1903, 1921, 1965, 2014, 2015],
    "category": ["physics", "physics", "physics", "physics", "peace", "medicine"],
    "birthplace": ["Warsaw, Poland", "Paris, France", "Ulm, Germany", "Queens, New York, USA", "Mingora, Pakistan", "Ningbo, China"],
    "country": ["Poland", "France", "Germany", "USA", "Pakistan", "China"],
    "motivation": [
        "in recognition of the extraordinary services to Physics",
        "jointly with Marie Curie",
        "for his services to Theoretical Physics",
        "for their fundamental work in quantum electrodynamics",
        "for her struggle against the suppression of children",
        "for her discoveries concerning a novel therapy"
    ]
})

# Preprocess and enrich dataset
nobel_df['born'] = pd.to_datetime(nobel_df['born'], errors='coerce')
nobel_df['year'] = pd.to_numeric(nobel_df['year'], errors='coerce')
nobel_df['age_at_award'] = nobel_df['year'] - nobel_df['born'].dt.year

# Show basic info
print(nobel_df.info())
print(nobel_df.head())

In [None]:
# Unique categories and gender counts
print("Unique categories:", nobel_df['category'].unique())
print("Gender counts:")
print(nobel_df['gender'].value_counts())

# Awards by decade
nobel_df['decade'] = (nobel_df['year'] // 10) * 10
print(nobel_df['decade'].value_counts())

# Most common surnames
print("Top surnames:")
print(nobel_df['surname'].value_counts())

# Awards by country
print("Country counts:")
print(nobel_df['country'].value_counts())

# Detect multiple laureates by ID
laureate_counts = nobel_df.groupby("id").size()
multiple_awards = laureate_counts[laureate_counts > 1]
print("Multiple award winners:")
print(multiple_awards)

In [None]:
sns.set(style="whitegrid")

# Gender Distribution
plt.figure(figsize=(6,4))
sns.countplot(x='gender', data=nobel_df)
plt.title("Gender Distribution")
plt.show()

# Award Categories
plt.figure(figsize=(6,4))
sns.countplot(y='category', data=nobel_df, order=nobel_df['category'].value_counts().index)
plt.title("Awards by Category")
plt.show()

# Awards Over Time
plt.figure(figsize=(6,4))
sns.histplot(nobel_df['year'], bins=10)
plt.title("Nobel Awards Over Time")
plt.show()

# Age at Time of Award
plt.figure(figsize=(6,4))
sns.histplot(nobel_df['age_at_award'].dropna(), bins=10)
plt.title("Age at Award")
plt.show()

# Country Frequency
plt.figure(figsize=(6,4))
top_countries = nobel_df['country'].value_counts().nlargest(5)
sns.barplot(x=top_countries.values, y=top_countries.index)
plt.title("Top 5 Countries by Awards")
plt.show()

### Final Observations

- Nobel prizes have historically been dominated by male recipients.
- Physics is a common category among the sample, but Peace and Medicine also feature.
- Most laureates are in their late 40s or older at the time of receiving the prize.
- Country-wise, developed nations like USA, Germany, and France are well-represented.
- The data offers great opportunities for storytelling with deeper, full-dataset analysis.
