# Nobel Prize Analysis
This project explores the Nobel Laureates dataset to derive insights into historical trends, category-wise awards, gender distribution, geographical patterns, and age analysis at the time of winning.

### Dataset Overview
The dataset includes details such as:
- Name and Gender
- Country and Birthplace
- Category of Award
- Year of Award
- Motivation


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load mock dataset (simulate with a small sample for this notebook)
nobel_df = pd.DataFrame({
    "id": [1, 2, 3, 4, 5, 6],
    "firstname": ["Marie", "Pierre", "Albert", "Richard", "Malala", "Tu"],
    "surname": ["Curie", "Curie", "Einstein", "Feynman", "Yousafzai", "Youyou"],
    "born": ["1867-11-07", "1859-05-15", "1879-03-14", "1918-05-11", "1997-07-12", "1930-12-30"],
    "died": ["1934-07-04", "1906-04-19", "1955-04-18", "1988-02-15", "0000-00-00", "0000-00-00"],
    "gender": ["female", "male", "male", "male", "female", "female"],
    "year": [1903, 1903, 1921, 1965, 2014, 2015],
    "category": ["physics", "physics", "physics", "physics", "peace", "medicine"],
    "birthplace": ["Warsaw, Poland", "Paris, France", "Ulm, Germany", "Queens, New York, USA", "Mingora, Pakistan", "Ningbo, China"],
    "country": ["Poland", "France", "Germany", "USA", "Pakistan", "China"],
    "motivation": [
        "in recognition of the extraordinary services to Physics",
        "jointly with Marie Curie",
        "for his services to Theoretical Physics",
        "for their fundamental work in quantum electrodynamics",
        "for her struggle against the suppression of children",
        "for her discoveries concerning a novel therapy"
    ]
})

# Preprocessing
nobel_df['born'] = pd.to_datetime(nobel_df['born'], errors='coerce')
nobel_df['year'] = pd.to_numeric(nobel_df['year'], errors='coerce')
nobel_df['age_at_award'] = nobel_df['year'] - nobel_df['born'].dt.year

### Key Questions
1. What is the distribution of Nobel Prizes by gender?
2. Which categories have the highest number of awards?
3. How are Nobel Prizes distributed over time?
4. What is the age distribution of laureates at the time of the award?
5. Which countries have the most laureates?
6. Are there notable patterns across gender, age, and field?


In [None]:
# Set up visual theme
sns.set(style="whitegrid")

# 1. Gender distribution
sns.countplot(x='gender', data=nobel_df)
plt.title("Gender Distribution of Laureates")
plt.show()

# 2. Awards by category
sns.countplot(y='category', data=nobel_df, order=nobel_df['category'].value_counts().index)
plt.title("Awards by Category")
plt.show()

# 3. Yearly distribution
sns.histplot(data=nobel_df, x='year', bins=10, kde=False)
plt.title("Distribution of Nobel Awards Over Time")
plt.show()

# 4. Age distribution
sns.histplot(nobel_df['age_at_award'].dropna(), bins=10)
plt.title("Age at Time of Award")
plt.show()

# 5. Country representation
top_countries = nobel_df['country'].value_counts().nlargest(5)
sns.barplot(x=top_countries.values, y=top_countries.index)
plt.title("Top 5 Represented Countries")
plt.show()

### Insights and Conclusions

- **Gender Gap**: The dataset shows a strong gender disparity with more male laureates.
- **Popular Categories**: Physics leads among the categories in this dataset sample.
- **Temporal Spread**: Awards span well over a century, reflecting evolving fields and global contributions.
- **Age Trend**: Nobel winners tend to be in their 40s–60s at the time of recognition.
- **Country Dominance**: The USA, Germany, and France are highly represented, reflecting geopolitical trends in science and peace efforts.
