# Age Prediction Using XGBoost

## Description
This notebook focuses exclusively on implementing **XGBoost** for predicting **age of death** using a structured dataset. XGBoost is a powerful **gradient boosting algorithm** known for its efficiency and high performance in structured data tasks. It is widely used in machine learning competitions and real-world applications due to its scalability, robustness, and feature importance insights.

### **Why This Notebook?**
I have structured this repository to explore multiple modeling approaches **separately**, ensuring that each model is implemented in a dedicated notebook. This notebook is specifically designed to:
- **Showcase XGBoost’s capabilities** in handling structured tabular data.
- **Compare its performance against other models** (which are implemented in separate notebooks).
- **Understand the strengths and limitations** of XGBoost in a regression setting.

### **What’s Covered in This Notebook?**
✅ **Data Preprocessing**: Handling missing values, categorical encoding, and feature scaling.  
✅ **XGBoost Implementation**: Training an optimized **gradient boosting model** on the dataset.  
✅ **Hyperparameter Tuning**: Using techniques such as grid search or random search to optimize performance.  
✅ **Training & Evaluation**: Assessing model performance using metrics like RMSE, MAE, and feature importance analysis.  

Each model is trained and evaluated separately to highlight its strengths and weaknesses. Stay tuned for additional notebooks showcasing **novel techniques and alternative modeling approaches**.


# Import Libraries, Dependencies and Dataset


In [1]:
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import kagglehub
import os



In [2]:
# Dataset download
path = kagglehub.dataset_download("imoore/age-dataset")

print("Path to dataset files:", path)
dataset_path = "/home/codespace/.cache/kagglehub/datasets/imoore/age-dataset/versions/1"
print("Files in dataset directory:", os.listdir(dataset_path))

file_path = os.path.join(dataset_path, "AgeDataset-V1.csv")  #
df = pd.read_csv(file_path)

print(df.head())
print(df.info())
print(df.describe())

Path to dataset files: /home/codespace/.cache/kagglehub/datasets/imoore/age-dataset/versions/1
Files in dataset directory: ['AgeDataset-V1.csv']
     Id                     Name  \
0   Q23        George Washington   
1   Q42            Douglas Adams   
2   Q91          Abraham Lincoln   
3  Q254  Wolfgang Amadeus Mozart   
4  Q255     Ludwig van Beethoven   

                                 Short description Gender  \
0   1st president of the United States (1732–1799)   Male   
1                      English writer and humorist   Male   
2  16th president of the United States (1809-1865)   Male   
3        Austrian composer of the Classical period   Male   
4           German classical and romantic composer   Male   

                                             Country  Occupation  Birth year  \
0  United States of America; Kingdom of Great Bri...  Politician        1732   
1                                     United Kingdom      Artist        1952   
2                           Uni