# Death Prediction Project: Exploring Novel Models with Benchmarks

## Description
This project explores predictive modeling techniques for death prediction using a dataset with demographic and temporal features. The goal is to predict the **year of death** for individuals based on the given features.

### Models Implemented:
- **XGBoost**: Used as a benchmark model, achieving ~78% accuracy for ±1 year error tolerance.
- **DF2M (Deep Functional Factor Models)**: Currently under implementation, leveraging sparse factorization, Gaussian Processes, and deep kernels for explainable and robust predictions.
- **MOIRAI**: Planned implementation as another novel model for comparison, leveraging advanced transformer architectures for universal time-series forecasting.

### Objectives:
1. **Benchmarking**: Evaluate baseline performance using XGBoost.
2. **Novel Approaches**: Implement and test research-based models such as DF2M and MOIRAI.
3. **Learning**: Showcase the ability to read, implement, and adapt cutting-edge research into real-world predictive tasks.

### Progress:
- **Data Preprocessing**: Completed with handling of missing values, feature encoding, and scaling.
- **XGBoost**: Successfully implemented as the benchmark model.
- **DF2M**: Currently under active development.
- **MOIRAI**: Implementation planned as the next step.

### Next Steps:
1. Finalize the implementation of DF2M and evaluate its performance against XGBoost.
2. Integrate MOIRAI into the project to compare its performance on the same dataset.
3. Analyze results and document key insights from the model evaluations.


# Import Libraries, Dependencies and Dataset


In [5]:
!pip install pytorch-tabnet
!pip install pyro-ppl
!pip install gpytorch
!pip install kagglehub


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m
Collecting kagglehub
  Downloading kagglehub-0.3.6-py3-none-any.whl.metadata (30 kB)
Downloading kagglehub-0.3.6-py3-none-any.whl (51 kB)
Installing collected packages: kagglehub
Successfully installed kagglehub-0.

In [6]:
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import kagglehub
import os

# pytorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset, TensorDataset

# tabnet
from pytorch_tabnet.tab_model import TabNetRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# bayesian modeling with pyro
import pyro
from pyro.infer import SVI, Trace_ELBO, config_enumerate 
from pyro.infer.autoguide import AutoMultivariateNormal
from pyro.distributions import Normal, Bernoulli, MultivariateNormal, constraints
import pyro.poutine as poutine
from pyro.optim import Adam


# guassian processoes with gpytorch
import gpytorch
from gpytorch.models import ExactGP
from gpytorch.kernels import RBFKernel, ScaleKernel, PeriodicKernel
from gpytorch.likelihoods import GaussianLikelihood
from gpytorch.distributions import MultivariateNormal
from gpytorch.means import ConstantMean


# proprocessings
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.utils import resample
from sklearn.impute import SimpleImputer
from tqdm import tqdm
import time

In [13]:
# Dataset download
path = kagglehub.dataset_download("imoore/age-dataset")

print("Path to dataset files:", path)
dataset_path = "/home/codespace/.cache/kagglehub/datasets/imoore/age-dataset/versions/1"
print("Files in dataset directory:", os.listdir(dataset_path))

file_path = os.path.join(dataset_path, "AgeDataset-V1.csv")  #
df = pd.read_csv(file_path)

Path to dataset files: /home/codespace/.cache/kagglehub/datasets/imoore/age-dataset/versions/1
Files in dataset directory: ['AgeDataset-V1.csv']


In [14]:
print(df.head())
print(df.info())
print(df.describe())

     Id                     Name  \
0   Q23        George Washington   
1   Q42            Douglas Adams   
2   Q91          Abraham Lincoln   
3  Q254  Wolfgang Amadeus Mozart   
4  Q255     Ludwig van Beethoven   

                                 Short description Gender  \
0   1st president of the United States (1732–1799)   Male   
1                      English writer and humorist   Male   
2  16th president of the United States (1809-1865)   Male   
3        Austrian composer of the Classical period   Male   
4           German classical and romantic composer   Male   

                                             Country  Occupation  Birth year  \
0  United States of America; Kingdom of Great Bri...  Politician        1732   
1                                     United Kingdom      Artist        1952   
2                           United States of America  Politician        1809   
3    Archduchy of Austria; Archbishopric of Salzburg      Artist        1756   
4               