# Tabular Quick Start - Titanic Classification

## 🎯 Objective
A minimal baseline demonstrating the simplest AutoGluon workflow: **load → fit → leaderboard**

**Task**: Binary Classification  
**Dataset**: Titanic (Kaggle)  
**Target**: `Survived`  
**Metric**: ROC-AUC  

## 📺 Video Tutorial

[![AutoGluon Part 2: Tabular Demos](https://img.youtube.com/vi/WXv557L0ny4/0.jpg)](https://youtu.be/WXv557L0ny4)

Click the image above to watch the complete Part 2 tutorial on YouTube!

## 📋 What This Notebook Does
1. Load Titanic dataset
2. Train AutoGluon with ONE line of code
3. View leaderboard and feature importance
4. Make predictions

## 📦 Install Dependencies

In [None]:
!pip install -q torch torchvision torchaudio
!pip install -q autogluon

## 📚 Import Libraries

In [8]:
from autogluon.tabular import TabularDataset, TabularPredictor

## 📥 Load Dataset

AutoGluon provides sample datasets including Titanic:

In [9]:
# Load Titanic dataset (built-in)
train = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/titanic/train.csv')
test = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/titanic/test.csv')

print(f"✅ Data loaded!")
print(f"   Train: {train.shape}")
print(f"   Test:  {test.shape}")
print(f"\n📊 First few rows:")
display(train.head())

Loaded data from: https://autogluon.s3.amazonaws.com/datasets/titanic/train.csv | Columns = 12 / 12 | Rows = 891 -> 891
Loaded data from: https://autogluon.s3.amazonaws.com/datasets/titanic/test.csv | Columns = 11 / 11 | Rows = 418 -> 418


✅ Data loaded!
   Train: (891, 12)
   Test:  (418, 11)

📊 First few rows:


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## 🎯 Set Target Label

In [10]:
LABEL = "Survived"
print(f"🎯 Target: {LABEL}")

🎯 Target: Survived


## 🚀 Train Model (ONE Line!)

This is the simplest possible AutoGluon workflow:

In [11]:
# Train with default settings
predictor = TabularPredictor(label=LABEL).fit(
    train,
    presets="medium_quality",
    time_limit=300  # 5 minutes for quick demo
)

No path specified. Models will be saved in: "AutogluonModels/ag-20251026_224248"
Verbosity: 2 (Standard Logging)
AutoGluon Version:  1.4.0
Python Version:     3.9.6
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 25.0.0: Wed Sep 17 21:42:08 PDT 2025; root:xnu-12377.1.9~141/RELEASE_ARM64_T8132
CPU Count:          10
Memory Avail:       3.80 GB / 16.00 GB (23.8%)
Disk Space Avail:   94.16 GB / 228.27 GB (41.2%)
Presets specified: ['medium_quality']
Using hyperparameters preset: hyperparameters='default'
Beginning AutoGluon training ... Time limit = 300s
AutoGluon will save models to "/Users/banbalagan/Projects/autogluon-assignment/part2-demos/AutogluonModels/ag-20251026_224248"
Train Data Rows:    891
Train Data Columns: 11
Label Column:       Survived
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [np.int64(0), np.int64(1)]
	If 'binary' is not the correct probl

## 📊 Leaderboard

In [12]:
leaderboard = predictor.leaderboard(train, silent=True)
print("🏆 Model Leaderboard:")
display(leaderboard)

leaderboard.to_csv('leaderboard.csv', index=False)
print("\n💾 Saved: leaderboard.csv")

🏆 Model Leaderboard:


Unnamed: 0,model,score_test,score_val,eval_metric,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,RandomForestEntr,0.962963,0.815642,accuracy,0.032627,0.026798,0.215507,0.032627,0.026798,0.215507,1,True,4
1,RandomForestGini,0.962963,0.815642,accuracy,0.045693,0.030488,0.249094,0.045693,0.030488,0.249094,1,True,3
2,ExtraTreesGini,0.961841,0.810056,accuracy,0.049123,0.026853,0.200536,0.049123,0.026853,0.200536,1,True,6
3,ExtraTreesEntr,0.960718,0.804469,accuracy,0.045828,0.025885,0.205944,0.045828,0.025885,0.205944,1,True,7
4,NeuralNetTorch,0.933782,0.837989,accuracy,0.0109,0.007868,3.190696,0.0109,0.007868,3.190696,1,True,10
5,WeightedEnsemble_L2,0.933782,0.843575,accuracy,0.022956,0.013015,3.659836,0.001302,0.000267,0.061544,2,True,12
6,LightGBMLarge,0.928171,0.815642,accuracy,0.003448,0.00185,2.493369,0.003448,0.00185,2.493369,1,True,11
7,LightGBM,0.905724,0.821229,accuracy,0.003143,0.001649,0.694414,0.003143,0.001649,0.694414,1,True,2
8,NeuralNetFastAI,0.892256,0.826816,accuracy,0.010754,0.00488,0.407596,0.010754,0.00488,0.407596,1,True,8
9,XGBoost,0.881033,0.815642,accuracy,0.006891,0.00302,0.309392,0.006891,0.00302,0.309392,1,True,9



💾 Saved: leaderboard.csv


## 🔍 Feature Importance

In [13]:
feature_importance = predictor.feature_importance(train)
print("🔍 Feature Importance:")
display(feature_importance)

feature_importance.to_csv('feature_importance.csv')
print("\n💾 Saved: feature_importance.csv")

Computing feature importance via permutation shuffling for 11 features using 891 rows with 5 shuffle sets...
	2.58s	= Expected runtime (0.52s per shuffle set)
	1.18s	= Actual runtime (Completed 5 of 5 shuffle sets)


🔍 Feature Importance:


Unnamed: 0,importance,stddev,p_value,n,p99_high,p99_low
Sex,0.15174,0.00937,1.735942e-06,5,0.171033,0.132447
Ticket,0.108866,0.005082,5.67991e-07,5,0.119329,0.098403
Name,0.102581,0.010366,1.234255e-05,5,0.123924,0.081238
Pclass,0.057464,0.003756,2.178048e-06,5,0.065197,0.04973
SibSp,0.040404,0.006964,0.0001018318,5,0.054743,0.026065
Age,0.036588,0.006668,0.0001267265,5,0.050318,0.022858
Parch,0.03367,0.003272,1.057024e-05,5,0.040407,0.026933
Cabin,0.032772,0.001844,1.198246e-06,5,0.036569,0.028975
Embarked,0.024916,0.005044,0.0001910308,5,0.035302,0.01453
PassengerId,0.018406,0.004318,0.0003381574,5,0.027297,0.009516



💾 Saved: feature_importance.csv


## 🔮 Predictions

In [14]:
predictions = predictor.predict(test)
print("🔮 Sample predictions:")
print(predictions.head(10))

🔮 Sample predictions:
0    0
1    1
2    0
3    0
4    1
5    0
6    0
7    0
8    1
9    0
Name: Survived, dtype: int64


## 💾 Save Model

In [15]:
import shutil
shutil.make_archive('autogluon_model', 'zip', predictor.path)
print("✅ Model saved: autogluon_model.zip")

✅ Model saved: autogluon_model.zip


## 🎓 Summary

This notebook showed the **minimal AutoGluon workflow**:

```python
predictor = TabularPredictor(label=LABEL).fit(train)
```

That's it! AutoGluon handles:
- ✅ Data preprocessing
- ✅ Feature engineering
- ✅ Model selection
- ✅ Hyperparameter tuning
- ✅ Ensemble creation

**Key Findings (Typical Results):**
- Most important features: Sex, Fare, Age, Pclass
- Ensemble models perform best
- ~80-85% accuracy achievable