# Data Exploration

This notebook lets you quickly explore datasets used in the deep learning comparison project.

- **MNIST** (handwritten digits)
- **CIFAR-10** (10-class color images)
- **Titanic** (tabular classification)

Use the sections below to load and preview each dataset. You can comment/uncomment sections as needed.

## Setup

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import tensorflow as tf
from tensorflow.keras.datasets import mnist, cifar10


## MNIST: Load & Preview

In [None]:
# Load MNIST (grayscale 28x28 images)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print('MNIST shapes:', X_train.shape, X_test.shape)

# Show a few samples
plt.figure()
for i in range(9):
    plt.subplot(3,3,i+1)
    plt.imshow(X_train[i], cmap='gray')
    plt.title(int(y_train[i]))
    plt.axis('off')
plt.tight_layout()
plt.show()


## CIFAR-10: Load & Preview

In [None]:
# Load CIFAR-10 (RGB 32x32 images)
(X_train_c, y_train_c), (X_test_c, y_test_c) = cifar10.load_data()
print('CIFAR-10 shapes:', X_train_c.shape, X_test_c.shape)

# Show a few samples
plt.figure()
for i in range(9):
    plt.subplot(3,3,i+1)
    plt.imshow(X_train_c[i])
    plt.title(int(y_train_c[i]))
    plt.axis('off')
plt.tight_layout()
plt.show()


## Titanic: Load from CSV

Place `train.csv` in `datasets/titanic/` or update the path below.

In [None]:
csv_path = '../datasets/titanic/train.csv'
if os.path.exists(csv_path):
    df = pd.read_csv(csv_path)
    print('Titanic shape:', df.shape)
    display(df.head())
    # Basic preprocessing example
    cols = ['Pclass','Sex','Age','Fare']
    df = df.dropna(subset=['Age','Fare','Sex','Pclass'])
    df['Sex'] = df['Sex'].map({'male':0, 'female':1})
    X = df[cols].values
    y = df['Survived'].values
    print('Feature matrix shape:', X.shape)
else:
    print('Titanic CSV not found at', csv_path)
