## **Digit Recognizer - CNN + RFC Baseline**


### *Goal*:
#### Classify images of handwritten digits (0-9) from 28x28 grayscale images using CNN and Random Forest.

### *Approach*:
#### 1. Load data
#### 2. Preprocess (normalize, reshape)
#### 3. Train a CNN for high accuracy
#### 4. Train a Random Forest for baseline comparison
#### 5. Generate predictions and save two submission files


In [21]:
# Import Libraries
import numpy as np
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, Flatten, Dropout
from tensorflow.keras.utils import to_categorical

In [22]:
# Load data
train_df = pd.read_csv('/Users/Aayush/Desktop/kaggle_codes/datasets/digit_recognizer/train.csv')
test_df = pd.read_csv('/Users/Aayush/Desktop/kaggle_codes/datasets/digit_recognizer/test.csv')

In [23]:
# Preprocessing for CNN
# Split features and labels
X_train = train_df.drop('label', axis =1).values
y_train = train_df['label'].values

In [24]:
# Normalize pixel values to [0, 1]
X_train = X_train / 255.0
test_df = test_df / 255.0

In [25]:
# Reshape for CNN (28x28x1)
X_train = X_train.reshape(-1, 28, 28, 1)
test_df = test_df.values.reshape(-1, 28, 28, 1)

In [26]:
# One-hot encode labels for CNN
y_train = to_categorical(y_train, num_classes=10)

In [27]:
# Build CNN
model = Sequential([
    Input(shape=(28, 28, 1)),
    Conv2D(32, kernel_size=(3,3), activation='relu'),
    MaxPooling2D(pool_size=(2,2)),
    Conv2D(64, kernel_size=(3,3), activation='relu'),
    MaxPooling2D(pool_size=(2,2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train
model.fit(X_train, y_train, epochs=10, batch_size=64, validation_split=0.1)

# Predict
pred = model.predict(test_df)
pred_labels = np.argmax(pred, axis=1)

# Create submission
submission = pd.DataFrame({'ImageId': range(1, len(pred_labels)+1), 'Label': pred_labels})
submission.to_csv('submission.csv', index=False)

Epoch 1/10
[1m233/591[0m [32m━━━━━━━[0m[37m━━━━━━━━━━━━━[0m [1m12s[0m 36ms/step - accuracy: 0.6734 - loss: 1.0125

KeyboardInterrupt: 