# Dogs vs. Cats Classification Challenge  

## 🐶 vs. 🐱 - Can Your Model Tell the Difference?  

In this challenge, you'll build a deep learning model to classify images as either **dogs (1)** or **cats (0)**. While this task is trivial for humans (and even for dogs and cats!), teaching a machine to do it requires a well-trained convolutional neural network (CNN).  

### 📌 Your Tasks:  
1. **Train a CNN** on the provided dataset to classify dog and cat images.  
2. **Deploy the model** on Heroku, allowing users to upload an image for prediction.  

---

## 📂 Dataset Structure  
The dataset is organized into training and test sets, with separate folders for dogs and cats:  

```
./dataset/  
│  
├── training_set/  
│   ├── dog/  
│   │   ├── dog1.jpg  
│   │   ├── dog2.jpg  
│   │   └── ...  
│   └── cat/  
│       ├── cat1.jpg  
│       ├── cat2.jpg  
│       └── ...  
│  
└── test_set/  
    ├── dog/  
    │   ├── dog1.jpg  
    │   ├── dog2.jpg  
    │   └── ...  
    └── cat/  
        ├── cat1.jpg  
        ├── cat2.jpg  
        └── ...  
```

---

## 🔄 Loading Images with `ImageDataGenerator`  

Keras provides `ImageDataGenerator` for efficient image loading, augmentation, and batch processing.  

### 🔹 **Key Features:**  
- **Data Augmentation**: Apply transformations (rotation, zoom, flip) to increase dataset diversity.  
- **Batch Processing**: Load images in batches (saves memory).  
- **Automatic Labeling**: Uses folder names as labels (`cat` → `0`, `dog` → `1`).  

### 🔹 **Basic Usage:**  
```python
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Initialize with augmentation (optional)
train_datagen = ImageDataGenerator(
    rescale=1./255,        # Normalize pixel values [0,1]
    shear_range=0.2,       # Random shearing
    zoom_range=0.2,        # Random zoom
    horizontal_flip=True,  # Random horizontal flip
    validation_split=0.2   # Split into train/validation
)

# Load training data
train_generator = train_datagen.flow_from_directory(
    directory='./dataset/training_set/',
    target_size=(150, 150),  # Resize images
    batch_size=32,
    class_mode='binary',     # Binary classification (cat/dog)
    subset='training'        # Use this for training
)

# Load validation data (same directory, different subset)
val_generator = train_datagen.flow_from_directory(
    directory='./dataset/training_set/',
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary',
    subset='validation'      # Use this for validation
)
```

📖 **Docs:** [ImageDataGenerator](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator)  

---

## 🔄 Alternative: `image_dataset_from_directory` (TF 2.6+)  

A newer, more efficient way to load images (without augmentation).  

### 🔹 **Key Differences:**  
✅ **Faster** (uses `tf.data.Dataset`).  
✅ **No augmentation** (use `layers.RandomFlip`, `RandomRotation` instead).  
✅ **Better for large datasets** (lower memory usage).  

### 🔹 **Example:**  
```python
from tensorflow.keras.utils import image_dataset_from_directory

train_ds = image_dataset_from_directory(
    directory='./dataset/training_set/',
    labels='inferred',       # Auto-label from subdirs
    label_mode='binary',     # Binary classification
    image_size=(150, 150),
    batch_size=32,
    validation_split=0.2,
    subset='training',
    seed=123
)

val_ds = image_dataset_from_directory(
    directory='./dataset/training_set/',
    label_mode='binary',
    image_size=(150, 150),
    batch_size=32,
    validation_split=0.2,
    subset='validation',
    seed=123
)
```

📖 **Docs:** [image_dataset_from_directory](https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory)  

---

## 💡 **Key Tips for Students**  

### 🔸 **1. `ImageDataGenerator` vs. `image_dataset_from_directory`**  
| Feature | `ImageDataGenerator` | `image_dataset_from_directory` |
|---------|----------------------|--------------------------------|
| **Augmentation** | ✅ Built-in | ❌ (Use TF layers) |
| **Speed** | Slower | Faster (tf.data) |
| **Memory** | Higher | Lower |
| **Best for** | Small datasets + augmentation | Large datasets |

### 🔸 **2. When to Use `validation_split`?**  
- If you don’t have a separate validation set, use `validation_split=0.2` (20% validation).  
- Requires **same seed** for train/val generators.  

### 🔸 **3. Preprocessing Steps**  
✔ **Rescale images** (`1./255` for normalization).  
✔ **Shuffle data** (`shuffle=True` in `flow_from_directory`).  
✔ **Use correct `class_mode`** (`'binary'` for 2 classes, `'categorical'` for >2).  

---

## 🚀 **Next Steps**  
1. **Train a CNN** (e.g., `Sequential` model with `Conv2D`, `MaxPooling2D`).  
2. **Evaluate on test set** (using `model.evaluate(test_generator)`).  
3. **Deploy on Heroku** (Flask/Django + TensorFlow Serving).  

Good luck! 🐕🐈 Let’s see if your model can outsmart a cat! 😼