# Data Augmentation

Data augmentation is the process of **artificially increasing the size and diversity of a dataset** by creating modified versions of existing data.

Why do we need it?
- Prevents overfitting
- Improves model generalization
- Useful when collecting new data is expensive

We will explore:
- Image augmentation
- Text augmentation (basic)

## 1. Image Data Augmentation
Using **Keras ImageDataGenerator** we can apply transformations such as:
- Rotation
- Flipping
- Zooming
- Shifting
- Shearing

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array

# Load sample image
img = load_img('https://upload.wikimedia.org/wikipedia/commons/9/99/Black_square.jpg', target_size=(150, 150))
x = img_to_array(img)
x = np.expand_dims(x, axis=0)

# Define augmentation generator
datagen = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

# Generate augmented images
i = 0
plt.figure(figsize=(8, 8))
for batch in datagen.flow(x, batch_size=1):
    plt.subplot(2, 2, i+1)
    plt.imshow(batch[0].astype('uint8'))
    i += 1
    if i == 4:
        break
plt.show()

## 2. Text Data Augmentation (Basic)
Text augmentation is trickier but common techniques include:
- Synonym replacement
- Random insertion/deletion
- Back translation
- Noise injection

In [None]:
import random

sentence = "Data augmentation helps improve model performance."
words = sentence.split()

# Simple synonym replacement (mock example)
synonyms = {"improve": "enhance", "model": "algorithm"}
augmented = [synonyms.get(w, w) for w in words]

print("Original:", sentence)
print("Augmented:", " ".join(augmented))

## ✅ Summary
- **Image augmentation** (rotation, flip, zoom) is widely used in CV.
- **Text augmentation** can involve synonyms, random noise, or translations.
- Data augmentation makes models more robust and prevents overfitting.