Augmentation refers to the process of generating new training samples by applying various transformations to the existing data. This technique is particularly useful in deep learning and machine learning, as it helps increase the diversity of the training data without actually collecting new data. The goal of augmentation is to improve the model's generalization ability by introducing variations that the model may encounter in real-world scenarios.

In [5]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# import wget
import os
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import load_img, img_to_array, ImageDataGenerator
from sklearn.metrics import classification_report, log_loss, accuracy_score
from sklearn.model_selection import train_test_split

In [11]:
directory = "R:/AI Ml/.ipynb_checkpoints/FruitsData/fruits/train/train"
Name = []
for file in os.listdir(directory):
    Name+=[file]

# print(Name) -->  in Name all folders are stored it's represent as Fruits
fruit_map = dict(zip(Name,[i for i in range(len(Name))]))
# This is dictionary in which each filder ( fruit ) is listed as index of folder
print(fruit_map)

r_fruit_map = dict(zip([i for i in range(len(Name))],Name))
print(r_fruit_map)
# os.listdir(directory)

['Apple Braeburn', 'Apple Granny Smith', 'Apricot', 'Avocado', 'Banana', 'Blueberry', 'Cactus fruit', 'Cantaloupe', 'Cherry', 'Clementine', 'Corn', 'Cucumber Ripe', 'Grape Blue', 'Kiwi', 'Lemon', 'Limes', 'Mango', 'Onion White', 'Orange', 'Papaya', 'Passion Fruit', 'Peach', 'Pear', 'Pepper Green', 'Pepper Red', 'Pineapple', 'Plum', 'Pomegranate', 'Potato Red', 'Raspberry', 'Strawberry', 'Tomato', 'Watermelon']
{'Apple Braeburn': 0, 'Apple Granny Smith': 1, 'Apricot': 2, 'Avocado': 3, 'Banana': 4, 'Blueberry': 5, 'Cactus fruit': 6, 'Cantaloupe': 7, 'Cherry': 8, 'Clementine': 9, 'Corn': 10, 'Cucumber Ripe': 11, 'Grape Blue': 12, 'Kiwi': 13, 'Lemon': 14, 'Limes': 15, 'Mango': 16, 'Onion White': 17, 'Orange': 18, 'Papaya': 19, 'Passion Fruit': 20, 'Peach': 21, 'Pear': 22, 'Pepper Green': 23, 'Pepper Red': 24, 'Pineapple': 25, 'Plum': 26, 'Pomegranate': 27, 'Potato Red': 28, 'Raspberry': 29, 'Strawberry': 30, 'Tomato': 31, 'Watermelon': 32}
{0: 'Apple Braeburn', 1: 'Apple Granny Smith', 2: 

Split the images into train, validation, test sets
* Perform data augmentation by using ImageDataGenerator so that we can acquire more relevant data from the existing images by making minor alterations to the dataset.

* rescale=1.0/255.0: This scales the pixel values of the images from the range [0, 255] to the range [0, 1]. It is a common preprocessing step in deep learning to normalize the input data.

* vertical_flip=True: This allows vertical flipping of the images as part of the data augmentation process.

* horizontal_flip=True: This allows horizontal flipping of the images as part of the data augmentation process.

* rotation_range=40: This randomly rotates the images within the specified range of degrees (0 to 40 degrees) as part of the data augmentation process.

* width_shift_range=0.2: This randomly shifts the images horizontally by a fraction of the total width (20% in this case) as part of the data augmentation process.

* height_shift_range=0.2: This randomly shifts the images vertically by a fraction of the total height (20% in this case) as part of the data augmentation process.

* zoom_range=0.1: This randomly zooms in and out on images by up to 10% as part of the data augmentation process.
* validation_split=0.2: This specifies that 20% of the data will be used for validation.

* target_size=(100, 100): This resizes all images to 100x100 pixels. This is the size that the neural network will process.

* batch_size=32 for training and batch_size=16 for validation and testing: This is the number of images to be yielded from the generator per batch.

* class_mode='categorical': This means the labels are one-hot encoded. It is used for multi-class classification.

* subset='training' or subset='validation': Specifies whether the generator is used for the training or validation subset. This requires the ImageDataGenerator to have a validation_split parameter set.

* shuffle=True: This means the data will be shuffled at each epoch, which is useful for training to prevent the model from seeing the same order of data at each epoch. For the test set, shuffle=False is often used to maintain the order for evaluation.

In [13]:
# This generator will read images from the specified directory, apply augmentations, and 
# yield batches of 32 images with their labels for training.
img_data_gen = ImageDataGenerator(rescale=1./255,
vertical_flip = True,
horizontal_flip = True,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
zoom_range = 0.1,
validation_split = 0.2
)

test_data_gen = ImageDataGenerator(rescale=1./255)

In [16]:
train_generator = test_data_gen.flow_from_directory(directory,
shuffle = True,
batch_size = 32,
subset = 'training',
target_size=(100,100))

valid_generator = img_data_gen.flow_from_directory(directory,
shuffle=True,
batch_size=16,
subset='validation',
target_size=(100,100))

Found 16623 images belonging to 33 classes.
Found 3314 images belonging to 33 classes.


In [None]:
# masking, igon vector, filters, transformation vectors these are used to divide data into layers