# objective

This report presents an overview of the development of a deep learning model tailored for the task of fruit classification. We begin with the acquisition of diverse and extensive datasets from the renowned platform, Kaggle.com. These datasets serve as the foundational blocks for training our model.

Leveraging the powerful and widely-used TensorFlow library, we design and refine our deep learning model. TensorFlow's advanced capabilities enable us to construct a model that is accurate in its predictive abilities and also efficient in processing.

A key highlight of this project is the integration of our model into a user-friendly web application, developed using Streamlit. This application stands as the interface between the model and its users, offering a seamless and interactive experience. Users can effortlessly upload an image of a fruit onto the web page, where our model swiftly analyzes the image and delivers its prediction. The code is available in our github repository, sadly we didn't have the occadion to deploy in on a container.

The report delves into each of these aspects in detail, outlining the model's architecture, the data preparation process, training methodologies, and the deployment strategy for the web application. Our goal is to provide a transparent and thorough understanding of the steps involved in bringing this fruit classification model from concept to reality.

Our code can be found at : 
[Fruit Classification GitHub](https://github.com/aparru33/DeepLearning/tree/main/Fruit_classification)

# Dataset creation

## Dataset Acquisition Strategy

There are multiple strategies for acquiring a dataset. Given the time and resources required to create a dataset from scratch, we opted to utilize existing datasets. To this end, we selected four datasets from Kaggle, each chosen for its diversity, relevance and good credibility because being sourced :

1. [Fruits 262 Dataset](https://www.kaggle.com/datasets/aelchimminut/fruits262)
2. [Fruit and Vegetable Image Recognition](https://www.kaggle.com/datasets/kritikseth/fruit-and-vegetable-image-recognition)
3. [Fruit Recognition Dataset](https://www.kaggle.com/datasets/sshikamaru/fruit-recognition?select=train)
4. [Fruits Dataset Images](https://www.kaggle.com/datasets/shreyapmaher/fruits-dataset-images)

By integrating multiple datasets, we enhance the adaptability of our model and test its robustness against a variety of images, while also expanding the range of fruit categories available for classification..

## Image Selection Strategy

Regarding image selection, we faced two options: using images with a single fruit per image, or images with multiple fruits of the same category. The former simplifies the task, reducing the risk of model confusion due to overlapping fruits or complex backgrounds, and necessitates a less complex training process and architecture. The latter, although more complex and requiring a more elaborate architecture and extended training, more accurately mirrors real-world scenarios where multiple fruits may be present in a single image. In order to use ResNet and represent input image, we choose image of size greater than 224x224 as it should have sufficient details without needed huge computation time. 
In the same optics we choose to restrain the dataset to less 31 categories. Although we use  techniqes to ensure good aspect ratios when resizing images, we excluded images where the width-to-height or height-to-width ratio exceeds 2. This helps avoid excessive distortion during the resizing process..

We adopted a hybrid approach, combining both single and multiple fruit images, to leverage the strengths of each method. This approach aligns with our diverse dataset selection, further enhancing the robustness of the model.

## Data Augmentation and Dataset Composition

Our final dataset includes a variety of images differing in the number of fruits, their arrangements, backgrounds, and lighting conditions.

In our training set, we included 800 images for each fruit category. For the test set, we applied the Pareto principle, resulting in 200 images per category. Data augmentation, involving image rotation, flips, and slight shifts in width and height (without altering the fruit's proportions), was employed where necessary to achieve the desired number of images for each category. We keep catgory with enough image in it so no image are augmented more than 2 times. This approach ensures balanced representation across all fruit categories.

The complete dataset, including details of its composition and augmentation, can be accessed at the following link:
[Train set](https://kaggle.com/datasets/936f6e568e36965f48e61129b297ef3f1065d1b031ae8728c8236e0fa08bc862)
[Test set](https://kaggle.com/datasets/e94ae09478bc72132eeb1549170d531ad0bc0bb37528531abd08554e1247d872)


# CNN architectures

To design, train and test our models, we use the python language with the tensorflow-cpu library running on an Ubuntu 20.04.5 LTS x86_64 operating system with 30 CPU Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz. The library used are in the requirement.txt file on our GitHib repository.

## First simple model


We first development an initial, naive deep learning model for fruit image classification. This model serves as a precursor and training for ourselves before using sophisticated architectures like ResNet and EfficientNet, providing a baseline for comparison and insight into the challenges of the task.

**Model Architecture:**
The model architecture is a straightforward CNN, focusing on fundamental techniques without the complexities of advanced architectures. It consists of several convolutional layers (Conv2D), each followed by a ReLU activation function. We use ReLu because it is the simplest activation function. Convolutional layers use filters of a specified size (e.g., 3x3 or 5x5) with appropriate stride values, typically set to 1. Padding is employed in these layers to preserve the spatial dimensions of the output feature maps.

After each convolutional layer, batch normalization is applied to stabilize learning and improve convergence rates.

Following convolutional blocks, max pooling layers (MaxPooling2D) are used to reduce the spatial dimensions of the feature maps, effectively compressing the learned features. Pooling layers typically have a pool size of 2x2 and a stride of 2.

The flattened output from the convolutional base feeds into a dense layer network, culminating in a softmax activation function for classification. The dense layers serve as fully connected layers to interpret the features extracted by the convolutional base and the softmax is used to get the probability for each category of our dataset.

**Training Configuration:**

The training utilizes a batch size of 256 over a limited fixed number of epochs 10, considering the model’s simplicity. The input is the dataset created bedore with an input shape of 264x264 with 3 channels, one for each color.

A standard optimizer like SGD or Adam is used, with a fixed learning rate without sophisticated scheduling or adaptive rate mechanisms.

As our problem is a categorical one with more than two, the categorical cross-entropy loss function is employed, suitable for multi-class classification tasks.

**Model Evaluation and Performance:**

The model is evaluated on a validation dataset, with performance metrics including accuracy, precision, recall, and F1-score being recorded.

Given its fundamental nature, the model's performance is expected to be lower than that achieved by more complex architectures and it is with a weighted precision and recall of respectively 0.39 and 0.37 over the test set designed before.

**Conclusion and Future Work:**

This baseline model establishes an initial understanding of the classification task's complexity and the performance achievable with basic CNN architectures.

Future developments will focus on incorporating advanced features such as deeper layer stacks, residual connections, and scalable architectures. Experimentation with different optimization techniques, learning rate schedulers, and extensive data augmentation strategies will also be explored to enhance model performance.

This first model allow us for a comprehensive understanding and optimization of the image classification solution. 

## ResNet

This part details the development and performance of a Residual Network (ResNet) based deep learning model, designed for the classification of fruit images. The model architecture is a variant of ResNet, a popular convolutional neural network known for its efficacy in handling deep learning tasks, particularly in the field of image recognition.

**Model Architecture:**
The model employs the ResNet architecture with a depth determined by the formula `n * 6 + 2`, where 'n' is a configurable parameter. This depth calculation ensures the model has a sufficient number of layers to capture complex features in the image data while maintaining computational efficiency. We use a simple one with only n=2.

Key components of the model include:
- **Convolutional Layers (Conv2D)**: Utilized for feature extraction from images.
- **Batch Normalization**: Aids in stabilizing and speeding up the training process.
- **Activation Functions (ReLU)**: Used for introducing non-linearity into the model, allowing it to learn more complex patterns.
- **Average Pooling**: Reduces the spatial dimensions of the output from previous layers, summarizing the features.
- **Flattening**: Converts the 2D feature maps into a 1D feature vector, necessary for classification.
- **Dense Layer with Softmax Activation**: The final layer used for classifying the input image into one of the 31 fruit categories.

**Training Configuration:**
As for the simple model we use a batch size of 256 but we use much more epoch, 400 although it is not very high for such an architecture. We also add an early stopping criteria to monitor validation accuracy and then halting training when performance ceases to improve, thereby preventing overfitting.
Adam optimizer is used with a simple but dynamic learning rate, facilitating effective convergence to the minimum loss.
Again we use categorical cross-entropy
We also add callbacks to adjust the learning rate during training, optimizing the learning process.

**Data Preprocessing:**
- The model expects input images of size 224x224 pixels wich is standard for a ResNet model
- The dataset is divided into training and validation sets, with data augmentation applied to the training set to enhance model robustness.
- Pixel values are normalized for effective training.

**Model Evaluation and Performance:**
In the fit method we use post training with the validation_data parameter so the model is evaluated on a separate validation dataset to assess its generalization capabilities. The evaluation metrics include accuracy, among others, providing insight into the model’s performance. The model's architecture and training process are designed to maximize accuracy while minimizing the potential for overfitting.

**Conclusion:**
The ResNet model developed for fruit classification demonstrates a more sophisticated approach to handling a multi-class image classification task. Its architecture and training regimen are tailored to capture the intricate patterns in fruit images, thereby enabling accurate classification across multiple fruit categories. The use of advanced techniques like batch normalization, adaptive learning rates, and early stopping further enhances its performance and efficiency. As such the score we get are much better than the simple model with a weighted average precision of 0.63 and a weighted recall of 0.62. This score should be better if we have used a more deep and elaborated ResNet model.

**Future Work:**
Future iterations of the model could explore deeper architectures, alternative optimization algorithms, or more advanced data augmentation techniques to further improve classification accuracy and robustness. Additionally, testing the model on a more diverse and larger dataset could provide further insights into its scalability and effectiveness across different fruit varieties and image conditions. An other alternative is to use a pre trained model in the tensorflow library. We done that with the efficientNet architecture.