# objective

This report presents an overview of the development of a deep learning model tailored for the task of fruit classification. We begin with the acquisition of diverse and extensive datasets from the renowned platform, Kaggle.com. These datasets serve as the foundational blocks for training our model.

Leveraging the powerful and widely-used TensorFlow library, we design and refine our deep learning model. TensorFlow's advanced capabilities enable us to construct a model that is accurate in its predictive abilities and also efficient in processing.

A key highlight of this project is the integration of our model into a user-friendly web application, developed using Streamlit. This application stands as the interface between the model and its users, offering a seamless and interactive experience. Users can effortlessly upload an image of a fruit onto the web page, where our model swiftly analyzes the image and delivers its prediction.

The report delves into each of these aspects in detail, outlining the model's architecture, the data preparation process, training methodologies, and the deployment strategy for the web application. Our goal is to provide a transparent and thorough understanding of the steps involved in bringing this fruit classification model from concept to reality.

# Dataset creation

## Dataset Acquisition Strategy

There are multiple strategies for acquiring a dataset. Given the time and resources required to create a dataset from scratch, we opted to utilize existing datasets. To this end, we selected four datasets from Kaggle, each chosen for its diversity, relevance and good credibility because being sourced :

1. [Fruits 262 Dataset](https://www.kaggle.com/datasets/aelchimminut/fruits262)
2. [Fruit and Vegetable Image Recognition](https://www.kaggle.com/datasets/kritikseth/fruit-and-vegetable-image-recognition)
3. [Fruit Recognition Dataset](https://www.kaggle.com/datasets/sshikamaru/fruit-recognition?select=train)
4. [Fruits Dataset Images](https://www.kaggle.com/datasets/shreyapmaher/fruits-dataset-images)

By integrating multiple datasets, we enhance the adaptability of our model and test its robustness against a variety of images, while also expanding the range of fruit categories available for classification..

## Image Selection Strategy

Regarding image selection, we faced two options: using images with a single fruit per image, or images with multiple fruits of the same category. The former simplifies the task, reducing the risk of model confusion due to overlapping fruits or complex backgrounds, and necessitates a less complex training process and architecture. The latter, although more complex and requiring a more elaborate architecture and extended training, more accurately mirrors real-world scenarios where multiple fruits may be present in a single image. In order to use ResNet and represent input image, we choose image of size greater than 224x224 as it should have sufficient details without needed huge computation time. 
In the same optics we choose to restrain the dataset to less than 50 category. Although we use  techniqes to ensure good aspect ratios when resizing images, we excluded images where the width-to-height or height-to-width ratio exceeds 2. This helps avoid excessive distortion during the resizing process..

We adopted a hybrid approach, combining both single and multiple fruit images, to leverage the strengths of each method. This approach aligns with our diverse dataset selection, further enhancing the robustness of the model.

## Data Augmentation and Dataset Composition

Our final dataset includes a variety of images differing in the number of fruits, their arrangements, backgrounds, and lighting conditions.

In our training set, we included 800 images for each fruit category. We consider more likely that a user of the app will upload an image with a single fruit in it so we choose a ratio of 2/3 of such image and 1/3 of image with multiple fruit of the same category. For the test set, we applied the Pareto principle, resulting in 200 images per category. Data augmentation, involving image rotation, flips, and slight shifts in width and height (without altering the fruit's proportions), was employed where necessary to achieve the desired number of images for each category. This approach ensures balanced representation across all fruit categories.

The complete dataset, including details of its composition and augmentation, can be accessed at the following link:
[](#)


# CNN architectures