Flipkart GRID 6.0 Robotics Track: Smart Vision

This repository contains two Python scripts, backend.py and gui.py, that work together to detect, classify, and extract details from product images using YOLO, OCR, and a freshness detection AI model. The GUI allows users to select images and view extracted product details.

Overview

`backend.py`

This script handles the core processing, including:

YOLO Detection: Uses the YOLOv8 model to detect objects in provided images and extracts the detected region.
OCR Processing: Uses pytesseract to extract text from the images. This raw OCR text data is then processed using an NLP model google's gemini
Freshness Detection: A neural network model is used to estimate the freshness score of certain detected classes (like fruits or vegetables).
Data Augmentation: Images are preprocessed using transformations, including resizing, color jittering, and noise addition.

`gui.py`

This script provides a graphical user interface (GUI) for the project:

Allows users to select four images (front, back, and sides).
Saves the selected images and passes them to backend.py for processing.
Displays the detected class and extracted details in the GUI.
Saves the results in product_details.txt.

Installation

Clone the repository:

git clone https://github.com/yourusername/product-detail-extraction.git
cd product-detail-extraction

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required dependencies:
```
pip install -r requirements.txt
```
Ensure that pytesseract is correctly installed and set up on your system. Refer to the Tesseract installation guide if needed.
Set up the .env file with your Gemini API key:
```
GEMINI_API_KEY=your_api_key_here
```

Usage

Run the GUI:
```
python gui.py
```
Use the GUI to select four images (front, back, side1, side2) of a product.
The backend will process the images, detect the product class, and extract details. Results will be displayed in the GUI and saved to product_details.txt.

Modules Used

Python Libraries

torch, torchvision: For loading and using the freshness detection model.
ultralytics: To load the YOLOv8 model and perform object detection.
pytesseract: For extracting text from the detected regions.
numpy: For numerical operations.
Pillow (PIL): For image manipulation, including loading and saving images.
google-generativeai: To interface with the Gemini API for parsing OCR results.
python-dotenv: For loading environment variables like the Gemini API key.
tkinter: For creating the graphical user interface.

Model Training

The YOLO detection and freshness indicator models are custom trained models. For more info about training of model or the dataset refer to these links

YOLO Detection

Colab Notebook : here
Dataset: custom dataset

Freshness Detection

Colab Notebook : here
Dataset: kaggle dataset

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
input		input
ocr		ocr
.gitignore		.gitignore
GRID6_detection_new.pt		GRID6_detection_new.pt
README.md		README.md
backend.py		backend.py
freshness_detection.pt		freshness_detection.pt
gui.py		gui.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Flipkart GRID 6.0 Robotics Track: Smart Vision

Table of Contents

Overview

`backend.py`

`gui.py`

Installation

Usage

Modules Used

Python Libraries

Model Training

YOLO Detection

Freshness Detection

About

Uh oh!

Releases

Packages

Languages

Roteshkumar/Smart_Vision

Folders and files

Latest commit

History

Repository files navigation

Flipkart GRID 6.0 Robotics Track: Smart Vision

Table of Contents

Overview

backend.py

gui.py

Installation

Usage

Modules Used

Python Libraries

Model Training

YOLO Detection

Freshness Detection

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`backend.py`

`gui.py`

Packages