Skip to content

ShubhamMehla3/Graph_Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ML Challenge Winner πŸŽ‰πŸŽ‰

Overview

This repository contains the code and documentation for the Optical Character Recognition (OCR) solution that won the AI Challenge hosted by FutureSmart AI. The challenge focused on developing innovative solutions for extracting data from complex charts and graphs in images or PDFs.

Shubham Mehla's solution demonstrated exceptional use of machine learning and computer vision techniques, achieving a remarkable 94.5% accuracy in extracting data from charts and graphs. The project showcased the effectiveness of advanced OCR methods in simplifying and automating the graph analysis process.

Highlights

  • Challenge Organizer: FutureSmart AI
  • Objective: Extract data from graphs and charts in images or PDFs using machine learning and rule-based methods.
  • Winning Approach: Developed a hybrid solution combining the LayoutLMv2 machine learning model with contour detection and rule-based methods for accurate data extraction from graphs.
  • Accuracy: Achieved a 94.5% accuracy rate in data extraction.
  • Link to Blog Post: Read detailed insights about the solution

Approach

The solution combined the following techniques:

  1. Machine Learning (LayoutLMv2): Fine-tuned the LayoutLMv2 model for text classification in graph images, identifying key elements like titles, axes, and legends.
  2. Contour Detection: Applied computer vision techniques to detect bars and other graph components for data extraction.
  3. Rule-based Methods: Developed a rule-based approach to categorize graph data based on layout and structure.

Repository Contents

  • Code: The code for fine-tuning the LayoutLMv2 model and performing contour detection.
  • Dataset: A labeled dataset of graph images used for training the model (included in the repository).
  • Blog Post: A step-by-step breakdown of the solution, approaches used, and challenges overcome during the development.

For more information or questions, feel free to reach out to Shubham Mehla via LinkedIn.

About

ML Challenge Winner organised by FutureSmart AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published