ML Challenge Winner 🎉🎉

Overview

This repository contains the code and documentation for the Optical Character Recognition (OCR) solution that won the AI Challenge hosted by FutureSmart AI. The challenge focused on developing innovative solutions for extracting data from complex charts and graphs in images or PDFs.

Shubham Mehla's solution demonstrated exceptional use of machine learning and computer vision techniques, achieving a remarkable 94.5% accuracy in extracting data from charts and graphs. The project showcased the effectiveness of advanced OCR methods in simplifying and automating the graph analysis process.

Highlights

Challenge Organizer: FutureSmart AI
Objective: Extract data from graphs and charts in images or PDFs using machine learning and rule-based methods.
Winning Approach: Developed a hybrid solution combining the LayoutLMv2 machine learning model with contour detection and rule-based methods for accurate data extraction from graphs.
Accuracy: Achieved a 94.5% accuracy rate in data extraction.
Link to Blog Post: Read detailed insights about the solution

Approach

The solution combined the following techniques:

Machine Learning (LayoutLMv2): Fine-tuned the LayoutLMv2 model for text classification in graph images, identifying key elements like titles, axes, and legends.
Contour Detection: Applied computer vision techniques to detect bars and other graph components for data extraction.
Rule-based Methods: Developed a rule-based approach to categorize graph data based on layout and structure.

Repository Contents

Code: The code for fine-tuning the LayoutLMv2 model and performing contour detection.
Dataset: A labeled dataset of graph images used for training the model (included in the repository).
Blog Post: A step-by-step breakdown of the solution, approaches used, and challenges overcome during the development.

For more information or questions, feel free to reach out to Shubham Mehla via LinkedIn.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
images		images
README.md		README.md
dataset.zip		dataset.zip
inference.ipynb		inference.ipynb
run_seq_labeling.py		run_seq_labeling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML Challenge Winner 🎉🎉

Overview

Highlights

Approach

Repository Contents

About

Uh oh!

Releases

Packages

Languages

ShubhamMehla3/Graph_Extractor

Folders and files

Latest commit

History

Repository files navigation

ML Challenge Winner 🎉🎉

Overview

Highlights

Approach

Repository Contents

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages