This repository contains the code and documentation for the Optical Character Recognition (OCR) solution that won the AI Challenge hosted by FutureSmart AI. The challenge focused on developing innovative solutions for extracting data from complex charts and graphs in images or PDFs.
Shubham Mehla's solution demonstrated exceptional use of machine learning and computer vision techniques, achieving a remarkable 94.5% accuracy in extracting data from charts and graphs. The project showcased the effectiveness of advanced OCR methods in simplifying and automating the graph analysis process.
- Challenge Organizer: FutureSmart AI
- Objective: Extract data from graphs and charts in images or PDFs using machine learning and rule-based methods.
- Winning Approach: Developed a hybrid solution combining the LayoutLMv2 machine learning model with contour detection and rule-based methods for accurate data extraction from graphs.
- Accuracy: Achieved a 94.5% accuracy rate in data extraction.
- Link to Blog Post: Read detailed insights about the solution
The solution combined the following techniques:
- Machine Learning (LayoutLMv2): Fine-tuned the LayoutLMv2 model for text classification in graph images, identifying key elements like titles, axes, and legends.
- Contour Detection: Applied computer vision techniques to detect bars and other graph components for data extraction.
- Rule-based Methods: Developed a rule-based approach to categorize graph data based on layout and structure.
- Code: The code for fine-tuning the LayoutLMv2 model and performing contour detection.
- Dataset: A labeled dataset of graph images used for training the model (included in the repository).
- Blog Post: A step-by-step breakdown of the solution, approaches used, and challenges overcome during the development.
For more information or questions, feel free to reach out to Shubham Mehla via LinkedIn.