OCR-book-dataset

Simple project made using OpenCV and Tesseract with C++ / Qt framework to read screenshots from book pages and extract text content to be fetch into a dataset. The aim was to automate the process of taking notes manually and to implement an interface where the user can select book extracts to be saved.

The project involves python FastAPI web framework to provide an api and a simple frontend client built with React and Nginx web server. Qt offers a GUI to process and translate image to text to generate new entries to the dataset.

Demo

1. Default Client

2. OCR GUI while translating image selection to text

3. OCR GUI saving new book extract to database

4. Refreshing client to see effective change

Dependencies

API requirements

Docker 26.0.1+
Docker compose 2.26.1+

GUI requirements

C++ 11.3.0+
CMake 3.16+
Qt 6.3+
Tesseract 4.0+
OpenCV 4.0+

Installation

This repository is splitted upon two main directories. The api folder containing all of our frontend and backend services, while the ocr folder gathers all files related to our gui :

.
├── api
│   ├── backend
│   ├── db
│   └── frontend
└── ocr
    └── gui

Build and run the API

First we need build our backend and frontend services. To do so, a docker-compose.yml file is present inside the api directory. To automatically generate our build images and run the necessary containers :

cd api && docker compose up --build

Once our containers are running, we can access both our frontend and backend services.

Frontend (Default view)

Running at : http://localhost:3000

Backend (Interactive API docs)

Running at : http://localhost:8989/docs

FastAPI uses Swagger UI to generate an interactive documentation to visualize and interact with the api and its relied dataset.

Note: By default, the current project is shipped with mysql database schema and minimal dataset using mysql docker container.

Build the OCR (GUI)

Note : the gui part of the project has been developed on an x86_64 cpu architecture using Ubuntu 22.04 operating system. All the following steps will describe the process of building the project on that specific architecture and setup only.

OpenCV and Tesseract libraries installation

As our gui was build using OpenCV and Tesseract, we first need to install the dependencies following those instructions :

Qt Creator installation

The simplest way to build the project as configured, is to use Qt Creator to generate the final executable. Installing Qt Creator will automatically install all necessary dependencies to manage it.

Technology Stack

Backend

FastAPI : Python Web framework to build APIs
SQLAlchemy : SQL toolkit and ORM (for database interactions)
Pydantic : Data validation and settings management
Uvicorn : ASGI web server

Frontend

React : frontend Javascript Framework
React-Bootstrap : Bootstrap frontend components (wip)
Nginx : HTTP web server

GUI

Qt : Cross-Platform application development framework for desktop, embedded and mobile
QML : Multi-Paradigm Language for creating highly dynamic applications in Qt
QtQuick : Standard library for writing QML applications
OpenCV : Open Source Computer Vision Library
Tesseract : Open Source OCR Engine

Deployment

Docker : open platform to build, ship, and run distributed applications
Docker compose : Define and run multi-container applications with Docker

Releases

Work In Progress...

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
api		api
img		img
ocr		ocr
LICENSE		LICENSE
README.md		README.md

License

aperlini/OCR-book-dataset

Folders and files

Latest commit

History

Repository files navigation