- OCR (Optical Character Recognition)
- Key Word Search
- Intelligent Search (Contextual Search)
Here is the guide for app installation in your own environments.
-
Download the Sentence Transformers model Artefacts from here.
-
Download the Layout Parser Model from here.
-
Create a new virtual environment
$ conda create --name <env-name> python=3.7.5 $ # specify a convienient name for <env-name> as for the new env
-
To activate the created environment
$ conda activate <env-name> $ # replace the specified name with <env-name>
Follow the below steps for the installation.
-
Clone the repository:
$ git clone https://github.com/engenuityai/lucid-engen-server.git
or download the zip file from the repository & extract it.
Then navigate to the project root folder.
$ cd lucid-engen-server
-
Copy the Downloaded Sentence Transformer Model Artefacts and place inside the
model
folder inintelligent_search_module
directory. -
Copy the Downloaded Layout Parser Model and place inside the
ocr
directory. -
Install the Requirements:
$ pip install -r requirements.txt
-
Run the App on localhost:
$ flask run $ # You can change to any specific host and port to serve the app
-
App will be Running at: http://127.0.0.1:5000
Endpoint for the Keyword Search http://127.0.0.1:5000/keyword-search
Endpoint for the Intelligent Search http://127.0.0.1:5000/intelligent-search
The project code base structure is as below:
< PROJECT ROOT >
|
|-- assets/ # Folder to store input images
|
|-- intelligent_search_module/ # Module for Intelligent Search
| |-- model/ # Folder containing all the Model Artefacts for Sentence Transformer
| |-- __init__.py # Module Initialization
| |-- bert_process.py # Sentence Transformer Operations for Intelligent Search
|
|
|-- keyword_search_module/ # Module for Keyword Search
| |-- __init__.py # Module Initialization
| |-- ocr_word_search_complete.py # Keyword Search Operations
|
|
|-- ocr/ # Module for OCR Operations
| |-- __init__.py # Module initialization
| |-- img_gen.py # Image generations
| |-- Layout_det.py # Layout Parsing
| |-- pyteseract_para_ocr_bb.py # Paragraph Level OCR Processors
| |-- Pytesseract_table_ocr_bb.py # Table Level Operations
| |-- run_ocr.py # Run Tasks Operations
| |-- test_07_14.h5 # Layout Parser Model
|
|
|-- utils/ # Support Functions
| |-- __init__.py # Module initialization
| |-- cleaning.py # Text data preprocessing/postprocessing
| |-- generate.py # Result generation operations
|
|
|-- intelligent_search_output/ # Output Result Directory for Intelligent Search
| |-- csv # Contains reslted csv files
| |-- pics # Contains generated images
| |-- result_pdfs # Contains final pdf outputs
|
|
|-- keyword_search_output/ # Output Result Directory for Keyword Search
|
|
|-- requirements.txt # Requirements, all the required dependencies
|-- .flaskenv # Flask environment configurations
|-- Dockerfile # Dockerfile Script for the App
|
|-- app.py # Setup App Configuration
|-- main.py # Main App Starter - WSGI gateway
|
|-- ************************************************************************