nvOCDR

nvOCDR is a C++ library for optical character detection and recognition. It is optimized for Nvidia devices with Nvidia software stack. This library consumes the TAO Toolkit trained OCDNet and OCRNet models for any OCR application. Whether you are building a surveillance system, a traffic monitoring application, or any other type of video analytics solution, the nvOCDR library is an essential tool for achieving accurate and reliable results. It can be easily integrated to any application requiring OCR ability.

Installation

Prerequisites

CUDA 11.4 or above
TensorRT 8.5 or above (To use ViT-based model, TensorRT 8.6 above is required.)
OpenCV 4.0 or above
Jetpack 5.1 or above on Jetson devices
Pretrained OCDNet and OCRNet model

Set up the development environment:

We suggest to start from TensorRT container:

On X86 platform:

docker run --gpus=all -v <work_path>:<work_path> --rm -it --privileged --net=host nvcr.io/nvidia/tensorrt:23.11-py3 bash
# install opencv
apt update && apt install -y libopencv-dev

On Jetson platform

docker run --gpus=all -v <work_path>:<work_path> --rm -it --privileged --net=host nvcr.io/nvidia/l4t-tensorrt:r8.5.2.2-devel bash
# install opencv
apt update && apt install -y libopencv-dev

Prepare the OCDNet and OCRNet model:

And then you could dowload the pretrained models of OCDNet and OCRNet with following instructions or train your own model (Please ref to TAO Toolkit documentation for how to train your own OCDNet and OCRNet. And there will be a vocabulary list named character_list.txt of OCRNet model when you download the PTM from NGC.

download the onnx models of OCDnet and OCRnet

mkdir onnx_models
cd onnx_models

# Download OCDnet onnx
wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocdnet/deployable_v1.0/files?redirect=true&path=dcn_resnet18.onnx' -O dcn_resnet18.onnx

mv dcn_resnet18.onnx ocdnet.onnx

# Download OCRnet onnx
wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocrnet/deployable_v1.0/files?redirect=true&path=ocrnet_resnet50.onnx' -O ocrnet_resnet50.onnx

mv ocrnet_resnet50.onnx ocrnet.onnx

# Download OCRnet character_list
wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocrnet/deployable_v1.0/files?redirect=true&path=character_list' -O character_list

mv character_list character_list.txt

# # Download command for ViT-based models:
# # Download OCDNet-ViT onnx
# wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocdnet/deployable_v2.0/files?redirect=true&path=ocdnet_fan_tiny_2x_icdar.onnx' -O ocdnet_fan_tiny_2x_icdar.onnx

# # Download OCRNet-ViT onnx
# wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocrnet/deployable_v2.0/files?redirect=true&path=ocrnet-vit.onnx' -O ocrnet-vit.onnx

# # Download OCRnet character_list
# wget --content-disposition 'https://api.ngc.nvidia.com/v2/models/org/nvidia/team/tao/ocrnet/deployable_v2.0/files?redirect=true&path=character_list' -O character_list

Compile the TensorRT OSS plugin libray (Optional):

Notes: If you're using TensorRT 8.6 and above, you can skip this step.

The OCDNet requires modulatedDeformConvPlugin for running with TensorRT

Get TensorRT OSS repository

git clone -b release/8.6 https://github.com/NVIDIA/TensorRT.git
cd TensorRT
git submodule update --init --recursive

Compile TensorRT libnvinfer_plugin.so:

mkdir build && cd build
# On X86 platform
cmake .. 
# On Jetson platform
# cmake .. -DTRT_LIB_DIR=/usr/lib/aarch64-linux-gnu/
make nvinfer_plugin -j4

Notes: You can use the helper script to compile TensorRT OSS.

Copy the libnvinfer_plugin.so to the system library path

cp libnvinfer_plugin.so.8.6.0 /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.5.1
# On Jetson platform:
# cp libnvinfer_plugin.so.8.6.0 /usr/lib/aarch64-linux-gnu/libnvinfer_plugin.so.8.5.2

Generate TensorRT engine:

Finally generate the TensorRT engine from trained OCDNet and OCRNet:

#Generate OCDNet engine with dynmaic batch size and max batch size is 4:
/usr/src/tensorrt/bin/trtexec --onnx=./ocdnet.onnx --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:4x3x736x1280 --fp16 --saveEngine=./ocdnet.fp16.engine

#Generate OCRNet engine with dynamic batch size and max batch size is 32:
/usr/src/tensorrt/bin/trtexec --onnx=./ocrnet.onnx --minShapes=input:1x1x32x100 --optShapes=input:32x1x32x100 --maxShapes=input:32x1x32x100 --fp16 --saveEngine=./ocrnet.fp16.engine

# #Generate engines for ViT-based models
# /usr/src/tensorrt/bin/trtexec --onnx=./ocdnet_fan_tiny_2x_icdar.onnx --minShapes=input:1x3x736x1280 --optShapes=input:1x3x736x1280 --maxShapes=input:1x3x736x1280 --fp16 --saveEngine=./ocdnet.fp16.engine

# /usr/src/tensorrt/bin/trtexec --onnx=./ocrnet-vit.onnx --minShapes=input:1x1x64x200 --optShapes=input:32x1x64x200 --maxShapes=input:32x1x64x200 --fp16 --saveEngine=./ocrnet.fp16.engine

Building

Clone the repository:

git clone https://github.com/NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution.git

Compile the libnvocdr.so:

cd NVIDIA-Optical-Character-Detection-and-Recognition-Solution
make
export LD_LIBRARY_PATH=$(pwd)

Usage

Use it in your application

To use nvOCDR in your C++ project, include the nvOCRD.h header file and link against the nvOCDR library. Here's an example code:

//test.cpp
#include <opencv2/opencv.hpp>
#include <cuda.h>
#include <cuda_runtime.h>
#include "nvocdr.h"

int main()
{

    // Init the nvOCDR lib
    // Please pay attention to the following parameters. You may need to change them according to different models.
    nvOCDRParam param;
    param.input_data_format = NHWC;
    param.ocdnet_trt_engine_path = (char *)"./ocdnet.fp16.engine";
    param.ocdnet_infer_input_shape[0] = 3;
    param.ocdnet_infer_input_shape[1] = 736;
    param.ocdnet_infer_input_shape[2] = 1280;
    param.ocdnet_binarize_threshold = 0.1;
    param.ocdnet_polygon_threshold = 0.3;
    param.ocdnet_max_candidate = 200;
    param.ocdnet_unclip_ratio = 1.5;
    param.ocrnet_trt_engine_path = (char *)"./ocrnet.fp16.engine";
    param.ocrnet_dict_file = (char *)"./character_list.txt";
    param.ocrnet_infer_input_shape[0] = 1;
    param.ocrnet_infer_input_shape[1] = 32;
    param.ocrnet_infer_input_shape[2] = 100;
    // uncomment if you're using attention-based models:
    // param.ocrnet_decode = Attention;
    nvOCDRp nvocdr_ptr = nvOCDR_init(param);

    // Load the input
    const char* img_path = "./test.jpg";
    cv::Mat img = cv::imread(img_path);
    nvOCDRInput input;
    input.device_type = GPU;
    input.shape[0] = 1;
    input.shape[1] = img.size().height;
    input.shape[2] = img.size().width;
    input.shape[3] = 3;
    size_t item_size = input.shape[1] * input.shape[2] * input.shape[3] * sizeof(uchar);
    cudaMalloc(&input.mem_ptr, item_size);
    cudaMemcpy(input.mem_ptr, reinterpret_cast<void*>(img.data), item_size, cudaMemcpyHostToDevice);

    // Do inference
    nvOCDROutputMeta output;
    nvOCDR_inference(input, &output, nvocdr_ptr);

    // Print the output
    int offset = 0;
    for(int i = 0; i < output.batch_size; i++)
    {
        for(int j = 0; j < output.text_cnt[i]; j++)
        {
            printf("%d : %s, %ld\n", i, output.text_ptr[offset].ch, strlen(output.text_ptr[offset].ch));
            offset += 1;
        }
    }

    // Destroy the resoures
    free(output.text_ptr);
    cudaFree(input.mem_ptr);
    nvOCDR_deinit(nvocdr_ptr);

    return 0;
}

You can compile the code with the command:

g++ ./test.cpp -I./include -L./ -I/usr/include/opencv4/ -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcudart -lopencv_core -lopencv_imgcodecs -lnvocdr -o test

Use nvOCDR in DeepStream SDK

For more information on how to use nvOCDR in DeepStream, see the documentation.

Use nvOCDR in Triton

For more information on how to use nvOCDR in Triton, see the documentation.

Use OCRNet with attention module

The ViT-based OCRNet models released on NGC (deployable 2.0 and deployable 2.1) come with attention module which require attention decoding method. One can enable attention decoding by the following steps:

In C++ application:

nvOCDRParam param;
param.ocrnet_decode = Attention;

In DeepStream:

customlib-props="ocrnet-decode:Attention"

In Triton (in models/nvOCDR/spec.json):
```
"ocrnet_decode": "Attention"
```

API Reference

For more information about nvOCDR API, see the API reference

License

By cloning or downloading nvOCDR, you agree to terms of the nvOCDR EULA.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
c++_samples		c++_samples
deepstream		deepstream
doc		doc
include		include
src		src
tools		tools
triton		triton
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

License

NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution

Folders and files

Latest commit

History

Repository files navigation

nvOCDR

Table of Contents:

Installation

Prerequisites

Set up the development environment:

Prepare the OCDNet and OCRNet model:

Compile the TensorRT OSS plugin libray (Optional):

Generate TensorRT engine:

Building

Usage

Use it in your application

Use nvOCDR in DeepStream SDK

Use nvOCDR in Triton

Use OCRNet with attention module

API Reference

License

About

Resources

License

Stars

Watchers

Forks

Languages