New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat : Support Document AI in WasmEdge #2356
Comments
Document AI TasksGeneral Pre-requisites and Common Pre-processing Functions
Document AI Tasks and Model SelectionDocument AI multimodal models pre-train text, layout and image in a multi-modal framework using large-scale unlabeled scanned/digital-born documents. These models are then used in visually-rich downstream document understanding tasks by fine-tuning them on the task-respective labeled benchmark dataset. The following table outlines the tasks, datasets and corresponding models to be supported in this project.
Discussion Topics
|
Week 1-2 Progress UpdateIn order to achieve the first main goal of integrating Document AI, compiled a Rust Wrapper to Tesseract to WebAssembly using WasmEdge Plugins to perform OCR on images
Thus, the tesseract OCR can now be compiled to WASM from the rust code, the rough test code for which I have uploaded at https://github.com/sarrah-basta/wasmedge_ai_testing/blob/main/rusty-tesseract-wasm/README.md#build-instructions-to-build-the-wrapper . Week 2-3 Plan
|
Thank you! |
Week 3 Progress UpdateIn order to create the next main pre-processing block, worked on creating a tokenizer using the Rust Tokenizers Core and compiled to WebAssembly to tokenized text given by OCR
I was able to create this Rust Code compiled with WasmEdge to solve the first two parts, i.e create the correct tokenizer and obtain the encodings. Note : I tested this by using the words obtained via OCR from the HuggingFaces ImageProcessor, this will later be replaced by wasm implementation of tesseract created earlier as well. Week 4 Plan
|
Thank you so much for the update! I just want to clarify that you have created no additional host functions / plugins. You got the entire OCR program working inside WasmEdge (Rust compiled into Wasm). Is that correct? Thanks! |
Yes @juntao , that's correct. While I originally thought this would be needed by leveraging the C Api of tesseract, instead, since Tesseract has a command line functionality which can be used by simply installing the pre-built binaries, I decided to leverage that instead.
Hence, yes the entire program now works inside WasmEdge, I did however have to make the use of a plugin : wasmedge_process_interface to be able to use the Command Line functionality of the native operating system (which WasmEdge is running on) while the user's Wasm is being executed on WasmEdge . Hope this clears the need and functioning, thank you ! P.S. Pytesseract, the python wrapper to Tesseract, used in most AI applications I am referring, also uses an identical approach. |
Week 4 & 5 Progress UpdateIn order to create the modular end-to-end pipeline of the LayoutLMv2ModelForTokenClassification (currently using the temporary CLI based method for Tesseract OCR), created the following preprocessing functions to get the inputs required by themodel in the correct Tensor formats.
Next, I obtained the required model with the fine-tuned weights and traced it in Python to convert it to TorchScript in the function The code for these preprocessing functions is at https://github.com/sarrah-basta/wasmedge_ai_testing/tree/main/layoutlmv2_model . Errors currently facing[2023-04-16 20:59:08.292] [error] [WASI-NN] Only F32 inputs and outputs are supported for now.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: NnErrno { code: 1, name: "INVALID_ARGUMENT", message: "" }', src/main.rs:107:9
stack backtrace:
[2023-04-16 20:59:08.293] [error] execution failed: unreachable, Code: 0x89
[2023-04-16 20:59:08.293] [error] In instruction: unreachable (0x00) , Bytecode offset: 0x001a8568
[2023-04-16 20:59:08.293] [error] When executing function name: "_start" This error is being caused due to this check in the source code plugins/wasi-nn however all the tensors created by me for the inputs are of the correct types. Possible solutionsI am currently a little stuck and would require some guidance on how I should approach this further, is there some reason for only supporting F32 Tensors in the WASI-NN plugin for Pytorch Bckend, and if yes, is there any way to change the expectations of the TorchScript or PyTorch model ? Hopefully @juntao can give some insight. Week 6 Plan
Hence, I have been the C API to get identical results, and while I wait for some guidance on the above issue, I will go ahead with creating a host function with the Rust plugin SDK to call functions, after registering the tesseract C API as a WasmEdge plugin (similar to https://github.com/WasmEdge/WasmEdge/blob/master/examples/plugin/get-string/getstring.cpp ) |
@q82419 According to the investigation by @sarrah-basta, the |
Week 6 & 7 Progress UpdateIn order to create the OCR solution using the Tesseract API to avoid the CLI dependency of the command line plugin that breaks the Wasm sandbox in a very unpredictable ways, I created a :
The basic flow of the created codes are as follows : The Rust Library contains
This length is then passed to another plugin function This String is then parsed and each detection made is fed into a
A vector of such structs is returned by the public I will be using it in the layoutlmv2 model created earlier. The WasmEdge Plugin Wasi-OCR contains the two plugin functions described above and the necessary functions to register it as a module in the following file structure
The Tesseract API is created when the environment is created and destroyed at the end of the call of the Dependencies and Install Instructions for the pluginThe Tesseract API has two dependencies which can be installed as follows : sudo apt install tesseract-ocr
sudo apt-get install libleptonica-dev More detailed instructions can be found at https://tesseract-ocr.github.io/tessdoc/Installation.html but only the above two libraries are necessary. They are then linked via the CMakeLists. Building WasmEdge with the plugin
Week 8 PlanSince the concern for the CLI depependency is now solved, this week I can focus on
These two models and the preprocessing functions already created will be used for 4 different Document AI tasks outlined in the first comment in this issue. Once the inferencing is (hopefully) successfully) done, the last 4 weeks should be spent creating the post processing functions and packaging the code written. |
Week 8 Progress Update
Week 9 Plan
These two models and the preprocessing functions already created will be used for 4 different Document AI tasks outlined in the first comment in this issue. Once the inferencing is (hopefully) successfully done, the last 4 weeks should be spent creating the post processing functions and packaging the code written. |
Week 9 Progress Update
Week 10 Plan
|
Motivation
The Hugging Face Hub provides a platform hosting a collection of pre-trained models, datasets, and demos of machine learning projects. This blog by them gives a concise overview of the SOTA available models for Document AI, which includes many data science tasks from Optical Character Recognition (OCR) , Document Image Classification, Document Layout Analysis, Document Parsing and Document Visual Question Answering.
WasmEdge would like to enable easy integration of these Document AI tasks in WasmEdge applications by creating the necessary pre- and post-processing functions in Rust and using the fine-tuned models available on the Hugging Face Model Hub.
Details
Document AI tasks use multimodal models, i.e models that can unify document text (using OCR) , layout (using tokens), and visual information (using spatial information from the image) in a single end-to-end framework that can learn cross-modal interactions. Each Document AI task has a description page that describes its expected output and the datasets for the task. The corresponding models fine-tuned for these datasets are available in Pytorch format, which is supported by the Wasi-NN plugin.
This project aims to
Milestones
The text was updated successfully, but these errors were encountered: