Skip to content

System Architecture Document

Duy Nguyen edited this page Jun 13, 2023 · 5 revisions

Introduction

The system architecture of our web service revolves around providing customers with an easy way to set up a streaming pipeline for processing various types of documents, including text and images. This document outlines the key components and technologies used in our system.

Architecture Overview

Our system architecture consists of the following components:

1. API:

  • Language: Go
  • Framework: Gorilla Mux
  • Purpose: Exposes RESTful endpoints to interact with the system.
  • Responsibilities: Handles incoming requests, performs validation, and interacts with other components.

2. Dashboard:

  • Language: JavaScript
  • Framework: Vue.js
  • Purpose: Provides a user-friendly interface for managing the streaming pipeline and monitoring document processing.
  • Responsibilities: Displays relevant information, allows users to configure pipeline components, and visualizes processing statistics.

3. Consumer:

  • Language: Python
  • Libraries: Confluent Kafka, Kafka-Kraft
  • Purpose: Consumes data from Kafka topics and processes documents in real-time.
  • Responsibilities: Receives messages from Kafka, performs document processing tasks (e.g., image pre-processing, OCR, field detection, text processing, lookup), and sends the processed data to the appropriate destination.

4. Data Storage:

  • Redis: Used for caching frequently accessed data, improving overall system performance.
  • MongoDB: Serves as the primary database for storing processed documents, metadata, and related information.

High-Level Architecture Diagram

                                    +-------------------+
                                    |      Internet     |
                                    +-------------------+
                                             |
                +----------------------------|-------------------------------------------+   
                |                            |                            K8S CLUSTER    |     
                |    +---------------------- |------------------------------+            |
                |    |                       |                     Node-1   |            |
                |    |            +----------|-----------+                  |            | 
                |    |            | Load Balancer (Nginx)|                  |            |
                |    |            +----------|-----------+                  |            |
                |    |                       |                              |            |
                |    |            +----------*-----------+                  |            |
                |    |            |                      |                  |            |
                |    |    +-------|-------+      +-------|-------+          |            |
                |    |    |   Vue.js      | <--> |      Go API   |---------------+       |  
                |    |    +---------------+      +-------|-------+          |    |       |
                |    +-----------------------------------|------------------+    |       |
                |                                        |                       |       |
                |    +-----------------------------------v------------------+    |       |
                |    |                                             Node-2   |    |       |
                |    |    +---|---+   +---|---+   +---------------+         |    |       |
                |    |    | Kafka |   | S3    |   |  MongoDB      |         |    |       |
                |    |    |       |   |       |   |---------------|         |    |       |
                |    |    |       |   |       |   |   Redis       |         |    |       |
                |    |    +-------+   +-------+   +---------------+         |    |       |
                |    |                                                      |    |       |
                |    +------------------------------------^-----------------+    |       |
                |                                         |                      |       |                             
                |    +------------------------------------|-----------------+    |       |
                |    |                                            Node-3    |<---+       |
                |    |    +-----------------+    +-----------------+        |            |
                |    |    | Consummer       |    | Consummer       |        |            |
                |    |    | Fied Detection  |    | Invoice         |        |            |
                |    |    +-----------------+    +-----------------+        |            |
                |    |    | CF1 | CF2 | CF3 |    | CI1 | CI2 | CI3 |        |            |
                |    |    +-----------------+    +-----------------+        |            |
                |    +------------------------------------------------------+            |
                |                                                                        |
                +------------------------------------------------------------------------+

Detailed Component Descriptions

1. API

The API component is built using Go programming language and utilizes the Gorilla Mux framework to handle incoming requests. It provides a RESTful interface for clients to interact with the system, enabling them to configure the streaming pipeline, monitor document processing, and retrieve processed data.

2. Dashboard

The dashboard component, developed with Vue.js, offers an intuitive and user-friendly interface for managing the streaming pipeline. It allows users to configure the components of the pipeline, view processing statistics, and monitor the system's health and performance. The dashboard communicates with the API to retrieve relevant data and update the configuration.

3. Consumer

The consumer component, implemented in Python, integrates with Confluent Kafka and Kafka-Kraft. It consumes messages from Kafka topics and performs real-time document processing tasks. The consumer applies various components, such as image pre-processing, OCR, field detection, text processing, and lookup, to process the documents. The processed data is then sent to the appropriate destination, such as a database or an external system.

4. Data Storage

  • Redis: Redis is utilized as a caching mechanism within the system. It stores frequently accessed data, reducing the response time for subsequent requests and enhancing overall system performance.
  • MongoDB: MongoDB acts as the primary database for storing processed documents, metadata, and related information. It provides a flexible and scalable solution for managing document data in the system.

Conclusion

The system architecture presented here enables customers to easily set up and configure a streaming pipeline for processing various types of documents. The API, dashboard, consumer, and data storage components work together to facilitate seamless document processing. By leveraging technologies such as Go, Gorilla Mux, Vue.js, Python, Confluent Kafka, Kafka-Kraft, Redis, and MongoDB, our system provides a robust and efficient solution for customers' document processing needs.

Please note that the above document is a sample and may not cover all aspects of your specific system architecture. Feel free to modify and expand it based on your requirements and additional components involved.