From c7356b82d658b6c85bd4dd0b82eb97c0262f78e2 Mon Sep 17 00:00:00 2001 From: dockerzhang Date: Tue, 21 Jun 2022 17:18:06 +0800 Subject: [PATCH 1/4] [INLONG-437][Doc] Update the Agent Documents for release 1.2.0 --- docs/design_and_concept/basic_concept.md | 26 +++---- .../how_to_write_plugin_agent.md | 55 +++++--------- docs/modules/agent/metrics.md | 66 +++++++++++++++++ docs/modules/agent/overview.md | 72 +------------------ .../design_and_concept/basic_concept.md | 26 +++---- .../how_to_write_plugin_agent.md | 63 ++++++---------- .../current/modules/agent/metrics.md | 65 +++++++++++++++++ .../current/modules/agent/overview.md | 68 +----------------- 8 files changed, 203 insertions(+), 238 deletions(-) create mode 100644 docs/modules/agent/metrics.md create mode 100644 i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/metrics.md diff --git a/docs/design_and_concept/basic_concept.md b/docs/design_and_concept/basic_concept.md index 3329aa97920..823e96dee4d 100644 --- a/docs/design_and_concept/basic_concept.md +++ b/docs/design_and_concept/basic_concept.md @@ -3,16 +3,16 @@ title: Basic Concept sidebar_position: 1 --- -| Name | Description | Other | -|--------------------------|-----------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------| -| Standard Architecture | Contains all InLong components such as InLong Agent/Manager/MQ/Sort/Dashboard | Suitable for massive data and large-scale production environments | -| Lightweight Architecture | Contains only one component of InLong Sort, which can be used with Manager/Dashboard | The lightweight architecture is simple and flexible, suitable for small-scale data | -| Group | Data Streams Group, it contains multiple data streams, and one Group represents one data ingestion. | Group has attributes such as ID and Name. | -| Stream | Data Stream, a stream has a specific flow direction. | Stream has attributes such as ID, Name, and data fields. | -| Node | Data Node, including `Extract Node` and `Load Node`, stands for the data source and sink types separately. | | -| InLongMsg | InLong data format, if you consume message directly from the message queue, you need to perform `InLongMsg` parsing first. | | -| Agent | Represents various collection capabilities. | It contains File Agent, SQL Agent, Binlog Agent, etc. | -| DataProxy | Forward received data to different message queues. | Supports data transmission blocking, placing retransmission. | -| Sort | Data stream sorting | Sort-flink based on Flink, sort-standalone for local sorting. | -| TubeMQ | InLong's self-developed message queuing service | It can also be called Tube, with low-cost, high-performance features. | -| Pulsar | [Apache Pulsar](https://pulsar.apache.org/), a high-performance, high-consistency message queue service | | \ No newline at end of file +| Name | Description | Other | +|--------------------------|----------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------| +| Standard Architecture | Contains all InLong components such as InLong Agent/Manager/MQ/Sort/Dashboard | Suitable for massive data and large-scale production environments | +| Lightweight Architecture | Contains only one component of InLong Sort, which can be used with Manager/Dashboard | The lightweight architecture is simple and flexible, suitable for small-scale data | +| Group | Data Streams Group, it contains multiple data streams, and one Group represents one data ingestion. | Group has attributes such as ID and Name. | +| Stream | Data Stream, a stream has a specific flow direction. | Stream has attributes such as ID, Name, and data fields. | +| Node | Data Node, including `Extract Node` and `Load Node`, stands for the data source and sink types separately. | | +| InLongMsg | InLong data format, if you consume message directly from the message queue, you need to perform `InLongMsg` parsing first. | | +| Agent | The standard architecture uses Agent for data collection, and Agent represents different types of collection capabilities. | It contains File Agent, SQL Agent, Binlog Agent, etc. | +| DataProxy | Forward received data to different message queues. | Supports data transmission blocking, placing retransmission. | +| Sort | Data stream sorting. | Sort-flink based on Flink, sort-standalone for local sorting. | +| TubeMQ | InLong's self-developed message queuing service | It can also be called Tube, with low-cost, high-performance features. | +| Pulsar | [Apache Pulsar](https://pulsar.apache.org/), a high-performance, high-consistency message queue service | | \ No newline at end of file diff --git a/docs/design_and_concept/how_to_write_plugin_agent.md b/docs/design_and_concept/how_to_write_plugin_agent.md index 26ad07da707..c16eb420a33 100644 --- a/docs/design_and_concept/how_to_write_plugin_agent.md +++ b/docs/design_and_concept/how_to_write_plugin_agent.md @@ -3,13 +3,13 @@ title: Agent Plugin sidebar_position: 2 --- -# Overview +## Overview -This article is aimed at InLong-Agent plug-in developers, trying to explain the process of developing an Agent plug-in as comprehensively as possible, and strive to eliminate the confusion of developers and make plug-in development easier. +In Standard Architecture, we can collect various types of data through InLong Agent. InLong Agent supports extending new collection types in the form of plug-ins. This article will guide developers on customizing the new Agent collection plug-ins. -## Before Development +## Concepts and Models -InLong Agent itself, as a data collection framework, is constructed with a Job + Task architecture. And abstract data source reading and writing into Reader/Sink plug-ins, which are incorporated into the entire framework. +InLong Agent is a data collection framework, adopted `Job` + `Task` architectural model. And abstract data source reading and writing into Reader/Sink plugins. Developers need to be clear about the concepts of Job and Task: @@ -18,33 +18,29 @@ Developers need to be clear about the concepts of Job and Task: A Task contains the following components: -- Reader: Reader is a data collection module, which is responsible for collecting data from the data source and sending the data to the channel. -- Sink: Sink is a data writing module, responsible for continuously fetching data from the channel and writing the data to the destination. -- Channel: Channel is used to connect reader and sink, as a data transmission channel for both, and plays a role in monitoring data writing and reading +- Reader: a data collection module, which is responsible for collecting data from the data source and sending the data to the Channel. +- Sink: a data writing module, responsible for continuously fetching data from the Channel and writing the data to the destination. +- Channel: connect Reader and Sink, as a data transmission channel for both, and plays a role in monitoring data writing and reading. -As a developer, you only need to develop specific Source, Reader and Sink. If the data needs to be persisted to the local disk, use the persistent Channel, otherwise use the memory Channel +When extending an Agent plugin, you need to develop specific Source, Reader and Sink. If the data needs to be persisted to the local disk, use the persistent Channel, otherwise use the memory Channel ## Demonstration -The Job\Task\Reader\Sink\Channel concept introduced above can be represented by the following figure: +The Job/Task/Reader/Sink/Channel concept introduced above can be represented by the following figure: ![](img/Agent_Flow.png) -1. The user submits a Job (via the manager or via curl), and the Job defines the Source, Channel, and Sink that need to be used (defined by the fully qualified name of the class) -2. The framework starts the Job and creates the Source through the reflection mechanism -3. The framework starts the Source and calls the Split interface of the Source to generate one or more Tasks -4. When a Task is generated, a Reader (a type of Source will generate a corresponding reader), a User-configured Channel and a User-configured Sink are generated at the same time -5. Task starts to execute, Reader starts to read data to Channel, Sink fetches data from Channel and sends it -6. All the information needed for Job and Task execution is encapsulated in the JobProfile - -## Reference Demo - -Please understand the above process by reading the Job class, Task class, TextFileSource class, TextFileReader class, and ProxySink class in the Agent source +- The user submits a Job (via the manager), and the Job defines the Source, Channel, and Sink that need to be used (defined by the fully qualified name of the class) +- The framework starts the Job and creates the Source through the reflection mechanism +- The framework starts the Source and calls the Split interface of the Source to generate one or more Tasks +- When a Task is generated, a Reader (a type of Source will generate a corresponding reader), a User-configured Channel and a User-configured Sink are generated at the same time +- Task starts to execute, Reader starts to read data to Channel, Sink fetches data from Channel and sends it +- All the information needed for Job and Task execution is encapsulated in the JobProfile ## Development Process -1. First develop Source, implement split logic, and return ReaderList -2. The developed Reader implements the logic of reading data and writing to Channel -3. The sink under development implements the logic of fetching data from the channel and writing it to the specified sink +- First develop Source, implement split logic, and return ReaderList +- The developed Reader implements the logic of reading data and writing to Channel +- The sink under development implements the logic of fetching data from the channel and writing it to the specified sink ## Programming must know interface @@ -189,17 +185,4 @@ public interface Message { } ``` -Developers can expand customized Message according to this interface. For example, ProxyMessage contains InLongGroupId, InLongStreamId and other attributes - - -## Last but not Least - -All new plugins must have a document in the `InLong` official wiki. The document needs to include but not limited to the following: - -1. **Quick introduction**: Introduce the usage scenarios and features of the plug-in. -2. **Implementation principle**: Introduce the underlying principle of plug-in implementation, such as `sqlReader` to read data in the database by executing Sql query -3. **Configuration Instructions** - - Give the json configuration file of synchronization tasks in typical scenarios. - - Introduce the meaning of each parameter, whether it is required, default value, value range and other constraints. -4. **Restrictions**: Are there other restrictions on use. -5. **FAQ**: Frequently asked questions by users. +Developers can expand customized Message according to this interface. For example, ProxyMessage contains InLongGroupId, InLongStreamId and other attributes \ No newline at end of file diff --git a/docs/modules/agent/metrics.md b/docs/modules/agent/metrics.md new file mode 100644 index 00000000000..b25451d9e68 --- /dev/null +++ b/docs/modules/agent/metrics.md @@ -0,0 +1,66 @@ +--- +title: Monitor Metrics +sidebar_position: 3 +--- + +## JMX Configuration +Agent provides the ability of monitoring indicators in JMX and Prometheus mode, and JMX mode is used by default. The monitoring indicators have been registered to MBeanServer +Users can add similar JMX (port and authentication are adjusted according to the situation) to the startup parameters of the Agent to realize the collection of monitoring indicators from the remote end. + +```Shell +-Dcom.sun.management.jmxremote +-Djava.rmi.server.hostname=127.0.0.1 +-Dcom.sun.management.jmxremote.port=9999 +-Dcom.sun.management.jmxremote.authenticate=false +-Dcom.sun.management.jmxremote.ssl=false +``` + +## Prometheus Configuration +You can declare whether to enable Prometheus and HTTPServer port in `agent.properties`. + +```properties +# the default is false +agent.prometheus.enable=true +# the default is 8080 +agent.prometheus.exporter.port=8080 +``` + +## Appendix: Metrics Items + +### AgentTaskMetric +| property | description | +| ---- | ---- | +| runningTasks | tasks currently being executed | +| retryingTasks | Tasks that are currently being retried | +| fatalTasks | The total number of currently failed tasks | + + +### JobMetrics +| property | description | +| ---- | ---- | +| runningJobs | the total number of currently running jobs | +| fatalJobs | the total number of currently failed jobs | + +### PluginMetric +| property | description | +| ---- | ---- | +| readNum | the number of reads | +| sendNum | the number of sent items | +| sendFailedNum | the number of failed sending | +| readFailedNum | the number of failed reads | +| readSuccessNum | the number of successful reads | +| sendSuccessNum | the number of successfully sent | + +### SourceMetric + +| property | type | description | +|----------------------------|---------|--------------------------------------------------------------------| +| agent_source_count_success | Counter | the success message count in agent source since agent started | +| agent_source_count_fail | Counter | the sink success message count in agent source since agent started | + +### SinkMetric + +| property | type | description | +|--------------------------|---------|--------------------------------------------------------------------| +| agent_sink_count_success | Counter | the sink success message count in agent source since agent started | +| agent_sink_count_fail | Counter | the sink failed message count in agent source since agent started | \ No newline at end of file diff --git a/docs/modules/agent/overview.md b/docs/modules/agent/overview.md index 7a090d0bfc8..2eccf4b734b 100644 --- a/docs/modules/agent/overview.md +++ b/docs/modules/agent/overview.md @@ -3,7 +3,7 @@ title: Overview sidebar_position: 1 --- -InLong-Agent is a collection tool that supports multiple types of data sources, and is committed to achieving stable and efficient data collection functions between multiple heterogeneous data sources including File, SQL, Binlog, Metrics, etc. +InLong Agent is a collection tool that supports multiple types of data sources, and is committed to achieving stable and efficient data collection functions between multiple heterogeneous data sources including File, SQL, Binlog, Metrics, etc. ## Design Concept In order to solve the problem of data source diversity, InLong-agent abstracts multiple data sources into a unified source concept, and abstracts sinks to write data. When you need to access a new data source, you only need to configure the format and reading parameters of the data source to achieve efficient reading. @@ -25,7 +25,7 @@ User-configured path monitoring, able to monitor the created file information Directory regular filtering, support YYYYMMDD+regular expression path configuration Breakpoint retransmission, when InLong-Agent restarts, it can automatically re-read from the last read position to ensure no reread or missed reading. -### Sql +### SQL This type of data refers to the way it is executed through SQL SQL regular decomposition, converted into multiple SQL statements Execute SQL separately, pull the data set, the pull process needs to pay attention to the impact on mysql itself @@ -34,70 +34,4 @@ The execution cycle, which is generally executed regularly ### Binlog This type of collection reads binlog and restores data by configuring mysql slave Need to pay attention to multi-threaded parsing when binlog is read, and multi-threaded parsing data needs to be labeled in order -The code is based on the old version of dbsync, the main modification is to change the sending of tdbus-sender to push to agent-channel for integration - -## Monitor Metrics configuration - -Agent provides the ability of monitoring indicators in JMX and Prometheus mode, and JMX mode is used by default. The monitoring indicators have been registered to MBeanServer -Users can add similar JMX (port and authentication are adjusted according to the situation) to the startup parameters of the Agent to realize the collection of monitoring indicators from the remote end. - -```Shell --Dcom.sun.management.jmxremote --Djava.rmi.server.hostname=127.0.0.1 --Dcom.sun.management.jmxremote.port=9999 --Dcom.sun.management.jmxremote.authenticate=false --Dcom.sun.management.jmxremote.ssl=false -``` - -The agent indicators are divided into the following items, and the indicators are as follows: - -### AgentTaskMetric -| property | description | -| ---- | ---- | -| runningTasks | tasks currently being executed | -| retryingTasks | Tasks that are currently being retried | -| fatalTasks | The total number of currently failed tasks | - - -### JobMetrics -| property | description | -| ---- | ---- | -| runningJobs | the total number of currently running jobs | -| fatalJobs | the total number of currently failed jobs | - -### PluginMetric -| property | description | -| ---- | ---- | -| readNum | the number of reads | -| sendNum | the number of sent items | -| sendFailedNum | the number of failed sending | -| readFailedNum | the number of failed reads | -| readSuccessNum | the number of successful reads | -| sendSuccessNum | the number of successfully sent | - -### SourceMetric - -| property | type | description | -|----------------------------|---------|--------------------------------------------------------------------| -| agent_source_count_success | Counter | the success message count in agent source since agent started | -| agent_source_count_fail | Counter | the sink success message count in agent source since agent started | - -### SinkMetric - -| property | type | description | -|--------------------------|---------|--------------------------------------------------------------------| -| agent_sink_count_success | Counter | the sink success message count in agent source since agent started | -| agent_sink_count_fail | Counter | the sink failed message count in agent source since agent started | - -> In addition, Agent also has built-in Prometheus `simpleclient-hotspot`, which is used to collect JVM-related metrics. - -### Configure Prometheus - -You can declare whether to enable Prometheus and HTTPServer port in `agent.properties`. - -```properties -# the default is false -agent.prometheus.enable=true -# the default is 8080 -agent.prometheus.exporter.port=8080 -``` +The code is based on the old version of dbsync, the main modification is to change the sending of tdbus-sender to push to agent-channel for integration \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/basic_concept.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/basic_concept.md index b3fd1266bbf..84bced18198 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/basic_concept.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/basic_concept.md @@ -3,16 +3,16 @@ title: 基本概念 sidebar_position: 1 --- -| Name | Description | Other | -|---------------------------|---------------------------------------------------------------|--------------------------------------------| -| Standard Architecture | 标准架构,包含 InLong Agent/Manager/MQ/Sort/Dashboard 等所有 InLong 组件 | 适合海量数据、大规模生产环境 | -| Lightweight Architecture | 轻量化架构,只包含 InLong Sort 一个组件,可以搭配 Manager/Dashboard 一起使用 | 轻量化架构简单、灵活,适合小规模数据 | -| Group | 数据流组,包含多个数据流,一个Group 代表一个数据接入 | Group 有ID、Name 等属性 | -| Stream | 数据流,一个数据流有具体的流向 | Stream 有ID、Name、数据字段等属性 | -| Node | 数据节点,包括`Extract Node` 和 `Load Node`,分别代表数据源类型和数据流向目标类型 | | -| InLongMsg | InLong 数据格式,如果从消息队列中直接消费,需要先进行`InLongMsg` 解析 | | -| Agent | 代表各种采集能力 | 包含文件Agent、SQL Agent、Binlog Agent 等 | -| DataProxy | 将接收到的数据转发到不同的消息队列 | 支持数据发送阻塞和落盘重发 | -| Sort | 数据流分拣 | 主要有基于Flink的sort-flink,sort-standalone 本地分拣 | -| TubeMQ | InLong自带的消息队列服务 | 也可以叫Tube,拥有低成本、高性能特性 | -| Pulsar | 即[Apache Pulsar](https://pulsar.apache.org/), 高性能、高一致性消息队列服务 | | \ No newline at end of file +| Name | Description | Other | +|---------------------------|--------------------------------------------------------------|--------------------------------------------| +| Standard Architecture | 标准架构,包含 InLong Agent/Manager/MQ/Sort/Dashboard 等所有 InLong 组件 | 适合海量数据、大规模生产环境 | +| Lightweight Architecture | 轻量化架构,只包含 InLong Sort 一个组件,可以搭配 Manager/Dashboard 一起使用 | 轻量化架构简单、灵活,适合小规模数据 | +| Group | 数据流组,包含多个数据流,一个Group 代表一个数据接入 | Group 有ID、Name 等属性 | +| Stream | 数据流,一个数据流有具体的流向 | Stream 有ID、Name、数据字段等属性 | +| Node | 数据节点,包括`Extract Node` 和 `Load Node`,分别代表数据源类型和数据流向目标类型 | | +| InLongMsg | InLong 数据格式,如果从消息队列中直接消费,需要先进行`InLongMsg` 解析 | | +| Agent | 标准架构使用 Agent 进行数据采集,Agent 代表不同类型的采集能力 | 包含文件 Agent、SQL Agent、Binlog Agent 等 | +| DataProxy | 将接收到的数据转发到不同的消息队列 | 支持数据发送阻塞和落盘重发 | +| Sort | 数据流分拣 | 主要有基于Flink的sort-flink,sort-standalone 本地分拣 | +| TubeMQ | InLong自带的消息队列服务 | 也可以叫Tube,拥有低成本、高性能特性 | +| Pulsar | 即[Apache Pulsar](https://pulsar.apache.org/), 高性能、高一致性消息队列服务 | | \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_write_plugin_agent.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_write_plugin_agent.md index 22f8e5d3992..a01e6416f04 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_write_plugin_agent.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_write_plugin_agent.md @@ -3,49 +3,42 @@ title: Agent 插件 sidebar_position: 2 --- -# 总览 +## 总览 -本文面向 InLong-Agent 插件开发人员,尝试尽可能全面地阐述开发一个 Agent 插件所经过的历程,力求消除开发者的困惑,让插件开发变得简单。 +在 Standard Architecture 中,我们可以通过 InLong Agent 来采集各种类型的数据源。InLong Agent 支持以插件的方式扩展新的采集类型,本文将指导开发者如何自定义新的 Agent 采集插件。 -## 开发之前 +## 概念和模型 -InLong Agent 本身作为数据采集框架,采用 Job + Task 架构构建。并将数据源读取和写入抽象成为 Reader/Sink 插件,纳入到整个框架中。 - -开发人员需要明确 Job 以及 Task 的概念: +InLong Agent 是一个数据采集框架,采用 `Job` + `Task` 架构模型,将数据源读取和写入抽象成为 Reader/Sink 插件。 - `Job`: `Job`是 Agent 用以描述从一个源头到一个目的端的同步作业,是 Agent 数据同步的最小业务单元。比如:读取一个文件目录下的所有文件 - `Task`: `Task`是把`Job`拆分得到的最小执行单元。比如:文件夹下有多个文件需要被读取,那么一个 job 会被拆分成为多个 task ,每个 task 读取对应的文件 -一个Task包含以下各个组件: +一个 Task 包含以下组件: -- Reader:Reader 为数据采集模块,负责采集数据源的数据,将数据发送给 channel。 -- Sink: Sink 为数据写入模块,负责不断向 channel 取数据,并将数据写入到目的端。 -- Channel:Channel 用于连接 reader 和 sink,作为两者的数据传输通道,并起到了数据的写入读取监控作用 +- Reader:数据采集模块,负责采集数据源的数据,将数据发送给 Channel。 +- Sink: 数据写入模块,负责不断向 Channel 取数据,并将数据写入到目的端。 +- Channel:连接 Reader 和 Sink,作为两者的数据传输通道,并起到了数据的写入读取监控作用。 -作为开发人员,实际上只需要开发特定的 Source、Reader 以及 Sink 即可,数据如果需要持久化到本地磁盘,使用持久化 Channel ,如果否则使用内存 Channel +当扩展一个 Agent 插件时,需要开发特定的 Source、Reader 以及 Sink,数据如果需要持久化到本地磁盘,使用持久化 Channel ,如果否则使用内存 Channel ## 流程图示 -上述介绍的 Job \ Task \ Reader \ Sink \ Channel 概念可以用下图表示: +上述介绍的 Job/Task/Reader/Sink/Channel 概念可以用下图表示: ![](img/Agent_Flow.png) -1. 用户提交 Job(通过 manager 或者通过 curl 方式提交),Job 中定义了需要使用的 Source, Channel, Sink(通过类的全限定名定义) -2. 框架启动 Job,通过反射机制创建出 Source -3. 框架启动 Source,并调用 Source 的 Split 接口,生成一个或者多个 Task -4. 生成一个 Task 时,同时生成 Reader(一种类型的 Source 会生成对应的 reader),用户配置的 Channel 以及用户配置的 Sink -5. Task 开始执行,Reader 开始读取数据到 Channel,Sink 从 Channel 中取数进行发送 -6. Job 和 Task 执行时所需要的所有信息都封装在 JobProfile 中 - - -## 参考 Demo - -请开发人员通过阅读 Agent 源码中的 Job 类、Task 类、TextFileSource 类、TextFileReader 类、以及 ProxySink 类来弄懂上述流程 +- 用户提交 Job(通过 manager),Job 中定义了需要使用的 Source, Channel, Sink(通过类的全限定名定义) +- 框架启动 Job,通过反射机制创建出 Source +- 框架启动 Source,并调用 Source 的 Split 接口,生成一个或者多个 Task +- 生成一个 Task 时,同时生成 Reader(一种类型的 Source 会生成对应的 reader),用户配置的 Channel 以及用户配置的 Sink +- Task 开始执行,Reader 开始读取数据到 Channel,Sink 从 Channel 中取数进行发送 +- Job 和 Task 执行时所需要的所有信息都封装在 JobProfile 中 ## 开发流程 -1、首先开发 Source , 实现 Split 逻辑,返回 Reader 列表 -2、开发对应的 Reader ,实现读取数据并写入到 Channel 的逻辑 -3、开发对应的 Sink , 实现从 Channel 中取数并写入到指定 Sink 中的逻辑 +- 首先开发 Source , 实现 Split 逻辑,返回 Reader 列表 +- 开发对应的 Reader ,实现读取数据并写入到 Channel 的逻辑 +- 开发对应的 Sink , 实现从 Channel 中取数并写入到指定 Sink 中的逻辑 ## 编程必知接口 @@ -80,7 +73,7 @@ private class ReaderImpl implements Reader { } ``` -`Reader`接口功能如下: +`Reader` 接口功能如下: - `read`: 被单个 Task 调用,调用后返回读取的一条消息,Agent 内部的消息使用 Message 封装 - `isFinished`: 判断是否读取完成,举例:如果是 SQL 任务,则判断是否读取完了 ResultSet 中的所有内容,如果是文件任务,则判断超过用户设置的等待时间后是否还有数据写入 - `getReadSource`: 获取采集源,举例:如果是文件任务,则返回当前读取的文件名 @@ -112,7 +105,7 @@ public interface Sink extends Stage { ``` -`Sink`接口功能如下: +`Sink` 接口功能如下: - `write`: 被单个 Task 调用,从 Task 中的 Channel 读取一条消息,并写入到特定的存储介质中,以 PulsarSink 为例,则需要通过 PulsarSender 发送到 Pulsar - `setSourceName`: 设置数据源名称,如果是文件,则是文件名 - `initMessageFilter`: 初始化 MessageFilter , 用户可以在Job配置文件中通过设置 agent.message.filter.classname 来创建一个消息过滤器来过滤每一条消息,详情可以参考 MessageFilter 接口 @@ -191,16 +184,4 @@ public interface Message { } ``` -开发人员可以根据该接口拓展定制化的 Message ,比如 ProxyMessage 中,就包含了 InLongGroupId, InLongStreamId 等属性 - -## Last but not Least - -新增插件都必须在`InLong`官方wiki中有一篇文档,文档需要包括但不限于以下内容: - -1. **快速介绍**:介绍插件的使用场景,特点等。 -2. **实现原理**:介绍插件实现的底层原理,比如`sqlReader`通过执行Sql查询来读取数据库中的数据 -3. **配置说明** - - 给出典型场景下的同步任务的json配置文件。 - - 介绍每个参数的含义、是否必选、默认值、取值范围和其他约束。 -4. **约束限制**:是否存在其他的使用限制条件。 -5. **FAQ**:用户经常会遇到的问题。 +开发人员可以根据该接口拓展定制化的 Message ,比如 ProxyMessage 中,就包含了 InLongGroupId, InLongStreamId 等属性 \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/metrics.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/metrics.md new file mode 100644 index 00000000000..89d4a525bf0 --- /dev/null +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/metrics.md @@ -0,0 +1,65 @@ +--- +title: 监控指标 +sidebar_position: 3 +--- + +## JMX 配置 +Agent 提供了 JMX 和 Prometheus 方式的监控指标能力,默认使用 JMX 方式。JMX 方式的监控指标已经注册到 MBeanServer +用户可以在Agent的启动参数中增加如下类似JMX定义(端口和鉴权根据情况进行调整),实现监控指标从远端采集。 + +```shell + -Dcom.sun.management.jmxremote + -Djava.rmi.server.hostname=127.0.0.1 + -Dcom.sun.management.jmxremote.port=9999 + -Dcom.sun.management.jmxremote.authenticate=false + -Dcom.sun.management.jmxremote.ssl=false +``` + +## Prometheus 配置 +用户可以在`agent.properties`中声明是否启用Prometheus以及HTTPServer端口号 + +```properties +# 默认不启用Prometheus +agent.prometheus.enable=true +# 默认端口为8080 +agent.prometheus.exporter.port=8080 +``` + +## 附录:指标项 + +### AgentTaskMetric +| 属性名称 | 说明 | +| ---- | ---- | +| runningTasks | 当前正在执行的任务 | +| retryingTasks | 当前正在重试的任务 | +| fatalTasks | 当前失败的任务总数 | + +### JobMetrics +| 属性名称 | 说明 | +| ---- | ---- | +| runningJobs | 当前正在运行的job总数 | +| fatalJobs | 当前失败的job总数 | + +### PluginMetric +| 属性名称 | 说明 | +| ---- | ---- | +| readNum | 读取的条数 | +| sendNum | 发送的条数 | +| sendFailedNum | 发送失败条数 | +| readFailedNum | 读取失败条数 | +| readSuccessNum | 读取成功条数 | +| sendSuccessNum | 发送成功条数 | + +### SourceMetric + +| 属性名称 | 类型 | 说明 | +|----------------------------|---------|-------------------| +| agent_source_count_success | Counter | source 读取成功次数 | +| agent_source_count_fail | Counter | source 读取失败次数 | + +### SinkMetric + +| 属性名称 | 类型 | 说明 | +|--------------------------|---------|-----------------| +| agent_sink_count_success | Counter | sink 写入成功次数 | +| agent_sink_count_fail | Counter | sink 写入失败次数 | \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md index ef161fe994f..d02e5f4504d 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/overview.md @@ -3,7 +3,7 @@ title: 总览 sidebar_position: 1 --- -InLong-Agent是一个支持多种数据源类型的收集工具,致力于实现包括File、Sql、Binlog、Metrics等多种异构数据源之间稳定高效的数据采集功能。 +InLong Agent 是一个支持多种数据源类型的收集工具,致力于实现包括 File、Sql、Binlog、Metrics 等多种异构数据源之间稳定高效的数据采集功能。 ## 设计理念 为了解决数据源多样性问题,InLong-agent 将多种数据源抽象成统一的source概念,并抽象出sink来对数据进行写入。当需要接入一个新的数据源的时候,只需要配置好数据源的格式与读取参数便能跟做到高效读取。 @@ -34,68 +34,4 @@ SQL正则分解,转化成多条SQL语句 ### Binlog 这类采集通过配置mysql slave的方式,读取binlog,并还原数据 需要注意binlog读取的时候多线程解析,多线程解析的数据需要打上顺序标签 -代码基于老版本的dbsync,主要的修改是将tdbus-sender的发送改为推送到agent-channel的方式做融合 - -## 监控指标配置 -Agent提供了JMX和Prometheus方式的监控指标能力,默认使用JMX方式。JMX方式的监控指标已经注册到MBeanServer -用户可以在Agent的启动参数中增加如下类似JMX定义(端口和鉴权根据情况进行调整),实现监控指标从远端采集。 - -```shell - -Dcom.sun.management.jmxremote - -Djava.rmi.server.hostname=127.0.0.1 - -Dcom.sun.management.jmxremote.port=9999 - -Dcom.sun.management.jmxremote.authenticate=false - -Dcom.sun.management.jmxremote.ssl=false -``` - -Agent指标分为以下几项, 各项的属性分别为: - -### AgentTaskMetric -| 属性名称 | 说明 | -| ---- | ---- | -| runningTasks | 当前正在执行的任务 | -| retryingTasks | 当前正在重试的任务 | -| fatalTasks | 当前失败的任务总数 | - -### JobMetrics -| 属性名称 | 说明 | -| ---- | ---- | -| runningJobs | 当前正在运行的job总数 | -| fatalJobs | 当前失败的job总数 | - -### PluginMetric -| 属性名称 | 说明 | -| ---- | ---- | -| readNum | 读取的条数 | -| sendNum | 发送的条数 | -| sendFailedNum | 发送失败条数 | -| readFailedNum | 读取失败条数 | -| readSuccessNum | 读取成功条数 | -| sendSuccessNum | 发送成功条数 | - -### SourceMetric - -| 属性名称 | 类型 | 说明 | -|----------------------------|---------|-------------------| -| agent_source_count_success | Counter | source 读取成功次数 | -| agent_source_count_fail | Counter | source 读取失败次数 | - -### SinkMetric - -| 属性名称 | 类型 | 说明 | -|--------------------------|---------|-----------------| -| agent_sink_count_success | Counter | sink 写入成功次数 | -| agent_sink_count_fail | Counter | sink 写入失败次数 | - -> 另外,Agent还内置了Prometheus的`simpleclient-hotspot`,用于采集JVM相关的指标信息 - -### Configure Prometheus - -用户可以在`agent.properties`中声明是否启用Prometheus以及HTTPServer端口号 - -```properties -# 默认不启用Prometheus -agent.prometheus.enable=true -# 默认端口为8080 -agent.prometheus.exporter.port=8080 -``` +代码基于老版本的dbsync,主要的修改是将tdbus-sender的发送改为推送到agent-channel的方式做融合 \ No newline at end of file From 9aa601d4c6a389dee5ae1912b8ba7efc2f56ecb0 Mon Sep 17 00:00:00 2001 From: dockerzhang Date: Tue, 21 Jun 2022 17:21:07 +0800 Subject: [PATCH 2/4] [INLONG-437][Doc] Update the Agent Documents for release 1.2.0 --- docs/design_and_concept/how_to_write_plugin_agent.md | 4 ++-- .../current/design_and_concept/how_to_write_plugin_agent.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/design_and_concept/how_to_write_plugin_agent.md b/docs/design_and_concept/how_to_write_plugin_agent.md index c16eb420a33..7e360c57175 100644 --- a/docs/design_and_concept/how_to_write_plugin_agent.md +++ b/docs/design_and_concept/how_to_write_plugin_agent.md @@ -42,9 +42,9 @@ The Job/Task/Reader/Sink/Channel concept introduced above can be represented by - The developed Reader implements the logic of reading data and writing to Channel - The sink under development implements the logic of fetching data from the channel and writing it to the specified sink -## Programming must know interface +## Interface -Some of the plug-ins that will be developed below, the classes and interfaces that need to be known are as follows: +The following will introduce the classes and interfaces you need to know to develop an Agent plug-in. ### Reader ```java diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_write_plugin_agent.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_write_plugin_agent.md index a01e6416f04..e89a69cfba3 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_write_plugin_agent.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_write_plugin_agent.md @@ -40,9 +40,9 @@ InLong Agent 是一个数据采集框架,采用 `Job` + `Task` 架构模型, - 开发对应的 Reader ,实现读取数据并写入到 Channel 的逻辑 - 开发对应的 Sink , 实现从 Channel 中取数并写入到指定 Sink 中的逻辑 -## 编程必知接口 +## 接口 -下面将介绍开发一款插件需要知道的类与接口,如下: +下面将介绍开发一个 Agent 插件需要知道的类与接口。 ### Reader ```java From 6c9a1fc68c82a88ce699912f25225938e712f631 Mon Sep 17 00:00:00 2001 From: dockerzhang Date: Tue, 21 Jun 2022 17:24:34 +0800 Subject: [PATCH 3/4] [INLONG-437][Doc] Update the Agent Documents for release 1.2.0 --- docs/modules/agent/metrics.md | 2 +- .../current/modules/agent/metrics.md | 12 ++++++------ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/modules/agent/metrics.md b/docs/modules/agent/metrics.md index b25451d9e68..73c1f473e23 100644 --- a/docs/modules/agent/metrics.md +++ b/docs/modules/agent/metrics.md @@ -7,7 +7,7 @@ sidebar_position: 3 Agent provides the ability of monitoring indicators in JMX and Prometheus mode, and JMX mode is used by default. The monitoring indicators have been registered to MBeanServer Users can add similar JMX (port and authentication are adjusted according to the situation) to the startup parameters of the Agent to realize the collection of monitoring indicators from the remote end. -```Shell +```shell -Dcom.sun.management.jmxremote -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote.port=9999 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/metrics.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/metrics.md index 89d4a525bf0..f4b1d218600 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/metrics.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/agent/metrics.md @@ -5,14 +5,14 @@ sidebar_position: 3 ## JMX 配置 Agent 提供了 JMX 和 Prometheus 方式的监控指标能力,默认使用 JMX 方式。JMX 方式的监控指标已经注册到 MBeanServer -用户可以在Agent的启动参数中增加如下类似JMX定义(端口和鉴权根据情况进行调整),实现监控指标从远端采集。 +用户可以在 Agen t的启动参数中增加如下类似 JMX 定义(端口和鉴权根据情况进行调整),实现监控指标从远端采集。 ```shell - -Dcom.sun.management.jmxremote - -Djava.rmi.server.hostname=127.0.0.1 - -Dcom.sun.management.jmxremote.port=9999 - -Dcom.sun.management.jmxremote.authenticate=false - -Dcom.sun.management.jmxremote.ssl=false +-Dcom.sun.management.jmxremote +-Djava.rmi.server.hostname=127.0.0.1 +-Dcom.sun.management.jmxremote.port=9999 +-Dcom.sun.management.jmxremote.authenticate=false +-Dcom.sun.management.jmxremote.ssl=false ``` ## Prometheus 配置 From e17d602882b20ce103ec974d64660d7bdc37de8f Mon Sep 17 00:00:00 2001 From: dockerzhang Date: Tue, 21 Jun 2022 17:43:30 +0800 Subject: [PATCH 4/4] [INLONG-437][Doc] Update the Agent Documents for release 1.2.0 --- .../design_and_concept/basic_concept.md | 26 +++++++++---------- ...md => how_to_extend_data_node_for_sort.md} | 0 2 files changed, 13 insertions(+), 13 deletions(-) rename i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/{how_to_extend_data_node_for_sort.md.md => how_to_extend_data_node_for_sort.md} (100%) diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/basic_concept.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/basic_concept.md index 84bced18198..5eac2662213 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/basic_concept.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/basic_concept.md @@ -3,16 +3,16 @@ title: 基本概念 sidebar_position: 1 --- -| Name | Description | Other | -|---------------------------|--------------------------------------------------------------|--------------------------------------------| -| Standard Architecture | 标准架构,包含 InLong Agent/Manager/MQ/Sort/Dashboard 等所有 InLong 组件 | 适合海量数据、大规模生产环境 | -| Lightweight Architecture | 轻量化架构,只包含 InLong Sort 一个组件,可以搭配 Manager/Dashboard 一起使用 | 轻量化架构简单、灵活,适合小规模数据 | -| Group | 数据流组,包含多个数据流,一个Group 代表一个数据接入 | Group 有ID、Name 等属性 | -| Stream | 数据流,一个数据流有具体的流向 | Stream 有ID、Name、数据字段等属性 | -| Node | 数据节点,包括`Extract Node` 和 `Load Node`,分别代表数据源类型和数据流向目标类型 | | -| InLongMsg | InLong 数据格式,如果从消息队列中直接消费,需要先进行`InLongMsg` 解析 | | -| Agent | 标准架构使用 Agent 进行数据采集,Agent 代表不同类型的采集能力 | 包含文件 Agent、SQL Agent、Binlog Agent 等 | -| DataProxy | 将接收到的数据转发到不同的消息队列 | 支持数据发送阻塞和落盘重发 | -| Sort | 数据流分拣 | 主要有基于Flink的sort-flink,sort-standalone 本地分拣 | -| TubeMQ | InLong自带的消息队列服务 | 也可以叫Tube,拥有低成本、高性能特性 | -| Pulsar | 即[Apache Pulsar](https://pulsar.apache.org/), 高性能、高一致性消息队列服务 | | \ No newline at end of file +| Name | Description | Other | +|---------------------------|--------------------------------------------------------------|-----------------------------------------------| +| Standard Architecture | 标准架构,包含 InLong Agent/Manager/MQ/Sort/Dashboard 等所有 InLong 组件 | 适合海量数据、大规模生产环境 | +| Lightweight Architecture | 轻量化架构,只包含 InLong Sort 一个组件,可以搭配 Manager/Dashboard 一起使用 | 轻量化架构简单、灵活,适合小规模数据 | +| Group | 数据流组,包含多个数据流,一个Group 代表一个数据接入 | Group 有ID、Name 等属性 | +| Stream | 数据流,一个数据流有具体的流向 | Stream 有ID、Name、数据字段等属性 | +| Node | 数据节点,包括`Extract Node` 和 `Load Node`,分别代表数据源类型和数据流向目标类型 | | +| InLongMsg | InLong 数据格式,如果从消息队列中直接消费,需要先进行`InLongMsg` 解析 | | +| Agent | 标准架构使用 Agent 进行数据采集,Agent 代表不同类型的采集能力 | 包含文件 Agent、SQL Agent、Binlog Agent 等 | +| DataProxy | 将接收到的数据转发到不同的消息队列 | 支持数据发送阻塞和落盘重发 | +| Sort | 数据流分拣 | 主要有基于 Flink 的 sort-flink,sort-standalone 本地分拣 | +| TubeMQ | InLong 自带的消息队列服务 | 也可以叫 Tube,拥有低成本、高性能特性 | +| Pulsar | 即[Apache Pulsar](https://pulsar.apache.org/), 高性能、高一致性消息队列服务 | | \ No newline at end of file diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_extend_data_node_for_sort.md.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_extend_data_node_for_sort.md similarity index 100% rename from i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_extend_data_node_for_sort.md.md rename to i18n/zh-CN/docusaurus-plugin-content-docs/current/design_and_concept/how_to_extend_data_node_for_sort.md