Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INLONG-396][Release] Add blog for the 1.2.0 release #441

Merged
merged 6 commits into from
Jun 22, 2022
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: Release InLong 0.11.0
sidebar_position: 3
---

Apache InLong (incubating) has been renamed from the original Apache TubeMQ (incubating) from 0.9.0. With the name change, InLong has also been upgraded from a single message queue to a one-stop integration framework for massive data. InLong supports data collection, aggregation, caching, and sorting, users can import data from the data source to the real-time computing engine or land to offline storage with a simple configuration.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: Release InLong 0.12.0
sidebar_position: 2
---

InLong: the sacred animal in Chinese myths stories, draws rivers into the sea, as a metaphor for the InLong system to provide data access capabilities.
Expand All @@ -15,7 +14,6 @@ The 0.12.0-incubating just released mainly includes the following:

This version closed about 120+ issues, including four major features and 35 improvements.


### Apache InLong(incubating) Introduction
[Apache InLong](https://inlong.apache.org) is a one-stop integration framework for massive data donated by Tencent to the Apache community. It provides automatic, safe, reliable, and high-performance data transmission capabilities to facilitate the construction of streaming-based data analysis, modeling, and applications.
The Apache InLong project was originally called TubeMQ, focusing on high-performance, low-cost message queuing services. In order to further release the surrounding ecological capabilities of TubeMQ, we upgraded the project to InLong, focusing on creating a one-stop integration framework for massive data.
Expand Down Expand Up @@ -66,5 +64,3 @@ In subsequent versions, we will further enhance the capabilities of InLong to co
- Support link for data access ClickHouse
- Support DB data collection
- The second stage full link indicator audit function


Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: Release InLong 1.1.0
sidebar_position: 1
---

Apache InLong is a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities. InLong supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: Analysis of InLong Sort ETL Solution Based on Apache Flink SQL
sidebar_position: 4
title: Analysis of InLong Sort ETL Solution
---

# Analysis of InLong Sort ETL Solution Based on Apache Flink SQL
Expand All @@ -9,7 +8,7 @@ sidebar_position: 4

With the increasing number of users and developers of Apache InLong(incubating), the demand for richer usage scenarios and low-cost operation is getting stronger and stronger. Among them, the demand for adding Transform (T) to the whole link of InLong has received the most feedback. After the research and design of @yunqingmoswu, @EMsnap, @gong, @thexiay community developers, the InLong Sort ETL solution based on Flink SQL has been completed. This article will introduce the implementation details of the solution in detail.

First of all, based on Apache Flink SQL, there are mainly the following considerations:
Firstly, based on Apache Flink SQL, there are mainly the following considerations:

- Flink SQL has high scalability and flexibility brought about by its powerful expression ability. Basically, Flink SQL can support most demand scenarios in the community. When the built-in functions of Flink SQL do not meet the requirements, we can also extend them through various UDFs.
- Compared with the implementation of the underlying API of Flink, the development cost of Flink SQL is lower. Only for the first time, the conversion logic of Flink SQL needs to be implemented. In the future, we can focus on the construction of the ability of Flink SQL, such as the extension connector and the UDF.
Expand Down Expand Up @@ -56,7 +55,7 @@ The core concept refers to the explanation of terms in the outline design

| Name | Meaning |
| :-------------------------: | :----------------------------------------------------------: |
| InLong Dashborad | Inlong front end management interface |
| InLong Dashboard | Inlong front end management interface |
| InLong Manager Client | Wrap the interface in the manager for external user programs to call without going through the front-end inlong dashboard |
| InLong Manager Openapi | Inlong manager and external system call interface |
| InLong Manager metaData | Inlong manager metadata management, including metadata information of group and stream dimensions |
Expand All @@ -65,7 +64,7 @@ The core concept refers to the explanation of terms in the outline design
| InLong Stream | Data flow: a data flow has a specific flow direction |
| Stream Source | There are corresponding acquisition end and sink end in the stream. This design only involves the stream source |
| Stream Info | Abstract of data flow in sort, including various sources, transformations, destinations, etc. of the data flow |
| Group Info | Encapsulation of data flow in sort. A groupinfo can contain multiple stream infos |
| Group Info | Encapsulation of data flow in sort. A group info can contain multiple stream infos |
| Node | Abstraction of data source, data transformation and data destination in data synchronization |
| Extract Node | Source side abstraction of data synchronization |
| Load Node | Destination abstraction of data synchronization |
Expand All @@ -89,19 +88,19 @@ The core concept refers to the explanation of terms in the outline design

This design mainly involves the following entities:

GroupStreamGroupInfoStreamInfoNodeNodeRelationFieldRelationFunctionFilterFunctionSubstringFunctionFunctionParamFieldInfoMetaFieldInfoMySQLExtractNode、KafkaLoadNode and etc.
Group, Stream, GroupInfo, StreamInfo, Node, NodeRelation, FieldRelation, Function, FilterFunction, SubstringFunction, FunctionParam, FieldInfo, MetaFieldInfo, MySQLExtractNode, KafkaLoadNode, etc.

For ease of understanding, this section will model and analyze the relationship between entities. Description of entity correspondence of domain model:

- One group corresponds to one groupinfo
- One group corresponds to one group info
- A group contains one or more streams
- One stream corresponds to one streaminfo
- A groupinfo contains one or more streaminfo
- A streaminfo contains multiple nodes
- A streaminfo contains one or more NodeRelations
- A noderelation contains one or more fieldrelations
- A NodeRelation contains 0 or more filterfunctions
- A fieldrelation contains one function or one fieldinfo as the source field and one fieldinfo as the target field
- One stream corresponds to one StreamInfo
- A GroupInfo contains one or more StreamInfo
- A StreamInfo contains multiple nodes
- A StreamInfo contains one or more NodeRelations
- A NodeRelation contains one or more FieldRelations
- A NodeRelation contains 0 or more FilterFunctions
- A FieldRelation contains one function or one FieldInfo as the source field and one FieldInfo as the target field
- A function contains one or more FunctionParams

The above relationship can be represented by UML object relationship diagram as:
Expand Down Expand Up @@ -134,7 +133,7 @@ The above relationship can be represented by UML object relationship diagram as:

## 3.3 Module Design

This design only adds Flink connector and flinksql generator to the original system, and modifies the data model module.
This design only adds Flink connector and Flink SQL generator to the original system, and modifies the data model module.

### 3.3.1 Module Structure

Expand All @@ -146,9 +145,9 @@ Description of important module division:

| Name | Description |
| :---------------: | :----------------------------------------------------------: |
| FlinkSQLParser | Used to generate flinksql core classes, including references to groupinfo |
| GroupInfo | The internal abstraction of sort for inlong group is used to encapsulate the synchronization related information of the entire inlong group, including the reference to list\<streaminfo\> |
| StreamInfo | The internal abstraction of sort to inlong stream is used to encapsulate inlong stream synchronization related information, including references to list\<node\>, list\<noderelation\> |
| FlinkSQLParser | Used to generate Flink SQL core classes, including references to GroupInfo |
| GroupInfo | The internal abstraction of sort for inlong group is used to encapsulate the synchronization related information of the entire inlong group, including the reference to list\<StreamInfo\> |
| StreamInfo | The internal abstraction of sort to inlong stream is used to encapsulate inlong stream synchronization related information, including references to list\<node\>, list\<NodeRelation\> |
| Node | The top-level interface of the synchronization node. Its subclass implementation is mainly used to encapsulate the data of the synchronization data source and the transformation node |
| ExtractNode | Data extract node abstraction, inherited from node |
| LoadNode | Data load node abstraction, inherited from node |
Expand All @@ -160,8 +159,8 @@ Description of important module division:
| SubstringFunction | Used for string interception function abstraction, inherited from function |
| FunctionParam | Abstraction for function parameters |
| ConstantParam | Encapsulation of function constant parameters, inherited from FunctionParam |
| FieldInfo | The encapsulation of node fields can also be used as function input parameters, inherited from functionparam |
| MetaFieldInfo | The encapsulation of built-in fields is currently mainly used in the metadata field scenario of canal JSON, which is inherited from fieldinfo |
| FieldInfo | The encapsulation of node fields can also be used as function input parameters, inherited from FunctionParam |
| MetaFieldInfo | The encapsulation of built-in fields is currently mainly used in the metadata field scenario of canal JSON, which is inherited from FieldInfo |

# 4. Detailed System Design

Expand Down Expand Up @@ -201,11 +200,8 @@ with
'password' = 'password',
'database-name' = 'inlong',
'table-name' = 'tableName')

```



### 4.1.2 TransformNode Described in SQL

The node configuration is:
Expand All @@ -221,7 +217,6 @@ The node configuration is:
new FieldInfo("age", new IntFormatInfo()),
MoreThanOrEqualOperator.getInstance(), new ConstantParam(18))
);

```

The generated SQL is:
Expand All @@ -230,11 +225,8 @@ The generated SQL is:

```sql
SELECT `name` AS `name`,`age` AS `age` FROM `mysql_1` WHERE `age` < 25 AND `age` >= 18

```



### 4.1.3 LoadNode Described in SQL

The node configuration is:
Expand Down Expand Up @@ -267,7 +259,6 @@ The node configuration is:
new CanalJsonFormat(), null,
null, "id");
}

```

The generated SQL is:
Expand All @@ -287,11 +278,9 @@ with (
'canal-json-inlong.timestamp-format.standard' = 'SQL',
'canal-json-inlong.map-null-key.literal' = 'null'
)

```



## 4.2 Field T Described in SQL

### 4.2.1 Filter operator
Expand All @@ -304,12 +293,11 @@ The generated SQL is:

```sql
INSERT INTO `kafka_3` SELECT `name` AS `name`,`age` AS `age` FROM `mysql_1` WHERE `age` < 25 AND `age` >= 18

```

### 4.2.2 Watermark

The complete configuration of groupinfo is as follows:
The complete configuration of GroupInfo is as follows:

**nodeconfig3**

Expand Down
102 changes: 102 additions & 0 deletions blog/2022-06-22-release-1.2.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
title: Release InLong 1.2.0
---

Apache InLong is a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities.
InLong supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data.

## 1.2.0 Features Overview
**The just-released 1.2.0-incubating version closes about 410+ issues, contains 30+ features and 190+ optimizations.**
Mainly include the following:

### Enhance management and control capabilities
- Dashboard and Manager add cluster management capabilities
- Dashboard optimizes the flow creation process
- Manager supports plug-in extension of MQ

### Extended collection node
- Support for collecting data in Pulsar
- Support data collection in MongoDB-CDC
- Support data collection in MySQL-CDC
- Support data collection in Oracle-CDC
- Support data collection in PostgreSQL-CDC
- Support data collection in SQLServer-CDC

### Extended write node
- Support for writing data to Kafka
- Support for writing data to HBase
- Support for writing data to PostgreSQL
- Support for writing data to Oracle
- Supports writing data to MySQL
- Support writing data to TDSQL-PostgreSQL
- Support for writing data to Greenplum
- Supports writing data to SQLServer

### Support data conversion
- Support String Split
- Support String Regular Replace
- Support String Regular Replace First Matched Value
- Support Data Filter
- Support Data Distinct
- Support Regular Join

### Enhanced system monitoring function
- Support the reporting and management of data link heartbeat

### Other optimizations
- Optimize the HTTP request method in Manager Client
- Supports the delivery of DataProxy multi-cluster configurations
- GitHub Action check, pipeline optimization

## 1.2.0 Features Details

### Support multi-cluster management
Manager adds cluster management function, supports multi-cluster configuration, and solves the limitation that only one set of clusters can be defined through configuration files.
Users can create different types of clusters on Dashboard as needed.
The multi-cluster feature is mainly designed and implemented by @healchow, @luchunliang, @leezng, thanks to three contributors.

### Enrich the way of collecting file data and MySQL Binlog
Version 1.2.0 supports collecting complete file data, and also supports collecting data from the specified Binlog location in MySQL. This part of the work was done by @Greedyu.

### Support whole database migration
Sort supports migration of data across the entire database, contributed by @EMsnap.

### Supports writing data in Canal format
Support for writing data in Canal format to Kafka, contributed by @thexiay.

### Supports running SQL scripts
Sort supports running SQL scripts, see [INLONG-4405](https://github.com/apache/inlong/issues/4405), thanks to @gong for contributing this feature.

### Support the reporting and management of data link heartbeat
This version supports the heartbeat reporting and management of data grouping, data flow and underlying components, which is the premise of the state management of each link of the subsequent system.
This feature was primarily designed and contributed by @baomingyu, @healchow and @kipshi.

### Manager supports the creation of resources in multiple flow directions
In version 1.2.0, Manager added the creation of some storage resources:

- Create Topic for Kafka (contributed by @woofyzhao)
- Create databases and tables for Iceberg (contributed by @woofyzhao)
- Create namespaces and tables for HBase (contributed by @woofyzhao)
- Create databases and tables for ClickHouse (contributed by @lucaspeng12138)
- Create indices for Elasticsearch (contributed by @lucaspeng12138)
- Create databases and tables for PostgreSQL (contributed by @baomingyu)

### Sort supports lightweight architecture
Version 1.2.0 of Sort has done a lot of refactoring and improvements.
By introducing Flink-CDC, it supports a variety of Extract and Load nodes, and also supports data transformation (ie Transform).
This feature contains many sub-features. The main developers are:
@baomingyu, @EMsnap, @GanfengTan, @gong, @lucaspeng12138, @LvJiancheng, @kipshi, @thexiay, @woofyzhao, @yunqingmoswu, thank you all for your contributions.

For more information, please refer to: [Analysis of InLong Sort ETL Solution](2022-06-16-inlong-sort-etl_en.md)

### Other features and bug fixes
For related content, please refer to the [Release Notes](https://github.com/apache/inlong/blob/master/CHANGES.md), which details the features, enhancements and bug fixes of this release.

## Apache InLong follow-up planning

In subsequent versions, we will expand more data sources and storages to cover more usage scenarios, and gradually improve the usability and robustness of the system, including:

- Heartbeat report of each component
- Status management of data flow
- Enhance system auditing and monitoring capabilities
- Expand more types of acquisition nodes and storage nodes
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: 0.11.0 版本发布
sidebar_position: 3
---

Apache InLong(incubating) 从 0.9.0 版本开始由原 Apache TubeMQ(incubating)改名而来,伴随着名称的改变,InLong 也由单一的消息队列升级为一站式海量数据集成框架,支持了大数据领域的采集、汇聚、缓存和分拣功能,用户只需要简单的配置就可以把数据从数据源导入到实时计算引擎或者落地到离线存储。
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: 0.12.0 版本发布
sidebar_position: 2
---

InLong(应龙) : 中国神话故事里的神兽,引流入海,借喻 InLong 系统提供数据接入能力。
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: 1.1.0 版本发布
sidebar_position: 1
---

Apache InLong(应龙)是一个一站式海量数据集成框架,提供自动、安全、可靠和高性能的数据传输能力,同时支持批和流,方便业务构建基于流式的数据分析、建模和应用。InLong支持大数据领域的采集、汇聚、缓存和分拣功能,用户只需要简单的配置就可以把数据从数据源导入到实时计算引擎或者落地到离线存储。
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
---
title: 基于 Apache Flink SQL 的 InLong Sort ETL 方案解析
sidebar_position: 4
title: InLong Sort ETL 方案解析
---

# 基于 Apache Flink SQL 的 InLong Sort ETL 方案解析
Expand Down