Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc][Improve]Support Chinese for /seatunnel-engine/rest-api.md and local-mode.md and cluster-mode.md and checkpoint-storage.md #6445

Merged
merged 3 commits into from
Mar 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
187 changes: 187 additions & 0 deletions docs/zh/seatunnel-engine/checkpoint-storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
---

sidebar_position: 7
-------------------

# 检查点存储

## 简介

检查点是一种容错恢复机制。这种机制确保程序在运行时,即使突然遇到异常,也能自行恢复。

### 检查点存储

检查点存储是一种存储检查点数据的存储机制。

SeaTunnel Engine支持以下检查点存储类型:

- HDFS (OSS,S3,HDFS,LocalFile)
- LocalFile (本地),(已弃用: 使用Hdfs(LocalFile)替代).

我们使用微内核设计模式将检查点存储模块从引擎中分离出来。这允许用户实现他们自己的检查点存储模块。

`checkpoint-storage-api`是检查点存储模块API,它定义了检查点存储模块的接口。

如果你想实现你自己的检查点存储模块,你需要实现`CheckpointStorage`并提供相应的`CheckpointStorageFactory`实现。

### 检查点存储配置

`seatunnel-server`模块的配置在`seatunnel.yaml`文件中。

```yaml

seatunnel:
engine:
checkpoint:
storage:
type: hdfs #检查点存储的插件名称,支持hdfs(S3, local, hdfs), 默认为localfile (本地文件), 但这种方式已弃用
# 插件配置
plugin-config:
namespace: #检查点存储父路径,默认值为/seatunnel/checkpoint/
K1: V1 # 插件其它配置
K2: V2 # 插件其它配置
```

注意: namespace必须以"/"结尾。

#### OSS

阿里云oss是基于hdfs-file,所以你可以参考[hadoop oss文档](https://hadoop.apache.org/docs/stable/hadoop-aliyun/tools/hadoop-aliyun/index.html)来配置oss.

除了与oss buckets交互外,oss客户端需要与buckets交互所需的凭据。
客户端支持多种身份验证机制,并且可以配置使用哪种机制及其使用顺序。也可以使用of org.apache.hadoop.fs.aliyun.oss.AliyunCredentialsProvider的自定义实现。
如果您使用AliyunCredentialsProvider(可以从阿里云访问密钥管理中获得),它们包括一个access key和一个secret key。
你可以这样配置:

```yaml
seatunnel:
engine:
checkpoint:
interval: 6000
timeout: 7000
storage:
type: hdfs
max-retained: 3
plugin-config:
storage.type: oss
oss.bucket: your-bucket
fs.oss.accessKeyId: your-access-key
fs.oss.accessKeySecret: your-secret-key
fs.oss.endpoint: endpoint address
fs.oss.credentials.provider: org.apache.hadoop.fs.aliyun.oss.AliyunCredentialsProvider
```

有关Hadoop Credential Provider API的更多信息,请参见: [Credential Provider API](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).

阿里云oss凭证提供程序实现见: [验证凭证提供](https://github.com/aliyun/aliyun-oss-java-sdk/tree/master/src/main/java/com/aliyun/oss/common/auth)

#### S3

S3基于hdfs-file,所以你可以参考[hadoop s3文档](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)来配置s3。

除了与公共S3 buckets交互之外,S3A客户端需要与buckets交互所需的凭据。
客户端支持多种身份验证机制,并且可以配置使用哪种机制及其使用顺序。也可以使用com.amazonaws.auth.AWSCredentialsProvider的自定义实现。
如果您使用SimpleAWSCredentialsProvider(可以从Amazon Security Token服务中获得),它们包括一个access key和一个secret key。
您可以这样配置:

```yaml
``` yaml

seatunnel:
engine:
checkpoint:
interval: 6000
timeout: 7000
storage:
type: hdfs
max-retained: 3
plugin-config:
storage.type: s3
s3.bucket: your-bucket
fs.s3a.access.key: your-access-key
fs.s3a.secret.key: your-secret-key
fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider


```

如果您使用`InstanceProfileCredentialsProvider`,它支持在EC2 VM中运行时使用实例配置文件凭据,您可以检查[iam-roles-for-amazon-ec2](https://docs.aws.amazon.com/zh_cn/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html).
您可以这样配置:

```yaml

seatunnel:
engine:
checkpoint:
interval: 6000
timeout: 7000
storage:
type: hdfs
max-retained: 3
plugin-config:
storage.type: s3
s3.bucket: your-bucket
fs.s3a.endpoint: your-endpoint
fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.InstanceProfileCredentialsProvider
```

有关Hadoop Credential Provider API的更多信息,请参见: [Credential Provider API](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).

#### HDFS

如果您使用HDFS,您可以这样配置:

```yaml
seatunnel:
engine:
checkpoint:
storage:
type: hdfs
max-retained: 3
plugin-config:
storage.type: hdfs
fs.defaultFS: hdfs://localhost:9000
// 如果您使用kerberos,您可以这样配置:
kerberosPrincipal: your-kerberos-principal
kerberosKeytabFilePath: your-kerberos-keytab
```

如果HDFS是HA模式,您可以这样配置:

```yaml
seatunnel:
engine:
checkpoint:
storage:
type: hdfs
max-retained: 3
plugin-config:
storage.type: hdfs
fs.defaultFS: hdfs://usdp-bing
seatunnel.hadoop.dfs.nameservices: usdp-bing
seatunnel.hadoop.dfs.ha.namenodes.usdp-bing: nn1,nn2
seatunnel.hadoop.dfs.namenode.rpc-address.usdp-bing.nn1: usdp-bing-nn1:8020
seatunnel.hadoop.dfs.namenode.rpc-address.usdp-bing.nn2: usdp-bing-nn2:8020
seatunnel.hadoop.dfs.client.failover.proxy.provider.usdp-bing: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

```

如果HDFS在`hdfs-site.xml`或`core-site.xml`中有其他配置,只需使用`seatunnel.hadoop.`前缀设置HDFS配置即可。

#### 本地文件

```yaml
seatunnel:
engine:
checkpoint:
interval: 6000
timeout: 7000
storage:
type: hdfs
max-retained: 3
plugin-config:
storage.type: hdfs
fs.defaultFS: file:/// # 请确保该目录具有写权限

```

21 changes: 21 additions & 0 deletions docs/zh/seatunnel-engine/cluster-mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---

sidebar_position: 3
-------------------

# 以集群模式运行作业

这是最推荐的在生产环境中使用SeaTunnel Engine的方法。此模式支持SeaTunnel Engine的全部功能,集群模式将具有更好的性能和稳定性。

在集群模式下,首先需要部署SeaTunnel Engine集群,然后客户端将作业提交给SeaTunnel Engine集群运行。

## 部署SeaTunnel Engine集群

部署SeaTunnel Engine集群参考[SeaTunnel Engine集群部署](../../en/seatunnel-engine/deployment.md)

## 提交作业

```shell
$SEATUNNEL_HOME/bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template
```

25 changes: 25 additions & 0 deletions docs/zh/seatunnel-engine/local-mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---

sidebar_position: 2
-------------------

# 以本地模式运行作业

仅用于测试。

最推荐在生产环境中使用SeaTunnel Engine的方式为[集群模式](cluster-mode.md).

## 本地模式部署SeaTunnel Engine

[部署SeaTunnel Engine本地模式参考](../../en/start-v2/locally/deployment.md)

## 修改SeaTunnel Engine配置

将$SEATUNNEL_HOME/config/hazelcast.yaml中的自动增量更新为true

## 提交作业

```shell
$SEATUNNEL_HOME/bin/seatunnel.sh --config $SEATUNNEL_HOME/config/v2.batch.config.template -e local
```