Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Update the k8s cluster mode document #6500

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
356 changes: 48 additions & 308 deletions docs/en/start-v2/kubernetes/kubernetes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -93,34 +93,6 @@ Load image to minikube via:
minikube image load seatunnel:2.3.5
```

</TabItem>

<TabItem value="Zeta (cluster-mode)">

```Dockerfile
FROM openjdk:8

ENV SEATUNNEL_VERSION="2.3.5"
ENV SEATUNNEL_HOME="/opt/seatunnel"

RUN wget https://dlcdn.apache.org/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-${SEATUNNEL_VERSION}-bin.tar.gz
RUN mv apache-seatunnel-${SEATUNNEL_VERSION} ${SEATUNNEL_HOME}
RUN mkdir -p $SEATUNNEL_HOME/logs
RUN cd ${SEATUNNEL_HOME} && sh bin/install-plugin.sh ${SEATUNNEL_VERSION}
```

Then run the following commands to build the image:
```bash
docker build -t seatunnel:2.3.5 -f Dockerfile .
```
Image `seatunnel:2.3.5` need to be present in the host (minikube) so that the deployment can take place.

Load image to minikube via:
```bash
minikube image load seatunnel:2.3.5
```

</TabItem>
</Tabs>

Expand Down Expand Up @@ -171,10 +143,6 @@ flink-kubernetes-operator-5f466b8549-mgchb 1/1 Running 3 (23h
<TabItem value="Zeta (local-mode)">
none
</TabItem>

<TabItem value="Zeta (cluster-mode)">
none
</TabItem>
</Tabs>

## Run SeaTunnel Application
Expand Down Expand Up @@ -362,261 +330,6 @@ kubectl apply -f seatunnel.yaml
```

</TabItem>


<TabItem value="Zeta (cluster-mode)">

In this guide we are going to use [seatunnel.streaming.conf](https://github.com/apache/seatunnel/blob/2.3.5-release/config/v2.streaming.conf.template):

```conf
env {
parallelism = 2
job.mode = "STREAMING"
checkpoint.interval = 2000
}

source {
FakeSource {
parallelism = 2
result_table_name = "fake"
row.num = 16
schema = {
fields {
name = "string"
age = "int"
}
}
}
}

sink {
Console {
}
}
```

Generate a configmap named seatunnel-config in Kubernetes for the seatunnel.streaming.conf so that we can mount the config content in pod.
```bash
kubectl create cm seatunnel-config \
--from-file=seatunnel.streaming.conf=seatunnel.streaming.conf
```

Then, we use the following command to load some configuration files used by the seatunnel cluster into the configmap

Create the yaml file locally as follows

- Create `hazelcast-client.yaml`:

```yaml

hazelcast-client:
cluster-name: seatunnel
properties:
hazelcast.logging.type: log4j2
network:
cluster-members:
- localhost:5801

```
- Create `hazelcast.yaml`:

```yaml

hazelcast:
cluster-name: seatunnel
network:
rest-api:
enabled: true
endpoint-groups:
CLUSTER_WRITE:
enabled: true
DATA:
enabled: true
join:
tcp-ip:
enabled: true
member-list:
- localhost
port:
auto-increment: false
port: 5801
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50

```
- Create `seatunnel.yaml`:

```yaml
seatunnel:
engine:
history-job-expire-minutes: 1440
backup-count: 1
queue-type: blockingqueue
print-execution-info-interval: 60
print-job-metrics-info-interval: 60
slot-service:
dynamic-slot: true
checkpoint:
interval: 10000
timeout: 60000
storage:
type: hdfs
max-retained: 3
plugin-config:
namespace: /tmp/seatunnel/checkpoint_snapshot
storage.type: hdfs
fs.defaultFS: file:///tmp/ # Ensure that the directory has written permission
```

Create congfigmaps for the configuration file using the following command

```bash
kubectl create configmap hazelcast-client --from-file=hazelcast-client.yaml
kubectl create configmap hazelcast --from-file=hazelcast.yaml
kubectl create configmap seatunnelmap --from-file=seatunnel.yaml

```

Deploy Reloader to achieve hot deployment
We use the Reloader here to automatically restart the pod when the configuration file or other modifications are made. You can also directly give the value of the configuration file and do not use the Reloader

- [Reloader](https://github.com/stakater/Reloader/)

```bash
wget https://raw.githubusercontent.com/stakater/Reloader/master/deployments/kubernetes/reloader.yaml
kubectl apply -f reloader.yaml

```

- Create `seatunnel-cluster.yml`:
```yaml
apiVersion: v1
kind: Service
metadata:
name: seatunnel
spec:
selector:
app: seatunnel
ports:
- port: 5801
name: seatunnel
clusterIP: None
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: seatunnel
annotations:
configmap.reloader.stakater.com/reload: "hazelcast,hazelcast-client,seatunnelmap"
spec:
serviceName: "seatunnel"
replicas: 3 # modify replicas according to your case
selector:
matchLabels:
app: seatunnel
template:
metadata:
labels:
app: seatunnel
spec:
containers:
- name: seatunnel
image: seatunnel:2.3.5
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5801
name: client
command: ["/bin/sh","-c","/opt/seatunnel/bin/seatunnel-cluster.sh -DJvmOption=-Xms2G -Xmx2G"]
resources:
limits:
cpu: "1"
memory: 4G
requests:
cpu: "1"
memory: 2G
volumeMounts:
- mountPath: "/opt/seatunnel/config/hazelcast.yaml"
name: hazelcast
subPath: hazelcast.yaml
- mountPath: "/opt/seatunnel/config/hazelcast-client.yaml"
name: hazelcast-client
subPath: hazelcast-client.yaml
- mountPath: "/opt/seatunnel/config/seatunnel.yaml"
name: seatunnelmap
subPath: seatunnel.yaml
- mountPath: /data/seatunnel.streaming.conf
name: seatunnel-config
subPath: seatunnel.streaming.conf
volumes:
- name: hazelcast
configMap:
name: hazelcast
- name: hazelcast-client
configMap:
name: hazelcast-client
- name: seatunnelmap
configMap:
name: seatunnelmap
- name: seatunnel-config
configMap:
name: seatunnel-config
items:
- key: seatunnel.streaming.conf
path: seatunnel.streaming.conf
```

- Starting a cluster:
```bash
kubectl apply -f seatunnel-cluster.yml
```
Then modify the seatunnel configuration in pod using the following command

```bash
kubectl edit cm hazelcast
```
Change the member-list option to your cluster address

This uses the headless service access mode

The format for accessing between general pods is [pod-name].[service-name].[namespace].svc.cluster.local

for example:
```bash
- seatunnel-0.seatunnel.default.svc.cluster.local
- seatunnel-1.seatunnel.default.svc.cluster.local
- seatunnel-2.seatunnel.default.svc.cluster.local
```
```bash
kubectl edit cm hazelcast-client
```
Change the cluster-members option to your cluster address

for example:
```bash
- seatunnel-0.seatunnel.default.svc.cluster.local:5801
- seatunnel-1.seatunnel.default.svc.cluster.local:5801
- seatunnel-2.seatunnel.default.svc.cluster.local:5801
```
Later, you will see that the pod automatically restarts and updates the seatunnel configuration

```bash
kubectl edit cm hazelcast-client
```
After we wait for all pod updates to be completed, we can use the following command to check if the configuration inside the pod has been updated

```bash
kubectl exec -it seatunnel-0 -- cat /opt/seatunnel/config/hazelcast-client.yaml
```
Afterwards, we can submit tasks to any pod

```bash
kubectl exec -it seatunnel-0 -- /opt/seatunnel/bin/seatunnel.sh --config /data/seatunnel.streaming.conf
```
</TabItem>

</Tabs>

**See The Output**
Expand Down Expand Up @@ -729,39 +442,66 @@ To stop your job and delete your FlinkDeployment you can simply:
kubectl delete -f seatunnel.yaml
```
</TabItem>
</Tabs>

<TabItem value="Zeta (cluster-mode)">

You may follow the logs of your job, after a successful startup (which can take on the order of a minute in a fresh environment, seconds afterwards) you can:
## Zeta Cluster Mode
When we say Zeta cluster mode, it means it will has multiple node as the server(worker) node. They are in same hazelcast cluster, they can connect, send message with each other.

```bash
kubectl exec -it seatunnel-1 -- tail -f /opt/seatunnel/logs/seatunnel-engine-server.log | grep ConsoleSinkWriter
Refer Document: https://docs.hazelcast.com/hazelcast/5.0/deploy/configuring-kubernetes

If we create the zeta cluster on k8s, there has 2 things you need notice:
1. create a headless service
2. update the hazelcast.yaml
the hazelcast config file change to :
`hazelcast.yaml`
```yaml
hazelcast:
network:
join:
multicast:
enabled: false
kubernetes:
enabled: true
service-dns: <you created headless service name>
# this service port must has (hazelcast default port is 5701) and must be consistent with your setup
service-port: 5801
port:
auto-increment: false
port: 5801
... <other configurations>
```

looks like the below (your content may be different since we use `FakeSource` to automatically generate random stream data):
`hazelcast-client.yaml`
```yaml
hazelcast-client:
network:
cluster-members:
- <you created headless service name>
... <other configurations>
```

The pod start command is : `sh bin/seatunnel-cluster.sh`
Then you can create the deployment or statefulset. All the pod will join the same hazelcast cluster.
You can check the `logs/seatunnel-engine-server.log` (or update the `config/log4j2.properties` file let's the log info print to console).
When you see the log like this, it means the network connector between each pod is success.
```shell
...
2023-10-10 08:05:07,283 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=1 rowIndex=7: SeaTunnelRow#tableId= SeaTunnelRow#kind=INSERT : IibHk, 820962465
2023-10-10 08:05:07,283 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=1 rowIndex=8: SeaTunnelRow#tableId= SeaTunnelRow#kind=INSERT : lmKdb, 1072498088
2023-10-10 08:05:07,283 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=1 rowIndex=9: SeaTunnelRow#tableId= SeaTunnelRow#kind=INSERT : iqGva, 918730371
2023-10-10 08:05:07,284 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=1 rowIndex=10: SeaTunnelRow#tableId= SeaTunnelRow#kind=INSERT : JMHmq, 1130771733
2023-10-10 08:05:07,284 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=1 rowIndex=11: SeaTunnelRow#tableId= SeaTunnelRow#kind=INSERT : rxoHF, 189596686
2023-10-10 08:05:07,284 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=1 rowIndex=12: SeaTunnelRow#tableId= SeaTunnelRow#kind=INSERT : OSblw, 559472064
2023-10-10 08:05:07,284 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=1 rowIndex=13: SeaTunnelRow#tableId= SeaTunnelRow#kind=INSERT : yTZjG, 1842482272
2023-10-10 08:05:07,284 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=1 rowIndex=14: SeaTunnelRow#tableId= SeaTunnelRow#kind=INSERT : RRiMg, 1713777214
2023-10-10 08:05:07,284 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=1 rowIndex=15: SeaTunnelRow#tableId= SeaTunnelRow#kind=INSERT : lRcsd, 1626041649
2023-10-10 08:05:07,284 INFO org.apache.seatunnel.connectors.seatunnel.console.sink.ConsoleSinkWriter - subtaskIndex=1 rowIndex=16: SeaTunnelRow#tableId= SeaTunnelRow#kind=INSERT : QrNNW, 41355294
Members {size:<Your total pod number>, ver:<Your total pod number>} [
Member [Your Pod IP 1]:5801 - 2e2bd0dc-940f-4d97-85a5-8c2f909b1fe4
Member [Your Pod IP 2]:5801 - 4df94399-56e7-4715-8417-62b619405cfe this
...
]
```

After those step, you has success setup an seatunnel cluster. Then you can submit the job via [RestAPI](https://seatunnel.apache.org/docs/2.3.4/seatunnel-engine/rest-api) or shell command.
An example of shell command submit job:
```
kubectl exec -it <your pod name> -- bash

To stop your job and delete your FlinkDeployment you can simply:
cd <SEATUNNEL_HOME>

```bash
kubectl delete -f seatunnel-cluster.yaml
sh bin/seatunnel.sh -c <the config file path>
```
</TabItem>
</Tabs>


Happy SeaTunneling!
Expand Down
Loading
Loading