Skip to content

Commit

Permalink
Merge branch 'master' into zhaoyuguang_12
Browse files Browse the repository at this point in the history
  • Loading branch information
zhaoyuguang committed Jul 19, 2020
2 parents 71e3930 + d326326 commit 5ce95d4
Show file tree
Hide file tree
Showing 58 changed files with 1,246 additions and 434 deletions.
15 changes: 5 additions & 10 deletions docs/content/faq/_index.cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,21 +114,16 @@ ElasticJob 并未包含 Spring 的依赖,请用户自行添加需要的版本

回答:

任务在mesos有单独的agent可提供所需的资源时才会启动,否则会等待直到有足够的资源。
任务在mesos有单独的 agent 可提供所需的资源时才会启动,否则会等待直到有足够的资源。

**增加JOB APP API**

* 将作业打包部署后发布作业APP
* 将作业打包部署后发布作业 APP

* 作业APP配置参数cpuCount,memoryMB分别代表应用启动时需要用到的CPU及内存
* 作业APP配置参数 cpuCount, memory MB分别代表应用启动时需要用到的 CPU 及内存

**调整JOB API**
**调整 JOB API**

* 新增作业时,必须先发布打包部署后的作业APP。

* 作业配置参数cpuCount,memoryMB分别代表作业运行时需要用到的CPU及内存。

## 15. 对于多网卡的机器,任务启动如何获取IP或是否可配置网卡信息?

回答:
ElasticJob默认获取网卡列表中第一个非回环可用IPV4的地址。当用户配置多网卡后,可通过设置系统变量(elasticjob.preferred.network.interface)获取指定网卡地址。
* 作业配置参数 cpuCount, memoryMB 分别代表作业运行时需要用到的 CPU 及内存。
48 changes: 47 additions & 1 deletion docs/content/features/elastic.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,50 @@ weight = 2
chapter = true
+++

TODO
Elastic schedule is the most important feature in ElasticJob, which acts as a job processing system that enables the horizontal scaling of jobs by sharding, it's also the origin of the project name "ElasticJob".

## Sharding

A concept in ElasticJob to split the job, enabling the job to be executed in distributed environment, where every single server only executes one of the slice that is assigned to it.
ElasticJob is aware of the number of servers in an almost-real-time manner, with the increment/decrement number of the servers, it re-assigns the job slices to the distributed servers, maximizing the efficiency as the increment of resources.

To execute the job in distributed servers, a job will be divided into multiple individual job items, one or some of which will be executed by the distributed servers.

For example, if a job is divided into 4 slices, and there're two servers to execute the job, then each server is assigned 2 slices, undertaking 50% of the workload, as follows.

![Sharding Job](https://shardingsphere.apache.org/elasticjob/current/img/elastic/sharding.png)

### Sharding Item

ElasticJob doesn't directly provide the abilities to process the data, instead, it assigns the sharding items to the job servers, where the developers should process the sharding items and their business logic themselves.
The sharding item is numeric type, in the range of [0, size(slices) - 1].

### Customized sharding options

Customized sharding options can build a relationship with the sharding items, converting the sharding items' numbers to more readable business codes.

For example, to horizontally split the databases according to the regions, database A stores data from Beijing, database B stores data from Shanghai and database C stores data from Guangzhou.
If we configure only by the sharding items' numbers, the developers need the knowledge that 0 represents Beijing, 1 represents Shanghai and 2 represents Guangzhou.
Customized sharding options make the codes more readable, if we have customized options `0=Beijing,1=Shanghai,2=Guangzhou`, we can simply use `Beijing`, `Shanghai`, `Guangzhou` in the codes.

## Maximize the usage of resources

ElasticJob provides a flexible way to maximize the throughput of the jobs.
When new job server joins, ElasticJob will be aware of it from the registry, and will re-shard in the next scheduling process, the new server will undertake some of the job slices, as follows.

![scale out](https://shardingsphere.apache.org/elasticjob/current/img/elastic/sacle-out.png)

Configuring a larger number of sharding items than the number of servers, or better, a multiplier of the number of servers, makes it more reasonably for the job to leverage the resources, and assign the sharding items dynamically.

For example, we have 10 sharding items and there're 3 servers, the number of sharding items are server A = 0,1,2; server B = 3,4,5; server C = 6,7,8,9.
If the server C is down, then server A = 0,1,2,3,4 and B = 5,6,7,8,9, maximizing the throughput without losing any sharding item.

## High Availability

When a server is down when executing a sharding item, the registry is also aware of that and the sharding item will be transferred to another living server, thus achieve the goal of high availability.
The unfinished job from a crashed server will be transferred and executed continuously, as follows.

![HA](https://shardingsphere.apache.org/elasticjob/current/img/elastic/ha.png)

Setting the total number of sharding items to 1 and more than 1 servers to execute the jobs makes the job run in the mode of `1` master and `n` slaves.
Once the servers that are executing jobs are down, the idle servers will take over the jobs and execute them in the next scheduling, or better, if the failover option is enabled, the idle servers can take over the failed jobs immediately.
37 changes: 36 additions & 1 deletion docs/content/features/resource.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,39 @@ weight = 3
chapter = true
+++

TODO
The resource allocation function is unique to ElasticJob-Cloud.

## Execution mode

ElasticJob-Cloud is divided into two execution modes: transient and daemon execution.

### Transient execution

The resources are released immediately after the execution of each job to ensure that the existing resources are used for staggered execution.
Resource allocation and container startup both take up a certain amount of time, and resources may not be sufficient during job execution, so job execution may be delayed.
Transient execution is suitable for jobs with long intervals, high resource consumption and no strict requirements on execution time.

### Daemon execution

Whether it is running or waiting to run, it always occupies the allocated resources, which can save too many container startup and resource allocation costs, and is suitable for jobs with short intervals and stable resource requirements.

## Scheduler

ElasticJob-Cloud is developed based on the Mesos Framework and is used for resource scheduling and application distribution. It needs to be started independently and provides services.

## Job Application

Refers to the application after the job is packaged and deployed, and describes the basic information such as the CPU, memory, startup script, and application download path that are needed to start the job.
Each job application can contain one or more jobs.

## Job

That is, the specific tasks that are actually run share the same job ecology as ElasticJob-Lite.
The job application must be registered before registering the job.

## Resource

Refers to the CPU and memory required to start or run a job.
Configuration in the job application dimension indicates the resources needed for the entire application to start;
Configuration in the job dimension indicates the resources required for each job to run.
The resources required for job startup are the sum of the resources required by the specified job application and the resources required by the job.
130 changes: 129 additions & 1 deletion docs/content/user-manual/elasticjob-lite/configuration/_index.cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,132 @@ weight = 2
chapter = true
+++

TODO
通过配置可以快速清晰的理解 ElasticJob-Lite 所提供的功能。

本章节是 ElasticJob-Lite 的配置参考手册,需要时可当做字典查阅。

ElasticJob-Lite 提供了 3 种配置方式,用于不同的使用场景。

## 注册中心配置项

### 可配置属性

| 属性名 | 类型 | 缺省值 | 描述 |
| ----------------------------- |:-------- |:------- |:-------------------------- |
| serverLists | String | | 连接 ZooKeeper 服务器的列表 |
| namespace | String | | ZooKeeper 的命名空间 |
| baseSleepTimeMilliseconds | int | 1000 | 等待重试的间隔时间的初始毫秒数 |
| maxSleepTimeMilliseconds | String | 3000 | 等待重试的间隔时间的最大毫秒数 |
| maxRetries | String | 3 | 最大重试次数 |
| sessionTimeoutMilliseconds | boolean | 60000 | 会话超时毫秒数 |
| connectionTimeoutMilliseconds | boolean | 15000 | 连接超时毫秒数 |
| digest | String | 无需验证 | 连接 ZooKeeper 的权限令牌 |

### 核心配置项说明

**serverLists:**

包括 IP 地址和端口号,多个地址用逗号分隔,如: host1:2181,host2:2181


## 作业配置项

### 可配置属性

| 属性名 | 类型 | 缺省值 | 描述 |
| ----------------------------- |:---------- |:-------------- |:----------------------------------- |
| jobName | String | | 作业名称 |
| shardingTotalCount | int | | 作业分片总数 |
| cron | String | | CRON 表达式,用于控制作业触发时间 |
| shardingItemParameters | String | | 个性化分片参数 |
| jobParameter | String | | 作业自定义参数 |
| monitorExecution | boolean | true | 监控作业运行时状态 |
| failover | boolean | false | 是否开启任务执行失效转移 |
| misfire | boolean | true | 是否开启错过任务重新执行 |
| maxTimeDiffSeconds | int | -1(不检查) | 最大允许的本机与注册中心的时间误差秒数 |
| reconcileIntervalMinutes | int | 10 | 修复作业服务器不一致状态服务调度间隔分钟 |
| jobShardingStrategyType | String | AVG_ALLOCATION | 作业分片策略类型 |
| jobExecutorServiceHandlerType | String | CPU | 作业线程池处理策略 |
| jobErrorHandlerType | String | | 作业错误处理策略 |
| description | String | | 作业描述信息 |
| props | Properties | | 作业属性配置信息 |
| disabled | boolean | false | 作业是否禁止启动 |
| overwrite | boolean | false | 本地配置是否可覆盖注册中心配置 |

### 核心配置项说明

**shardingItemParameters:**

分片序列号和参数用等号分隔,多个键值对用逗号分隔。
分片序列号从0开始,不可大于或等于作业分片总数。
如:0=a,1=b,2=c

**jobParameter:**

可通过传递该参数为作业调度的业务方法传参,用于实现带参数的作业
例:每次获取的数据量、作业实例从数据库读取的主键等。

**monitorExecution:**

每次作业执行时间和间隔时间均非常短的情况,建议不监控作业运行时状态以提升效率。
因为是瞬时状态,所以无必要监控。请用户自行增加数据堆积监控。并且不能保证数据重复选取,应在作业中实现幂等性。
每次作业执行时间和间隔时间均较长的情况,建议监控作业运行时状态,可保证数据不会重复选取。

**maxTimeDiffSeconds:**

如果时间误差超过配置秒数则作业启动时将抛异常。

**reconcileIntervalMinutes:**

在分布式的场景下由于网络、时钟等原因,可能导致 ZooKeeper 的数据与真实运行的作业产生不一致,这种不一致通过正向的校验无法完全避免。
需要另外启动一个线程定时校验注册中心数据与真实作业状态的一致性,即维持 ElasticJob 的最终一致性。

配置为小于 1 的任意值表示不执行修复。

**jobShardingStrategyType:**

详情请参见[内置分片策略列表](/cn/user-manual/elasticjob-lite/configuration/built-in-strategy/sharding)

**jobExecutorServiceHandlerType:**

详情请参见[内置线程池策略列表](/cn/user-manual/elasticjob-lite/configuration/built-in-strategy/thread-pool)

**jobErrorHandlerType:**

详情请参见[内置错误处理策略列表](/cn/user-manual/elasticjob-lite/configuration/built-in-strategy/error-handler)

**props:**

详情请参见[作业属性配置列表](/cn/user-manual/elasticjob-lite/configuration/props)

**disabled:**

可用于部署作业时,先禁止启动,部署结束后统一启动。

**overwrite:**

如果可覆盖,每次启动作业都以本地配置为准。

## 作业监听器配置项

### 常规监听器配置项

可配置属性:无

### 分布式监听器配置项

可配置属性

| 属性名 | 类型 | 缺省值 | 描述 |
| ------------------------------ |:------- |:-------------- |:---------------------------------- |
| started-timeout-milliseconds | long | Long.MAX_VALUE | 最后一个作业执行前的执行方法的超时毫秒数 |
| completed-timeout-milliseconds | long | Long.MAX_VALUE | 最后一个作业执行后的执行方法的超时毫秒数 |

## 事件追踪配置项

### 可配置属性

| 属性名 | 类型 | 缺省值 | 描述 |
| ------- |:------- |:----- |:------------------- |
| type | String | | 事件追踪存储适配器类型 |
| storage | 泛型 | | 事件追踪存储适配器对象 |
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,8 @@ weight = 2
chapter = true
+++

TODO
Through which developers can quickly and clearly understand the functions provided by ElasticJob-Lite.

This chapter is a configuration manual for ElasticJob-Lite, which can also be referred to as a dictionary if necessary.

ElasticJob-Lite has provided 3 kinds of configuration methods for different situations.
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
+++
title = "内置策略"
weight = 4
chapter = true
+++

## 简介

ElasticJob 通过 SPI 方式允许开发者扩展策略;
与此同时,ElasticJob 也提供了大量的内置策略以便于开发者使用。

## 使用方式

内置策略通过 type 进行配置。
本章节根据功能区分并罗列 ElasticJob 全部的内置算法,供开发者参考。
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
+++
title = "Built-in Strategy"
weight = 4
chapter = true
+++

## Introduction

ElasticJob allows developers to implement strategies via SPI;
At the same time, ElasticJob also provides a couple of built-in strategies for simplify developers.

## Usage

The built-in strategies are configured by type.
This chapter distinguishes and lists all the built-in strategies of ElasticJob according to its functions for developers' reference.
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
+++
title = "错误处理策略"
weight = 3
+++

## 记录日志策略

类型:LOG

记录作业异常日志,但不中断作业执行。

## 抛出异常策略

类型:THROW

抛出系统异常并中断作业执行。

## 忽略异常策略

类型:IGNORE

忽略系统异常且不中断作业执行。
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
+++
title = "Error Handler Strategy"
weight = 3
+++

## Log Strategy

Type: LOG

Log error and do not interrupt job.

## Throw Strategy

Type: THROW

Throw system exception and interrupt job.

## Ignore Strategy

Type: IGNORE

Ignore exception and do not interrupt job.
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
+++
title = "作业分片策略"
weight = 1
+++

## 平均分片策略

类型:AVG_ALLOCATION

根据分片项平均分片。

如果作业服务器数量与分片总数无法整除,多余的分片将会顺序的分配至每一个作业服务器。

举例说明:
1. 如果 3 台作业服务器且分片总数为9,则分片结果为:1=[0,1,2], 2=[3,4,5], 3=[6,7,8]
2. 如果 3 台作业服务器且分片总数为8,则分片结果为:1=[0,1,6], 2=[2,3,7], 3=[4,5]
3. 如果 3 台作业服务器且分片总数为10,则分片结果为:1=[0,1,2,9], 2=[3,4,5], 3=[6,7,8]

## 奇偶分片策略

类型:ODEVITY

根据作业名称哈希值的奇偶数决定按照作业服务器 IP 升序或是降序的方式分片。

如果作业名称哈希值是偶数,则按照 IP 地址进行升序分片;
如果作业名称哈希值是奇数,则按照 IP 地址进行降序分片。
可用于让服务器负载在多个作业共同运行时分配的更加均匀。

举例说明:
1. 如果 3 台作业服务器,分片总数为2且作业名称的哈希值为偶数,则分片结果为:1 = [0], 2 = [1], 3 = []
2. 如果 3 台作业服务器,分片总数为2且作业名称的哈希值为奇数,则分片结果为:3 = [0], 2 = [1], 1 = []

## 轮询分片策略

类型:ROUND_ROBIN

根据作业名称轮询分片。

0 comments on commit 5ce95d4

Please sign in to comment.