Skip to content

Commit

Permalink
Add sharding scaling english documents
Browse files Browse the repository at this point in the history
  • Loading branch information
KomachiSion committed Mar 25, 2020
1 parent 0975300 commit 4fb8f19
Show file tree
Hide file tree
Showing 7 changed files with 142 additions and 8 deletions.
8 changes: 4 additions & 4 deletions sharding-scaling/README.md
Expand Up @@ -8,7 +8,7 @@ The following figure may clearly express this component's role:

Supplementary instruction about the figure:

1. Support to migrate whole tables in sharding configuration only, can't support to migrate specified tables.
1. Support to migrate tables in shardingRule configuration only, can't support to migrate specified tables.

2. The process of migration splits into two steps, inventory data migration and incremental data synchronization.

Expand All @@ -26,10 +26,10 @@ Sharding-Proxy: 3.x ~ 5.x

## How to Run

Refer to the [Quick Start](./src/resources/document/manual/quick-start.cn.md)
Refer to the [Quick Start](./src/resources/document/manual/quick-start.en.md)

## For more documents

[Overview](./src/resources/document/features/_index.cn.md)
[Overview](./src/resources/document/features/_index.en.md)

[Principle](./src/resources/document/features/principle.cn.md)
[Principle](./src/resources/document/features/principle.en.md)
8 changes: 4 additions & 4 deletions sharding-scaling/README_ZH.md
Expand Up @@ -8,7 +8,7 @@

关于图片的补充说明:

1. 只支持迁移整个数据库,暂不支持迁移指定的表。
1. 只支持迁移在shardingRule中配置的表,暂不支持迁移指定的表。
2. 迁移过程分为两步——历史数据迁移和实时数据迁移。
- 在历史数据迁移过程中,Sharding-Scaling使用 `select *`语句去获取数据,使用`insert`语句将数据迁移到目标库中;
- 在实时数据迁移过程中,Sharding-Scaling使用binlog来迁移数据,并且在迁移前会标记下binlog位置。
Expand All @@ -23,11 +23,11 @@ Sharding-Proxy:3.x ~ 5.x

## 如何运行

参考[快速入门](./src/resources/Quick%20Start_zh.md)
参考[快速入门](./src/resources/document/manual/quick-start.cn.md)

## 更多文档

[管理指南](./src/resources/Admin%20Guide_zh.md)
[概览](./src/resources/document/features/_index.cn.md)

[ShardingScaling架构](./src/resources/Architecture_zh.md)
[实现原理](./src/resources/document/features/principle.cn.md)

2 changes: 2 additions & 0 deletions sharding-scaling/src/resources/document/features/_index.cn.md
Expand Up @@ -19,6 +19,8 @@ chapter = true

ShardingScaling是一个提供给用户的通用的ShardingSphere数据接入迁移,及弹性伸缩的解决方案。

**4.1.0**开始向用户提供。

![结构总揽](../img/scaling-overview.cn.png)

## 挑战
Expand Down
41 changes: 41 additions & 0 deletions sharding-scaling/src/resources/document/features/_index.en.md
@@ -0,0 +1,41 @@
+++
pre = "<b>3.7. </b>"
title = "Scaling"
weight = 7
chapter = true
+++

## Background

The storage and computing ability of stand-alone database is limited. For improving these abilities, ShardingSphere provides sharding capability, which can distribute data across different databases.

For applications that have been running with stand-alone database, there is a problem how to migrate data to sharding data nodes safely and simply.

And for some applications which have used ShardingSphere, the rapid growth of data may also cause a single data node or even the entire data nodes to reach a bottleneck.
How to expand their data nodes for ShardingSphere cluster also became a problem.

## Introduction

ShardingScaling is a common solution for migrating data to ShardingSphere or scaling data in ShardingSphere since **4.1.0**.

![Scaling Overview](../img/scaling-overview.en.png)

## Challenges

ShardingSphere provides users with great freedom in sharding strategies and algorithms, but it gives a great challenge to scaling.
So it's the first challenge that how to find a way can support kinds of sharding strategies and algorithms and scale data nodes efficiently.

What's more, During the scaling process, it should not affect the running applications.
So It is a other big challenge for scaling to reduce the time window of data unavailability during the scaling as much as possible, or even completely unaware.

Finally, scaling should not affect the existing data. How to ensure the availability and correctness of data is the third challenge of scaling.

## Goal

The main design goal of sharding scaling is providing a common ShardingSphere scaling solution which can support kinds of sharding strategies and reduce the impact as much as possible during scaling.

## Status

current is in alpha development.

![Roadmap](../img/roadmap.en.png)
22 changes: 22 additions & 0 deletions sharding-scaling/src/resources/document/features/concept.en.md
@@ -0,0 +1,22 @@
+++
pre = "<b>3.7.1. </b>"
toc = true
title = "Core Concept"
weight = 1
+++

## Scaling Job

It refers one complete process of scaling data from old sharding rules to new sharding rule.

## Data Node

Same as the [Data Node](/en/features/sharding/concept/sql/) in sharding/SQL.

## Inventory Data

It refers all existing data stored in data nodes before the scaling job started.

## Incremental Data

It refers the new data generated by application during scaling job.
@@ -0,0 +1,10 @@
+++
pre = "<b>3.7.2. </b>"
toc = true
title = "Core Features"
weight = 2
+++

1. Migrate data from single datasource to ShardingSphere when first using ShardingSphere.
2. Data node expansion or shrinkage for ShardingSphere.
3. Change sharding strategy for ShardingSphere.
59 changes: 59 additions & 0 deletions sharding-scaling/src/resources/document/features/principle.en.md
@@ -0,0 +1,59 @@
+++
pre = "<b>3.7.3. </b>"
toc = true
title = "Principle"
weight = 3
+++

## Principle

Consider about these challenges of ShardingScaling, the solution is: Use two database clusters temporarily, and switch after the scaling is completed.

![Scaling Principle Overview](../img/scaling-principle-overview.en.png)


Advantages:

1. No effect for origin data during scaling.
2. No risk for scaling failure.
3. No limited by sharding strategies.

Disadvantages:

1. Redundant servers during scaling.
2. All data needs to be moved.

ShardingScaling will analyze the sharding rules and extract information like datasource and data nodes.
According the sharding rules, ShardingScaling create a scaling job with 4 main phases.

1. Preparing Phase.
2. Inventory Phase.
3. Incremental Phase.
4. Switching Phase.

![Workflow](../img/workflow.en.png)

### Preparing Phase

ShardingScaling will check the datasource connectivity and permissions, statistic the amount of inventory data, record position of log, shard tasks based on amount of inventory data and the parallelism set by the user.

### Inventory Phase

Executing the Inventory data migration tasks sharded in preparing phase.
ShardingScaling uses JDBC to query inventory data directly from data nodes and write to the new cluster using new rules.

### Incremental Phase

The data in data nodes is still changing during the inventory phase, so ShardingScaling need to synchronize these incremental data to new data nodes.
Different databases have different implementations, but generally implemented by change data capture function based on replication protocols or WAL logs.

- MySQL:subscribe and parse binlog.
- PostgreSQL:official logic replication [test_decoding](https://www.postgresql.org/docs/9.4/test-decoding.html).

These captured incremental data, ShardingSphere also write to the new cluster using new rules.

### Switching Phase

In this phase, there may be a temporary read only time, make the data in old data nodes static so that the incremental phase complete fully.
The read only time is range seconds to minutes, it depends on the amount of data and the checking data.
After finished, ShardingSphere can switch the configuration by register-center and config-center, make application use new sharding rule and new data nodes.

0 comments on commit 4fb8f19

Please sign in to comment.