Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: load data statement, separate data preparing routine and commit routine #11533

Merged
merged 9 commits into from Sep 16, 2019

Conversation

cfzjywxk
Copy link
Contributor

@cfzjywxk cfzjywxk commented Jul 30, 2019

What problem does this PR solve?

experiments diff on
#11517

What is changed and how it works?

load data statement

  • separate data preparing routine and commit routine(maybe could concurrent commit within multiple txns in the future)
  • tests on different workloads

bench workload1

sysbench schema
1m rows = 185MB
10m rows = 1.9GB

batch size = 20k

data mode rt(sec) batch size(row) prepare(us) commit(us) avg prepare(us) avg commit(us)
1m concurrent 27.56 20k 5578125 27436263 111562 548725
1m concurrent 28.93 20k 5646682 28809710 112933 576194
1m concurrent 28.59 20k 6077118 28429610 121542 568592
1m sequential 33.56 20k 5295292 28251475 105905 565029
1m sequential 31.64 20k 5221046 26403697 104420 528073
1m sequential 32.74 20k 5463142 27259847 109262 545196

batch size = 200k

data mode rt(sec) batch size(row) prepare(us) commit(us) avg prepare(us) avg commit(us)
1m concurrent 29.41 200k 5740404 28639015 1148080 5727803
1m sequential 33.87 200k 5580477 28238641 1116095 5647728

batch size = 4k

data mode rt(sec) batch size(row) prepare(us) commit(us) avg prepare(us) avg commit(us)
1m concurrent 35.79 4k 5653526 35739674 22614 142958
1m sequential 37.52 4k 5349413 32139162 21397 128556

data size = 10m

data mode rt(sec) batch size(row) prepare(us) commit(us) avg prepare(us) avg commit(us)
10m concurrent 299 20k 62993307 299338296 125986 598676
10m concurrent 308 20k 60778661 308309945 121557 616619
10m concurrent 310 20k 60284459 309862211 120568 619724
10m sequential 371 20k 64699475 306366299 129398 612732
10m sequential 348 20k 62781449 285804457 125562 571608
10m sequential 346 20k 61517956 285356229 123035 570712

bench workload2

this workload task generator routine and commit routine cooperate well, producing and committing seem concurrently progressing. Master branch do all these things sequentially

CREATE TABLE `model` (
  `pk` int(11) DEFAULT NULL,
  `col1` int(11) DEFAULT NULL,
  `col2` int(11) DEFAULT NULL,
  ...
  `col99` int(11) DEFAULT NULL,
  KEY `pk` (`pk`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

column content = 99999999

100columns 1m rows = 852MB
500columns 1m rows = 4.2GB

data mode rt(sec) batch size(row) prepare(us) commit(us) avg prepare(us) avg commit(us)
1m concurrent 80.44 20k 55995202 79229461 1119904 1584589
1m concurrent 78.07 20k 55592061 76927103 1111841 1538542
1m concurrent 83.41 40k 55236614 81160118 2209464 3246404
1m sequential 121.60 20k 50197243 71241453 1003944 1424829

500columns

data mode rt(sec) batch size(row) prepare(us) commit(us) avg prepare(us) avg commit(us)
1m concurrent 457.73 20k 268403660 452392863 5368073 9047857
1m concurrent 392.51 20k 264366292 387069215 5287325 7741384
1m sequential 589.52 20k 236941284 351913873 4738825 7038277
1m sequential 585.56 20k 237462383 347632453 4749247 6952649

bench workload3

wide table with 100 columns (all columns with BLOB or TEXT fields) , big column

work queue size 1000(max wait job numbers)
( txn limit 100MB)

schema like

CREATE TABLE `tin_n_10wb` (
  `pk` int(11) DEFAULT NULL,
  `col1` blob DEFAULT NULL,
  `col2` text DEFAULT NULL,
  ...
 `col95` blob DEFAULT NULL,
 `col96` text DEFAULT NULL,
  ...

column content like
len("G2jOVOMOg3Cnub2jCuXtn0AO1V5hONfLftt0cBHX74itPKMm2uVj1Ax1GfykLLR6PxGclMhPRQxoCe8ZvYI096goa6CsxKHiBvpl7SOp0ax5vl1QNYUZq2x2") = 120

100k rows = 1.2GB

data mode rt(sec) batch size(row) prepare(us) commit(us) avg prepare(us) avg commit(us)
100k concurrent 24.70 5k(raftstore.raft-max-inflight-msgs) 22384533 23709265 1119226 1185463
100k sequential 43.15 5k(reconfiguring raftstore.raft-max-inflight-msgs) 20503449 22651435 1025172 1132571
100k concurrent 21.87 1k 20900265 21574010 209002 215740
100k sequential 51.21 1k 19181370 31973015 191813 319730
100k sequential 108.33 1k 19770424 88490433 197704 884904
100k concurrent 23.52 200 20573994 23393318 41147 46786
100k sequential 40.94 200 19083036 21767787 38166 43535

some conclusions up to now

  • greater avg prepare and commit time cost in concurrent mode compared with sequential processing.
  • costs, seperate processing will consume more memory resources during processing. Sequential processing some row buffer could be reused.
  • batch size will impact performance, default 20k batch size seems good on most situations
  • workload3 batch size = 5k and batch size = 1k have much different performance, batch size = 5k
    many error logs in like

from monitor data on grafana, we see prewrite on kv is quit slow, the root cause should be the prewrite request is quite slow(32sec on 5k batch size)

tikv/tikv#5279
has some explanations on prewrite slow on tikv executing workload3(big rowsize)
this could be solved by increase tikv raftstore.raft-max-inflight-msgs configuration(default -> 4096), allow more un ack raft messages in queue, after changing this config,

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change

Side effects

  • Increased code complexity

Related changes

@cfzjywxk
Copy link
Contributor Author

/run-all-tests

executor/load_data.go Outdated Show resolved Hide resolved
@ngaut
Copy link
Member

ngaut commented Jul 31, 2019

Could you do a test for many columns(such as 100 columns) ?

@cfzjywxk
Copy link
Contributor Author

Could you do a test for many columns(such as 100 columns) ?

en. will construct such testing scenarios, and profiling defailed cpu/memory usage

@cfzjywxk
Copy link
Contributor Author

cfzjywxk commented Aug 1, 2019

@jackysp @ngaut PTAL

@ngaut ngaut removed their request for review August 1, 2019 05:38
@cfzjywxk cfzjywxk requested a review from zz-jason August 1, 2019 08:15
@cfzjywxk
Copy link
Contributor Author

cfzjywxk commented Aug 1, 2019

/run-all-tests

executor/load_data.go Outdated Show resolved Hide resolved
// EnqOneTask feed one batch commit task to commit routine
func (e *LoadDataInfo) EnqOneTask() {
if e.curBatchCnt > 0 {
logutil.BgLogger().Info("tried to enqueue one task current len", zap.Uint64("len", e.curBatchCnt))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make the committed batch size configurable. It is better to set it to debug level.

executor/load_data.go Outdated Show resolved Hide resolved
executor/load_data.go Outdated Show resolved Hide resolved
// InitQueues initialize task queue and error report queue
func (e *LoadDataInfo) InitQueues() {
e.commitTaskQueue = make(chan commitTask, 1000)
e.commitErrQueue = make(chan error, 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make(chan error)

executor/load_data.go Outdated Show resolved Hide resolved
server/conn.go Outdated
}
}()
// start commit worker routine
go loadDataInfo.CommitRoutine(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make the load process into a separate goroutine and keep the commit process in this goroutine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed PTAL

server/conn.go Outdated Show resolved Hide resolved
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix ci.

@codecov
Copy link

codecov bot commented Aug 5, 2019

Codecov Report

Merging #11533 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #11533   +/-   ##
===========================================
  Coverage   81.2905%   81.2905%           
===========================================
  Files           454        454           
  Lines         98720      98720           
===========================================
  Hits          80250      80250           
  Misses        12725      12725           
  Partials       5745       5745

@cfzjywxk cfzjywxk requested a review from jackysp August 5, 2019 14:06
@cfzjywxk
Copy link
Contributor Author

cfzjywxk commented Aug 5, 2019

/run-all-tests

@@ -286,6 +286,9 @@ const (

// TiDBEnableNoopFuncs set true will enable using fake funcs(like get_lock release_lock)
TiDBEnableNoopFuncs = "tidb_enable_noop_functions"

// TiDBLoadDataSeqProcess set true will make load data process and commit sequentially in one routine
TiDBLoadDataSeqProcess = "tidb_load_data_seq_process"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some tests for it.

defer func() {
r := recover()
if r != nil {
buf := make([]byte, 4096)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zap log library has a Stack function to get stack tracing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roger, changed

logutil.Logger(ctx).Error("commit error refresh", zap.Error(err))
return err
}
// this only set in sequential mode, e.rows buffer will be reused in sequential mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is sequential mode?

return err
}
}
logutil.Logger(ctx).Debug("one task enqueued ", zap.Int("current queue len", len(e.commitTaskQueue)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debug message.

break
}
tasks++
logutil.Logger(ctx).Debug("commit one task finished",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

zap.Int("pending tasks count", len(e.commitTaskQueue)))
} else {
end = true
logutil.Logger(ctx).Info("commit work all finished",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

server/conn.go Outdated
}
if err = loadDataInfo.Ctx.StmtCommit(); err != nil {
return nil, err
if enqTask {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to use enqTask argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

param removed

server/conn.go Outdated
defer func() {
r := recover()
if r != nil {
buf := make([]byte, 4096)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use zap library instead.

@ngaut
Copy link
Member

ngaut commented Aug 21, 2019

Please do not force push your commit. Just commit is fine.

server/conn.go Outdated Show resolved Hide resolved
executor/load_data.go Show resolved Hide resolved

// these fields are used for pipeline data prepare and commit
commitTaskQueue chan CommitTask
QuitCommit chan struct{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do it by only using one signal channel.

executor/load_data.go Show resolved Hide resolved
server/conn.go Show resolved Hide resolved
@cfzjywxk
Copy link
Contributor Author

cfzjywxk commented Sep 5, 2019

ping @lysu @jackysp PTAL thx

@cfzjywxk
Copy link
Contributor Author

cfzjywxk commented Sep 5, 2019

/run-unit-test

@jackysp jackysp added the status/can-merge Indicates a PR has been approved by a committer. label Sep 16, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 16, 2019

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Sep 16, 2019

@cfzjywxk merge failed.

@jackysp
Copy link
Member

jackysp commented Sep 16, 2019

/rebuild

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT1 Indicates that a PR has LGTM 1. type/performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants