We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
做了一个简单测试,输出单张表数据,共47条记录 在使用过程中,我只做了如下配置,希望是跑3个task:
"setting": { "speed": { "channel": 3 }, },
而执行结果却为:
2020-11-26 10:51:18.160 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [16] tasks. 2020-11-26 10:51:18.160 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [16] tasks.
预期是3个task,为何最终是16个task,于是继续往下深究:
// adviceNumber为channel数,假设为3 // tableNumber假设为1 // 计算后eachTableShouldSplittedNumber为3 private static int calculateEachTableShouldSplittedNumber(int adviceNumber, int tableNumber) { double tempNum = 1.0 * adviceNumber / tableNumber; return (int) Math.ceil(tempNum); }
为什么最终channel数与实际task数不同?
//最终切分份数不一定等于 eachTableShouldSplittedNumber boolean needSplitTable = eachTableShouldSplittedNumber > 1 && StringUtils.isNotBlank(splitPk); if (needSplitTable) { if (tables.size() == 1) { //原来:如果是单表的,主键切分num=num*2+1 // splitPk is null这类的情况的数据量本身就比真实数据量少很多, 和channel大小比率关系时,不建议考虑 //eachTableShouldSplittedNumber = eachTableShouldSplittedNumber * 2 + 1;// 不应该加1导致长尾(长尾:倾斜) //考虑其他比率数字?(splitPk is null, 忽略此长尾) eachTableShouldSplittedNumber = eachTableShouldSplittedNumber * 5; } // 尝试对每个表,切分为eachTableShouldSplittedNumber 份 for (String table : tables) { tempSlice = sliceConfig.clone(); tempSlice.set(Key.TABLE, table); List<Configuration> splittedSlices = SingleTableSplitUtil .splitSingleTable(tempSlice, eachTableShouldSplittedNumber); splittedConfigs.addAll(splittedSlices); } }
The text was updated successfully, but these errors were encountered:
No branches or pull requests
提升job内Channel并发有三种配置方式:
做了一个简单测试,输出单张表数据,共47条记录
在使用过程中,我只做了如下配置,希望是跑3个task:
而执行结果却为:
预期是3个task,为何最终是16个task,于是继续往下深究:
为什么最终channel数与实际task数不同?
The text was updated successfully, but these errors were encountered: