Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature][connector][fake] Support mutil splits for fake source connector #2974

Merged
merged 21 commits into from
Oct 14, 2022

Conversation

liugddx
Copy link
Member

@liugddx liugddx commented Oct 2, 2022

close #2961

Purpose of this pull request

feature Support more than splits and parallelism for fake connector.

Check list

@liugddx liugddx marked this pull request as draft October 2, 2022 23:04
@liugddx liugddx marked this pull request as ready for review October 2, 2022 23:41

public FakeSourceReader(SingleSplitReaderContext context, FakeDataGenerator randomData) {
public FakeSourceReader(SourceReader.Context context, FakeDataGenerator randomData) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public FakeSourceReader(SourceReader.Context context, FakeDataGenerator randomData) {
public FakeSourceReader(SourceReader.Context context, FakeDataGenerator fakeDataGenerator) {

Copy link
Member

@TyrantLucifer TyrantLucifer Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename randomData to fakeDataGenerator?

private final Map<Integer, Set<FakeSourceSplit>> pendingSplits;
private final Integer totalRowNum;

public FakeSourceSplitEnumerator(SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext, Integer totalRowNum) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public FakeSourceSplitEnumerator(SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext, Integer totalRowNum) {
public FakeSourceSplitEnumerator(SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext, int totalRowNum) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the review comment in FakeSourceSplitEnumerator, this parameter maybe need to be removed.

private static final Logger LOG = LoggerFactory.getLogger(FakeSourceSplitEnumerator.class);
private final SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext;
private final Map<Integer, Set<FakeSourceSplit>> pendingSplits;
private final Integer totalRowNum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final Integer totalRowNum;
private final int totalRowNum;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the review comment in FakeSourceSplitEnumerator, this parameter maybe need to be removed.

@Data
@AllArgsConstructor
public class FakeSourceSplit implements SourceSplit {
private Integer rowNum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private Integer rowNum;
private int rowNum;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the review comment in FakeSourceSplitEnumerator, this parameter maybe need to be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed?

LOG.info("Starting to calculate splits.");
int numReaders = enumeratorContext.currentParallelism();

if (null != totalRowNum) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using the parameter row.size to control the row num of each reader not the total number? The disadvantage of slicing now is that the user needs to calculate how many data will be generated by each parallel task during configuration, our original intention is to make user configuration simpler, the user only needs to configure how many data will be generated by each slice, so that the slice only needs one parameter id, then only need to use the id to take the balance of the parallelism when the slice is assigned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fake should be simple,so a simple sharding approach is needed here, and the configuration file does not need to be changed.

@@ -57,9 +58,9 @@ private SeaTunnelRow randomRow() {
return new SeaTunnelRow(randomRow.toArray());
}

public List<SeaTunnelRow> generateFakedRows() {
public List<SeaTunnelRow> generateFakedRows(int rowNum) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the review comment in FakeSourceSplitEnumerator, this method maybe need to revert because every split has the same row num.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert?

@liugddx liugddx closed this Oct 4, 2022
@liugddx liugddx deleted the improve-supportparallelism branch October 4, 2022 08:07
@liugddx liugddx restored the improve-supportparallelism branch October 4, 2022 08:10
@liugddx liugddx reopened this Oct 4, 2022
@liugddx
Copy link
Member Author

liugddx commented Oct 5, 2022

The reason why CI error is #2984 , please review. @TyrantLucifer @EricJoy2048

@Data
@AllArgsConstructor
public class FakeSourceSplit implements SourceSplit {
private Integer rowNum;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed?

@@ -57,9 +58,9 @@ private SeaTunnelRow randomRow() {
return new SeaTunnelRow(randomRow.toArray());
}

public List<SeaTunnelRow> generateFakedRows() {
public List<SeaTunnelRow> generateFakedRows(int rowNum) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert?

@@ -51,7 +51,17 @@ public SeaTunnelRowType getProducedType() {
}

@Override
public AbstractSingleSplitReader<SeaTunnelRow> createReader(SingleSplitReaderContext readerContext) throws Exception {
public SourceSplitEnumerator<FakeSourceSplit, FakeSourceState> createEnumerator(SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext) throws Exception {
return new FakeSourceSplitEnumerator(enumeratorContext, fakeConfig.getRowNum());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return new FakeSourceSplitEnumerator(enumeratorContext, fakeConfig.getRowNum());
return new FakeSourceSplitEnumerator(enumeratorContext, fakeConfig);


@Override
public SourceSplitEnumerator<FakeSourceSplit, FakeSourceState> restoreEnumerator(SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext, FakeSourceState checkpointState) throws Exception {
return new FakeSourceSplitEnumerator(enumeratorContext, fakeConfig.getRowNum());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return new FakeSourceSplitEnumerator(enumeratorContext, fakeConfig.getRowNum());
return new FakeSourceSplitEnumerator(enumeratorContext, fakeConfig);

FakeSourceSplit split = splits.poll();
if (null != split) {
// Generate a random number of rows to emit.
List<SeaTunnelRow> seaTunnelRows = fakeDataGenerator.generateFakedRows(split.getRowNum());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
List<SeaTunnelRow> seaTunnelRows = fakeDataGenerator.generateFakedRows(split.getRowNum());
List<SeaTunnelRow> seaTunnelRows = fakeDataGenerator.generateFakedRows();

LOG.info("Starting to calculate splits.");
int numReaders = enumeratorContext.currentParallelism();
for (int i = 0; i < numReaders; i++) {
allSplit.add(new FakeSourceSplit(rowNum, i));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allSplit.add(new FakeSourceSplit(rowNum, i));
allSplit.add(new FakeSourceSplit(i));

@@ -46,7 +47,7 @@ public void testComplexSchemaParse(String conf) throws FileNotFoundException, UR
SeaTunnelRowType seaTunnelRowType = seaTunnelSchema.getSeaTunnelRowType();
FakeConfig fakeConfig = FakeConfig.buildWithConfig(testConfig);
FakeDataGenerator fakeDataGenerator = new FakeDataGenerator(seaTunnelSchema, fakeConfig);
List<SeaTunnelRow> seaTunnelRows = fakeDataGenerator.generateFakedRows();
List<SeaTunnelRow> seaTunnelRows = fakeDataGenerator.generateFakedRows(fakeConfig.getRowNum());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
List<SeaTunnelRow> seaTunnelRows = fakeDataGenerator.generateFakedRows(fakeConfig.getRowNum());
List<SeaTunnelRow> seaTunnelRows = fakeDataGenerator.generateFakedRows(fakeConfig);

@liugddx
Copy link
Member Author

liugddx commented Oct 5, 2022

please review again @TyrantLucifer


public FakeSourceReader(SingleSplitReaderContext context, FakeDataGenerator randomData) {
public FakeSourceReader(SourceReader.Context context, FakeDataGenerator randomData) {
Copy link
Member

@TyrantLucifer TyrantLucifer Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename randomData to fakeDataGenerator?


@Override
public String splitId() {
return splitId.toString();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return splitId.toString();
return String.valueOf(splitId);

@TyrantLucifer TyrantLucifer changed the title feature Support more than splits and parallelism for fake connector [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector Oct 5, 2022
@Override
public void handleNoMoreSplits() {
noMoreSplit = true;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove useless blank line.

TyrantLucifer
TyrantLucifer previously approved these changes Oct 6, 2022
Copy link
Member

@TyrantLucifer TyrantLucifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, cc @EricJoy2048

@EricJoy2048
Copy link
Member

Please fix the CI error

@liugddx
Copy link
Member Author

liugddx commented Oct 6, 2022

Please fix the CI error

Error: JobMasterTest.testHandleCheckpointTimeout:128 » ConditionTimeout Assertion con...

Maybe time setting is too short?

@EricJoy2048
Copy link
Member

EricJoy2048 commented Oct 7, 2022

Please fix the CI error

CI error will be fixed by #3009. I will retry the CI after #3009 merged.

@EricJoy2048
Copy link
Member

Please fix the CI error

CI error will be fixed by #3009. I will retry the CI after #3009 merged.

You need merge the dev branch to your branch first and then push it .

@liugddx liugddx requested review from EricJoy2048 and TyrantLucifer and removed request for EricJoy2048 and TyrantLucifer October 8, 2022 04:59
@liugddx
Copy link
Member Author

liugddx commented Oct 8, 2022

rerun CI Thanks. @EricJoy2048

@EricJoy2048
Copy link
Member

Please wait #3019 merged. The CI error because #3019.

@EricJoy2048
Copy link
Member

Some bug we must fix to resolve the CI problems. Please wait for us.

@liugddx
Copy link
Member Author

liugddx commented Oct 10, 2022

rerun CI please. @EricJoy2048

EricJoy2048
EricJoy2048 previously approved these changes Oct 12, 2022
Copy link
Member

@EricJoy2048 EricJoy2048 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liugddx
Copy link
Member Author

liugddx commented Oct 12, 2022

@ashulin help to review .

@ashulin ashulin changed the title [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector [feature][connector][fake] Support mutil splits for fake source connector Oct 12, 2022
ashulin
ashulin previously approved these changes Oct 13, 2022

private final SourceReader.Context context;
private final Deque<FakeSourceSplit> splits = new LinkedList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final Deque<FakeSourceSplit> splits = new LinkedList<>();
private final Queue<FakeSourceSplit> splits = new LinkedList<>();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused deque api?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused deque api?

cc @ashulin

liugddx and others added 3 commits October 13, 2022 12:24
…e/seatunnel/connectors/seatunnel/fake/source/FakeSourceReader.java

Co-authored-by: hailin0 <hailin088@gmail.com>
… improve-supportparallelism

# Conflicts:
#	seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSource.java
#	seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceReader.java
#	seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplitEnumerator.java
Copy link
Member

@EricJoy2048 EricJoy2048 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@hailin0
Copy link
Member

hailin0 commented Oct 13, 2022

+1

@Hisoka-X Hisoka-X merged commit c28c44b into apache:dev Oct 14, 2022
@liugddx liugddx deleted the improve-supportparallelism branch October 14, 2022 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improve][Connector-V2][Fake] Support more than splits and parallelism for fake connector .
6 participants