Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tests] Add end-to-end tests for cdc connector uber jars #594

Merged
merged 4 commits into from
Nov 14, 2021

Conversation

wuchong
Copy link
Member

@wuchong wuchong commented Nov 10, 2021

We introduced flink-sql-connectors-xxx for all mysql-cdc, postgres-cdc and mongodb-cdc. This module is used to shade kafka and build an uber jar which is mainly used for SQL clusters. However, we don't have any tests for the uber jars, e.g. does the shading work? do we exclude any needed classes? This can be covered by e2e tests.

@wuchong wuchong linked an issue Nov 10, 2021 that may be closed by this pull request
@wuchong
Copy link
Member Author

wuchong commented Nov 10, 2021

This pull request introduced a new module flink-cdc-e2e-tests for end-to-end testing purpose. It startup a flink cluster using docker containers, a mysql docker container, and submit SQL job using the compiled mysql-cdc uber jars. This could verify either the uber jar contains all the needed dependencies or class conflicts.

Currently, this pull request only add e2e tests for mysql-cdc. I will add tests for other cdc connectors later.

@leonardBang leonardBang removed their request for review November 10, 2021 10:36
@wuchong wuchong force-pushed the e2e-tests branch 2 times, most recently from 0ff7ace to 466a658 Compare November 11, 2021 13:55
We have to fallback to v1.15.3 testcontainers, because Azure Pipeline always complaining the following error when starting the container:

```
04:18:43,894 [main] INFO  🐳 [jark/oracle-xe-11g-r2-cdc:0.1]                           [] - Waiting for database connection to become available at jdbc:oracle:thin:system/oracle@localhost:49165:xe using query 'SELECT 1 FROM DUAL'
04:18:43,949 [docker-java-stream--1472620850] INFO  com.ververica.cdc.connectors.e2e.OracleE2eITCase             [] - STDOUT: Starting Oracle Net Listener.
04:18:44,102 [docker-java-stream--1472620850] INFO  com.ververica.cdc.connectors.e2e.OracleE2eITCase             [] - STDOUT: Starting Oracle Database 11g Express Edition instance.
04:18:58,842 [docker-java-stream--1472620850] INFO  com.ververica.cdc.connectors.e2e.OracleE2eITCase             [] - STDOUT:
04:22:44,218 [main] ERROR 🐳 [jark/oracle-xe-11g-r2-cdc:0.1]                           [] - Could not start container
java.lang.IllegalStateException: Container is started, but cannot be accessed by (JDBC URL: jdbc:oracle:thin:system/oracle@localhost:49165:xe), please check container logs
```

According to this issue[1], this might be a problem of Oracle testcontainers v1.16 working with "wnameless/oracle-xe-11g-r2".
A better solution might be upgrade our Oracle base image to "gvenzl/oracle-xe" which is the default image of Oracle testcontainers v1.16.

[1]: testcontainers/testcontainers-java#4297
Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wuchong for the great work, the PR looks good totally, I only left some minor comments

Comment on lines 88 to 90
if (!result) {
checkResult(expectedResult, table, fields);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A check with timeout exception is better than a final check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. A check here can print the mimatched results which is much more meaningful than timeout exception.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, I noticed the timeout check logic may problem, if the first attempt we didn't pull enough data(i.e the upstream program is still writing), the check will fail due to we call checkResult at first time in and the timeout is useless?

import java.util.List;

/** End-to-end tests for mysql-cdc connector uber jar. */
public class MySqlE2eITCase extends FlinkContainerTestEnvironment {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think end to end test is different with integration test, how about rename it to MySqlEndToEndTest ?
and we use tests package prefix in Flink, here we use e2e, do we need to keep the same style?

Copy link
Member Author

@wuchong wuchong Nov 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only unit-test and integration-test phase in maven-surefire-plugin, no end-to-end notion in it. We matches *Test.java suffix as unit test, and *ITCase.java as integration-test. So we have to name it ITCase.

You can also see that e2e tests in flink also name like SQLClientSchemaRegistryITCase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me, how about the package prefix? Do you think we need to keep align with Flink project

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will use tests package

" 'password' = '" + MYSQL_TEST_PASSWORD + "',",
" 'database-name' = '" + mysqlInventoryDatabase.getDatabaseName() + "',",
" 'table-name' = 'products_source',",
" 'server-id' = '5800-5900',",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the server-id range is to large here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any problem to use a large range?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this large range looks like we need 100 server ids in e2e test, but actually 4 server ids is enough

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But it should still works even we set a large range, right? I think many users will set a large range, because they may change parallelism frequently.

@wuchong
Copy link
Member Author

wuchong commented Nov 14, 2021

Thanks @leonardBang , I have addressed the comments.

Copy link
Contributor

@leonardBang leonardBang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wuchong for the update, LGTM

@wuchong wuchong merged commit 61ee775 into apache:master Nov 14, 2021
@wuchong wuchong deleted the e2e-tests branch November 14, 2021 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[test] Add end-to-end tests for all uber jars
2 participants