-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Harness: Support for pseudo-parallel streams #26219
Performance Harness: Support for pseudo-parallel streams #26219
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
3f07f68
to
b42380c
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@@ -4,3 +4,8 @@ Performance harness for destination connectors. | |||
|
|||
This component is used by the `/connector-performance` GitHub action and is used in order to test throughput of | |||
destination connectors on a number of datasets. | |||
|
|||
Associated files are: | |||
<li>Main.java - the main entrypoint for the harness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -26,16 +28,23 @@ public class Main { | |||
private static final String CREDENTIALS_PATH = "secrets/%s_%s_credentials.json"; | |||
|
|||
public static void main(final String[] args) { | |||
// If updating args for Github Actions, also update the run-performance-test.yml file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
...mance/destination-harness/src/main/resources/catalogs/destination-snowflake/10m_catalog.json
Show resolved
Hide resolved
...n-harness/src/main/java/io/airbyte/integrations/destination_performance/PerformanceTest.java
Outdated
Show resolved
Hide resolved
Ryan, how do we know the random logic works? Is it worth adding a quick test? |
I did a manual test here and confirmed it creates the correct number of streams. Also got the list of tables by looking at the Snowflake connection. Did you want to have a unit test added for brevity? EDIT: quick sniff test is too see when Snowflake flushes which streams it's flushes and whether it gets up to 11 (the number of streams allocated for this run) |
Yes. Probably a good idea to avoid regressions. |
.../destination-harness/src/main/java/io/airbyte/integrations/destination_performance/Main.java
Show resolved
Hide resolved
.../destination-harness/src/main/java/io/airbyte/integrations/destination_performance/Main.java
Show resolved
Hide resolved
...rmance/destination-harness/src/main/resources/catalogs/destination-snowflake/1m_catalog.json
Outdated
Show resolved
Hide resolved
.../destination-harness/src/main/java/io/airbyte/integrations/destination_performance/Main.java
Show resolved
Hide resolved
#### Note: The following `dataset=` values are supported: `1m`<sub>(default)</sub>, `10m`, `20m`, `bottleneck_stream1` | ||
> :runner: ${{github.event.inputs.connector}} https://github.com/${{github.repository}}/actions/runs/${{github.run_id}} | ||
#### Note: The following `dataset=` values are supported: `1m`<sub>(default)</sub>, `10m`, `20m`, `bottleneck_stream1`. | ||
For destination performance only: you can also use `stream-numbers=N` to simulate N number of parallel streams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryankfu Is this ok?
This comment was marked as outdated.
This comment was marked as outdated.
/connector-performance connector=connectors/destination-snowflake Note: The following
|
/connector-performance connector=connectors/destination-snowflake ref=ryan/parallel-stream-performance Note: The following
|
…ame configured catalog
Moved text up to connect with other argument discussion
d477b11
to
5c2cd98
Compare
/approve-and-merge reason="not in critical path and only affect performance harness, also automake is borked" |
This comment was marked as outdated.
This comment was marked as outdated.
/connector-performance connector=connectors/destination-snowflake ref=ryan/parallel-stream-performance Note: The following
|
* Adds support for pseudo-parallel datasets * Ran ./gradlew :spotlessJavaApply * Automated Change * Fixes issue with parallel datasets credentials * Fixes filter for parallel credentials * Adds a new configurable property to build a pseudo-parallel catalog * Fixes Github Actions variable to be processed properly with the K8s harness yaml * Adds unit test for random streams and generating streams within the same configured catalog * Ran ./gradlew :spotlessJavaApply * Added additional description for GitHub Actions * Update connector-performance-command.yml Moved text up to connect with other argument discussion * Fixes spotBugs issue * Automated Commit - Formatting Changes * Update GitHub Action description --------- Co-authored-by: ryankfu <ryankfu@users.noreply.github.com> Co-authored-by: Rodi Reich Zilberman <867491+rodireich@users.noreply.github.com>
…6219) * Adds support for pseudo-parallel datasets * Ran ./gradlew :spotlessJavaApply * Automated Change * Fixes issue with parallel datasets credentials * Fixes filter for parallel credentials * Adds a new configurable property to build a pseudo-parallel catalog * Fixes Github Actions variable to be processed properly with the K8s harness yaml * Adds unit test for random streams and generating streams within the same configured catalog * Ran ./gradlew :spotlessJavaApply * Added additional description for GitHub Actions * Update connector-performance-command.yml Moved text up to connect with other argument discussion * Fixes spotBugs issue * Automated Commit - Formatting Changes * Update GitHub Action description --------- Co-authored-by: ryankfu <ryankfu@users.noreply.github.com> Co-authored-by: Rodi Reich Zilberman <867491+rodireich@users.noreply.github.com>
What
Introduced pseudo-parallel streams to mock the behavior of CDC (Change Data Capture)
How
Uses the same dataset but uses the same catalog across the same dataset and uses a random function to select which Stream to add metadata to when generating the
AirbyteRecordMessage
Recommended reading order
PerformanceTest.java
- Added random stream assignment for each recordMain.java
- Added ability to duplicate the stream catalog to mock having multiple streamsrun-harness-process.yaml
- Adds configurability for new parameter to have multiple streamsconnector-performance-command.yaml
- Adds new parameter for Github Actions🚨 User Impact 🚨
No breaking changes
For connector PRs, use this section to explain which type of semantic versioning bump occurs as a result of the changes. Refer to our Semantic Versioning for Connectors guidelines for more information. Breaking changes to connectors must be documented by an Airbyte engineer (PR author, or reviewer for community PRs) by using the Breaking Change Release Playbook.
If there are breaking changes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.
Pre-merge Actions
Expand the relevant checklist and delete the others.
New Connector
Community member or Airbyter
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.0.0.1
Dockerfile
has version0.0.1
README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog with an entry for the initial version. See changelog exampledocs/integrations/README.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
Updating a connector
Community member or Airbyter
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
Connector Generator
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes