Support parallel reads from a single source. #7749

cgardens · 2021-11-08T17:57:04Z

Tell us about the problem you're trying to solve

Not to be confused with this issue: #4081, here we are trying to make it possible to replicate multiple streams for a single source at the same time. i.e. if a source has two tables, both of those tables might be replicated in paralell.

hpias · 2022-05-02T17:42:54Z

One should only ever split the source load to multiple connections to have different sync schedules for different streams, never because of parallelism.
This should probably be a setting that is connection based, or even schedule based. Workers, Sources and destinations shouldnt care about it. Strictly orchestration/scheduler implementaion reaponsibility. Eg split configured streams in number of configured ques (connection parallelism/ max parallel streams), assign to worker. Or even better, Que only a single stream x parallel workers at once to account for the difference in stream loads to achieve round robin if the cost of instantiation of a single stream worker is not too high.
The scheduling optimizer could be further optimized by either using stats or changing the source spec to include estimated row counts or similar useful metrics for the stream weighting factor.
Normalization should be optionally postponed if it affects workers (not sure how it is implemented) for successful EL streams. The idea is to be able to scale in the k8s pool once EL has completed as DBT is destination workload.

toandm · 2022-09-05T09:33:19Z

This will be a great addition to Airbyte. I currently sync hundreds of table from a MySQL source. It takes ages to load them all in.

blake-enyart · 2022-09-07T23:09:24Z

@cgardens are there any updates on progress on this issue? Is this coming in tandem with the per-stream sync update or afterwards?

luancaarvalho · 2023-01-05T00:10:02Z

Any news ?

Jordonkopp · 2023-03-15T15:44:18Z

Any updates on this feature request?

ggam · 2023-04-23T14:08:00Z

This is much needed as the workaround is to maintain multiple connections, splitting streams between them, which increases maintenance effort.

samsipe · 2023-11-09T21:02:38Z

+1 on this. Would be a huge help!

mcivorsteiner · 2023-11-16T23:01:07Z

+1 on this, this is big issue for us currently

cgardens added type/enhancement New feature or request area/platform issues related to the platform labels Nov 8, 2021

cgardens mentioned this issue Nov 8, 2021

[MOVE EPIC] Parallel Replication #7750

Open

sherifnada added the area/connectors Connector related issues label Nov 15, 2021

bleonard added the team/platform-move label Apr 16, 2022

bleonard added the frozen Not being actively worked on label Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support parallel reads from a single source. #7749

Support parallel reads from a single source. #7749

cgardens commented Nov 8, 2021

hpias commented May 2, 2022

toandm commented Sep 5, 2022

blake-enyart commented Sep 7, 2022

luancaarvalho commented Jan 5, 2023

Jordonkopp commented Mar 15, 2023

ggam commented Apr 23, 2023

samsipe commented Nov 9, 2023

mcivorsteiner commented Nov 16, 2023

Support parallel reads from a single source. #7749

Support parallel reads from a single source. #7749

Comments

cgardens commented Nov 8, 2021

Tell us about the problem you're trying to solve

hpias commented May 2, 2022

toandm commented Sep 5, 2022

blake-enyart commented Sep 7, 2022

luancaarvalho commented Jan 5, 2023

Jordonkopp commented Mar 15, 2023

ggam commented Apr 23, 2023

samsipe commented Nov 9, 2023

mcivorsteiner commented Nov 16, 2023