[SUPPORT]Best way to ingest a large number of tables

Dear,

We're new to Hudi and we would like to know which the best way is to ingest a large number of tables. For example, in the production environment, we have about 70 mysql databases with >1000 tables in total. We'd prefer them all to be ingested with continuous mode in the spirit of data lake. 

 - option 1, deltastreamer : each table requires a single deltastreamer, so it consumes too much resource. (Is it possible to submit multiple deltastreamer into 1 spark context?) 
 - option 2, multitabledeltastreamer: it doesn't support MOR yet, which is our preferred format. 
 - option 3, write our own data ingestion logic with java-client, but it takes some time.

Do you have any suggestions?
Thanks,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT]Best way to ingest a large number of tables #3344

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[SUPPORT]Best way to ingest a large number of tables #3344

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions