-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Closed
Labels
area:ingestIngestion into HudiIngestion into Hudipriority:highSignificant impact; potential bugsSignificant impact; potential bugs
Description
Dear,
We're new to Hudi and we would like to know which the best way is to ingest a large number of tables. For example, in the production environment, we have about 70 mysql databases with >1000 tables in total. We'd prefer them all to be ingested with continuous mode in the spirit of data lake.
- option 1, deltastreamer : each table requires a single deltastreamer, so it consumes too much resource. (Is it possible to submit multiple deltastreamer into 1 spark context?)
- option 2, multitabledeltastreamer: it doesn't support MOR yet, which is our preferred format.
- option 3, write our own data ingestion logic with java-client, but it takes some time.
Do you have any suggestions?
Thanks,
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:ingestIngestion into HudiIngestion into Hudipriority:highSignificant impact; potential bugsSignificant impact; potential bugs