Skip to content

[SUPPORT]Best way to ingest a large number of tables #3344

@ziudu

Description

@ziudu

Dear,

We're new to Hudi and we would like to know which the best way is to ingest a large number of tables. For example, in the production environment, we have about 70 mysql databases with >1000 tables in total. We'd prefer them all to be ingested with continuous mode in the spirit of data lake.

  • option 1, deltastreamer : each table requires a single deltastreamer, so it consumes too much resource. (Is it possible to submit multiple deltastreamer into 1 spark context?)
  • option 2, multitabledeltastreamer: it doesn't support MOR yet, which is our preferred format.
  • option 3, write our own data ingestion logic with java-client, but it takes some time.

Do you have any suggestions?
Thanks,

Metadata

Metadata

Assignees

Labels

area:ingestIngestion into Hudipriority:highSignificant impact; potential bugs

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions