Skip to content

[SUPPORT] How to run cleaner table service on DFS source of DeltaStreamer ? #7249

@satiricr2d2

Description

@satiricr2d2

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

Posting this question here as I was not able to find answers in the documentation section.

Can I run cleaning table service on DFS parquet source of DeltaStreamer which is doing streaming ingestion of parquet files on DFS to hudi table ?

Also, wanted to know if there is a way to know the list of files on DFS that were successfully read by DeltaStreamer.

To Reproduce

Expected behavior

Environment Description

  • Hudi version : 0.12.0

  • Spark version : 3.3

  • Hive version :

  • Hadoop version :

  • Storage (HDFS/S3/GCS..) : Local

  • Running on Docker? (yes/no) : No

Additional context

Stacktrace

Add the stacktrace of the error.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions