Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Log files are not compacted #2771

Closed
stackfun opened this issue Apr 5, 2021 · 3 comments
Closed

[SUPPORT] Log files are not compacted #2771

stackfun opened this issue Apr 5, 2021 · 3 comments
Assignees

Comments

@stackfun
Copy link

stackfun commented Apr 5, 2021

Describe the problem you faced

Sometimes, log files with only upserts are not compacted in MOR table.
The first image shows the compacted parquet files, note that it was created March 30th.
image

The second image has the log files, which were created after the 30th, but they are never compacted. In our use case, we have a lot of small random upserts.
image

Here's our hudi configuration.

options = {
        "hoodie.table.name": table_name,
        "hoodie.datasource.write.hive_style_partitioning": True,
        "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator",
        "hoodie.datasource.write.operation": "upsert",
        "hoodie.datasource.write.partitionpath.field": "field_1,field2",
        "hoodie.datasource.write.recordkey.field": "sha256",
        "hoodie.datasource.write.table.name": table_name,
        "hoodie.datasource.write.table.type": "MERGE_ON_READ",
        "hoodie.datasource.compaction.async.enable": True,
        "hoodie.index.type": "SIMPLE",  
        "hoodie.compact.inline": True,  
        "hoodie.clean.async": True, 
        'hoodie.clean.automatic': True,
        "hoodie.simple.index.input.storage.level": "DISK_ONLY", 
        "hoodie.write.status.storage.level": "DISK_ONLY",
        'hoodie.cleaner.commits.retained': 2,
        "hoodie.compact.inline.max.delta.commits": "16",   
        "hoodie.logfile.data.block.max.size": 1024 * 1024 * 8, # Workaround for https://github.com/apache/hudi/issues/2692
        "hoodie.logfile.max.size": 1024 * 1024 * 8, # Workaround for https://github.com/apache/hudi/issues/2692
        "hoodie.memory.merge.fraction": "0.75", # default is 0.6, allocate more memory for merging
    }

To Reproduce

Currently trying to reproduce with a small example, but not successful yet.

Expected behavior

Compaction on this file group running

Environment Description

@stackfun stackfun changed the title [SUPPORT] [SUPPORT] Log files are not compacted Apr 6, 2021
@nsivabalan
Copy link
Contributor

I see you have enabled async compaction. If you enable inline compaction, does it work?

@n3nash
Copy link
Contributor

n3nash commented Apr 7, 2021

@stackfun I see that you have some conflicting configs

        "hoodie.datasource.compaction.async.enable": True,
        "hoodie.index.type": "SIMPLE",  
        "hoodie.compact.inline": True,  

You have enabled inline as well as async at the same time. Can you please enable one of them depending on if you want async or inline.

Additionally, I see that you have set the num delta commits before compaction kicks in to 16

hoodie.compact.inline.max.delta.commits : "16"

This means compaction will not kick in until you do 16 delta commits. Additionally, even after that, the default CompactionPolicy is Bounded https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java#L97

which will compact at max 500GB https://github.com/apache/hudi/blob/master/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java#L94.

Are you by any change writing more than 500GB during your 16 delta commits ? If yes, some log files may not get compacted.

@n3nash n3nash self-assigned this Apr 7, 2021
@stackfun
Copy link
Author

stackfun commented Apr 7, 2021

Setting the "hoodie.compaction.target.io" config worked like a charm. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants