-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SUPPORT] #1852
Comments
This looks surprising to me. file listing for finding latest file versions for index lookup and writing happens in driver (concurrently within embedded-service). If the executors have trouble connecting to the driver, then executor would list them. Do you see any non-fatal exceptions when writing ? Can you also paste the timeline (listing of .hoodie folder) |
I don't see any exceptions in the driver logs or executor logs. I see these two warnings in driver logs
These are the contests of the timeline The timeline only has files from the current day but I see log files in the data folder from over a week ago, do you have any idea what might be causing so many log files |
MacBook-Pro:hudi balaji.varadarajan$ grep -c '.clean.requested' ~/Downloads/dot_hoodie_folder.txt
|
|
Sorry, I did not realize that. Let me check and get back |
We have a jira : https://issues.apache.org/jira/browse/HUDI-1015 to improve/avoid listing. I have added this case to the jira. |
Ended up creating a new jira : https://issues.apache.org/jira/browse/HUDI-1119 as this has different cause. |
I updated to master @ 743ef32 and then applied the patch you linked above. The first batch that ran had several "RunCompactionActionExecutor" I'm still consistently seeing long batches The contents of the timeline folder are now. I think the root of my issue is that I have tons of log files which don't seem to get compacted. |
Only 1 compaction.inflight now |
@ssomuah : Regarding the patch, it is meant to ensure all pending compactions are completed. Regarding the slowness, we are working on general and S3 specific performance improvements on the write side which should be part of next release : 0.6.0 |
@bvaradar I think the issue I'm facing is due to configuration, but I can't pinpoint what it is. I'm ending up with an extremely large number of files fo a single partition merge on read table. I have tens of thousands of log files which I would have thought would get compacted into parquet at some point. what volume of updates is working well for merge on read tables today? |
@ssomuah : In addition, note that inline compaction which runs serially with ingestion. We have a working PR which lets compaction run concurrently with ingestion : #1752 |
What do you mean by "runs serially with ingestion"? My understanding was that inline compaction happened in the same flow as writing so an inline compaction would simply slow down ingestion. Does INLINE_COMPACT_NUM_DELTA_COMMITS_PROP refer to the number of commits retained in general, or the number of commits for a record? I see in the timeline I have several clean.requested and clean.inflight, how can I get these to actually complete? What determines how many log files are created in each batch for a MOR table? EDIT: |
What do you mean by "runs serially with ingestion"? My understanding was that inline compaction happened in the same flow as writing so an inline compaction would simply slow down ingestion. ===> Yes, that is what I meant. Inline Compaction would run after ingestion but not in parallel. You can use #1752 to have it run concurrently. Does INLINE_COMPACT_NUM_DELTA_COMMITS_PROP refer to the number of commits retained in general, or the number of commits for a record? ==> INLINE_COMPACT_NUM_DELTA_COMMITS_PROP refers to number of ingestion (deltacommits) between 2 compaction runs. I see in the timeline I have several clean.requested and clean.inflight, how can I get these to actually complete? ==> If it is in inflight state alone, there could be errors when Hudi is trying to cleanup. Please look for exceptions in driver logs. Cleaner run should be run automatically by default. Also, any pending clean operations will automatically get picked up in next ingestion. So, it must have been failing for some reasons. You can turn on logs to see what is happening. Is it possible to force a compaction of the existing log files. ===> Yes, by configuring INLINE_COMPACT_NUM_DELTA_COMMITS_PROP. You can set it to 1 to have aggressive compaction. |
@ssomuah : Looking at the commit metadata, it is the case where your updates are spread across a large number of files. For example, in latest commit, 334 files sees updates whereas only one file is newly created due to inserts. It looks like this is the nature of your workload. If your record key has some sort of ordering, then you can initially bootstrap using "bulk-insert" which would sort and write the data in record-key order. This can potentially help reduce the number of files getting updated if each batch of writes have similar ordering. You can also try recreating the dataset with larger parquet file size and higher small file limit and async compactions (more frequent to keep the number of active log files in check). However, you are basically trying to reduce the number of files getting appended at the expense of more data getting appended to a single file. This is a general upsert problem due to the nature of your workload. |
Closing this ticket as it was answered. |
Describe the problem you faced
Write performance degrades over time
To Reproduce
Steps to reproduce the behavior:
1.Create an unpartitoned MOR table
2.Use it for a few days
Expected behavior
Write performance should not degrade over time
Environment Description
Hudi version : Master @ 3b9a305 https://github.com/apache/hudi/tree/3b9a30528bd6a6369181702303f3384162b04a7f
Spark version : 2.4.4
Hive version : N/A
Hadoop version : 2.7.3
Storage (HDFS/S3/GCS..) : ABFSS
Running on Docker? (yes/no) : no
Additional context
The MOR table has a single partition.
It's a spark streaming application with 5 minute batches.
Intially it runs and completes batches in the duration. But over time the time for batches to complete increases.
From the spark ui we can see that most of the time is being taken actually writing the files.
And looking at the thread dump of the executors they are almost always spending their time listing files.
I think the reason for this is we have an extremely high number of files in the single partition folder.
An ls on the folder is showing about 45,000 files.
The other odd thing is that when we look at the write tasks in the spark ui. There
are several tasks that seem to have tiny numbers of records in them.
We can see compaction taking place so it's not clear why we still have so many files.
The table config is
We're using our own payload class that decides what to keep based on a timestamp in the message and not latest.
Stacktrace
StackTrace of list operation where we are spending a lot of time.
sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:352)
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(AbfsHttpOperation.java:259)
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:167)
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:124)
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.services.AbfsClient.listPath(AbfsClient.java:180)
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listFiles(AzureBlobFileSystemStore.java:549)
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:628)
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.listStatus(AzureBlobFileSystemStore.java:532)
shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.listStatus(AzureBlobFileSystem.java:344)
org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
org.apache.hudi.common.fs.HoodieWrapperFileSystem.listStatus(HoodieWrapperFileSystem.java:487)
org.apache.hudi.common.fs.FSUtils.getAllLogFiles(FSUtils.java:409)
org.apache.hudi.common.fs.FSUtils.getLatestLogVersion(FSUtils.java:420)
org.apache.hudi.common.fs.FSUtils.computeNextLogVersion(FSUtils.java:434)
org.apache.hudi.common.model.HoodieLogFile.rollOver(HoodieLogFile.java:115)
org.apache.hudi.common.table.log.HoodieLogFormatWriter.(HoodieLogFormatWriter.java:101)
org.apache.hudi.common.table.log.HoodieLogFormat$WriterBuilder.build(HoodieLogFormat.java:249)
org.apache.hudi.io.HoodieAppendHandle.createLogWriter(HoodieAppendHandle.java:291)
org.apache.hudi.io.HoodieAppendHandle.init(HoodieAppendHandle.java:141)
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:197)
org.apache.hudi.table.action.deltacommit.DeltaCommitActionExecutor.handleUpdate(DeltaCommitActionExecutor.java:77)
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.handleUpsertPartition(BaseCommitActionExecutor.java:246)
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.lambda$execute$caffe4c4$1(BaseCommitActionExecutor.java:102)
org.apache.hudi.table.action.commit.BaseCommitActionExecutor$$Lambda$192/1449069739.call(Unknown Source)
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:105)
The text was updated successfully, but these errors were encountered: