Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-24444: compactor.Cleaner should not set state 'mark cleaned' if there are obsolete files in the FS #1716

Closed
wants to merge 8 commits into from

Conversation

klcopp
Copy link
Contributor

@klcopp klcopp commented Nov 30, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

See HIVE-24444

How was this patch tested?

Unit test

@klcopp
Copy link
Contributor Author

klcopp commented Nov 30, 2020

@pvargacl would you mind taking a look too?

conf.set(ValidTxnList.VALID_TXNS_KEY, validTxnList.writeToString());
ValidReaderWriteIdList validWriteIdList = new ValidReaderWriteIdList();
Path locPath = new Path(location);
AcidUtils.Directory dir = AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, validWriteIdList,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could pass an empty dirSnapshot to the first getAcidState in removeFiles, that will fill up the snapshot, and you can you reuse it here, so it won't make a second listing on the FS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks!

Path locPath = new Path(location);
AcidUtils.Directory dir = AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, validWriteIdList,
Ref.from(false), false, dirSnapshots);
return dir.getObsolete().size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider aborts as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not, there can be aborts with higher txnId than compaction, this will see those, but the cleaner will never see them, so it would never finish its job

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aborted directories are deleted by removeFiles() too, and there can be obsolete directories with a higher txnId than this compaction too. The point of this change is to call markCleaned() when the table is completely cleaned, not necessarily when this compaction's cleaning is done. I figured that would be better than calling markCleaned() when some obsolete or aborted file gets deleted. (see HIVE-24444 description)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking the aborted is a very problematic in upstream, if you have transaction coming in continuously and some of them aborting cleaner won't be able to mark any compaction cleaned. Example:
txn1-10 coming in some of them writing, txnid=8 is aborting
compaction is kicking in, compacts the deltas, will have the next_txn_id for example txnId=13
other transactions are coming in txnid=18 is aborting
cleaner will start when everything is done under 13, it can clean everything. but since txnId=14 is open it will not see txnId=18, so when it checks if can mark itself cleaned, it will not do it, because the check sees txnId=18 aborted.
Second compaction kicks in, will have next_txn_id for example txnId=23
other transactions are coming in txnid=28 is aborting.
Again cleaner can not mark compaction 1 or compaction 2 cleaned, as none of them see 28 and can not clean it. this will pile up and Cleaner only mark them cleaned, if finally there is an iteration, when there was no new aborted transaction.
If you check only the obsoleted files, you only miss marking the compaction cleaned, when you should have, if you have two overlapping compaction, but in that case the second compaction will get marked as cleaned and the next cleaner run, compaction 1 will be marked as cleaned too. This too can cause problems if you enabled delayed cleaner, but probably less

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I see what you mean. I removed the aborted files from the total.

In general do you think checking for obsolete files is better than checking whether we removed any files?
Case 1: Assuming HIVE-23107 etc. are present in the version?
Case 2: Assuming HIVE-23107 etc. are not present in the version?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In upstream it would cause problems, for the delayed cleaner it is necessary that every compaction entry cleans only its own obsoletes. The default is every compaction is delayed by 15 minutes, so it is much more likely when a cleaning job finally runs, there are already some other jobs in the queue waiting. all of them contains highestwriteId saved, and they will clean only up until that. You should not merge those.

Copy link
Member

@deniskuzZ deniskuzZ Dec 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can merge stuff above retention period in this case, but that's another story not relevant to this pull request

Copy link
Member

@deniskuzZ deniskuzZ Dec 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked, but if folder is considered as obsolete when there is a base / minor compacted delta shadowing it - that's ok.
But i don't see benefits of this patch for upstream (contains HIVE-23107), am I missing something? Also it's gonna violate the idea of delayed cleaner that is coupled with a specific compaction request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this isn't necessary upstream, this change is for versions without HIVE-23107 etc. But I don't want to hurt upstream functionality with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's only for versions without HIVE-23107, should it be behind the feature flag/schema version check?

*/
private boolean removeFiles(String location, ValidWriteIdList writeIdList, CompactionInfo ci)
throws IOException, NoSuchObjectException, MetaException {
Path locPath = new Path(location);
FileSystem fs = locPath.getFileSystem(conf);
Map<Path, AcidUtils.HdfsDirSnapshot> dirSnapshots = AcidUtils.getHdfsDirSnapshots(fs, locPath);
AcidUtils.Directory dir = AcidUtils.getAcidState(locPath.getFileSystem(conf), locPath, conf, writeIdList, Ref.from(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be replaced with above fs variable

Copy link
Member

@deniskuzZ deniskuzZ Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would happen if there is a long running read-only txn + multiple compaction attempts? are we going to have multiple records in a queue for the same table/partition pending cleanup?

Copy link
Contributor Author

@klcopp klcopp Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not with HIVE-24291.
Without HIVE-24291 (which might not be usable if for example if HMS schema changes are out the question) we still could have a pileup of the same table/partition in "ready for cleaning" in the queue.
Without this change (HIVE-24444) some of them might not be deleted.
The goal of this change is that, when the table does get cleaned, all of the records will be deleted.

*/
private int getNumEventuallyObsoleteDirs(String location, Map<Path, AcidUtils.HdfsDirSnapshot> dirSnapshots)
throws IOException {
ValidTxnList validTxnList = new ValidReadTxnList();
Copy link
Member

@deniskuzZ deniskuzZ Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we consider all writes as valid here? Shouldn't we limit at least by compactor txnId?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compactor txnid is stored since HIVE-24291. This patch is for versions without that change, or any recent changes to the hms schema.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we limit by CQ_NEXT_TXN_ID ?
Do we need this change in upstream master if HIVE-24291 is already present ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't need this change if HIVE-24291 is present. But if HIVE-24291 is present it shouldn't hurt.
CQ_NEXT_TXN_ID is stored since HIVE-23107, which also has hms schema changes.

Copy link
Contributor Author

@klcopp klcopp Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if HIVE-24291 is present it shouldn't hurt.

This isn't necessarily true, since as @pvargacl noted there could be aborts since the compaction happened.
The question here is whether this change or HIVE-24314 is worse for users if HIVE-23107 (remove MIN_HISTORY_LEVEL) etc. are not present on their version.

@klcopp
Copy link
Contributor Author

klcopp commented Dec 4, 2020

I will close this because HIVE-24403 is making HIVE-23107 etc. backwards compatible so this change will not be needed.

@klcopp klcopp closed this Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants