Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receiver optimisation: possible to further short-circuit early recursion stop check? #3571

Open
mharvey-jt opened this issue Apr 9, 2024 · 5 comments

Comments

@mharvey-jt
Copy link
Contributor

Consider the directory structure:

TOP
└── L1
    ├── 1
    ├── 2
    └── 3

with a nested catalog at each directory; and a transaction that adds TOP/L1/4 with a lease path of TOP/L1/4.
DiffRec will make the early recursion stop check

catalog::DirectoryEntryBase::Differences diff =
old_entry.CompareTo(new_entry);
if ((diff == catalog::DirectoryEntryBase::Difference::kIdentical) &&
old_entry.IsNestedCatalogMountpoint()) {
// Early recursion stop if nested catalogs are identical
shash::Any id_nested_from, id_nested_to;
id_nested_from = old_catalog_mgr_->GetNestedCatalogHash(old_path);
id_nested_to = new_catalog_mgr_->GetNestedCatalogHash(new_path);
assert(!id_nested_from.IsNull() && !id_nested_to.IsNull());
if (id_nested_from == id_nested_to) continue;
}

for all directories 1,2,3. This is a relatively slow operation, and can be a bottleneck when L1 has very many nested catalogs.

Would it be possible to further short-circuit this check based on whether the path being considered is in the lease path (IsReportablePath()==true?)

@jblomer
Copy link
Member

jblomer commented Apr 11, 2024

I think that's a good idea! If I'm not mistaken, TOP/L1/{1,2,3} are even ignored paths, not only unreportable paths. For ignored paths, we know that the recursion will stop and that we never report them. Looks like it is safe to immediately move on when old_path or new_path should be ignored (while still moving the iterators i_from, i_to accordingly).

@mharvey-jt
Copy link
Contributor Author

Does #3577 do the job?

@mharvey-jt
Copy link
Contributor Author

perhaps the check can even move outside of that clause, to line 185?

@jblomer
Copy link
Member

jblomer commented Apr 11, 2024

I think it does the job. Yes, it should be fine to move it up. And it is sufficient to check for old_path or new_path because we already know that they are identical.

We can also, I think, make a similar optimization for the ReportAddition and ReportRemoval branches. We then may want to remove the IsIgnoredPath() check at the beginning of the method.

@koallen
Copy link

koallen commented Apr 12, 2024

I think it does the job. Yes, it should be fine to move it up. And it is sufficient to check for old_path or new_path because we already know that they are identical.

We can also, I think, make a similar optimization for the ReportAddition and ReportRemoval branches. We then may want to remove the IsIgnoredPath() check at the beginning of the method.

I agree, the changes would essentially replace the check at the start of the function as an even early recursion stop. Provided it's implemented correctly (i.e. as you mentioned do it for ReportAdditional and ReportRemoval as well)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants