Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip nitzan pglog ec getattr error #47332

Merged
merged 2 commits into from Jan 24, 2023

Conversation

NitzanMordhai
Copy link
Contributor

@NitzanMordhai NitzanMordhai commented Jul 28, 2022

For 'copy object' in EC pool, when the target copy does not exist,
getattr_maybe_cache returns ENODATA. That will cause ref_count to wildcard tag (backward capability),
which might lead pglog to have refcounts for each copy, and to grow quickly.

Fixes: https://tracker.ceph.com/issues/56707
Signed-off-by: Nitzan Mordechai nmordec@redhat.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@NitzanMordhai NitzanMordhai requested a review from a team as a code owner July 28, 2022 12:31
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-pglog-ec-getattr-error branch 2 times, most recently from 1309a0d to dcd40c5 Compare July 28, 2022 12:37
@NitzanMordhai
Copy link
Contributor Author

jenkins test make check

1 similar comment
@NitzanMordhai
Copy link
Contributor Author

jenkins test make check

@ronen-fr
Copy link
Contributor

I'd suggest to rephrase the PR/commit message. Maybe:
For 'copy object' in EC pool, when the target copy does not exist,
getattr_maybe_cache returns ENODATA. That will cause ref_count to wildcard tag (??),
which might lead pglog to have refcounts for each copy, and to grow quickly.

(hope I got the meaning correctly)

@NitzanMordhai
Copy link
Contributor Author

I'd suggest to rephrase the PR/commit message. Maybe:
For 'copy object' in EC pool, when the target copy does not exist,
getattr_maybe_cache returns ENODATA. That will cause ref_count to wildcard tag (??),
which might lead pglog to have refcounts for each copy, and to grow quickly.

Yes, that sounds better, will update it

@athanatos
Copy link
Contributor

I also don't understand the "ref_count to wildcard tag" portion of the commit message.

@athanatos
Copy link
Contributor

Fix LGTM though.

@NitzanMordhai
Copy link
Contributor Author

jenkins test make check

Create set of unit-test for erasure code pools

Fixes: https://tracker.ceph.com/issues/56707
Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>
…st object

In case of copy object when the target copy is not exist with erasure code pool
getattr_maybe_cache will return ENODATA that will cause ref_count to wildcard tag
that can affect pglog to grow quickly with refcounts for each copy

Fixes: https://tracker.ceph.com/issues/56707
Signed-off-by: Nitzan Mordechai <nmordec@redhat.com>
@NitzanMordhai NitzanMordhai force-pushed the wip-nitzan-pglog-ec-getattr-error branch from dcd40c5 to f060683 Compare August 10, 2022 14:19
@NitzanMordhai
Copy link
Contributor Author

jenkins test make check arm64

@NitzanMordhai
Copy link
Contributor Author

jenkins test windows

@athanatos
Copy link
Contributor

Is there a better explanation somewhere of how rgw uses this xattr? Why exactly does returning the wrong error code here cause another xattr to increase in size without bound?

@NitzanMordhai
Copy link
Contributor Author

Is there a better explanation somewhere of how rgw uses this xattr? Why exactly does returning the wrong error code here cause another xattr to increase in size without bound?

read_refcount is checking the return code from cls_cxx_getxattr, in case that the copy is not existing we will get ENODATA (before the change) and return 0, cls_rc_refcount_read\get\put will check that return code and reset the refcount.
After the fix, we will get return code of -ENOENT and will not call set_refcount (just like replica pools).

@athanatos athanatos self-requested a review September 14, 2022 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants