Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src/mon/ConnectionTracker.cc: Fix dump function #57146

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kamoltat
Copy link
Member

@kamoltat kamoltat commented Apr 30, 2024

Problem:

Currently, the ConnectionTracker::dump()
will dump a duplicate key which is not
ideal when you want to write a test that
converts the dump into a JSON since
each key must be unique. Basically without resolving this,
you cannot do something like:
json.load(connection_score) because the dump is not a valid JSON format

e.g.,

ceph daemon mon.a connection scores dump

peer is a duplicate key.

"peer_scores": {
    "peer": {
        "peer_rank": 0,
        "peer_score": 0.99966364610282044,
        "peer_alive": true
    },
    "peer": {
        "peer_rank": 2,
        "peer_score": 0.9941396935402752,
        "peer_alive": false
    }
}

report is a duplicate key

      "report": {
            "rank": 0,
            "epoch": 6,
            "version": 3940,
            "peer_scores": {...}
        },
        "report": {
            "rank": 1,
            "epoch": 6,
            "version": 345,
            "peer_scores": {...}
        },
        "report": {
            "rank": 2,
            "epoch": 6,
            "version": 84,
            "peer_scores": {...}
        }

Solution:
Use open_array_section and convert
peer_scores and reports into an
array instead.

peer is no longer a duplicate.

"peer_scores": [
   {
        "peer_rank": 0,
        "peer_score": 0.99966364610282044,
        "peer_alive": true
    },
    {
        "peer_rank": 2,
        "peer_score": 0.9941396935402752,
        "peer_alive": false
    }
]

report is no longer a duplicate

"reports": [
       {
            "rank": 0,
            "epoch": 6,
            "version": 3940,
            "peer_scores": {...}
        },
       {
            "rank": 1,
            "epoch": 6,
            "version": 345,
            "peer_scores": {...}
        },
        {
            "rank": 2,
            "epoch": 6,
            "version": 84,
            "peer_scores": {...}
        }
]

Test:
Added some tests to verify that our connection score is clean and to also verify that we can actually load the dump into a JSON format.

Fixes: https://tracker.ceph.com/issues/65695

Signed-off-by: Kamoltat ksirivad@redhat.com

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Problem:

Currently, the ConnectionTracker::dump()
will dump a duplicate key which is not
ideal when you want to write a test that
converts the dump into a JSON since
JSON objects are key-value pairs where
each key must be unique.

Solution:
Use open_array_section and convert
`peer_scores` and `reports` into an
array instead.

Fixes: https://tracker.ceph.com/issues/65695

Signed-off-by: Kamoltat <ksirivad@redhat.com>
@kamoltat kamoltat requested a review from a team as a code owner April 30, 2024 04:41
@kamoltat kamoltat self-assigned this Apr 30, 2024
@kamoltat
Copy link
Member Author

kamoltat commented Apr 30, 2024

teuthology run on the newly added integration test:
/ksirivad-2024-04-30_06:05:19-rados:singleton-wip-ksirivad-fix-connection-score-json-distro-default-smithi/

1/1 Passed

@kamoltat
Copy link
Member Author

jenkins test api

@kamoltat
Copy link
Member Author

@kamoltat kamoltat force-pushed the wip-ksirivad-fix-connection-score-json branch 2 times, most recently from d0f55ef to 143fe58 Compare May 2, 2024 20:18
Copy link
Contributor

@ronen-fr ronen-fr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from a questionable case (see comments)

src/mon/ConnectionTracker.cc Show resolved Hide resolved
qa/tasks/ceph_test_case.py Outdated Show resolved Hide resolved
return
else:
if elapsed >= timeout:
if check_fn and check_fn() and retry_count < 5:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_fn is allowed to be null, right? I am not sure that's handled correctly

Copy link
Member Author

@kamoltat kamoltat May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it's allowed to be null, basically, the check_fn part allows the function to keep looping if we determine that the actual thing that we are checking is making progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was trying to ask:
suppose there's no check_fn. I wanted to make sure you really meant to ignore the ' retry_count'
(I guess that makes sense, anyway).

qa/tasks/mon_connection_score.py Outdated Show resolved Hide resolved
Basically when we deploy a 3 MONS

Check if the connection scores are clean
with a 60 seconds grace period

Fixes: https://tracker.ceph.com/issues/65695

Signed-off-by: Kamoltat <ksirivad@redhat.com>
@kamoltat kamoltat force-pushed the wip-ksirivad-fix-connection-score-json branch from 143fe58 to 842f8b3 Compare May 9, 2024 21:12
@kamoltat
Copy link
Member Author

kamoltat commented May 9, 2024

@ronen-fr all of your comments should be addressed in the new commit, let me know what you think! Thank you so much for your review!

Copy link
Contributor

@ronen-fr ronen-fr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note my followup question)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants