-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: add metrics for FS hang IOs #308
Conversation
da6d57c
to
e707b14
Compare
need rebase 🤣 |
pkg/metrics/serve.go
Outdated
"github.com/containerd/nydus-snapshotter/pkg/manager" | ||
"github.com/containerd/nydus-snapshotter/pkg/metrics/collector" | ||
"github.com/containerd/nydus-snapshotter/pkg/metrics/exporter" | ||
) | ||
|
||
// Default period to determine a hang IO. | ||
const defaultHangIOPeriod = 10 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think interval
is better than period
to express
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have replaced period
with interval
.
pkg/metrics/serve.go
Outdated
} | ||
|
||
func (s *Server) StartCollectMetrics(ctx context.Context) error { | ||
// TODO(renzhen): make collect interval time configurable | ||
timer := time.NewTicker(time.Duration(1) * time.Minute) | ||
// The timer period is same as the period for determining hang IOs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar mistakes, it should be is the same
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. 😊
f4a5d65
to
0cf593f
Compare
@changweige Is there any way to rerun the test by myself? 😌 |
2b90df1
to
bd4701c
Compare
No. It's Github limitation. The action must be triggered to rerun by maintainers. Or you can force-push the PR to trigger it to run again |
Ok, got it. |
hungIOMap := make(map[string]uint64) | ||
recordedHungIOMap := make(map[uint64]uint64) | ||
nowTime := time.Now() | ||
for _, daemonInflightIOMetrics := range i.MetricsVec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please briefly introduce the algorithm here that catches the hung IO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better add a comment line to introduce and elaborate the algorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/metrics/collector/fs.go
Outdated
} | ||
i.RecordedMetrics = recordedHungIOMap | ||
for opcode, value := range hungIOMap { | ||
data.TotalHungIO.WithLabelValues(opcode).Add(float64(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metric should be Gauge. Hung io can disappear when the backend remote request is responded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I have replaced it with Gauge
.
pkg/metrics/collector/fs.go
Outdated
12: "OP_RENAME", | ||
13: "OP_LINK", | ||
14: "OP_OPEN", | ||
15: "OP_READ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to copy all FUSE op code definitions here? Only counting on READ should meet requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If only READ
operation is available. The Label
is unnecessary.
bd4701c
to
3664518
Compare
Add the metric TotalHangIO to record the total hang IO counts of FS. The inflight IOs data are from nydus daemon API /api/v1/metrics/inflight. Signed-off-by: Bin Tang <tangbin.bin@bytedance.com>
3664518
to
e270c1e
Compare
Add the metric TotalHangIO to record the total hang IO counts of FS. The inflight IOs data are from nydus daemon API
/api/v1/metrics/inflight
.Output:
Signed-off-by: Bin Tang tangbin.bin@bytedance.com