Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pace metering package #8

Closed
wants to merge 4 commits into from
Closed

Add pace metering package #8

wants to merge 4 commits into from

Conversation

xlab
Copy link

@xlab xlab commented Jun 2, 2022

This change is Reviewable

@xlab xlab requested a review from wojtek-coreum June 2, 2022 12:14
Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @wojtek-coreum and @xlab)

a discussion (no related file):
We need all this code (371 lines + unit tests) to just print the number of transactions per second executed concurrently?


Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @wojtek-coreum)

a discussion (no related file):

Previously, wojtek-coreum (Wojtek) wrote…

We need all this code (371 lines + unit tests) to just print the number of transactions per second executed concurrently?

  1. the code was smaller before you asked to make use of atomic
  2. the code wasn't there, instead, it was a small dependency included with 1 extra line.

So yea, now we need all that to print the number of transactions per second executed concurrently.

Also, what's the matter of saving disc space by shredding extra code lines?


Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @wojtek-coreum and @xlab)

a discussion (no related file):

Previously, xlab (Max Kupriianov) wrote…
  1. the code was smaller before you asked to make use of atomic
  2. the code wasn't there, instead, it was a small dependency included with 1 extra line.

So yea, now we need all that to print the number of transactions per second executed concurrently.

Also, what's the matter of saving disc space by shredding extra code lines?

It's not about disk space obviously, it's about understanding this and number of bugs which is proportional to the number of lines.
I'm very sure logic we need might be implemented in less than 50 lines, just in a function inside the zstress package in the main repo.


Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @wojtek-coreum and @xlab)

a discussion (no related file):

Previously, wojtek-coreum (Wojtek) wrote…

It's not about disk space obviously, it's about understanding this and number of bugs which is proportional to the number of lines.
I'm very sure logic we need might be implemented in less than 50 lines, just in a function inside the zstress package in the main repo.

Before even starting reviewing this I will wait for other guys opinion


Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reviewing without context is really hard. and I think we should make it a rule that enough context must be provided in the description of the PR about what it tries to achieve for example it adds a feature then functionality does the PR provides.

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @wojtek-coreum and @xlab)

Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many functions and variables that we are not actively using in coreum. please remove those. for example 2 different types of reporters (log and zap), pause function. allowing float and int (we only need int). Also I prefer using atmic package instead of sync package.

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @wojtek-coreum and @xlab)

Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will block this for now.

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @wojtek-coreum and @xlab)

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miladz68 If you want to start a discussion in the PR, please use "Start a new discussion" button below. Here, in the main thread reviewable publishes info about commits and reviews so your comments might be lost in the history

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @wojtek-coreum and @xlab)

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 2 unresolved discussions (waiting on @wojtek-coreum and @xlab)

a discussion (no related file):
@xlab what are the things we are going to measure in corezstress? I know about transactions per second being broadcasted. Anything else?


Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 3 unresolved discussions (waiting on @wojtek-coreum and @xlab)

a discussion (no related file):

Description of this package

Adding the description there in PR, so the git history is not covered with junk from README editing. If this PR is merged, a standalone README could be added for the future reference.

Main idea

Pace is a Go package that helps to answer one simple question: how fast it goes? It's a threadsafe counter that measures ticks in the specified timeframe. It also has a simple and intuitive interface:

func New(label string, interval time.Duration, repFn ReporterFunc) Pace

type Pace interface {
    // Step increments the counter of pace.
    Step(n int)
    // Pause stops reporting until resumed, all steps continue to be counted.
    Pause()
}

Initially the package was a standalone external dependency, however for the project needs I was asked to include it into the source, so I've copied some (well-tested) code and also made modifications to ensure that the package doesn't provide extra functionality that we don't use. Package provides ZapReporter for the only supported logger that we use.

Usage examples

The package being used in a currently pending PR related to the corezstress utility. The utility itself is a CLI app, where without any logging the user doesn't get any idea what the app does at the moment. With a few additional lines of code that setups the pace instance, and ticking it once an event happens, logs will contain info about certain actions pacing (see output examples).

signedTxPace := pace.New("signed tx", 10*time.Second, pace.ZapReporter(log))

			defer func() {
				signedTxPace.Pause()
			}()


					case result := <-results:
						signedTxPace.Step(1)

I think the package in this form served me well over years, to add some verbosity into random places in many tools and also as a debugging mechanism to detect stale flows (yes, the tool also reports whether the pacing becomes stale).

Important difference from the other reporting implementations like in PR #44 is that with a standalone package user is abstracted away from implementing the counter every time with risk of adding concurrency-related and other possible bugs. Also, this drop-in makes code much easier to read.

Output examples

INFO 329923 signed tx in 10s (pace: 32992.300/s)
INFO 333805 signed tx in 10s (pace: 33380.500/s)
INFO 333717 signed tx in 10s (pace: 33371.700/s)
INFO 333716 signed tx in 10s (pace: 33371.600/s)
INFO 334649 signed tx in 10s (pace: 33464.900/s)
INFO 334445 signed tx in 10s (pace: 33444.500/s)
INFO 336089 signed tx in 10s (pace: 33608.900/s)
INFO 333331 signed tx in 10s (pace: 33333.100/s)
INFO 329395 signed tx in 10s (pace: 32939.500/s)
INFO 3200 sent tx in 10s (pace: 320.000/s)
INFO 3600 sent tx in 10s (pace: 360.000/s)
INFO 3200 sent tx in 10s (pace: 320.000/s)
INFO 3428 sent tx in 10s (pace: 342.800/s)
INFO 2527 sent tx in 10s (pace: 252.700/s)
INFO 2145 sent tx in 10s (pace: 214.500/s)
INFO 2844 sent tx in 10s (pace: 284.400/s)
INFO 2432 sent tx in 10s (pace: 243.200/s)

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 6 files at r2.
Reviewable status: 3 of 6 files reviewed, 3 unresolved discussions (waiting on @wojtek-coreum and @xlab)

Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 3 unresolved discussions (waiting on @miladz68, @wojtek-coreum, and @xlab)

a discussion (no related file):

Previously, xlab (Max Kupriianov) wrote…

Description of this package

Adding the description there in PR, so the git history is not covered with junk from README editing. If this PR is merged, a standalone README could be added for the future reference.

Main idea

Pace is a Go package that helps to answer one simple question: how fast it goes? It's a threadsafe counter that measures ticks in the specified timeframe. It also has a simple and intuitive interface:

func New(label string, interval time.Duration, repFn ReporterFunc) Pace

type Pace interface {
    // Step increments the counter of pace.
    Step(n int)
    // Pause stops reporting until resumed, all steps continue to be counted.
    Pause()
}

Initially the package was a standalone external dependency, however for the project needs I was asked to include it into the source, so I've copied some (well-tested) code and also made modifications to ensure that the package doesn't provide extra functionality that we don't use. Package provides ZapReporter for the only supported logger that we use.

Usage examples

The package being used in a currently pending PR related to the corezstress utility. The utility itself is a CLI app, where without any logging the user doesn't get any idea what the app does at the moment. With a few additional lines of code that setups the pace instance, and ticking it once an event happens, logs will contain info about certain actions pacing (see output examples).

signedTxPace := pace.New("signed tx", 10*time.Second, pace.ZapReporter(log))

			defer func() {
				signedTxPace.Pause()
			}()


					case result := <-results:
						signedTxPace.Step(1)

I think the package in this form served me well over years, to add some verbosity into random places in many tools and also as a debugging mechanism to detect stale flows (yes, the tool also reports whether the pacing becomes stale).

Important difference from the other reporting implementations like in PR #44 is that with a standalone package user is abstracted away from implementing the counter every time with risk of adding concurrency-related and other possible bugs. Also, this drop-in makes code much easier to read.

Output examples

INFO 329923 signed tx in 10s (pace: 32992.300/s)
INFO 333805 signed tx in 10s (pace: 33380.500/s)
INFO 333717 signed tx in 10s (pace: 33371.700/s)
INFO 333716 signed tx in 10s (pace: 33371.600/s)
INFO 334649 signed tx in 10s (pace: 33464.900/s)
INFO 334445 signed tx in 10s (pace: 33444.500/s)
INFO 336089 signed tx in 10s (pace: 33608.900/s)
INFO 333331 signed tx in 10s (pace: 33333.100/s)
INFO 329395 signed tx in 10s (pace: 32939.500/s)
INFO 3200 sent tx in 10s (pace: 320.000/s)
INFO 3600 sent tx in 10s (pace: 360.000/s)
INFO 3200 sent tx in 10s (pace: 320.000/s)
INFO 3428 sent tx in 10s (pace: 342.800/s)
INFO 2527 sent tx in 10s (pace: 252.700/s)
INFO 2145 sent tx in 10s (pace: 214.500/s)
INFO 2844 sent tx in 10s (pace: 284.400/s)
INFO 2432 sent tx in 10s (pace: 243.200/s)

@xlab Thanks for the description. it is nice, and is more complete than what I expected. a shorter description is still acceptable but what you have done is even nicer.


Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 6 files at r2, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @xlab)


pkg/pace/pace.go line 15 at r2 (raw file):

	Step(n int)
	// Pause stops reporting until resumed, all steps continue to be counted.
	Pause()

The original PR where we have the use case (https://reviewable.io/reviews/CoreumFoundation/coreum/44), we only need step and stop function. pause only makes sense when there is resume, but we don't have that use case.


pkg/pace/pace.go line 41 at r2 (raw file):

}

func (p *paceImpl) Pause() {

I think this should be renamed to stop.


pkg/pace/pace.go line 42 at r2 (raw file):

func (p *paceImpl) Pause() {
	p.timer.Stop()

we should stop reporting loop, and make sure it is stopped. then we will not need mutex, will make the struct and code simpler.

oliverasistemas
oliverasistemas previously approved these changes Jun 6, 2022
Copy link
Contributor

@oliverasistemas oliverasistemas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @xlab)

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @xlab)

a discussion (no related file):
@xlab Please don't add tags to the repository, pointing to commits which are not in master. Potential users should not be pointed to the code which didn't pass review.


Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @xlab)


pkg/pace/pace.go line 79 at r2 (raw file):

	}

	go p.reportingLoop()

All the goroutines in the project are started and managed by parallel package (https://github.com/CoreumFoundation/coreum-tools/tree/master/pkg/parallel).
Please read README there and have a look at documentation: https://pkg.go.dev/github.com/ridge/parallel
Take a look at example how it is used by benchmark tool: https://github.com/CoreumFoundation/coreum/blob/master/coreznet/pkg/zstress/stress.go#L49
If you have any questions please ping me.

parallel is used to run goroutines because it handles errors, panics, synchronization and logging, implemented once for entire project.
So instead of using go keyword you should create a function accepting argument of type parallel.SpawnFn and this function should be used to spawn the goroutine.
Also you should accept ctx context.Context as another argument and exit reporting loop once this context is cancelled, so function Stop() and Pause() are not needed.
Then you should get logger by calling log := logger.Get(ctx) because ctx contains correct logger preconfigured by parallel (it contains correct logger name).

Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @wojtek-coreum, and @xlab)

a discussion (no related file):

Previously, wojtek-coreum (Wojtek) wrote…

@xlab Please don't add tags to the repository, pointing to commits which are not in master. Potential users should not be pointed to the code which didn't pass review.

The tags are marked as rc0, ... ,rcN basically release candidates. Which they are.
I don't know what users have in mind when trying to use "release candidate" code, but, which tag naming patterns you suggest?

Totally ignoring tags mechanics in git is bad, since go.mod allows only commit/tag references. And commits are very easy to confuse.



pkg/pace/pace.go line 15 at r2 (raw file):

Previously, miladz68 (milad) wrote…

The original PR where we have the use case (https://reviewable.io/reviews/CoreumFoundation/coreum/44), we only need step and stop function. pause only makes sense when there is resume, but we don't have that use case.

Yep, I agree should be "Stop" then.


pkg/pace/pace.go line 42 at r2 (raw file):

Previously, miladz68 (milad) wrote…

we should stop reporting loop, and make sure it is stopped. then we will not need mutex, will make the struct and code simpler.

I'll see how this can be optimized, if the meter should be considered dead after stopping.


pkg/pace/pace.go line 79 at r2 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

All the goroutines in the project are started and managed by parallel package (https://github.com/CoreumFoundation/coreum-tools/tree/master/pkg/parallel).
Please read README there and have a look at documentation: https://pkg.go.dev/github.com/ridge/parallel
Take a look at example how it is used by benchmark tool: https://github.com/CoreumFoundation/coreum/blob/master/coreznet/pkg/zstress/stress.go#L49
If you have any questions please ping me.

parallel is used to run goroutines because it handles errors, panics, synchronization and logging, implemented once for entire project.
So instead of using go keyword you should create a function accepting argument of type parallel.SpawnFn and this function should be used to spawn the goroutine.
Also you should accept ctx context.Context as another argument and exit reporting loop once this context is cancelled, so function Stop() and Pause() are not needed.
Then you should get logger by calling log := logger.Get(ctx) because ctx contains correct logger preconfigured by parallel (it contains correct logger name).

Let's rename Go to "parallel" too ;)

Go language has enough primitives to manage concurrent code, let's not invent Erlang for the trivial stuff.

Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 79 at r2 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

All the goroutines in the project are started and managed by parallel package (https://github.com/CoreumFoundation/coreum-tools/tree/master/pkg/parallel).
Please read README there and have a look at documentation: https://pkg.go.dev/github.com/ridge/parallel
Take a look at example how it is used by benchmark tool: https://github.com/CoreumFoundation/coreum/blob/master/coreznet/pkg/zstress/stress.go#L49
If you have any questions please ping me.

parallel is used to run goroutines because it handles errors, panics, synchronization and logging, implemented once for entire project.
So instead of using go keyword you should create a function accepting argument of type parallel.SpawnFn and this function should be used to spawn the goroutine.
Also you should accept ctx context.Context as another argument and exit reporting loop once this context is cancelled, so function Stop() and Pause() are not needed.
Then you should get logger by calling log := logger.Get(ctx) because ctx contains correct logger preconfigured by parallel (it contains correct logger name).

I tend to disagree with using parallel package or at least making it a rule to use that package. it seems that the package adds more than it actually solves.
@ysv @oliverasistemas would you please give your opinions as well ?

Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 79 at r2 (raw file):

Previously, miladz68 (milad) wrote…

I tend to disagree with using parallel package or at least making it a rule to use that package. it seems that the package adds more than it actually solves.
@ysv @oliverasistemas would you please give your opinions as well ?

And I think if the goroutine is controlled by paceImpl then mutex would not be needed. This is the point I was trying to make in a previous comment.

Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 79 at r2 (raw file):

Previously, miladz68 (milad) wrote…

And I think if the goroutine is controlled by paceImpl then mutex would not be needed. This is the point I was trying to make in a previous comment.

@ysv @miladz68 @oliverasistemas

Guys it's really another deep subject. I tried to dig deeply into whole IoC thing while trying to make a new app. It's a rabbit hole of metaprogramming. So, I just used my oldschool patterns instead, to deliver it quickly and don't have too much meta for a subcommand with few flags.

Let's make another discussion somewhere.

Also, if we enforce this as a rule, I'm fine, but the following must be done before we do that:

  1. Full guide on how to create new apps without a lot of boilerplate
  2. Strip down unused IoC code, it's really confusing to read
  3. Make znet init patterns to be reusable, right now it's all hardcoded for a single use case (of znet app).

Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 42 at r2 (raw file):

Previously, xlab (Max Kupriianov) wrote…

I'll see how this can be optimized, if the meter should be considered dead after stopping.

I left a comment below, sorry if my comments are all over the place, but to put it in other words by having more control over the goroutine (preferably with context.Context if it makes sense) then we can do away with mutex.

Copy link

@ysv ysv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, and @xlab)


pkg/pace/pace.go line 41 at r2 (raw file):

Previously, miladz68 (milad) wrote…

I think this should be renamed to stop.

100% agree because pause is temporary thing so it should logically have sort of continue method


pkg/pace/pace.go line 79 at r2 (raw file):

Previously, xlab (Max Kupriianov) wrote…

@ysv @miladz68 @oliverasistemas

Guys it's really another deep subject. I tried to dig deeply into whole IoC thing while trying to make a new app. It's a rabbit hole of metaprogramming. So, I just used my oldschool patterns instead, to deliver it quickly and don't have too much meta for a subcommand with few flags.

Let's make another discussion somewhere.

Also, if we enforce this as a rule, I'm fine, but the following must be done before we do that:

  1. Full guide on how to create new apps without a lot of boilerplate
  2. Strip down unused IoC code, it's really confusing to read
  3. Make znet init patterns to be reusable, right now it's all hardcoded for a single use case (of znet app).
  1. 100% I would not force to use parallel everywhere it is
  2. as for passing context I think it is right thing to be addressed but Stop method is required also because their purpose is different
  3. as for fetching logger from ctx I always try to avoid these type of things it unless it is mandatory. As for me storing of variables in ctx is bad practice similar to global variables but what is more ctx vars are not typed.
  4. As for IoC I agree with Maxim and I also found it challenging to go through all of code and even after spending bunch of hours on it I still don't feel comfortable with codebase

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, and @xlab)

a discussion (no related file):

Previously, xlab (Max Kupriianov) wrote…

The tags are marked as rc0, ... ,rcN basically release candidates. Which they are.
I don't know what users have in mind when trying to use "release candidate" code, but, which tag naming patterns you suggest?

Totally ignoring tags mechanics in git is bad, since go.mod allows only commit/tag references. And commits are very easy to confuse.

Until new code is in master just use replace directive in go.mod of the other repo to point it to your local version


Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, and @xlab)

a discussion (no related file):

Previously, wojtek-coreum (Wojtek) wrote…

Until new code is in master just use replace directive in go.mod of the other repo to point it to your local version

"My local version" doesn't work because
a) everything shall be in git, not on the disk
b) the implementation and usage are in different repos

That means, is you're trying to check my implementation and pull the branch, it won't compile since replace points to local version on the disk.


Copy link

@ysv ysv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, and @xlab)


pkg/pace/pace.go line 86 at r2 (raw file):

func (p *paceImpl) reportingLoop() {
	for range p.timer.C {
		func() {

is there any specific reason to put it into a separate func ?

Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 6 files reviewed, 8 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 41 at r2 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

100% agree because pause is temporary thing so it should logically have sort of continue method

Now it's stop with full shutdown.


pkg/pace/pace.go line 42 at r2 (raw file):

Previously, miladz68 (milad) wrote…

I left a comment below, sorry if my comments are all over the place, but to put it in other words by having more control over the goroutine (preferably with context.Context if it makes sense) then we can do away with mutex.

No problem, reasonable. I added it & tests.


pkg/pace/pace.go line 79 at r2 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…
  1. 100% I would not force to use parallel everywhere it is
  2. as for passing context I think it is right thing to be addressed but Stop method is required also because their purpose is different
  3. as for fetching logger from ctx I always try to avoid these type of things it unless it is mandatory. As for me storing of variables in ctx is bad practice similar to global variables but what is more ctx vars are not typed.
  4. As for IoC I agree with Maxim and I also found it challenging to go through all of code and even after spending bunch of hours on it I still don't feel comfortable with codebase

Done.


pkg/pace/pace.go line 86 at r2 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

I see
In such cases I prefer to move this to a separate func then
But current implementation is also ok for me

Done.

Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

a discussion (no related file):
@miladz68 @ysv
During our discussions on goroutines you agreed that you prefer to manage goroutines manually and you said that it creates a trouble for you if goroutine is hidden somewhere under the hood. This library does exactly that. It hides the goroutine in the constructor of the object.

I completely understand why it is done this way. Because as we discussed managing goroutine is a repeatable task so write this code once. And I know that most of the people do it this way but this fact doesn't mean that this is the right thing.

Basically there are two activities happening here:

  • collecting and reporting metric
  • managing goroutine

In my opinion those two things should be done separately. This library should expose two functions:

  • one for counting: Step
  • second one for logging: reportingLoop(ctx context.Context) which should be made public and renamed to sth like Run

Then I should be able to use the library like this:

go func() {
    p.Run(ctx) // Run reports status periodically and exits once `ctx` is canceled
}()

for {
    // ...
    p.Step(1)
}


pkg/pace/pace.go line 86 at r2 (raw file):

Previously, xlab (Max Kupriianov) wrote…

Done.

Did you forget to push the commit? Because it's still an inlined function


pkg/pace/pace.go line 23 at r3 (raw file):

type paceImpl struct {
	mux *sync.RWMutex

What fields here are protected by the mutex?
For better readability, could you please divide fieds into two groups:

  • those not protected by the mutex
  • those protected by the mutex

like this:

value    int64
label    string
interval time.Duration

mux *sync.RWMutex
repFn    ReporterFunc
cancelFn func()
timer    *time.Timer

?
In the example above I divided the fields randomly. It makes my life easier when it comes to understanding the code.


pkg/pace/pace.go line 23 at r3 (raw file):

type paceImpl struct {
	mux *sync.RWMutex

paceImpl is returned using pointer so you don't need a pointer to the mutex


pkg/pace/pace.go line 30 at r3 (raw file):

	lastTick time.Time
	repFn    ReporterFunc
	cancelFn func()

use context.CancelFn as a type for cancelFn to clearly indicate its origin and purpose


pkg/pace/pace.go line 31 at r3 (raw file):

	repFn    ReporterFunc
	cancelFn func()
	timer    *time.Timer

@ysv @miladz68
Previously I had a discussion with @xlab regarding using time.Timer over simple <-time.After. His argument is that there was a guy who created 16 millions timers per second using time.After, each leaving for 5 minutes. Obviously that guy killed the garbage collector. @xlab will probably paste a link in the reply.

The thing is that here we don't have this problem. Wouldn't it be better then to just use simple <-time.After which is just a one-liner rather than time.Timer which must be stored in the struct, started and reseted? Why to care about this additional stuff if you have a fire-and-forget solution?


pkg/pace/pace.go line 48 at r3 (raw file):

func (p *paceImpl) report() {
	timeframe := time.Since(p.lastTick)
	if abs(timeframe-p.interval) < 10*time.Millisecond {

Could you please explain the idea behind this if? I also don't understand the purpose of lastTick.


pkg/pace/pace.go line 72 at r3 (raw file):

	}

	paceCtx, cancelFn := context.WithCancel(context.Background())

To be compatible with the rest of the software please use ctx received as an argument to this function instead of context.Background()


pkg/pace/pace.go line 75 at r3 (raw file):

	p.cancelFn = cancelFn

	go p.reportingLoop(paceCtx)

I don't like the idea that object constructor starts the goroutine. Constructor is there to prepare the object. You don't expect that by calling constructor you start some serious job in the background. I consider it to be a design issue. If you want to do it this way there should be a Start() method starting the goroutine.


pkg/pace/zap_reporter.go line 11 at r3 (raw file):

// ZapReporter reports using the provided zap logger and stops reporting when flow of events is stoped.
func ZapReporter(log *zap.Logger) ReporterFunc {

What if there are many metrics being reported by the same setup? It would be nice to report them using single log-item instead of flooding the terminal.
Also if there are many metrics reported together it is quite important to group them because if many goroutines log their stuff, then lines from all of them are mixed and it might be hard to find matching values in the stream.


pkg/pace/zap_reporter.go line 19 at r3 (raw file):

		case value == 0 && previous == 0:
			return // don't report anything
		case value == 0 && previous != 0:

Please explain what is done inside this case


pkg/pace/zap_reporter.go line 47 at r3 (raw file):

		switch timeframe {
		case time.Second:
			log.Sugar().Infof("%s: %s/s in %v", label, intFmt(value), timeframe)

Is there a reason you use %s, intFmt and floatFmt instead of appropriate %-something syntax directly?


pkg/pace/zap_reporter.go line 47 at r3 (raw file):

		switch timeframe {
		case time.Second:
			log.Sugar().Infof("%s: %s/s in %v", label, intFmt(value), timeframe)

Could you please use structured logging using zap.Fields? Logs are there not only to present themselves in the terminal but they are also collected by different tools like elasticsearch, grafana or processed by other software. By using just a simple string you literally kill the possibility of manipulating them (filtering, searching, grouping).

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 6 files at r2.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)


pkg/pace/pace.go line 31 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

@ysv @miladz68
Previously I had a discussion with @xlab regarding using time.Timer over simple <-time.After. His argument is that there was a guy who created 16 millions timers per second using time.After, each leaving for 5 minutes. Obviously that guy killed the garbage collector. @xlab will probably paste a link in the reply.

The thing is that here we don't have this problem. Wouldn't it be better then to just use simple <-time.After which is just a one-liner rather than time.Timer which must be stored in the struct, started and reseted? Why to care about this additional stuff if you have a fire-and-forget solution?

While referring to "some guy on the internetz" may sound sketchy, the original Go code comment discourages anyone to use time.After, it's a cool thing that worked for early projects, but then people started to put it into production and memory begin to be eaten. I didn't want to use time.After because it introduces a bad example into the code, some other team member may copy it and put into a hot path.

I also tried to simplify time management code there by replacing time.Timer with time.Ticker in a loop, see this commit ffd5a40 but Ticker has floating intervals and under heavy GC load the time interval starts to drift, 10s becomes 10.3943409s and so on.

Honestly, I spent on optimising shit out of this package for so much time, I rarely spend so much time on really advanced code.

Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)


pkg/pace/pace.go line 31 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

While referring to "some guy on the internetz" may sound sketchy, the original Go code comment discourages anyone to use time.After, it's a cool thing that worked for early projects, but then people started to put it into production and memory begin to be eaten. I didn't want to use time.After because it introduces a bad example into the code, some other team member may copy it and put into a hot path.

I also tried to simplify time management code there by replacing time.Timer with time.Ticker in a loop, see this commit ffd5a40 but Ticker has floating intervals and under heavy GC load the time interval starts to drift, 10s becomes 10.3943409s and so on.

Honestly, I spent on optimising shit out of this package for so much time, I rarely spend so much time on really advanced code.

Screenshot 2022-06-07 at 20.42.56.png
Screenshot 2022-06-07 at 20.43.33.png

That being said, it has to be Timer.

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)


pkg/pace/pace.go line 31 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

Screenshot 2022-06-07 at 20.42.56.png
Screenshot 2022-06-07 at 20.43.33.png

That being said, it has to be Timer.

"If efficiency is a concern" is the keyword. Nobody will fire logging function more often than a second.

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 18 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

a discussion (no related file):
@miladz68 @ysv @xlab

The more I think about this library the more I'm sure that we should come back to the design board and think about it once again, specifying requirements first. In my opinion it's a reasonable assumption that this package might be used to print more than one metric in the same logical scope. Because we don't have this situation now (and in other unanswered comment I asked about the list of metrics @xlab wants to report), I'm not insisting on supporting the full scenario now, but this code should be ready to add such support by extending the library, not reimplementing it from scratch.

In current state I see three problems related to reporting many metrics:

  • each metric starts separate goroutine, its own timer etc. It takes additional resources. I don't care much but if @xlab wants to really discuss performance implications of using time.After over time.Timer (https://reviewable.io/reviews/CoreumFoundation/coreum-tools/8#-N3zBkiG7tiBTBfQfmav) then this point should be fixed first
  • the more important side effect of reporting each metric in its own goroutine is that metrics which are reported together in the source code, won't be printed together in the console. There are (might be - it's reasonable assumption) other goroutines running in parallel, they also log the stuff. Then, logs produced by metrics will be mixed with other logs, which means finding related metrics to compare them will be difficult
  • the other side effect of running a goroutine per metric is that chances are great (I guess 100%) that goroutines of the reporters might be scheduled in different order, so the lines in console will appear in a random order each time, forcing me to hunt for the right line I'm interested in at the moment.

So please, let's design this library in the way that it makes our life easier, not harder.


Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)


pkg/pace/pace_test.go line 21 at r3 (raw file):

	})

	go func() {

if tested code panics, go test will just terminate immediately without running other tests


pkg/pace/pace_test.go line 27 at r3 (raw file):

	}()

	time.Sleep(50 * time.Millisecond)

Every time you use time.Sleep in this unit test you have no guarantees that the tested code does what you expect. It might happen (and to answer your next question: yes, it really happens) that machine running this test (e.g. in CI) is overloaded causing random failures of testing framework. On the other hand if you set sleep to be high enough to handle 95% of situations and apply this approach in other unit tests then the testing will take much longer.

This is bad approach. If you really want to test this library it should be done without goroutines and library should be written in the way that does not require goroutines to be tested.

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r3.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)


pkg/pace/pace.go line 23 at r3 (raw file):

type paceImpl struct {
	mux *sync.RWMutex

with pause and resume function gone. I don't think we need the mutex anymore.


pkg/pace/pace.go line 31 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

"If efficiency is a concern" is the keyword. Nobody will fire logging function more often than a second.

To me this discussion seems unproductive. Because I think this is not something to bring on so much back and forth.

Copy link

@ysv ysv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

a discussion (no related file):

Previously, wojtek-coreum (Wojtek) wrote…

@miladz68 @ysv
During our discussions on goroutines you agreed that you prefer to manage goroutines manually and you said that it creates a trouble for you if goroutine is hidden somewhere under the hood. This library does exactly that. It hides the goroutine in the constructor of the object.

I completely understand why it is done this way. Because as we discussed managing goroutine is a repeatable task so write this code once. And I know that most of the people do it this way but this fact doesn't mean that this is the right thing.

Basically there are two activities happening here:

  • collecting and reporting metric
  • managing goroutine

In my opinion those two things should be done separately. This library should expose two functions:

  • one for counting: Step
  • second one for logging: reportingLoop(ctx context.Context) which should be made public and renamed to sth like Run

Then I should be able to use the library like this:

go func() {
    p.Run(ctx) // Run reports status periodically and exits once `ctx` is canceled
}()

for {
    // ...
    p.Step(1)
}

Yes that is what I don't like in this solution


a discussion (no related file):

Previously, wojtek-coreum (Wojtek) wrote…

@miladz68 @ysv @xlab

The more I think about this library the more I'm sure that we should come back to the design board and think about it once again, specifying requirements first. In my opinion it's a reasonable assumption that this package might be used to print more than one metric in the same logical scope. Because we don't have this situation now (and in other unanswered comment I asked about the list of metrics @xlab wants to report), I'm not insisting on supporting the full scenario now, but this code should be ready to add such support by extending the library, not reimplementing it from scratch.

In current state I see three problems related to reporting many metrics:

  • each metric starts separate goroutine, its own timer etc. It takes additional resources. I don't care much but if @xlab wants to really discuss performance implications of using time.After over time.Timer (https://reviewable.io/reviews/CoreumFoundation/coreum-tools/8#-N3zBkiG7tiBTBfQfmav) then this point should be fixed first
  • the more important side effect of reporting each metric in its own goroutine is that metrics which are reported together in the source code, won't be printed together in the console. There are (might be - it's reasonable assumption) other goroutines running in parallel, they also log the stuff. Then, logs produced by metrics will be mixed with other logs, which means finding related metrics to compare them will be difficult
  • the other side effect of running a goroutine per metric is that chances are great (I guess 100%) that goroutines of the reporters might be scheduled in different order, so the lines in console will appear in a random order each time, forcing me to hunt for the right line I'm interested in at the moment.

So please, let's design this library in the way that it makes our life easier, not harder.

Running goroutine per metric doesn't make sense for me also in the way I see it. We should be able to run multiple reporters with multiple metrics reported by each one



pkg/pace/pace.go line 72 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

To be compatible with the rest of the software please use ctx received as an argument to this function instead of context.Background()

agree


pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Every time you use time.Sleep in this unit test you have no guarantees that the tested code does what you expect. It might happen (and to answer your next question: yes, it really happens) that machine running this test (e.g. in CI) is overloaded causing random failures of testing framework. On the other hand if you set sleep to be high enough to handle 95% of situations and apply this approach in other unit tests then the testing will take much longer.

This is bad approach. If you really want to test this library it should be done without goroutines and library should be written in the way that does not require goroutines to be tested.

I also don't like to use sleep in tests but if there is no other way to test it it is ok
I believe in this case it should be possible to test without sleep but I would not insist on it much


pkg/pace/zap_reporter.go line 11 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

What if there are many metrics being reported by the same setup? It would be nice to report them using single log-item instead of flooding the terminal.
Also if there are many metrics reported together it is quite important to group them because if many goroutines log their stuff, then lines from all of them are mixed and it might be hard to find matching values in the stream.

The thing here is that we wanted to log a single metric for new
And I don't see any issue on adding multi-metric logging in this part of code
However in other parts it is more complicated


pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Is there a reason you use %s, intFmt and floatFmt instead of appropriate %-something syntax directly?

It is done to avoid code duplication and looks reasonable for me


pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Could you please use structured logging using zap.Fields? Logs are there not only to present themselves in the terminal but they are also collected by different tools like elasticsearch, grafana or processed by other software. By using just a simple string you literally kill the possibility of manipulating them (filtering, searching, grouping).

agree

Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

a discussion (no related file):

Previously, ysv (Yaroslav Savchuk) wrote…

Running goroutine per metric doesn't make sense for me also in the way I see it. We should be able to run multiple reporters with multiple metrics reported by each one

To address issues outlined in this comment:

  1. multiple goroutines (like 3-4 of extra go routines) will not make any impact on the performance, in a language where goroutines are spawned in millions.

2 -3) log lines are never in perfect order, because you have other logging from other goroutines and any Logger will use buffered logging, this is especially true for cloud environments when logs are collected from multiple sources and ordering is never taken seriously.



pkg/pace/pace.go line 86 at r2 (raw file):
I tried to resolve the conversation and Reviewable automatically adds "Done."

current implementation is also ok for me

This is a green light for me.


pkg/pace/pace.go line 75 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

I don't like the idea that object constructor starts the goroutine. Constructor is there to prepare the object. You don't expect that by calling constructor you start some serious job in the background. I consider it to be a design issue. If you want to do it this way there should be a Start() method starting the goroutine.

This is a common Go pattern, sometimes it makes sense to make an explicit Start, but for this package I wanted to make interface as clean as possible.

Also, it was done this way to match semantics of time.NewTimer or time.NewTicker, basically you initialize it and it start ticking.


pkg/pace/pace_test.go line 21 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

if tested code panics, go test will just terminate immediately without running other tests

It's handled by the go testing workflow


pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

I also don't like to use sleep in tests but if there is no other way to test it it is ok
I believe in this case it should be possible to test without sleep but I would not insist on it much

The code under test has asynchronous nature, I could go miles by scaffolding syncing primitives around, but I decided that using time constrains close to real world that are simplier to use.

This approach should be considered case-by-case of course.


pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

It is done to avoid code duplication and looks reasonable for me

Originally it was floatFmt in all places, but intFmt improves the output clarity. Right now it still makes sense to abstract away the specifics of the number type.


pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

agree

I did this on purpose because structured fields in Zap cause the log line to span across multiple lines. With 3 variables there and 2 additional line from the Zap itself, it will be 6 lines of text printed every tick.

However, if you insist I can convert this into structured version easily.

Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 6 files reviewed, 22 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 23 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

What fields here are protected by the mutex?
For better readability, could you please divide fieds into two groups:

  • those not protected by the mutex
  • those protected by the mutex

like this:

value    int64
label    string
interval time.Duration

mux *sync.RWMutex
repFn    ReporterFunc
cancelFn func()
timer    *time.Timer

?
In the example above I divided the fields randomly. It makes my life easier when it comes to understanding the code.

I've changed the ordering of the fields, also renamed mux to lastTickMux to signify what it protects now.


pkg/pace/pace.go line 23 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

paceImpl is returned using pointer so you don't need a pointer to the mutex

It doesn't hurt to have pointer here still.


pkg/pace/pace.go line 30 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

use context.CancelFn as a type for cancelFn to clearly indicate its origin and purpose

Done.


pkg/pace/pace.go line 31 at r3 (raw file):

Previously, miladz68 (milad) wrote…

To me this discussion seems unproductive. Because I think this is not something to bring on so much back and forth.

I agree, let's move on.


pkg/pace/pace.go line 72 at r3 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

agree

Added ctx param to the New() method, now the context has a root.


pkg/pace/pace.go line 63 at r4 (raw file):

// All ticks (or steps) are aggregated in timeframes specified using interval.
// If the reporting function was not provided, ZapReporter will be used as default.
func New(ctx context.Context, label string, interval time.Duration, reportFn ...ReporterFunc) Pace {

Made the reportFn optional, can be omitted if you assume that a default-company-wide-standartized-approved reported func shall be used by default.

Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: all files reviewed, 22 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/zap_reporter.go line 11 at r3 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

The thing here is that we wanted to log a single metric for new
And I don't see any issue on adding multi-metric logging in this part of code
However in other parts it is more complicated

As has been discussed on the call, the package was never meant as a professional metric collecting package. You don't expect log.Println to be able to print 100 log lines at the same time, or time.NewTimer to run many timers at once.

This package implements a debugging helper, also some CLI guidance to show some activity in the console and server logs. It's oriented towards INFO/DEBUG logging capabilities, rather than metrics collection agents like Prometheus exporters, etc.


pkg/pace/zap_reporter.go line 19 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Please explain what is done inside this case

The flow/pace is considered to be stalled and that fact is reported. It helps to avoid situations when you still have 0.00003/s but not stale. This case handles absolute zeroes in a time frame.

Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 22 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 48 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Could you please explain the idea behind this if? I also don't understand the purpose of lastTick.

lastTick stores starting point for the current timeframe. Basically the timestamp of the last counter reset.

As for this particular if, it rounds up timeframe to p.interval, if the difference between them is less than 10ms. This allows to get you round numbers when GC pressure is high or your external triggering function was not accurate 100%, but close enough to p.interval.

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 19 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

a discussion (no related file):

Previously, xlab (Max Kupriianov) wrote…

To address issues outlined in this comment:

  1. multiple goroutines (like 3-4 of extra go routines) will not make any impact on the performance, in a language where goroutines are spawned in millions.

2 -3) log lines are never in perfect order, because you have other logging from other goroutines and any Logger will use buffered logging, this is especially true for cloud environments when logs are collected from multiple sources and ordering is never taken seriously.

I gave more details about points 2 and 3 in other comment in this PR: https://reviewable.io/reviews/CoreumFoundation/coreum-tools/8#-N3zKHU48bJ5cQkdWDWY

Let's continue the discussion there.



pkg/pace/pace.go line 86 at r2 (raw file):

Previously, xlab (Max Kupriianov) wrote…

I tried to resolve the conversation and Reviewable automatically adds "Done."

current implementation is also ok for me

This is a green light for me.

For me, using inline functions is not an issue but if someone else in the team finds it unclear then I'm for applying this change. @ysv - you made the original comment so it's up to you


pkg/pace/pace.go line 23 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

It doesn't hurt to have pointer here still.

but it just doesn't make sense

@miladz68


pkg/pace/pace.go line 48 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

lastTick stores starting point for the current timeframe. Basically the timestamp of the last counter reset.

As for this particular if, it rounds up timeframe to p.interval, if the difference between them is less than 10ms. This allows to get you round numbers when GC pressure is high or your external triggering function was not accurate 100%, but close enough to p.interval.

If you want to be as precise as possible in calculations and taking real time between reports instead of just p.interval then why do you ignore 10-millisecond shift? Why not just take the real timeframe as it is returned by timeframe := time.Since(p.lastTick) without rounding?


pkg/pace/pace.go line 75 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

This is a common Go pattern, sometimes it makes sense to make an explicit Start, but for this package I wanted to make interface as clean as possible.

Also, it was done this way to match semantics of time.NewTimer or time.NewTicker, basically you initialize it and it start ticking.

@miladz68


pkg/pace/pace.go line 27 at r4 (raw file):

type paceImpl struct {
	lastTickMux *sync.RWMutex
	lastTick    time.Time

lastTick is referenced from two methods only reportingLopp (running in goroutine) and report. report is called only inside reportingLoop. It means all the read/write operations on lastTick are made by the same goroutine, so you don't need a mutex to protect it.

Moreover lastTick could be just a local variable of reportingLoop declared before for and all the updates could happen inside reportingLoop, then lastTick could be passed as an argument to report method.


pkg/pace/pace.go line 63 at r4 (raw file):

Previously, xlab (Max Kupriianov) wrote…

Made the reportFn optional, can be omitted if you assume that a default-company-wide-standartized-approved reported func shall be used by default.

  • using ... is an antipattern of making an argument optional in go. It suggests to the caller that it is valid to pass many of them
  • I'm for removing it completely

@miladz68


pkg/pace/pace_test.go line 21 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

It's handled by the go testing workflow

It's not:

func TestFirst(t *testing.T) {
	done := make(chan struct{})
	go func() {
		panic("panicked")
		close(done)
	}()
	<-done
}

func TestSecond(t *testing.T) {
	assert.Fail(t, "second test failed")
}

result of running go test .:

[wojciech@coreum pkg]$ go test .
panic: panicked

goroutine 20 [running]:
github.com/CoreumFoundation/coreum-tools/pkg.TestFirst.func1()
	/home/wojciech/coreum-tools/pkg/hello_test.go:12 +0x27
created by github.com/CoreumFoundation/coreum-tools/pkg.TestFirst
	/home/wojciech/coreum-tools/pkg/hello_test.go:11 +0x6a
FAIL	github.com/CoreumFoundation/coreum-tools/pkg	0.013s
FAIL

TestSecond is never executed and it's failure is not reported


pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

The code under test has asynchronous nature, I could go miles by scaffolding syncing primitives around, but I decided that using time constrains close to real world that are simplier to use.

This approach should be considered case-by-case of course.

IMO If you want to test this code, do it the right way. If the code must be changed to be tested properly - change it. In my opinion the logic in the library could be so simple that it doesn't require testing at all.

More context:
The unit tests are run by CI on every pull request made by all the team members. I really want to avoid situations where someone receives failures unrelated to committed code. In such scenario they must go to CI and inspect results just to find out that it's unrelated failure caused by overloaded CI server, and then wait until CI runs again. Doing such things to our teammates is really a good practice.

@ysv @miladz68


pkg/pace/zap_reporter.go line 11 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

As has been discussed on the call, the package was never meant as a professional metric collecting package. You don't expect log.Println to be able to print 100 log lines at the same time, or time.NewTimer to run many timers at once.

This package implements a debugging helper, also some CLI guidance to show some activity in the console and server logs. It's oriented towards INFO/DEBUG logging capabilities, rather than metrics collection agents like Prometheus exporters, etc.

My point is not to use separate log.* for each metric. My point is to use a single call to log.* to print all of them at once so there is a guarantee that all the metrics created by the caller are printed together so the end-user is not forced to hunt them in the terminal.

I've already said elsewhere that lack of logging many metrics is the sign that important requirement was missed during the design. I have two use-cases so far where I want two metrics to be printed together.

This is an example of code to print all the metrics in single log item:

		fields := make([]zap.Field, 0, len(r.reporters))
		for _, c := range r.reporters {
			fields = append(fields, c(scale))
		}
		log.Info(r.title, fields...)

@miladz68


pkg/pace/zap_reporter.go line 19 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

The flow/pace is considered to be stalled and that fact is reported. It helps to avoid situations when you still have 0.00003/s but not stale. This case handles absolute zeroes in a time frame.

I still don't understand, please ping me on slack when you have time


pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

Originally it was floatFmt in all places, but intFmt improves the output clarity. Right now it still makes sense to abstract away the specifics of the number type.

@ysv I don't understand

My point is that this:

log.Sugar().Infof("%s: %s/s in %v", label, intFmt(value), timeframe)

could be replaced by this:

log.Sugar().Infof("%s: %d/s in %v", label, value, timeframe)

and intFmt could be removed. If this is done the way I propose, where is the code duplication?


pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

I did this on purpose because structured fields in Zap cause the log line to span across multiple lines. With 3 variables there and 2 additional line from the Zap itself, it will be 6 lines of text printed every tick.

However, if you insist I can convert this into structured version easily.

How logs are formatted depends on how logger is configured. core, coreznet and corezstress have CLI flag --log-format accepting one of 3 values:

  • json - formats each log item in a single-line json - this is good if app is a service running in production where logs are collected and sent to elastic, grafana etc.
  • console - standard format delivered by zap which is intended to format logs in human-readable form, one line per log item - in practice the output is shitty if fields contain special characters because they are escaped like in json
  • yaml - custom format written to me which formats logs as a multi-line yaml - logs might take a lot of space but it's easier if you want to really debu something, and it's possible to load them into editor and use folding

To summarize:
Formatting the logs the way you like is not a responsibility of pace but the encoder which is used by zap when they are really printed.
If you prefer to have a one line per log item, use --log-format value of your choice.
But pace itself should produce well-structured logs to enable further processing capabilities offered by external tools.

Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 23 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

but it just doesn't make sense

@miladz68

I don't think it is necessary but I like to see us stick to a convention and avoid arbitrary styles. I think it makes sense to avoid pointers as much as possible and introduce them as less often as possible. I think if one of them gets the job done, the other one should be avoided.

Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 63 at r4 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…
  • using ... is an antipattern of making an argument optional in go. It suggests to the caller that it is valid to pass many of them
  • I'm for removing it completely

@miladz68

I agree with wojtek here.

Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 75 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

@miladz68

In our discussion I argued that this goroutine is playing the role of PrinterController and is not part of the counter package. for me a separation concerns would be nice

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

Copy link

@miladz68 miladz68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

IMO If you want to test this code, do it the right way. If the code must be changed to be tested properly - change it. In my opinion the logic in the library could be so simple that it doesn't require testing at all.

More context:
The unit tests are run by CI on every pull request made by all the team members. I really want to avoid situations where someone receives failures unrelated to committed code. In such scenario they must go to CI and inspect results just to find out that it's unrelated failure caused by overloaded CI server, and then wait until CI runs again. Doing such things to our teammates is really a good practice.

@ysv @miladz68

testing async code is hard. it is super hard in fact. but writing correct code is hard as well. I agree with yaroslav here I would like to see it done correctly but I would not put too much emphasis on it.

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, miladz68 (milad) wrote…

testing async code is hard. it is super hard in fact. but writing correct code is hard as well. I agree with yaroslav here I would like to see it done correctly but I would not put too much emphasis on it.

Personally if I see a test like this (using unpredictable Sleep) I prefer to not have a test at all, especially in this case where tested logic could be simple and once accepted not modified for long time

Copy link
Author

@xlab xlab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 23 at r3 (raw file):

I think it makes sense to avoid pointers as much as possible

any argumentation?


pkg/pace/pace.go line 23 at r3 (raw file):

Previously, miladz68 (milad) wrote…

with pause and resume function gone. I don't think we need the mutex anymore.

Mutex locks lastTick which still participates in the control loop. Which can become even more important to have, if the package implements an external-controllable ticking like you suggested on the call.


pkg/pace/pace_test.go line 21 at r3 (raw file):

FAIL

this is the output of the testing workflow, it marks test as failing upon panic.


pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Personally if I see a test like this (using unpredictable Sleep) I prefer to not have a test at all, especially in this case where tested logic could be simple and once accepted not modified for long time

Oh sorry for adding too much tests

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace_test.go line 21 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

FAIL

this is the output of the testing workflow, it marks test as failing upon panic.

The correct output of testing workflow is the list of all failing tests.

@miladz68


pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

Oh sorry for adding too much tests

@xlab I'm not going to discuss with you this way

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r4.
Reviewable status: all files reviewed, 22 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)


pkg/pace/pace.go line 63 at r4 (raw file):

// All ticks (or steps) are aggregated in timeframes specified using interval.
// If the reporting function was not provided, ZapReporter will be used as default.
func New(ctx context.Context, label string, interval time.Duration, reportFn ...ReporterFunc) Pace {

Stop reporting when ctx received here is canceled, then you could remove Stop method and cancelFn completely


pkg/pace/pace.go line 74 at r4 (raw file):

	}

	logger, _ := zap.NewProduction()

Please use zap logger provided as an argument to pace because it might be configured differently than you expect here


pkg/pace/pace.go line 99 at r4 (raw file):

			}

			p.defaultLogger.With(zap.Error(

Here and above: you may do it without creating new logger you are not using here:
p.defaultLogger.Warn("...", zap.Error(...))


pkg/pace/pace.go line 99 at r4 (raw file):

			}

			p.defaultLogger.With(zap.Error(

please add zap.Any("panic", v) too


pkg/pace/pace_test.go line 18 at r4 (raw file):

		if stopped {
			t.Fatalf("not expected to fire after stop")
			t.FailNow()
  1. testReportingFn is run in goroutine by pace
  2. t.FailNow()uses runtime.Goexit to break the test

That's why documentation of t.FailNow says:

// FailNow must be called from the goroutine running the
// test or benchmark function, not from other goroutines
// created during the test. Calling FailNow does not stop
// those other goroutines.

Copy link
Collaborator

@wojtek-coreum wojtek-coreum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r4.
Reviewable status: all files reviewed, 22 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

@xlab xlab closed this Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants