Search code, repositories, users, issues, pull requests...

func (p *paceImpl) Pause() {
	p.timer.Stop()

we should stop reporting loop, and make sure it is stopped. then we will not need mutex, will make the struct and code simpler.

oliverasistemas

Reviewed all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @xlab)

wojtek-coreum

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @xlab)

a discussion (no related file):
@xlab Please don't add tags to the repository, pointing to commits which are not in master. Potential users should not be pointed to the code which didn't pass review.

wojtek-coreum

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @xlab)

	}

	go p.reportingLoop()

All the goroutines in the project are started and managed by parallel package (https://github.com/CoreumFoundation/coreum-tools/tree/master/pkg/parallel).
Please read README there and have a look at documentation: https://pkg.go.dev/github.com/ridge/parallel
Take a look at example how it is used by benchmark tool: https://github.com/CoreumFoundation/coreum/blob/master/coreznet/pkg/zstress/stress.go#L49
If you have any questions please ping me.

parallel is used to run goroutines because it handles errors, panics, synchronization and logging, implemented once for entire project.
So instead of using go keyword you should create a function accepting argument of type parallel.SpawnFn and this function should be used to spawn the goroutine.
Also you should accept ctx context.Context as another argument and exit reporting loop once this context is cancelled, so function Stop() and Pause() are not needed.
Then you should get logger by calling log := logger.Get(ctx) because ctx contains correct logger preconfigured by parallel (it contains correct logger name).

xlab

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @wojtek-coreum, and @xlab)

a discussion (no related file):

Previously, wojtek-coreum (Wojtek) wrote…

@xlab Please don't add tags to the repository, pointing to commits which are not in master. Potential users should not be pointed to the code which didn't pass review.

The tags are marked as rc0, ... ,rcN basically release candidates. Which they are.
I don't know what users have in mind when trying to use "release candidate" code, but, which tag naming patterns you suggest?

Totally ignoring tags mechanics in git is bad, since go.mod allows only commit/tag references. And commits are very easy to confuse.

pkg/pace/pace.go line 15 at r2 (raw file):

Previously, miladz68 (milad) wrote…

The original PR where we have the use case (https://reviewable.io/reviews/CoreumFoundation/coreum/44), we only need step and stop function. pause only makes sense when there is resume, but we don't have that use case.

Yep, I agree should be "Stop" then.

Previously, miladz68 (milad) wrote…

we should stop reporting loop, and make sure it is stopped. then we will not need mutex, will make the struct and code simpler.

I'll see how this can be optimized, if the meter should be considered dead after stopping.

Previously, wojtek-coreum (Wojtek) wrote…

All the goroutines in the project are started and managed by parallel package (https://github.com/CoreumFoundation/coreum-tools/tree/master/pkg/parallel).
Please read README there and have a look at documentation: https://pkg.go.dev/github.com/ridge/parallel
Take a look at example how it is used by benchmark tool: https://github.com/CoreumFoundation/coreum/blob/master/coreznet/pkg/zstress/stress.go#L49
If you have any questions please ping me.

parallel is used to run goroutines because it handles errors, panics, synchronization and logging, implemented once for entire project.
So instead of using go keyword you should create a function accepting argument of type parallel.SpawnFn and this function should be used to spawn the goroutine.
Also you should accept ctx context.Context as another argument and exit reporting loop once this context is cancelled, so function Stop() and Pause() are not needed.
Then you should get logger by calling log := logger.Get(ctx) because ctx contains correct logger preconfigured by parallel (it contains correct logger name).

Let's rename Go to "parallel" too ;)

Go language has enough primitives to manage concurrent code, let's not invent Erlang for the trivial stuff.

miladz68

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)

Previously, wojtek-coreum (Wojtek) wrote…

All the goroutines in the project are started and managed by parallel package (https://github.com/CoreumFoundation/coreum-tools/tree/master/pkg/parallel).
Please read README there and have a look at documentation: https://pkg.go.dev/github.com/ridge/parallel
Take a look at example how it is used by benchmark tool: https://github.com/CoreumFoundation/coreum/blob/master/coreznet/pkg/zstress/stress.go#L49
If you have any questions please ping me.

parallel is used to run goroutines because it handles errors, panics, synchronization and logging, implemented once for entire project.
So instead of using go keyword you should create a function accepting argument of type parallel.SpawnFn and this function should be used to spawn the goroutine.
Also you should accept ctx context.Context as another argument and exit reporting loop once this context is cancelled, so function Stop() and Pause() are not needed.
Then you should get logger by calling log := logger.Get(ctx) because ctx contains correct logger preconfigured by parallel (it contains correct logger name).

I tend to disagree with using parallel package or at least making it a rule to use that package. it seems that the package adds more than it actually solves.
@ysv @oliverasistemas would you please give your opinions as well ?

miladz68

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)

Previously, miladz68 (milad) wrote…

I tend to disagree with using parallel package or at least making it a rule to use that package. it seems that the package adds more than it actually solves.
@ysv @oliverasistemas would you please give your opinions as well ?

And I think if the goroutine is controlled by paceImpl then mutex would not be needed. This is the point I was trying to make in a previous comment.

xlab

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)

@ysv @miladz68 @oliverasistemas

Previously, miladz68 (milad) wrote…

And I think if the goroutine is controlled by paceImpl then mutex would not be needed. This is the point I was trying to make in a previous comment.

Guys it's really another deep subject. I tried to dig deeply into whole IoC thing while trying to make a new app. It's a rabbit hole of metaprogramming. So, I just used my oldschool patterns instead, to deliver it quickly and don't have too much meta for a subcommand with few flags.

Let's make another discussion somewhere.

Also, if we enforce this as a rule, I'm fine, but the following must be done before we do that:

Full guide on how to create new apps without a lot of boilerplate
Strip down unused IoC code, it's really confusing to read
Make znet init patterns to be reusable, right now it's all hardcoded for a single use case (of znet app).

miladz68

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)

Previously, xlab (Max Kupriianov) wrote…

I'll see how this can be optimized, if the meter should be considered dead after stopping.

I left a comment below, sorry if my comments are all over the place, but to put it in other words by having more control over the goroutine (preferably with context.Context if it makes sense) then we can do away with mutex.

ysv

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, and @xlab)

pkg/pace/pace.go line 41 at r2 (raw file):

Previously, miladz68 (milad) wrote…

I think this should be renamed to stop.

100% agree because pause is temporary thing so it should logically have sort of continue method

Previously, xlab (Max Kupriianov) wrote…

@ysv @miladz68 @oliverasistemas

Guys it's really another deep subject. I tried to dig deeply into whole IoC thing while trying to make a new app. It's a rabbit hole of metaprogramming. So, I just used my oldschool patterns instead, to deliver it quickly and don't have too much meta for a subcommand with few flags.

Let's make another discussion somewhere.

Also, if we enforce this as a rule, I'm fine, but the following must be done before we do that:

Full guide on how to create new apps without a lot of boilerplate

Strip down unused IoC code, it's really confusing to read

Make znet init patterns to be reusable, right now it's all hardcoded for a single use case (of znet app).

100% I would not force to use parallel everywhere it is
as for passing context I think it is right thing to be addressed but Stop method is required also because their purpose is different
as for fetching logger from ctx I always try to avoid these type of things it unless it is mandatory. As for me storing of variables in ctx is bad practice similar to global variables but what is more ctx vars are not typed.
As for IoC I agree with Maxim and I also found it challenging to go through all of code and even after spending bunch of hours on it I still don't feel comfortable with codebase

wojtek-coreum

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, and @xlab)

a discussion (no related file):

Previously, xlab (Max Kupriianov) wrote…

The tags are marked as rc0, ... ,rcN basically release candidates. Which they are.
I don't know what users have in mind when trying to use "release candidate" code, but, which tag naming patterns you suggest?

Totally ignoring tags mechanics in git is bad, since go.mod allows only commit/tag references. And commits are very easy to confuse.

Until new code is in master just use replace directive in go.mod of the other repo to point it to your local version

xlab

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, and @xlab)

a discussion (no related file):

Previously, wojtek-coreum (Wojtek) wrote…

Until new code is in master just use replace directive in go.mod of the other repo to point it to your local version

"My local version" doesn't work because
a) everything shall be in git, not on the disk
b) the implementation and usage are in different repos

That means, is you're trying to check my implementation and pull the branch, it won't compile since replace points to local version on the disk.

ysv

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, and @xlab)

func (p *paceImpl) reportingLoop() {
	for range p.timer.C {
		func() {

is there any specific reason to put it into a separate func ?

xlab

Reviewable status: 4 of 6 files reviewed, 8 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)

pkg/pace/pace.go line 41 at r2 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

100% agree because pause is temporary thing so it should logically have sort of continue method

Now it's stop with full shutdown.

Previously, miladz68 (milad) wrote…

I left a comment below, sorry if my comments are all over the place, but to put it in other words by having more control over the goroutine (preferably with context.Context if it makes sense) then we can do away with mutex.

No problem, reasonable. I added it & tests.

Previously, ysv (Yaroslav Savchuk) wrote…

100% I would not force to use parallel everywhere it is

as for passing context I think it is right thing to be addressed but Stop method is required also because their purpose is different

as for fetching logger from ctx I always try to avoid these type of things it unless it is mandatory. As for me storing of variables in ctx is bad practice similar to global variables but what is more ctx vars are not typed.

As for IoC I agree with Maxim and I also found it challenging to go through all of code and even after spending bunch of hours on it I still don't feel comfortable with codebase

Done.

Previously, ysv (Yaroslav Savchuk) wrote…

I see
In such cases I prefer to move this to a separate func then
But current implementation is also ok for me

Done.

xlab

Reviewed 1 of 6 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)

wojtek-coreum

Reviewed 1 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

a discussion (no related file):
@miladz68 @ysv
During our discussions on goroutines you agreed that you prefer to manage goroutines manually and you said that it creates a trouble for you if goroutine is hidden somewhere under the hood. This library does exactly that. It hides the goroutine in the constructor of the object.

I completely understand why it is done this way. Because as we discussed managing goroutine is a repeatable task so write this code once. And I know that most of the people do it this way but this fact doesn't mean that this is the right thing.

Basically there are two activities happening here:

collecting and reporting metric
managing goroutine

In my opinion those two things should be done separately. This library should expose two functions:

one for counting: Step
second one for logging: reportingLoop(ctx context.Context) which should be made public and renamed to sth like Run

Then I should be able to use the library like this:

go func() {
    p.Run(ctx) // Run reports status periodically and exits once `ctx` is canceled
}()

for {
    // ...
    p.Step(1)
}

Previously, xlab (Max Kupriianov) wrote…

Done.

Did you forget to push the commit? Because it's still an inlined function

type paceImpl struct {
	mux *sync.RWMutex

What fields here are protected by the mutex?
For better readability, could you please divide fieds into two groups:

those not protected by the mutex
those protected by the mutex

like this:

value    int64
label    string
interval time.Duration

mux *sync.RWMutex
repFn    ReporterFunc
cancelFn func()
timer    *time.Timer

?
In the example above I divided the fields randomly. It makes my life easier when it comes to understanding the code.

type paceImpl struct {
	mux *sync.RWMutex

paceImpl is returned using pointer so you don't need a pointer to the mutex

pkg/pace/pace.go line 30 at r3 (raw file):

	lastTick time.Time
	repFn    ReporterFunc
	cancelFn func()

use context.CancelFn as a type for cancelFn to clearly indicate its origin and purpose

	repFn    ReporterFunc
	cancelFn func()
	timer    *time.Timer

@ysv @miladz68
Previously I had a discussion with @xlab regarding using time.Timer over simple <-time.After. His argument is that there was a guy who created 16 millions timers per second using time.After, each leaving for 5 minutes. Obviously that guy killed the garbage collector. @xlab will probably paste a link in the reply.

The thing is that here we don't have this problem. Wouldn't it be better then to just use simple <-time.After which is just a one-liner rather than time.Timer which must be stored in the struct, started and reseted? Why to care about this additional stuff if you have a fire-and-forget solution?

pkg/pace/pace.go line 48 at r3 (raw file):

func (p *paceImpl) report() {
	timeframe := time.Since(p.lastTick)
	if abs(timeframe-p.interval) < 10*time.Millisecond {

Could you please explain the idea behind this if? I also don't understand the purpose of lastTick.

pkg/pace/pace.go line 72 at r3 (raw file):

	}

	paceCtx, cancelFn := context.WithCancel(context.Background())

To be compatible with the rest of the software please use ctx received as an argument to this function instead of context.Background()

	p.cancelFn = cancelFn

	go p.reportingLoop(paceCtx)

I don't like the idea that object constructor starts the goroutine. Constructor is there to prepare the object. You don't expect that by calling constructor you start some serious job in the background. I consider it to be a design issue. If you want to do it this way there should be a Start() method starting the goroutine.

pkg/pace/zap_reporter.go line 11 at r3 (raw file):

// ZapReporter reports using the provided zap logger and stops reporting when flow of events is stoped.
func ZapReporter(log *zap.Logger) ReporterFunc {

What if there are many metrics being reported by the same setup? It would be nice to report them using single log-item instead of flooding the terminal.
Also if there are many metrics reported together it is quite important to group them because if many goroutines log their stuff, then lines from all of them are mixed and it might be hard to find matching values in the stream.

pkg/pace/zap_reporter.go line 19 at r3 (raw file):

		case value == 0 && previous == 0:
			return // don't report anything
		case value == 0 && previous != 0:

Please explain what is done inside this case

pkg/pace/zap_reporter.go line 47 at r3 (raw file):

		switch timeframe {
		case time.Second:
			log.Sugar().Infof("%s: %s/s in %v", label, intFmt(value), timeframe)

Is there a reason you use %s, intFmt and floatFmt instead of appropriate %-something syntax directly?

pkg/pace/zap_reporter.go line 47 at r3 (raw file):

		switch timeframe {
		case time.Second:
			log.Sugar().Infof("%s: %s/s in %v", label, intFmt(value), timeframe)

Could you please use structured logging using zap.Fields? Logs are there not only to present themselves in the terminal but they are also collected by different tools like elasticsearch, grafana or processed by other software. By using just a simple string you literally kill the possibility of manipulating them (filtering, searching, grouping).

wojtek-coreum

Reviewed 1 of 6 files at r2.
Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

xlab

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

Previously, wojtek-coreum (Wojtek) wrote…

@ysv @miladz68
Previously I had a discussion with @xlab regarding using time.Timer over simple <-time.After. His argument is that there was a guy who created 16 millions timers per second using time.After, each leaving for 5 minutes. Obviously that guy killed the garbage collector. @xlab will probably paste a link in the reply.

The thing is that here we don't have this problem. Wouldn't it be better then to just use simple <-time.After which is just a one-liner rather than time.Timer which must be stored in the struct, started and reseted? Why to care about this additional stuff if you have a fire-and-forget solution?

While referring to "some guy on the internetz" may sound sketchy, the original Go code comment discourages anyone to use time.After, it's a cool thing that worked for early projects, but then people started to put it into production and memory begin to be eaten. I didn't want to use time.After because it introduces a bad example into the code, some other team member may copy it and put into a hot path.

I also tried to simplify time management code there by replacing time.Timer with time.Ticker in a loop, see this commit ffd5a40 but Ticker has floating intervals and under heavy GC load the time interval starts to drift, 10s becomes 10.3943409s and so on.

Honestly, I spent on optimising shit out of this package for so much time, I rarely spend so much time on really advanced code.

xlab

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

Previously, xlab (Max Kupriianov) wrote…

While referring to "some guy on the internetz" may sound sketchy, the original Go code comment discourages anyone to use time.After, it's a cool thing that worked for early projects, but then people started to put it into production and memory begin to be eaten. I didn't want to use time.After because it introduces a bad example into the code, some other team member may copy it and put into a hot path.

I also tried to simplify time management code there by replacing time.Timer with time.Ticker in a loop, see this commit ffd5a40 but Ticker has floating intervals and under heavy GC load the time interval starts to drift, 10s becomes 10.3943409s and so on.

Honestly, I spent on optimising shit out of this package for so much time, I rarely spend so much time on really advanced code.

That being said, it has to be Timer.

wojtek-coreum

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

Previously, xlab (Max Kupriianov) wrote…

That being said, it has to be Timer.

"If efficiency is a concern" is the keyword. Nobody will fire logging function more often than a second.

wojtek-coreum

Reviewable status: all files reviewed, 18 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

a discussion (no related file):
@miladz68 @ysv @xlab

The more I think about this library the more I'm sure that we should come back to the design board and think about it once again, specifying requirements first. In my opinion it's a reasonable assumption that this package might be used to print more than one metric in the same logical scope. Because we don't have this situation now (and in other unanswered comment I asked about the list of metrics @xlab wants to report), I'm not insisting on supporting the full scenario now, but this code should be ready to add such support by extending the library, not reimplementing it from scratch.

In current state I see three problems related to reporting many metrics:

each metric starts separate goroutine, its own timer etc. It takes additional resources. I don't care much but if @xlab wants to really discuss performance implications of using time.After over time.Timer (https://reviewable.io/reviews/CoreumFoundation/coreum-tools/8#-N3zBkiG7tiBTBfQfmav) then this point should be fixed first
the more important side effect of reporting each metric in its own goroutine is that metrics which are reported together in the source code, won't be printed together in the console. There are (might be - it's reasonable assumption) other goroutines running in parallel, they also log the stuff. Then, logs produced by metrics will be mixed with other logs, which means finding related metrics to compare them will be difficult
the other side effect of running a goroutine per metric is that chances are great (I guess 100%) that goroutines of the reporters might be scheduled in different order, so the lines in console will appear in a random order each time, forcing me to hunt for the right line I'm interested in at the moment.

So please, let's design this library in the way that it makes our life easier, not harder.

wojtek-coreum

Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

pkg/pace/pace_test.go line 21 at r3 (raw file):

	})

	go func() {

if tested code panics, go test will just terminate immediately without running other tests

pkg/pace/pace_test.go line 27 at r3 (raw file):

	}()

	time.Sleep(50 * time.Millisecond)

Every time you use time.Sleep in this unit test you have no guarantees that the tested code does what you expect. It might happen (and to answer your next question: yes, it really happens) that machine running this test (e.g. in CI) is overloaded causing random failures of testing framework. On the other hand if you set sleep to be high enough to handle 95% of situations and apply this approach in other unit tests then the testing will take much longer.

This is bad approach. If you really want to test this library it should be done without goroutines and library should be written in the way that does not require goroutines to be tested.

wojtek-coreum

Reviewed 1 of 2 files at r3.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

miladz68

Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

type paceImpl struct {
	mux *sync.RWMutex

with pause and resume function gone. I don't think we need the mutex anymore.

Previously, wojtek-coreum (Wojtek) wrote…

"If efficiency is a concern" is the keyword. Nobody will fire logging function more often than a second.

To me this discussion seems unproductive. Because I think this is not something to bring on so much back and forth.

ysv

Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

a discussion (no related file):

Previously, wojtek-coreum (Wojtek) wrote…

@miladz68 @ysv
During our discussions on goroutines you agreed that you prefer to manage goroutines manually and you said that it creates a trouble for you if goroutine is hidden somewhere under the hood. This library does exactly that. It hides the goroutine in the constructor of the object.

I completely understand why it is done this way. Because as we discussed managing goroutine is a repeatable task so write this code once. And I know that most of the people do it this way but this fact doesn't mean that this is the right thing.

Basically there are two activities happening here:

collecting and reporting metric

managing goroutine

In my opinion those two things should be done separately. This library should expose two functions:

one for counting: Step

second one for logging: reportingLoop(ctx context.Context) which should be made public and renamed to sth like Run

Then I should be able to use the library like this:
go func() {
    p.Run(ctx) // Run reports status periodically and exits once `ctx` is canceled
}()

for {
    // ...
    p.Step(1)
}

Yes that is what I don't like in this solution

a discussion (no related file):

Previously, wojtek-coreum (Wojtek) wrote…

@miladz68 @ysv @xlab

The more I think about this library the more I'm sure that we should come back to the design board and think about it once again, specifying requirements first. In my opinion it's a reasonable assumption that this package might be used to print more than one metric in the same logical scope. Because we don't have this situation now (and in other unanswered comment I asked about the list of metrics @xlab wants to report), I'm not insisting on supporting the full scenario now, but this code should be ready to add such support by extending the library, not reimplementing it from scratch.

In current state I see three problems related to reporting many metrics:

each metric starts separate goroutine, its own timer etc. It takes additional resources. I don't care much but if @xlab wants to really discuss performance implications of using time.After over time.Timer (https://reviewable.io/reviews/CoreumFoundation/coreum-tools/8#-N3zBkiG7tiBTBfQfmav) then this point should be fixed first

the more important side effect of reporting each metric in its own goroutine is that metrics which are reported together in the source code, won't be printed together in the console. There are (might be - it's reasonable assumption) other goroutines running in parallel, they also log the stuff. Then, logs produced by metrics will be mixed with other logs, which means finding related metrics to compare them will be difficult

the other side effect of running a goroutine per metric is that chances are great (I guess 100%) that goroutines of the reporters might be scheduled in different order, so the lines in console will appear in a random order each time, forcing me to hunt for the right line I'm interested in at the moment.

So please, let's design this library in the way that it makes our life easier, not harder.

Running goroutine per metric doesn't make sense for me also in the way I see it. We should be able to run multiple reporters with multiple metrics reported by each one

pkg/pace/pace.go line 72 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

To be compatible with the rest of the software please use ctx received as an argument to this function instead of context.Background()

agree

pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Every time you use time.Sleep in this unit test you have no guarantees that the tested code does what you expect. It might happen (and to answer your next question: yes, it really happens) that machine running this test (e.g. in CI) is overloaded causing random failures of testing framework. On the other hand if you set sleep to be high enough to handle 95% of situations and apply this approach in other unit tests then the testing will take much longer.

This is bad approach. If you really want to test this library it should be done without goroutines and library should be written in the way that does not require goroutines to be tested.

I also don't like to use sleep in tests but if there is no other way to test it it is ok
I believe in this case it should be possible to test without sleep but I would not insist on it much

pkg/pace/zap_reporter.go line 11 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

What if there are many metrics being reported by the same setup? It would be nice to report them using single log-item instead of flooding the terminal.
Also if there are many metrics reported together it is quite important to group them because if many goroutines log their stuff, then lines from all of them are mixed and it might be hard to find matching values in the stream.

The thing here is that we wanted to log a single metric for new
And I don't see any issue on adding multi-metric logging in this part of code
However in other parts it is more complicated

pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Is there a reason you use %s, intFmt and floatFmt instead of appropriate %-something syntax directly?

It is done to avoid code duplication and looks reasonable for me

pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Could you please use structured logging using zap.Fields? Logs are there not only to present themselves in the terminal but they are also collected by different tools like elasticsearch, grafana or processed by other software. By using just a simple string you literally kill the possibility of manipulating them (filtering, searching, grouping).

agree

xlab

Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

a discussion (no related file):

Previously, ysv (Yaroslav Savchuk) wrote…

Running goroutine per metric doesn't make sense for me also in the way I see it. We should be able to run multiple reporters with multiple metrics reported by each one

To address issues outlined in this comment:

multiple goroutines (like 3-4 of extra go routines) will not make any impact on the performance, in a language where goroutines are spawned in millions.

2 -3) log lines are never in perfect order, because you have other logging from other goroutines and any Logger will use buffered logging, this is especially true for cloud environments when logs are collected from multiple sources and ordering is never taken seriously.

pkg/pace/pace.go line 86 at r2 (raw file):
I tried to resolve the conversation and Reviewable automatically adds "Done."

current implementation is also ok for me

This is a green light for me.

Previously, wojtek-coreum (Wojtek) wrote…

I don't like the idea that object constructor starts the goroutine. Constructor is there to prepare the object. You don't expect that by calling constructor you start some serious job in the background. I consider it to be a design issue. If you want to do it this way there should be a Start() method starting the goroutine.

This is a common Go pattern, sometimes it makes sense to make an explicit Start, but for this package I wanted to make interface as clean as possible.

Also, it was done this way to match semantics of time.NewTimer or time.NewTicker, basically you initialize it and it start ticking.

pkg/pace/pace_test.go line 21 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

if tested code panics, go test will just terminate immediately without running other tests

It's handled by the go testing workflow

pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

I also don't like to use sleep in tests but if there is no other way to test it it is ok
I believe in this case it should be possible to test without sleep but I would not insist on it much

The code under test has asynchronous nature, I could go miles by scaffolding syncing primitives around, but I decided that using time constrains close to real world that are simplier to use.

This approach should be considered case-by-case of course.

pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

It is done to avoid code duplication and looks reasonable for me

Originally it was floatFmt in all places, but intFmt improves the output clarity. Right now it still makes sense to abstract away the specifics of the number type.

pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

agree

I did this on purpose because structured fields in Zap cause the log line to span across multiple lines. With 3 variables there and 2 additional line from the Zap itself, it will be 6 lines of text printed every tick.

However, if you insist I can convert this into structured version easily.

… signature.

xlab

Reviewable status: 4 of 6 files reviewed, 22 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)

Previously, wojtek-coreum (Wojtek) wrote…

What fields here are protected by the mutex?
For better readability, could you please divide fieds into two groups:

those not protected by the mutex

those protected by the mutex

like this:
value    int64
label    string
interval time.Duration

mux *sync.RWMutex
repFn    ReporterFunc
cancelFn func()
timer    *time.Timer
?
In the example above I divided the fields randomly. It makes my life easier when it comes to understanding the code.

I've changed the ordering of the fields, also renamed mux to lastTickMux to signify what it protects now.

Previously, wojtek-coreum (Wojtek) wrote…

paceImpl is returned using pointer so you don't need a pointer to the mutex

It doesn't hurt to have pointer here still.

pkg/pace/pace.go line 30 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

use context.CancelFn as a type for cancelFn to clearly indicate its origin and purpose

Done.

Previously, miladz68 (milad) wrote…

To me this discussion seems unproductive. Because I think this is not something to bring on so much back and forth.

I agree, let's move on.

pkg/pace/pace.go line 72 at r3 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

agree

Added ctx param to the New() method, now the context has a root.

// All ticks (or steps) are aggregated in timeframes specified using interval.
// If the reporting function was not provided, ZapReporter will be used as default.
func New(ctx context.Context, label string, interval time.Duration, reportFn ...ReporterFunc) Pace {

Made the reportFn optional, can be omitted if you assume that a default-company-wide-standartized-approved reported func shall be used by default.

xlab

Reviewed 2 of 2 files at r4, all commit messages.
Reviewable status: all files reviewed, 22 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)

pkg/pace/zap_reporter.go line 11 at r3 (raw file):

Previously, ysv (Yaroslav Savchuk) wrote…

The thing here is that we wanted to log a single metric for new
And I don't see any issue on adding multi-metric logging in this part of code
However in other parts it is more complicated

As has been discussed on the call, the package was never meant as a professional metric collecting package. You don't expect log.Println to be able to print 100 log lines at the same time, or time.NewTimer to run many timers at once.

This package implements a debugging helper, also some CLI guidance to show some activity in the console and server logs. It's oriented towards INFO/DEBUG logging capabilities, rather than metrics collection agents like Prometheus exporters, etc.

pkg/pace/zap_reporter.go line 19 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Please explain what is done inside this case

The flow/pace is considered to be stalled and that fact is reported. It helps to avoid situations when you still have 0.00003/s but not stale. This case handles absolute zeroes in a time frame.

xlab

Reviewable status: all files reviewed, 22 unresolved discussions (waiting on @miladz68, @oliverasistemas, @wojtek-coreum, @xlab, and @ysv)

pkg/pace/pace.go line 48 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Could you please explain the idea behind this if? I also don't understand the purpose of lastTick.

lastTick stores starting point for the current timeframe. Basically the timestamp of the last counter reset.

As for this particular if, it rounds up timeframe to p.interval, if the difference between them is less than 10ms. This allows to get you round numbers when GC pressure is high or your external triggering function was not accurate 100%, but close enough to p.interval.

wojtek-coreum

Reviewed all commit messages.
Reviewable status: all files reviewed, 19 unresolved discussions (waiting on @miladz68, @xlab, and @ysv)

a discussion (no related file):

Previously, xlab (Max Kupriianov) wrote…

To address issues outlined in this comment:

multiple goroutines (like 3-4 of extra go routines) will not make any impact on the performance, in a language where goroutines are spawned in millions.

2 -3) log lines are never in perfect order, because you have other logging from other goroutines and any Logger will use buffered logging, this is especially true for cloud environments when logs are collected from multiple sources and ordering is never taken seriously.

I gave more details about points 2 and 3 in other comment in this PR: https://reviewable.io/reviews/CoreumFoundation/coreum-tools/8#-N3zKHU48bJ5cQkdWDWY

Let's continue the discussion there.

Previously, xlab (Max Kupriianov) wrote…

I tried to resolve the conversation and Reviewable automatically adds "Done."

current implementation is also ok for me

This is a green light for me.

For me, using inline functions is not an issue but if someone else in the team finds it unclear then I'm for applying this change. @ysv - you made the original comment so it's up to you

Previously, xlab (Max Kupriianov) wrote…

It doesn't hurt to have pointer here still.

but it just doesn't make sense

pkg/pace/pace.go line 48 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

lastTick stores starting point for the current timeframe. Basically the timestamp of the last counter reset.

As for this particular if, it rounds up timeframe to p.interval, if the difference between them is less than 10ms. This allows to get you round numbers when GC pressure is high or your external triggering function was not accurate 100%, but close enough to p.interval.

If you want to be as precise as possible in calculations and taking real time between reports instead of just p.interval then why do you ignore 10-millisecond shift? Why not just take the real timeframe as it is returned by timeframe := time.Since(p.lastTick) without rounding?

Previously, xlab (Max Kupriianov) wrote…

This is a common Go pattern, sometimes it makes sense to make an explicit Start, but for this package I wanted to make interface as clean as possible.

Also, it was done this way to match semantics of time.NewTimer or time.NewTicker, basically you initialize it and it start ticking.

pkg/pace/pace.go line 27 at r4 (raw file):

type paceImpl struct {
	lastTickMux *sync.RWMutex
	lastTick    time.Time

lastTick is referenced from two methods only reportingLopp (running in goroutine) and report. report is called only inside reportingLoop. It means all the read/write operations on lastTick are made by the same goroutine, so you don't need a mutex to protect it.

Moreover lastTick could be just a local variable of reportingLoop declared before for and all the updates could happen inside reportingLoop, then lastTick could be passed as an argument to report method.

Previously, xlab (Max Kupriianov) wrote…

Made the reportFn optional, can be omitted if you assume that a default-company-wide-standartized-approved reported func shall be used by default.

using ... is an antipattern of making an argument optional in go. It suggests to the caller that it is valid to pass many of them
I'm for removing it completely

pkg/pace/pace_test.go line 21 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

It's handled by the go testing workflow

It's not:

func TestFirst(t *testing.T) {
	done := make(chan struct{})
	go func() {
		panic("panicked")
		close(done)
	}()
	<-done
}

func TestSecond(t *testing.T) {
	assert.Fail(t, "second test failed")
}

result of running go test .:

[wojciech@coreum pkg]$ go test .
panic: panicked

goroutine 20 [running]:
github.com/CoreumFoundation/coreum-tools/pkg.TestFirst.func1()
	/home/wojciech/coreum-tools/pkg/hello_test.go:12 +0x27
created by github.com/CoreumFoundation/coreum-tools/pkg.TestFirst
	/home/wojciech/coreum-tools/pkg/hello_test.go:11 +0x6a
FAIL	github.com/CoreumFoundation/coreum-tools/pkg	0.013s
FAIL

TestSecond is never executed and it's failure is not reported

pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

The code under test has asynchronous nature, I could go miles by scaffolding syncing primitives around, but I decided that using time constrains close to real world that are simplier to use.

This approach should be considered case-by-case of course.

IMO If you want to test this code, do it the right way. If the code must be changed to be tested properly - change it. In my opinion the logic in the library could be so simple that it doesn't require testing at all.

More context:
The unit tests are run by CI on every pull request made by all the team members. I really want to avoid situations where someone receives failures unrelated to committed code. In such scenario they must go to CI and inspect results just to find out that it's unrelated failure caused by overloaded CI server, and then wait until CI runs again. Doing such things to our teammates is really a good practice.

@ysv @miladz68

pkg/pace/zap_reporter.go line 11 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

As has been discussed on the call, the package was never meant as a professional metric collecting package. You don't expect log.Println to be able to print 100 log lines at the same time, or time.NewTimer to run many timers at once.

This package implements a debugging helper, also some CLI guidance to show some activity in the console and server logs. It's oriented towards INFO/DEBUG logging capabilities, rather than metrics collection agents like Prometheus exporters, etc.

My point is not to use separate log.* for each metric. My point is to use a single call to log.* to print all of them at once so there is a guarantee that all the metrics created by the caller are printed together so the end-user is not forced to hunt them in the terminal.

I've already said elsewhere that lack of logging many metrics is the sign that important requirement was missed during the design. I have two use-cases so far where I want two metrics to be printed together.

This is an example of code to print all the metrics in single log item:

		fields := make([]zap.Field, 0, len(r.reporters))
		for _, c := range r.reporters {
			fields = append(fields, c(scale))
		}
		log.Info(r.title, fields...)

pkg/pace/zap_reporter.go line 19 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

The flow/pace is considered to be stalled and that fact is reported. It helps to avoid situations when you still have 0.00003/s but not stale. This case handles absolute zeroes in a time frame.

I still don't understand, please ping me on slack when you have time

pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

Originally it was floatFmt in all places, but intFmt improves the output clarity. Right now it still makes sense to abstract away the specifics of the number type.

@ysv I don't understand

My point is that this:

log.Sugar().Infof("%s: %s/s in %v", label, intFmt(value), timeframe)

could be replaced by this:

log.Sugar().Infof("%s: %d/s in %v", label, value, timeframe)

and intFmt could be removed. If this is done the way I propose, where is the code duplication?

pkg/pace/zap_reporter.go line 47 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

I did this on purpose because structured fields in Zap cause the log line to span across multiple lines. With 3 variables there and 2 additional line from the Zap itself, it will be 6 lines of text printed every tick.

However, if you insist I can convert this into structured version easily.

How logs are formatted depends on how logger is configured. core, coreznet and corezstress have CLI flag --log-format accepting one of 3 values:

json - formats each log item in a single-line json - this is good if app is a service running in production where logs are collected and sent to elastic, grafana etc.
console - standard format delivered by zap which is intended to format logs in human-readable form, one line per log item - in practice the output is shitty if fields contain special characters because they are escaped like in json
yaml - custom format written to me which formats logs as a multi-line yaml - logs might take a lot of space but it's easier if you want to really debu something, and it's possible to load them into editor and use folding

To summarize:
Formatting the logs the way you like is not a responsibility of pace but the encoder which is used by zap when they are really printed.
If you prefer to have a one line per log item, use --log-format value of your choice.
But pace itself should produce well-structured logs to enable further processing capabilities offered by external tools.

miladz68

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

Previously, wojtek-coreum (Wojtek) wrote…

but it just doesn't make sense

@miladz68

I don't think it is necessary but I like to see us stick to a convention and avoid arbitrary styles. I think it makes sense to avoid pointers as much as possible and introduce them as less often as possible. I think if one of them gets the job done, the other one should be avoided.

miladz68

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

Previously, wojtek-coreum (Wojtek) wrote…

using ... is an antipattern of making an argument optional in go. It suggests to the caller that it is valid to pass many of them

I'm for removing it completely

@miladz68

I agree with wojtek here.

miladz68

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

Previously, wojtek-coreum (Wojtek) wrote…

@miladz68

In our discussion I argued that this goroutine is playing the role of PrinterController and is not part of the counter package. for me a separation concerns would be nice

wojtek-coreum

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

miladz68

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

IMO If you want to test this code, do it the right way. If the code must be changed to be tested properly - change it. In my opinion the logic in the library could be so simple that it doesn't require testing at all.

More context:
The unit tests are run by CI on every pull request made by all the team members. I really want to avoid situations where someone receives failures unrelated to committed code. In such scenario they must go to CI and inspect results just to find out that it's unrelated failure caused by overloaded CI server, and then wait until CI runs again. Doing such things to our teammates is really a good practice.

@ysv @miladz68

testing async code is hard. it is super hard in fact. but writing correct code is hard as well. I agree with yaroslav here I would like to see it done correctly but I would not put too much emphasis on it.

wojtek-coreum

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, miladz68 (milad) wrote…

testing async code is hard. it is super hard in fact. but writing correct code is hard as well. I agree with yaroslav here I would like to see it done correctly but I would not put too much emphasis on it.

Personally if I see a test like this (using unpredictable Sleep) I prefer to not have a test at all, especially in this case where tested logic could be simple and once accepted not modified for long time

xlab

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

I think it makes sense to avoid pointers as much as possible

any argumentation?

Previously, miladz68 (milad) wrote…

with pause and resume function gone. I don't think we need the mutex anymore.

Mutex locks lastTick which still participates in the control loop. Which can become even more important to have, if the package implements an external-controllable ticking like you suggested on the call.

pkg/pace/pace_test.go line 21 at r3 (raw file):

FAIL

this is the output of the testing workflow, it marks test as failing upon panic.

pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, wojtek-coreum (Wojtek) wrote…

Personally if I see a test like this (using unpredictable Sleep) I prefer to not have a test at all, especially in this case where tested logic could be simple and once accepted not modified for long time

Oh sorry for adding too much tests

wojtek-coreum

Reviewable status: all files reviewed, 17 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)

pkg/pace/pace_test.go line 21 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

FAIL

this is the output of the testing workflow, it marks test as failing upon panic.

The correct output of testing workflow is the list of all failing tests.

pkg/pace/pace_test.go line 27 at r3 (raw file):

Previously, xlab (Max Kupriianov) wrote…

Oh sorry for adding too much tests

@xlab I'm not going to discuss with you this way

wojtek-coreum

Reviewed 1 of 2 files at r4.
Reviewable status: all files reviewed, 22 unresolved discussions (waiting on @miladz68, @wojtek-coreum, @xlab, and @ysv)