Reopen #120 #166

ymz-ncnk · 2024-06-21T14:19:43Z

Hi, guys! Do you remember #120? I implemented something similar, but in a separate project. And we could merge both projects into one. As a variant, we could create an organization on github where each serializer would have its own module, or we could merge them right here. What do you think?

matheusd · 2024-06-21T15:31:24Z

The current report card now has fields detailing (for each serializer):

Use of unsafe string on umarshalling
Buffer reuse on marshalling
time.Time support type
API kind (reflection, codegen, etc)

Looking at your project, the only relevant thing that would be missing from goserbench now (IMO) is "Raw Encoding Type" (binary or text), which could be added directly here.

ymz-ncnk · 2024-06-21T16:18:46Z

In fact, it's not about specific features, they can be added quite easily. In the proposed project:

All serializers use the same data.
All the data of the serializer, such as the name of the results, the list of supported features or the Data itself, is located in its package. Thus, if the serializer author wants to make changes, it will touch only his package.
There is also a separate benchser package that allows to easily integrate benchmarks into another project.

And as for me, it would be great to have separate modules rather than separate packages. In this case, for example, only the serializer author could make changes to his module.

ymz-ncnk · 2024-06-21T16:55:19Z

In fact, it's not about specific features, they can be added quite easily.

And I think a list of features should be expanded. There are many differences between serializers, as mentioned in #120, some have string length restrictions, some support versioning, some encode field names or types, etc.

alecthomas · 2024-06-22T00:34:27Z

I do think it would be preferable if all serialisers used the same data, and having each package be largely self-contained in terms of configuration etc. would be great.

I don't personally see great value in having an exportable benchmarking package. The benefit of having a single package, where the benchmarks are all run together, is that they're easily comparable. When they're run in a separate project, on a different machine, the results aren't directly comparable. And what's the advantage over just using that packages own serialisation benchmarks, which presumably they all already have, and also presumably are much more comprehensive?

I also don't think this project should be split into separate modules owned by serialiser owners. That will just introduce maintenance overhead, and remove impartiality. The whole point of this project is that it's not controlled or driven by people who have a vested interest in the outcome. There have been a few cases where people want to promote their own packages, try to slip some questionable code through, and it gets caught in code review.

That said, I'm very open to it becoming community owned/driven, and moving to its own org as part of that process would be reasonable.

ymz-ncnk · 2024-06-22T01:05:47Z

In addition, the proposed project now has more accurate results, unless I did a mistake somewhere. Let me explain why. For example, Bebop under the hood uses the unsafe golang package for all types except string, MUS can also work in a similar mode:

mus+notunsafe+reuse 79.82
beebop200sc+notunsafe+reuse 86.1

At the moment, we are not considering such situations. And don't get me wrong, I like Bebop! and use it only for the example.

ymz-ncnk · 2024-06-22T01:34:13Z

The whole point of this project is that it's not controlled or driven by people who have a vested interest in the outcome. There have been a few cases where people want to promote their own packages, try to slip some questionable code through, and it gets caught in code review.

A few thoughts on how this might work. Let's imagine that there are benchmarks and I want to add my own serializer to them. I write a module (a project on github) and ask to add it. After that, the benchmarks owners can review my code, and if something is wrong, they can open an issue. Then I fix it and ask again. The same can be applied for updates.

alecthomas · 2024-06-22T03:50:15Z

Sure but why is this better?

matheusd · 2024-06-22T09:23:12Z

In addition, the proposed project now has more accurate results

How are your results "more accurate" than the ones here? Yes, the numbers may be different (and any two runs of the benchmark set will produce different numbers as a result) but why are they more accurate?

ymz-ncnk · 2024-06-22T11:35:03Z

First of all, when different serializers use different input it's not a good thing. Secondly (I'll only talk about Bebop and MUS for example), right now on my own laptop I got the following results (marshal+unmarshal):

bebop 257
mus 318

And we can think: "Well, MUS is slower than Bebop on 61". But in fact, these results are for:

bebop200sc+notunsafe 105.1
mus+raw+varint 143.1

, when Bebop uses unsafe code under the hood, for all types except string, and MUS uses raw and varint encodings. If MUS will also use unsafe code for all types except string, we will have:

bebop200sc+notunsafe 105.1
mus+notunsafe 104.7

Do you see the difference? And this is correct, because in this case they both use almost the same code for serialization. @matheusd

ymz-ncnk · 2024-06-22T11:40:46Z

Sure but why is this better?

Actually, I'm not insisting on an organization, it's just an option. We could try to test it and see if it works or not. But if you are not sure and still have questions, let's just close this issue.

deneonet · 2024-06-22T12:21:34Z

@ymz-ncnk , the configuration/features for each serializer is something great. It allows comparing the serializers based on features needed. What about adding the features option to this repo and then updating the report, so that users can filter the benchmarks of all serializers for the selected features? Then we could also benchmark, for example, mus like this:

mus
mus/unsafe
mus/reuse
mus/unsafe_reuse

, instead of just two benchmarks, because in the report, on default, only mus would be shown, but when selecting "unsafe", both mus and mus/unsafe would be shown, etc.

ymz-ncnk · 2024-06-22T13:32:28Z

A UI filter is a great idea too! Just look at this, there are a lot of cases and actually this is hard consumable information, which challenges me a lot at first. But with two predefined filters "Fastest Safe" and "Fastest Unsafe" it has become more clear. @deneonet

ymz-ncnk · 2024-06-22T13:53:54Z

What about adding the features option to this repo and then updating the report

Again, as an option, we can take the proposed project as a basis and add a UI to it (because of limitations of the current project). The big disadvantage of this is that we will be using a new codebase that should be reviewed.

ymz-ncnk · 2024-06-22T14:04:20Z

and add a UI to it

Add a UI, additional metrics, other serializers and more features.

alecthomas · 2024-06-22T22:18:55Z

I'll reiterate what I said before, I think these are good ideas:

I do think it would be preferable if all serialisers used the same data, and having each package be largely self-contained in terms of configuration etc. would be great.

I'm +1 on them.

The rest of the proposal, I'm -1 on.

Regarding merging the projects, I think it would be simpler to iterate and improve on what @deneonet has already done in this project, as he suggested above.

ymz-ncnk · 2024-06-22T22:27:56Z

Ok, I was just suggesting.

alecthomas · 2024-06-23T02:44:45Z

All good. Suggestions are always welcome and worth discussing, thanks!

ymz-ncnk · 2024-06-23T12:03:26Z

How are your results "more accurate" than the ones here? Yes, the numbers may be different (and any two runs of the benchmark set will produce different numbers as a result) but why are they more accurate?

I found one more thing. Let's compare two code snipets:

// MUS
func (s MUSSerializer) Marshal(o interface{}) ([]byte, error) {
	v := o.(*goserbench.SmallStruct)
	n := ord.SizeString(v.Name)
	n += raw.SizeInt64(v.BirthDay.UnixNano())
// ...

// Bebop
type Bebop200ScSerializer struct {
	a   BebopBuf200sc
	buf []byte
}

func (s *Bebop200ScSerializer) Marshal(o interface{}) (buf []byte, err error) {
	v := o.(*goserbench.SmallStruct)
	a := &s.a
	a.Name = v.Name
	a.BirthDay = v.BirthDay
// ...

If the serializer does not use standard data (like Bebop), it must copy it to/from its own data at each Marshal/Unmarshal. At the same time, everyone else doesn't do this.

matheusd · 2024-06-23T13:27:31Z

If the serializer does not use standard data (like Bebop), it must copy it to/from its own data at each Marshal/Unmarshal. At the same time, everyone else doesn't do this.

Have you measured the overhead involved in just the copying section?

ymz-ncnk · 2024-06-23T15:09:56Z

No, but I don't think it's a good question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reopen #120 #166

Reopen #120 #166

ymz-ncnk commented Jun 21, 2024

matheusd commented Jun 21, 2024

ymz-ncnk commented Jun 21, 2024

ymz-ncnk commented Jun 21, 2024

alecthomas commented Jun 22, 2024 •

edited

Loading

ymz-ncnk commented Jun 22, 2024 •

edited

Loading

ymz-ncnk commented Jun 22, 2024 •

edited

Loading

alecthomas commented Jun 22, 2024

matheusd commented Jun 22, 2024

ymz-ncnk commented Jun 22, 2024 •

edited

Loading

ymz-ncnk commented Jun 22, 2024

deneonet commented Jun 22, 2024

ymz-ncnk commented Jun 22, 2024

ymz-ncnk commented Jun 22, 2024 •

edited

Loading

ymz-ncnk commented Jun 22, 2024 •

edited

Loading

alecthomas commented Jun 22, 2024

ymz-ncnk commented Jun 22, 2024

alecthomas commented Jun 23, 2024

ymz-ncnk commented Jun 23, 2024

matheusd commented Jun 23, 2024

ymz-ncnk commented Jun 23, 2024

Reopen #120 #166

Reopen #120 #166

Comments

ymz-ncnk commented Jun 21, 2024

matheusd commented Jun 21, 2024

ymz-ncnk commented Jun 21, 2024

ymz-ncnk commented Jun 21, 2024

alecthomas commented Jun 22, 2024 • edited Loading

ymz-ncnk commented Jun 22, 2024 • edited Loading

ymz-ncnk commented Jun 22, 2024 • edited Loading

alecthomas commented Jun 22, 2024

matheusd commented Jun 22, 2024

ymz-ncnk commented Jun 22, 2024 • edited Loading

ymz-ncnk commented Jun 22, 2024

deneonet commented Jun 22, 2024

ymz-ncnk commented Jun 22, 2024

ymz-ncnk commented Jun 22, 2024 • edited Loading

ymz-ncnk commented Jun 22, 2024 • edited Loading

alecthomas commented Jun 22, 2024

ymz-ncnk commented Jun 22, 2024

alecthomas commented Jun 23, 2024

ymz-ncnk commented Jun 23, 2024

matheusd commented Jun 23, 2024

ymz-ncnk commented Jun 23, 2024

alecthomas commented Jun 22, 2024 •

edited

Loading

ymz-ncnk commented Jun 22, 2024 •

edited

Loading

ymz-ncnk commented Jun 22, 2024 •

edited

Loading

ymz-ncnk commented Jun 22, 2024 •

edited

Loading

ymz-ncnk commented Jun 22, 2024 •

edited

Loading

ymz-ncnk commented Jun 22, 2024 •

edited

Loading