Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup CI for this project #66

Closed
marghidanu opened this issue Apr 5, 2021 · 27 comments
Closed

Setup CI for this project #66

marghidanu opened this issue Apr 5, 2021 · 27 comments

Comments

@marghidanu
Copy link
Contributor

Hello,

It would be nice to set up some CI to run these implementations in a controlled environment periodically. We're getting benchmarks from different machines, and it is hard to keep track of all the numbers for all the different implementations. We need one source of truth.

@micwoj92
Copy link

micwoj92 commented Apr 5, 2021

There already is #49

@marghidanu
Copy link
Contributor Author

Cool, thanks for pointing it out!

@marghidanu marghidanu reopened this Apr 6, 2021
@marghidanu
Copy link
Contributor Author

I've looked at #49 and somehow it seems for from what we're trying to achieve, We need to standardize a few things, like:

  • Duration of execution (Some people run it for 5 seconds, other 10, other for 60)
  • Number of iterations (I think one is not enough, and compute some stats)
  • Maybe a build matrix (I'm willing to contribute a Raspberry Pi runner)

Let me know what you guys think.

@devblok
Copy link
Contributor

devblok commented Apr 6, 2021

I'm not really convinced that running benchmarks in a CI build is a stable enough environment for controlled results. Keep in mind that those machines could be running any number of tasks at any one time, and CPU time stealing may be an issue. The #49 also mentions the issue with changing hardware.

Nothing wrong with checking the validity, but to get a source of truth, you'd need end user machines running the code and submitting results. Not a perfectly controlled environment, but better than CI machines for sure.

As for duration of execution, somewhat longer runs should reduce frequency scaling differences, but too long, and some chips may lose their turbo boost, screwing with the results. Maybe 30 seconds should be the standard?

@marghidanu
Copy link
Contributor Author

The CI can trigger the jobs on whatever runners we want, for now, the GitHub runners should be fine. If we gather stats after execution we can actually determine the variation in performance and see if it's stable enough.

@micwoj92
Copy link

micwoj92 commented Apr 6, 2021

I'm not really convinced that running benchmarks in a CI build is a stable enough environment for controlled results.

Of course there always is a margin of error due to for example the images being updated (you can track updates here.)

Keep in mind that those machines could be running any number of tasks at any one time, and CPU time stealing may be an issue. The #49 also mentions the issue with changing hardware.

As far as I know all runners are the same specs and all are separated virtual machines.

but to get a source of truth, you'd need end user machines running the code and submitting results.

Keep in mind that benchmark is only benchmark and the results will be different on every machine. There is very low chance for 100% reproducible results. I disagree with the statement "Not a perfectly controlled environment, but better than CI machines for sure." Actually the GitHub runners will make it better. Not only they have the same specs as I mentioned before but also there is much less going in these runners than for example someone testing while having browser, email, text editor open and maybe even more open.

@rhbvkleef
Copy link
Contributor

I'm not really convinced that running benchmarks in a CI build is a stable enough environment for controlled results.

I agree, it isn't. For the actual drag race, we'll need to run everything on a single machine, one at a time. It is, however, very useful to provide ballpark numbers.

@rhbvkleef
Copy link
Contributor

I've looked at #49 and somehow it seems for from what we're trying to achieve, We need to standardize a few things, like:

* Duration of execution (Some people run it for 5 seconds, other 10, other for 60)
* Number of iterations (I think one is not enough, and compute some stats)
* Maybe a build matrix (I'm willing to contribute a Raspberry Pi runner)

Let me know what you guys think.

I agree with the standardization points. It would make it a lot easier to create tables.

About the runner, can you configure it to only run one job at a time? If that's the case, we might be able to use CI for drag racing after all.

@marghidanu
Copy link
Contributor Author

I've looked at #49 and somehow it seems for from what we're trying to achieve, We need to standardize a few things, like:

* Duration of execution (Some people run it for 5 seconds, other 10, other for 60)
* Number of iterations (I think one is not enough, and compute some stats)
* Maybe a build matrix (I'm willing to contribute a Raspberry Pi runner)

Let me know what you guys think.

I agree with the standardization points. It would make it a lot easier to create tables.

About the runner, can you configure it to only run one job at a time? If that's the case, we might be able to use CI for drag racing after all.

This is what I'm doing right now, trying to figure out the GitHub actions runner. :)

@marghidanu
Copy link
Contributor Author

marghidanu commented Apr 6, 2021

I think we can take some steps forward into standardizing the running even before we set up the runners.

  • We need to agree on the duration for running the test: 5 seconds is fine with me
  • How many interactions? On the Crystal example (see README file), I run the test for 20 iterations to make sure the numbers are in the same range.
  • Standard output from the implementations. This is just to make parsing a little easier, based on this we can automatically extract some values and determine that the runners are stable. This should also allow us to put together a report at the end of the CI pipeline. Maybe a nice graph or something.
  • Provisioning dependencies and installing tools might be a bit of a headache but I think every job should be able to take care of this.

If we have solved all the problems above we're ready to add additional operating systems, platforms, etc. using a build matrix.

@rhbvkleef
Copy link
Contributor

This is what I'm doing right now, trying to figure out the GitHub actions runner. :)

Cool! I'll be happy to find out about the results :)

We need to agree on the duration for running the test: 5 seconds is fine with me

Agreed.

How many interactions? On the Crystal example (see README file), I run the test for 20 iterations to make sure the numbers are in the same range.

Iterations? 20 seems a bit high, doesn't it? We'd be waiting for one and a half minute per language per CI run. That's not great. I think we should only run it once on CI, by default. For the drag race, we could look at increasing that number.

Standard output from the implementations. This is just to make parsing a little easier, based on this we can automatically extract some values and determine that the runners are stable. This should also allow us to put together a report at the end of the CI pipeline. Maybe a nice graph or something.

Yes :) I propose this:

<number of runs>
<total time>
<validation: {1,true,True}=valid, {0,false,False}=invalid>

so, simply one line per value, showing number of runs, total time, and the validation result.

Provisioning dependencies and installing tools might be a bit of a headache but I think every job should be able to take care of this.

In #80, I proposed adding a Dockerfile for each language, and for each type of run. That would do it, right?

@marghidanu
Copy link
Contributor Author

I already added a Dockerfile in my Crystal implementation from the beginning. That should take care of provisioning!

@rhbvkleef
Copy link
Contributor

I already added a Dockerfile in my Crystal implementation from the beginning. That should take care of provisioning!

Great!

@marghidanu
Copy link
Contributor Author

Ok, I think I have a pretty good idea of how this pipeline should look like.

@rhbvkleef
Copy link
Contributor

Perhaps changing the output format to this would make more sense:

::set-output name=passes::<number of runs>
::set-output name=totalDuration::<total time>
::set-output name=valid::<validity: {1,true,True,0,false,False}>

Just shamelessly stealing from #66

@marghidanu
Copy link
Contributor Author

I've added #84 as an example of what we're trying to achieve. We should also consider nightly builds.

@davepl
Copy link
Contributor

davepl commented Apr 7, 2021 via email

@micwoj92
Copy link

micwoj92 commented Apr 7, 2021

In any event, I was wondering what the best way to discuss things with everyone following the project in github is? Or would it be typical to open an issue for every checkin? Not sure where the equivalent of a ‘chat’ or ‘message board’ is!

Issues are okay for that but GitHub has "Discussions" feature in beta, they look much more like message boards, you can enable them in repo settings in features section
image

And here example discussions page
https://github.com/github/feedback/discussions

@rhbvkleef
Copy link
Contributor

In any event, I was wondering what the best way to discuss things with everyone following the project in github is? Or would it be typical to open an issue for every checkin?

I think it´s fine for you to submit things without announcement to the main branch, as you indicated you wanted that to primarily contain the code used in your videos. Typically, additions to the main branch are done through Pull/Merge Requests (PRs/MRs, they´re just different terminology used for essentially the same thing). I suspect (and as I asked before), we´ll create a separate branch that will fulfill this purpose for the community contributions (right?).

@davepl
Copy link
Contributor

davepl commented Apr 7, 2021 via email

@marghidanu
Copy link
Contributor Author

I feel that this thread was hijacked :)

@rhbvkleef
Copy link
Contributor

Sorry @marghidanu :/ After this, this topic should be done with. Then we´ll get back to CI stuff.

@davepl, I´ve created a drag-race branch for all the community work. I´ll start reorganizing that for the community contributions, and moving all the current pull-requests over to that branch. Simply keep working on the main branch.

@mike-barber
Copy link
Contributor

I've opened a PR to add Docker to the Rust implementation: #109

It does look like we've got a general problem uploading artifacts from the run, though: https://github.com/davepl/Primes/runs/2305966576

Error: Artifact name is not valid: PrimeCrystal/stl-faithful/marghidanu. Contains character: "/". Invalid artifact name characters include: ",:,<,>,|,*,?,\,/.

I'm guessing the quick fix for this would be something like replacing / with _ for the artifact name, perhaps by using https://github.com/marketplace/actions/replace-string.

We're going to run into issues with very long build times eventually too. It might be possible to limit the PR to build only the projects that have changed, but I suspect this is going to require a bit more custom scripting.

@marghidanu
Copy link
Contributor Author

It looks like the projects are now classified in some way (using subdirectories), this was not the case in the initial design for the CI. I just need to remove the slashes

@rhbvkleef
Copy link
Contributor

Yeah, sorry ´bout that 😬

@marghidanu
Copy link
Contributor Author

CI is now fixed. Let's aim to standardize output and add more solutions.

@mike-barber
Copy link
Contributor

Nice one @marghidanu!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants