Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore other storage methods for our test results #3728

Open
jedel1043 opened this issue Mar 9, 2024 · 12 comments
Open

Explore other storage methods for our test results #3728

jedel1043 opened this issue Mar 9, 2024 · 12 comments
Labels
discussion Issues needing more discussion E-Medium Medium difficulty problem Internal Category for changelog technical debt

Comments

@jedel1043
Copy link
Member

jedel1043 commented Mar 9, 2024

Right now, our test results are just stored in a big JSON. This is very much not ideal for several reasons, like the overall big size and the slow deserialization and serialization speed.

Right now, I see two other representations that we could use, each one with its pros and cons.

Binary serialized file (any format)

  • ✔️ Small size.
  • ✔️ Fast to serialize and deserialize.
  • ✔️ Naturally represents the tree structure of test suites.
  • ❌ Harder to query for specific test types (per feature/version/suite).
  • ❌ Most formats cannot be lazily deserialized.

Binary database file (Sqlite)

  • ✔️ Small size.
  • ✔️ Decent serialization speed and virtually nonexistent deserialization cost. (Sqlite lazily loads entries).
  • ✔️ Nice for data queries using SQL syntax.
  • ❌ Complicates our serialization logic considerably.
  • ❌ Requires a bit of database design to represent the tree structure of suites as a table.
@jedel1043 jedel1043 added technical debt discussion Issues needing more discussion E-Medium Medium difficulty problem Internal Category for changelog labels Mar 9, 2024
@Razican
Copy link
Member

Razican commented Mar 9, 2024

We could also have a small backend, connected to a DB with a simple JSON API, since I'm not sure how easy it is to handle SQLite from Docusaurus, for example.

Serializing into the DB should be straightforward with Diesel or something like that. We would just need to derive some structures and associate commit IDs (or tags) to results.

@nekevss
Copy link
Member

nekevss commented Mar 9, 2024

Potential theoretical option:

Store the results in a Test262 Results repository

Maybe someone else knows if this is a no-go from the jump. I was doing some research and we might be able to run a github action in another repository off a trigger action from the main repo. We'd have to test the idea, but if we could send results to another repository and have the github action commit the file. We'd be a bit less constrained on size.

Some benefits:

  • provides easier visibility and access to the results.
  • representation / formatting is less constrained as it's removed from the main repository
  • allows us to setup our results as a REST API in github-pages (for example: boajs.dev/test262-results/)

Negatives:

  • Might complicate CI considerably
  • Would require testing to determine viability

@HalidOdat
Copy link
Member

While I like the idea of easily querying the data, my main concern with sqlite is that we need a backend to handle to data, instead of just github pages.

I like @nekevss idea, instead of pushing to gh-branch on main push, we push to a cross-repository branch.

The two actions that we use to get the current gh-pages branch and push to it:

So this looks very doable, and I don't think it would complicate the CI that much.

@jedel1043
Copy link
Member Author

jedel1043 commented Mar 9, 2024

I'm not sure how easy it is to handle SQLite from Docusaurus, for example.

I think it's pretty straightforward with the sql.js library...

...however, I also really like this idea:

We could also have a small backend, connected to a DB with a simple JSON API,

But in that case I'd just move the whole webpage into the server, just to make it easier to integrate and keep in sync.

@jedel1043
Copy link
Member Author

I see the concerns about having to use a backend for this, but I just wanted to mention that the storing method for the data is completely orthogonal to the data representation itself; we could move our current JSON data into a separate repo, and we could also keep using this repo to store the data but using binary formats, for example.

@jedel1043
Copy link
Member Author

jedel1043 commented Mar 9, 2024

The two actions that we use to get the current gh-pages branch and push to it:\n\nThe actions/checkout has a repository option.\nThe github-push-action action used to push to gh-pages has a repository option too for cross-repository pushing

About this, I always thought that we don't need to have test data for all commits made to the repo. Right now, if there are a lot of commits in succession, we're running the whole test suite for each and every one of them. This is not ideal because those commits could just be deps bumps, for example. In this case, I'd suggest just running the test suite once a day from the webpage repo itself.

@jasonwilliams
Copy link
Member

jasonwilliams commented Mar 10, 2024

My 2 cents

I like @Razican's idea of a backend for a long term solution, anything with a file is going to slow down overtime with the JSON parsing, I don't mind looking into having a database on our own server somewhere (which we could use funds to do), it is more moving parts and things to maintain so I understand the concern with it.

In the meantime I do like @nekevss's idea (especially in the short term) to just move things to another repo and throw the data in there. Im happy to help set this up, it seems like Halid has already pointed us in the direction of the variables to changes.

About this, I always thought that we don't need to have test data for all commits made to the repo. Right now, if there are a lot of commits in succession, we're running the whole test suite for each and every one of them. This is not ideal because those commits could just be deps bumps, for example. In this case, I'd suggest just running the test suite once a day from the webpage repo itself.

Yes I agree, we have more than enough traffic here now that every commit is excessive, a nightly runner is more than enough in my opinion, I would even go to weekly but that precision may be a bit off.

@jasonwilliams
Copy link
Member

jasonwilliams commented Mar 11, 2024

Update

I've created https://github.com/boa-dev/website-data where we can put benchmarks and test262 results. This is set to run nightly.
This leaves the question of where to put dev documentation. I don't know if dev docs have proven useful to keep?

@jedel1043 suggested improving the (contributing) docs to show developers how to generate their own API docs rather than hosting a new one from each commit. The other option is we have "nightly API docs" but I don't know how often these would actually get used.

If we do plan to release more often then maybe we could do-away with dev docs (as most of our users would be using the stable release anyway)

@jedel1043
Copy link
Member Author

jedel1043 commented Apr 7, 2024

Have thought about this recently, and I think the biggest low hanging fruit right now would be to avoid generating a test result file for each tag, and instead embed the tag results on the test itself. What do you think?

@nekevss
Copy link
Member

nekevss commented Apr 8, 2024

If I'm thinking about that right, it would involve consolidating the test files into one general results file, correct?

Is there a way that we can do that without increasing the size of the results file we have fetch via the site? My only concern would be causing the conformance page to lag from fetching too large of a file. Outside of that concern, I'd be open to any changes.

@jedel1043
Copy link
Member Author

jedel1043 commented Apr 8, 2024

If I'm thinking about that right, it would involve consolidating the test files into one general results file, correct?

Is there a way that we can do that without increasing the size of the results file we have fetch via the site? My only concern would be causing the conformance page to lag from fetching too large of a file. Outside of that concern, I'd be open to any changes.

Maybe we could test the increase of load times first? It could be that the new data doesn't increase our file size too much.

@jedel1043
Copy link
Member Author

@CanadaHonk, we were discussing our data representation and since you are also dealing with the data representation of tests runs, we wanted to know how exactly does https://test262.fyi stores its tests runs for each engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Issues needing more discussion E-Medium Medium difficulty problem Internal Category for changelog technical debt
Projects
None yet
Development

No branches or pull requests

5 participants