Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Dont include working specs as errors #1

Merged
merged 3 commits into from
Apr 26, 2022
Merged

Conversation

vsoch
Copy link
Member

@vsoch vsoch commented Apr 26, 2022

And I still need to do the reclustering. This is a WIP.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
… as working or not working for the web app to load

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
Copy link
Collaborator

@harshithamenon harshithamenon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go ahead with this workflow, but here are some thoughts I have and maybe rethink whether we want to cluster any error message or cluster error messages occurring in failed builds

  • Many a times a build can fail due to system related issues like locks/network/filesystem errors and later succeed. How should we handle that? Should we re-cluster every time this happens?
  • How do we want to handle the case where a user is interested in identifying fixes for error messages that didn't necessarily cause the build to fail and decides to submit the error message to the monitor. I think technically it is still a valid use case.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
@vsoch
Copy link
Member Author

vsoch commented Apr 26, 2022

Many a times a build can fail due to system related issues like locks/network/filesystem errors and later succeed. How should we handle that? Should we re-cluster every time this happens?

A spec that is failed (by spack) is going to be spent to spack monitor. The model needs to be streamling online ML so it's updated immediately.

How do we want to handle the case where a user is interested in identifying fixes for error messages that didn't necessarily cause the build to fail and decides to submit the error message to the monitor. I think technically it is still a valid use case.

If the build doesn't fail I don't think the user will fail to look into that?

@vsoch vsoch merged commit 07edd3c into main Apr 26, 2022
@vsoch vsoch deleted the dont-include-working-specs branch April 26, 2022 22:36
@harshithamenon
Copy link
Collaborator

harshithamenon commented Apr 27, 2022

Many a times a build can fail due to system related issues like locks/network/filesystem errors and later succeed. How should we handle that? Should we re-cluster every time this happens?

A spec that is failed (by spack) is going to be spent to spack monitor. The model needs to be streamling online ML so it's updated immediately.

The situation I am describing is where it succeeds in the next attempt (or sometime in the future). Initially the spec was in errors/, but now that it succeeded do we go find the corresponding the error message from the db, remove the error message and recluster?

@vsoch
Copy link
Member Author

vsoch commented Apr 27, 2022

ohh I see - so it's some kind of ephemeral error (and not related to the spec so the hash would be the same?)

I think that could be a case in and of itself - if we truly find that a previously broken spec is now magically working, we'd report back to the user that there is an exact match for a working spec (and suggest to try again, and if still broken open an issue?) You are right it's not relevant for the demo here where things are hard coded as working/error but it could come up in some real world case. I do hope we don't have too many of those because it suggests there are things we aren't accounting for. Either we could look at those cases and try to add the variables to spack, or add some extra metadata to send to spack monitor to figure out.

@vsoch
Copy link
Member Author

vsoch commented Apr 27, 2022

And if we are holding a clustering, I don't think it would be feasible to manually remove/add points based on a change of state, we'd simply recluster a some frequency, and the temporal aspect of the model (e.g.,the build working from then into the future, for the most part) would eventually be dominant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants