New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tools: Apply stats patch on trace_replay.ml #4
Conversation
| let exec_commit rs ((time, message, c), hash) = | ||
| let level = rs.current_row.level in | ||
| Stat_recorder.set_stat_specs (specs_of_row rs.current_row); | ||
| let time = Time.Protocol.of_seconds time in | ||
| let c = on_lhs_context rs c in | ||
| let* hash' = Context.commit ~time ?message c in | ||
| on_rhs_hash rs hash hash'; | ||
| Stdlib.Hashtbl.add rs.hash_per_level level hash'; | ||
| Stdlib.Hashtbl.add rs.hash_per_level rs.current_block_idx hash'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For context, this line change is a hack to allow the selection of the GC commit at the previous split, as it turns out that the cycles can get misaligned otherwise... I'm only mentioning it in case it interfere with @adatario trace update, just so you know there's a trap there :P )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may have fallen into the trap. With this patch applied, I see the following error when 8192 commits have been processed (which is the value of --gc-when):
Replaying trace 8192/145000 commits 09:23 (ETA: --:--) [---------------] 5%
replay: internal error, uncaught exception:
File "src/trace/stats_trace_recorder.ml", line 927, characters 14-20: Assertion failed
Raised at Lwt.Miscellaneous.poll in file "src/core/lwt.ml", line 3077, characters 20-29
Called from Lwt_main.run.run_loop in file "src/unix/lwt_main.ml", line 31, characters 10-20
Called from Lwt_main.run in file "src/unix/lwt_main.ml", line 118, characters 8-13
Re-raised at Lwt_main.run in file "src/unix/lwt_main.ml", line 124, characters 4-13
Called from Cmdliner_term.app.(fun) in file "cmdliner_term.ml", line 24, characters 19-24
Called from Cmdliner_eval.run_parser in file "cmdliner_eval.ml", line 34, characters 37-44
The assertion here fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I've not seen this assert before hmhm any chance it could be related to the skip-gc perhaps? (since the assert seems unhappy that there was no GC?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha ok yeah you're right, it's also possible that it doesn't find a target commit the first time the GC could run due to a +-1 offset? (I've no idea why it still tries to reach for the latest GC stats though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the skip-gc part of the original patch and pushed it as a separate branch/commit. @adatario can look at this, when overhauling the GC triggering code. I'll report if I see any more failures on the replay benchmark run while dumping these stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ui, it's all a bit of a mine-field :)
I also think that this is an interference with skip-gc and the new recording of gc events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! So excited to see the new graphs :)
Co-authored-by: Puneeth Chaganti <punchagan@muse-amuse.in>
78600d8
to
20b004c
Compare
|
Thanks @punchagan! Looks good to me. I think we just need to be careful with the fact that gc/split can now be caused by trace and also by our previous hacks. But with splitting out of the One thought though: Would it make sense to integrate these additional out-of-band stats into the regular stats? That is, add entries to |
@art-w might be better equipped to comment on this. :) |
|
Yeah hm I've no idea, but I agree it sounds cleaner if possible :D This was a quick hack because I didn't want to mess with the existing stats (.. they might even be logging similar information already?) |
|
I ran into a little issue with this patch: All process write to the same file ( I suggest adding a command line option that enables writing these traces to a specified location. E.g: In the scripts we can then use @punchagan what do you think? |
|
Sounds like a good idea. 👍 |
|
Ran into another issue, when a replay is killed the Which seems to be caused by a dangling The suggestion with command-line option to specify path should also solve this. |
No description provided.