Fix BinLogReader for Linux binlogs and HtmlGenerator duplicate serverPath#257
Merged
Conversation
When multiple stage 1 bundles share the same ExtractPath (e.g. dotnet-dotnet and dotnet-dotnet-windows both extracting to the same directory), they produce identical /serverPath arguments. Dictionary.Add crashes on the duplicate key. Use indexer assignment instead so the last value wins silently. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BinLogReader.Replay() doesn't reliably fire events for newer binlog formats produced by MSBuild 18.x / .NET 10 source-build. The tree-based Serialization.Read() API works correctly for all binlog formats. This change: - Switches ExtractInvocations to always use ExtractInvocationsFromBuild (tree-based) instead of the event-based Replay() path - Removes dead event-based code (TryGetInvocationFromRecord, GetCommandLineFromEventArgs, event handler subscriptions) - Adds Linux compiler path trimming (bincore/csc without .exe extension) - Adds dotnet (no .exe) exec path detection for Linux Tested against actual Linux source-build binlogs: arcade (25 invocations), runtime (895 invocations) — both returned 0 with the old Replay() approach. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
I've validated these locally and in the pipeline. I need to get these merged and consumed in VMR to get a good set of data out of VMR. Right now it's busted and likely has been busted on linux for a while, we just didn't happen to have any pipelines building on linux for their source index data. |
radical
approved these changes
May 14, 2026
Member
Author
joperezr
added a commit
to joperezr/source-indexer
that referenced
this pull request
May 15, 2026
The fork has diverged ~21 commits since the last upstream sync (PR dotnet#184, 2025-05-12). Blindly re-running update-source-browser.ps1 risks silently dropping local features (dotnet#183 signing key, dotnet#192 dedup, dotnet#193 source-generated files, dotnet#255 net10 retarget, dotnet#257 Linux binlog fix, plus Dependabot bumps). - 02: prepend warning block to the 'Updating the vendored SourceBrowser' section listing the divergent PRs and recommending cherry-picks over a full re-sync. - 00: cross-reference the warning from the overview bullet. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix BinLogToSln/HtmlGenerator tool issues discovered while adding VMR source-index support.
1. BinLogReader: Use tree-based reader instead of event-based Replay()
BinLogReader.Replay()silently aborts when reading binlogs that lack aCurrentUICulturemessage — a known issue (MSBuildStructuredLog#936). This is common in binlogs produced on Linux. WhenReplay()hits aPropertyReassignmentrecord, it tries to format a resource string that was never initialized, throws anArgumentNullException, catches it internally viaOnException, and stops replaying — with zero diagnostics to the caller.Impact for VMR source-build: BinLogToSln extracted zero compiler invocations from Linux source-build binlogs (arcade, runtime, roslyn, etc.), producing empty SLN files. The same binlogs processed with the tree-based
Serialization.Read()API work perfectly — arcade yields 25 invocations, runtime yields 895.Impact for existing per-repo pipelines: Most repos (runtime, roslyn, aspnetcore) run their source-index job on Windows agents or cross-compile with
-os linuxon Windows, so their binlogs likely include the culture message and are unaffected. However, any repo that runs source-index natively on Linux could be silently producing empty results. Worth auditing.The fix switches
ExtractInvocationsto always use the tree-basedSerialization.Read()path (which was already implemented for.buildlogfiles). This approach:Strings.Initialize()workaround needed)ProjectPropertiesviaGetEvaluation(build).GetProperties()(verified: ~1000 properties per invocation including TargetFramework)Also adds Linux compiler path trimming — on Linux, the Csc task command line starts with
/path/to/Roslyn/bincore/csc(no.exeextension), which the old trimming logic didn't handle.A workaround exists (
Strings.Initialize()beforeReplay(), as used in component-detection), but the tree-based approach is strictly better for our use case.2. HtmlGenerator: Allow duplicate serverPath mappings
When multiple SLN files from the same VMR build map to the same source directory (e.g.,
dotnet-dotnetanddotnet-dotnet-windowsboth mapping tosrc/), HtmlGenerator crashed withDictionary.Addon duplicate keys. Changed to indexer assignment to allow last-write-wins.Testing