Skip to content

Fix BinLogReader for Linux binlogs and HtmlGenerator duplicate serverPath#257

Merged
ericstj merged 2 commits into
dotnet:mainfrom
ericstj:fix-binlog-reader
May 14, 2026
Merged

Fix BinLogReader for Linux binlogs and HtmlGenerator duplicate serverPath#257
ericstj merged 2 commits into
dotnet:mainfrom
ericstj:fix-binlog-reader

Conversation

@ericstj
Copy link
Copy Markdown
Member

@ericstj ericstj commented May 14, 2026

Summary

Fix BinLogToSln/HtmlGenerator tool issues discovered while adding VMR source-index support.

1. BinLogReader: Use tree-based reader instead of event-based Replay()

BinLogReader.Replay() silently aborts when reading binlogs that lack a CurrentUICulture message — a known issue (MSBuildStructuredLog#936). This is common in binlogs produced on Linux. When Replay() hits a PropertyReassignment record, it tries to format a resource string that was never initialized, throws an ArgumentNullException, catches it internally via OnException, and stops replaying — with zero diagnostics to the caller.

Impact for VMR source-build: BinLogToSln extracted zero compiler invocations from Linux source-build binlogs (arcade, runtime, roslyn, etc.), producing empty SLN files. The same binlogs processed with the tree-based Serialization.Read() API work perfectly — arcade yields 25 invocations, runtime yields 895.

Impact for existing per-repo pipelines: Most repos (runtime, roslyn, aspnetcore) run their source-index job on Windows agents or cross-compile with -os linux on Windows, so their binlogs likely include the culture message and are unaffected. However, any repo that runs source-index natively on Linux could be silently producing empty results. Worth auditing.

The fix switches ExtractInvocations to always use the tree-based Serialization.Read() path (which was already implemented for .buildlog files). This approach:

  • Works reliably for all binlog formats (no Strings.Initialize() workaround needed)
  • Is simpler — no event wiring or state tracking dictionaries
  • Throws on parse errors instead of silently aborting
  • Fully populates ProjectProperties via GetEvaluation(build).GetProperties() (verified: ~1000 properties per invocation including TargetFramework)

Also adds Linux compiler path trimming — on Linux, the Csc task command line starts with /path/to/Roslyn/bincore/csc (no .exe extension), which the old trimming logic didn't handle.

A workaround exists (Strings.Initialize() before Replay(), as used in component-detection), but the tree-based approach is strictly better for our use case.

2. HtmlGenerator: Allow duplicate serverPath mappings

When multiple SLN files from the same VMR build map to the same source directory (e.g., dotnet-dotnet and dotnet-dotnet-windows both mapping to src/), HtmlGenerator crashed with Dictionary.Add on duplicate keys. Changed to indexer assignment to allow last-write-wins.

Testing

  • Verified against actual Linux source-build binlogs downloaded from VMR build 2973064
  • All 10 existing BinLogToSln tests pass
  • Arcade: 0 → 25 invocations, Runtime: 0 → 895 invocations
  • Windows binlogs (winforms): 29 invocations (unchanged, still works)
  • Property extraction verified: all invocations have full properties including TargetFramework

ericstj and others added 2 commits May 14, 2026 09:36
When multiple stage 1 bundles share the same ExtractPath (e.g. dotnet-dotnet
and dotnet-dotnet-windows both extracting to the same directory), they produce
identical /serverPath arguments. Dictionary.Add crashes on the duplicate key.

Use indexer assignment instead so the last value wins silently.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BinLogReader.Replay() doesn't reliably fire events for newer binlog
formats produced by MSBuild 18.x / .NET 10 source-build. The tree-based
Serialization.Read() API works correctly for all binlog formats.

This change:
- Switches ExtractInvocations to always use ExtractInvocationsFromBuild
  (tree-based) instead of the event-based Replay() path
- Removes dead event-based code (TryGetInvocationFromRecord,
  GetCommandLineFromEventArgs, event handler subscriptions)
- Adds Linux compiler path trimming (bincore/csc without .exe extension)
- Adds dotnet (no .exe) exec path detection for Linux

Tested against actual Linux source-build binlogs: arcade (25 invocations),
runtime (895 invocations) — both returned 0 with the old Replay() approach.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ericstj ericstj requested review from joperezr and radical May 14, 2026 17:07
@ericstj
Copy link
Copy Markdown
Member Author

ericstj commented May 14, 2026

I've validated these locally and in the pipeline. I need to get these merged and consumed in VMR to get a good set of data out of VMR. Right now it's busted and likely has been busted on linux for a while, we just didn't happen to have any pipelines building on linux for their source index data.

@ericstj
Copy link
Copy Markdown
Member Author

ericstj commented May 14, 2026

@ericstj ericstj merged commit 30f49e1 into dotnet:main May 14, 2026
2 checks passed
joperezr added a commit to joperezr/source-indexer that referenced this pull request May 15, 2026
The fork has diverged ~21 commits since the last upstream sync (PR dotnet#184,
2025-05-12). Blindly re-running update-source-browser.ps1 risks silently
dropping local features (dotnet#183 signing key, dotnet#192 dedup, dotnet#193 source-generated
files, dotnet#255 net10 retarget, dotnet#257 Linux binlog fix, plus Dependabot bumps).

- 02: prepend warning block to the 'Updating the vendored SourceBrowser'
  section listing the divergent PRs and recommending cherry-picks over a
  full re-sync.
- 00: cross-reference the warning from the overview bullet.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants