Skip to content

Guard server-packet structs against silent size drift#66

Open
Mosch0512 wants to merge 10 commits into
mainfrom
wire-struct-guards
Open

Guard server-packet structs against silent size drift#66
Mosch0512 wants to merge 10 commits into
mainfrom
wire-struct-guards

Conversation

@Mosch0512
Copy link
Copy Markdown
Owner

@Mosch0512 Mosch0512 commented May 22, 2026

Summary

Defends against the bug class behind PR sven-n#402 (silent client freeze on the character-select / loading screen) — the freeze itself was patched by PR sven-n#404; this PR adds the safeguards that should prevent a regression and make the next one diagnosable.

Scope grew during review based on what was uncovered. Final shape:

1. Generated wire-size guards from OpenMU XML (codegen Level 1)

  • third_party/OpenMU git submodule pinned to 1a56f6dd5.
  • tools/gen_wire_sizes.py — Python 3 stdlib, parses ServerToClientPackets.xml, joins against a baked-in mapping (XML name → C++ struct name), emits one static_assert(sizeof(X) <= N) per mapped packet.
  • add_custom_command in src/CMakeLists.txt runs the generator only when the XML or script changes; output lands in ${CMAKE_BINARY_DIR}/generated/Network/Server/wire_sizes.generated.h, which WSclient.h includes at end-of-file.
  • Three packets seeded: CharacterCreationSuccessful → PRECEIVE_CREATE_CHARACTER, CharacterInformationExtended → PRECEIVE_JOIN_MAP_SERVER_EXTENDED, RespawnAfterDeathExtended → PRECEIVE_REVIVAL_EXTENDED.
  • <= rather than == because some client structs intentionally don't decode all trailing fields (e.g. PreviewData on character-creation). The actual safety property is "client struct fits in wire packet so safe_cast succeeds."

Adding coverage for another packet is a one-line append to PACKET_MAPPING.

2. Visible safe_cast failures (runtime diagnostics)

  • Single safe_cast<T>(span, packet_type = nullptr) template that logs a MCD_ERROR on every size mismatch — not just the call sites updated to pass a name. Fallback name is typeid(T).name() (mangled, but readable enough; RTTI is on).
  • LogSafeCastSizeMismatch defined in WSclient.cpp so the header stays free of g_ConsoleDebug includes. Uses bounded 4-arg vswprintf and a %.64hs field-width cap on the type name to avoid any buffer-overflow surface.
  • ReceiveJoinMapServer dispatch site (WSclient.cpp:13207) also emits an MCD_ERROR describing the user-visible symptom ("loading screen will appear frozen") so the cause is obvious in the log.

3. MCD_ERROR actually fires in Release builds

Without this, the runtime diagnostics from #2 would compile but never reach the user, because the entire CmuConsoleDebug::Write body was gated by #ifdef CONSOLE_DEBUG (not defined in any production build).

  • CmuConsoleDebug::GetInstance() now always returns a valid static instance. Previously returned nullptr unless CSK_LH_DEBUG_CONSOLE was defined, making every g_ConsoleDebug->Write(...) call site a null-pointer member-function call. It "worked" only because the body was empty.
  • Write() now routes MCD_ERROR to g_ErrorReport.Write (file-based, always compiled, opens MuError.log on startup) before the existing CONSOLE_DEBUG-gated console path. MCD_SEND / MCD_RECEIVE / MCD_NORMAL stay debug-only so production logs aren't flooded.

Net effect after merge: a future regression of the PR sven-n#402 shape produces a one-line entry like

[MCD_ERROR] safe_cast<PRECEIVE_JOIN_MAP_SERVER_EXTENDED>: received 92 bytes, expected at least 96 -- packet dropped
[MCD_ERROR] [ReceiveJoinMapServer] dropped -- protocol state stays REQUEST_JOIN_MAP_SERVER, main render will not be enabled (loading screen will appear frozen).

in MuError.log instead of a silent hang. Or, more likely, the build never compiles in the first place because the generated static_assert trips.

4. CI: recurse submodules

actions/checkout@v4 in the three MinGW build workflows now uses submodules: recursive. Required so CI can see the OpenMU XML the codegen step depends on.

Test plan

  • Builds clean: MinGW-w64 i686, Release.
  • Generator output verified: wire_sizes.generated.h contains the expected three asserts; sizeof checks compile.
  • Optional: temporarily change one assert by 1 to verify the build breaks where expected.

Future consideration (not in this PR)

The submodule currently vendors all of OpenMU — hundreds of MB of C# server, ASP.NET tooling, tests, etc. — just to access one XML file. Long-term, it would be cleaner to factor the packet schema into a dedicated repo (e.g. MUnique/mu-packet-schema) containing only the XML and the XSD, then have both OpenMU and MuMain pull it in as a tiny submodule. That makes the schema the formal contract between the two projects, lets other client implementations consume it without dragging in OpenMU's server tree, and removes the "MuMain's wire definitions are coupled to a specific OpenMU server commit" framing.

Out of scope here — requires coordination with @sven-n and the OpenMU side — but worth flagging now since this PR is the first time MuMain takes a hard dependency on OpenMU. Mentioned briefly in the design-doc issue sven-n#407.

Scope notes

  • Original scope was just "draft (1) and (3) for PRECEIVE_JOIN_MAP_SERVER_EXTENDED and the surrounding receive structs" — grew based on review feedback (broader safe_cast logging) and a real discovery (MCD_ERROR being a no-op in Release).
  • Pre-existing 3-arg vswprintf in the legacy CONSOLE_DEBUG-gated console path is not changed. Same bug pattern but out of scope; worth a separate sweep PR if desired.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements defensive measures to prevent silent client freezes caused by server-packet size drifts. By adding build-time static assertions and enhanced runtime diagnostic logging, the changes ensure that any future structural mismatches are either caught during compilation or explicitly reported in the debug console, significantly improving maintainability and troubleshooting.

Highlights

  • Build-time safety: Added static_assert checks for three critical server-receive structs to ensure wire sizes remain consistent and prevent silent regressions caused by incorrect packing.
  • Enhanced diagnostics: Introduced a new safe_cast overload that logs detailed error information, including packet type and size mismatches, to the debug console when a cast fails.
  • Improved error reporting: Updated the join-map server packet processing to emit specific console errors, making it easier to identify when a protocol state mismatch occurs.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances network packet validation by introducing a named overload for safe_cast and a corresponding logging utility, LogSafeCastSizeMismatch, to provide detailed error reports when packet sizes do not match expectations. Additionally, it adds static_assert checks for several network structures to ensure their memory layout remains consistent with the expected wire format. A review comment suggests using the %Iu format specifier instead of %zu for std::size_t to ensure compatibility with older Windows CRT versions.

Comment thread src/source/Network/Server/WSclient.cpp Outdated
Mosch0512 added a commit that referenced this pull request May 22, 2026
Older Windows CRTs (msvcrt.dll pre-VS2015) routed via vswprintf do not
recognise the C99 %zu length modifier. Cast the size_t arguments to
unsigned and use %u; packet sizes always fit in 32 bits on this i686
target. Per Gemini review on PR #66.
Mosch0512 added a commit that referenced this pull request May 22, 2026
Two coupled fixes that make MCD_ERROR diagnostics actually visible in
Release builds:

1. CmuConsoleDebug::GetInstance() now always returns a valid static
   instance. The previous behaviour (return nullptr unless
   CSK_LH_DEBUG_CONSOLE was defined) made every g_ConsoleDebug->Write
   call site a null-pointer member-function call. It "worked" only
   because the Write body was empty without CONSOLE_DEBUG; any code
   added there would have crashed in Release.

2. Write() now routes MCD_ERROR to g_ErrorReport.Write (file-based,
   always compiled) before the existing CONSOLE_DEBUG-gated path.
   MCD_SEND / MCD_RECEIVE / MCD_NORMAL stay debug-only so production
   logs don't get flooded.

Net effect: any MCD_ERROR call site -- including the safe_cast size
mismatch diagnostics added in PR #66 -- now lands in MuError.log even
in Release builds.
Mosch0512 added a commit that referenced this pull request May 22, 2026
Replace the three hand-typed static_asserts with a generated header
sourced from OpenMU's authoritative packet schema. Same protection
against the PR sven-n#402 class of bug, but maintained automatically.

Pieces:
- third_party/OpenMU: git submodule pinned to 1a56f6dd5.
- tools/gen_wire_sizes.py: parses ServerToClientPackets.xml, joins
  against a hand-maintained packet-name mapping, emits
  wire_sizes.generated.h with one static_assert(sizeof(X) <= N) per
  mapped packet. The mapping is currently the three structs PR #66
  covered by hand -- adding more is a one-line append.
- src/CMakeLists.txt: add_custom_command runs the generator when the
  XML changes; output goes to ${CMAKE_BINARY_DIR}/generated/, which
  is added to the include path. Main depends on GenWireSizes.
- src/source/Network/Server/WSclient.h: drop the three hand-written
  asserts; #include the generated header at end-of-file.

We assert `<=` rather than `==` because some client structs
intentionally don't decode all trailing fields (e.g. PRECEIVE_CREATE_
CHARACTER doesn't decode PreviewData, so its sizeof is 19 vs the wire
length 42). The actual safety property is "client struct must fit in
wire packet so safe_cast succeeds," which `<=` expresses correctly.

To add coverage for another packet: append a tuple to PACKET_MAPPING
in tools/gen_wire_sizes.py and rebuild.
@Mosch0512
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces automated wire-size guards for network packets by integrating OpenMU's packet definitions as a submodule and utilizing a Python script to generate static assertions. It also enhances the logging system to ensure that critical errors are always recorded in the error log and provides detailed feedback when packet size mismatches occur during safe casting. A security issue was identified in the logging implementation where the use of a fixed-size buffer with vswprintf could lead to a buffer overflow.

Comment on lines +240 to +245
wchar_t szErrorBuffer[256] = L"";
va_list pArgsForFile;
va_start(pArgsForFile, pStr);
vswprintf(szErrorBuffer, pStr, pArgsForFile);
va_end(pArgsForFile);
g_ErrorReport.Write(L"[MCD_ERROR] %ls\r\n", szErrorBuffer);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The implementation of MCD_ERROR logging uses a non-standard 3-argument version of vswprintf into a fixed-size buffer szErrorBuffer[256]. This poses a high risk of buffer overflow if the formatted error message (which may include long packet type names or received sizes) exceeds 255 characters. Since the project aims for cross-platform compatibility (as seen in CMakeLists.txt), it is recommended to use the standard 4-argument vswprintf or _vsnwprintf to enforce the buffer limit and prevent memory corruption.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 58a7d78. Switched the new MCD_ERROR path to the C99 4-arg vswprintf(buf, n, fmt, args) (same convention CErrorReport::Write already uses), and added a %.64hs field-width cap on the type-name format in LogSafeCastSizeMismatch so even a pathological typeid(T).name() expansion can't bloat the format output. Two layers: bounded buffer + bounded field width.

Left the pre-existing 3-arg vswprintf in the legacy CONSOLE_DEBUG console path untouched -- out of scope for this PR; worth a separate sweep PR if you want.

Mosch0512 added a commit that referenced this pull request May 23, 2026
Per Gemini review on PR #66 (high-priority security finding).

The 3-arg vswprintf(buf, fmt, args) is a Microsoft extension with no
buffer-size parameter -- a sufficiently long format expansion can
overflow the fixed 256-wchar buffer. Switch the new MCD_ERROR path in
CmuConsoleDebug::Write to the C99 4-arg form vswprintf(buf, n, fmt,
args), matching the convention already used by CErrorReport::Write.

Also cap the type-name field in LogSafeCastSizeMismatch with %.64hs so
even a pathological typeid(T).name() expansion cannot bloat the format
output. Defense in depth at the format-string level on top of the
buffer-size fix.

The pre-existing 3-arg vswprintf in the legacy CONSOLE_DEBUG-gated
console output is left alone (out of scope for this PR).
Mosch0512 and others added 10 commits May 23, 2026 23:31
Adds two defenses for the bug class behind PR sven-n#402 / fixed by sven-n#404:

1. static_assert(sizeof == N) on PRECEIVE_CREATE_CHARACTER (19),
   PRECEIVE_JOIN_MAP_SERVER_EXTENDED (92), and PRECEIVE_REVIVAL_EXTENDED
   (36). If the #pragma pack(push, 1) ever gets dropped or a field is
   added/reordered, the build now breaks at the offending struct with a
   pointer to check the pragma. Wire sizes match OpenMU's XML schema
   (CharacterInformationExtended, RespawnAfterDeathExtended).

2. Named overload safe_cast<T>(span, "TYPE_NAME") that logs an MCD_ERROR
   with packet type + received vs expected size on failure. Used at the
   join-map receive site. The dispatcher also logs the user-visible
   symptom ("loading screen will appear frozen") so a future drift turns
   a silent freeze into a one-line console diagnostic.
Older Windows CRTs (msvcrt.dll pre-VS2015) routed via vswprintf do not
recognise the C99 %zu length modifier. Cast the size_t arguments to
unsigned and use %u; packet sizes always fit in 32 bits on this i686
target. Per Gemini review on PR #66.
Merge the two safe_cast overloads into a single template with an
optional packet_type. Now every safe_cast<T>(span) call site -- not
just the ones updated to pass a name -- emits an MCD_ERROR on size
mismatch. When no name is supplied, typeid(T).name() (mangled but
readable) is used as a fallback.

Keeps g_ConsoleDebug rather than stderr (stderr is invisible in the
GUI client). Addresses follow-up review feedback.
Two coupled fixes that make MCD_ERROR diagnostics actually visible in
Release builds:

1. CmuConsoleDebug::GetInstance() now always returns a valid static
   instance. The previous behaviour (return nullptr unless
   CSK_LH_DEBUG_CONSOLE was defined) made every g_ConsoleDebug->Write
   call site a null-pointer member-function call. It "worked" only
   because the Write body was empty without CONSOLE_DEBUG; any code
   added there would have crashed in Release.

2. Write() now routes MCD_ERROR to g_ErrorReport.Write (file-based,
   always compiled) before the existing CONSOLE_DEBUG-gated path.
   MCD_SEND / MCD_RECEIVE / MCD_NORMAL stay debug-only so production
   logs don't get flooded.

Net effect: any MCD_ERROR call site -- including the safe_cast size
mismatch diagnostics added in PR #66 -- now lands in MuError.log even
in Release builds.
Replace the three hand-typed static_asserts with a generated header
sourced from OpenMU's authoritative packet schema. Same protection
against the PR sven-n#402 class of bug, but maintained automatically.

Pieces:
- third_party/OpenMU: git submodule pinned to 1a56f6dd5.
- tools/gen_wire_sizes.py: parses ServerToClientPackets.xml, joins
  against a hand-maintained packet-name mapping, emits
  wire_sizes.generated.h with one static_assert(sizeof(X) <= N) per
  mapped packet. The mapping is currently the three structs PR #66
  covered by hand -- adding more is a one-line append.
- src/CMakeLists.txt: add_custom_command runs the generator when the
  XML changes; output goes to ${CMAKE_BINARY_DIR}/generated/, which
  is added to the include path. Main depends on GenWireSizes.
- src/source/Network/Server/WSclient.h: drop the three hand-written
  asserts; #include the generated header at end-of-file.

We assert `<=` rather than `==` because some client structs
intentionally don't decode all trailing fields (e.g. PRECEIVE_CREATE_
CHARACTER doesn't decode PreviewData, so its sizeof is 19 vs the wire
length 42). The actual safety property is "client struct must fit in
wire packet so safe_cast succeeds," which `<=` expresses correctly.

To add coverage for another packet: append a tuple to PACKET_MAPPING
in tools/gen_wire_sizes.py and rebuild.
The codegen step in src/CMakeLists.txt depends on the OpenMU packet XML,
which lives in the third_party/OpenMU submodule added by the previous
commit. CI was running actions/checkout@v4 without `submodules: recursive`
so the submodule directory was empty and the build broke at the codegen
step (cannot find ServerToClientPackets.xml).
The previous codegen commit recorded the submodule pointer at master
HEAD at the time of `git submodule add` (d6689f04), not the commit I
actually checked out and built against locally (1a56f6dd5). Local
builds worked because the working tree was at the right SHA; CI failed
because it clones the recorded pointer, which pointed at a tree where
ServerToClientPackets.xml is at a different path/state.

Pinning to 1a56f6dd5 -- the OpenMU master HEAD on 2026-05-21 -- which
matches the wire sizes the static_asserts encode (CharacterInformation
Extended = 92, RespawnAfterDeathExtended = 36, CharacterCreation
Successful = 42).
Per Gemini review on PR #66 (high-priority security finding).

The 3-arg vswprintf(buf, fmt, args) is a Microsoft extension with no
buffer-size parameter -- a sufficiently long format expansion can
overflow the fixed 256-wchar buffer. Switch the new MCD_ERROR path in
CmuConsoleDebug::Write to the C99 4-arg form vswprintf(buf, n, fmt,
args), matching the convention already used by CErrorReport::Write.

Also cap the type-name field in LogSafeCastSizeMismatch with %.64hs so
even a pathological typeid(T).name() expansion cannot bloat the format
output. Defense in depth at the format-string level on top of the
buffer-size fix.

The pre-existing 3-arg vswprintf in the legacy CONSOLE_DEBUG-gated
console output is left alone (out of scope for this PR).
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
The wire-size codegen needs OpenMU's authoritative
ServerToClientPackets.xml. The previous setup vendored the entire
OpenMU repository as a git submodule (third_party/OpenMU) -- hundreds
of MB of C# server, tooling, and tests just to read one XML file.

The .NET ClientLibrary already pulls MUnique.OpenMU.Network.Packets
via NuGet (see MUnique.Client.Library.csproj). That package ships all
four packet XMLs (ClientToServer, ServerToClient, ChatServer,
ConnectServer) under contentFiles/. Sourcing from there means a
single versioned dependency (0.9.9) instead of vendoring the entire
OpenMU repo, and removes the "MuMain's wire definitions are coupled
to a specific OpenMU server commit" framing in favour of a normal
package upgrade flow.

Changes:

- src/CMakeLists.txt: hoist DOTNET_EXECUTABLE detection,
  mu_native_path(), and MU_NUGET_CACHE_DIR above the wire-size block
  so codegen can resolve the package path. The wire-size block then
  builds OPENMU_PACKETS_XML against the cache and, if missing,
  triggers a one-shot `dotnet restore` of MUnique.Client.Library.csproj
  at configure time. dotnet restore is fast on cache-hit so
  re-configuring stays cheap. WSLENV=NUGET_PACKAGES/w is set so the
  cache override propagates across the WSL->Windows interop boundary
  when DOTNET_EXECUTABLE is dotnet.exe.

- src/CMakeLists.txt: the Native-AOT cross-OS guard now clears a
  local _dotnet_for_aot variable instead of DOTNET_EXECUTABLE, so a
  Linux dotnet targeting Windows can still restore the NuGet package
  for codegen even though it cannot publish the AOT DLL.

- tools/gen_wire_sizes.py: take --xml directly (and an optional
  --source-label for the generated header's provenance comment)
  instead of --openmu-root + an internal layout assumption. Cleaner
  decoupling from any specific source layout.

- .gitmodules / third_party/OpenMU: removed. The other submodules
  (imgui, SDL, SDL_mixer) are unaffected -- they auto-init via
  mu_ensure_submodule() and don't need `submodules: recursive`.

- .github/workflows/mingw-build*.yml: drop `submodules: recursive`
  (only sven-n#408 added it, only for the OpenMU submodule). Add
  actions/setup-dotnet@v4 step so the wire-size codegen's
  restore-on-demand has a dotnet to invoke.

Manually verified locally: configure restores the package to
.nuget/munique.openmu.network.packets/0.9.9/, wire_sizes.generated.h
emits the same three static_asserts (lengths 42, 96, 36), and the
mingw-i686 build links cleanly with the editor-enabled target.
@Mosch0512 Mosch0512 force-pushed the wire-struct-guards branch from ff30020 to 6adc729 Compare May 23, 2026 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant