Skip to content

Locale-dependent serialization of integer types produces invalid XML #2

@cgri

Description

@cgri

Affected version: XSD 4.2.0 (C++/Tree mapping)
Affected files: xsd/cxx/tree/serialization/int.hxx
(and likely the analogous headers for long, short,
byte, unsigned-int, unsigned-long, integer, etc.)

Summary

The serialization operators for built-in integer types use a
std::basic_ostringstream without calling imbue(std::locale::classic()).
As a result, when the global C++ locale uses digit grouping (e.g. de_DE,
where 42000 is formatted as "42.000"), the generated XML contains
group-separator characters in numeric fields. This violates the
XML Schema specification, which requires xs:int, xs:long, xs:integer
etc. to be serialized as a plain digit sequence with no grouping.

Reproduction

Minimal schema:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="value" type="xs:int"/>
</xs:schema>

Minimal program:

#include <iostream>
#include <locale>
#include <sstream>
#include "schema.hxx"

int main() {
    std::locale::global(std::locale("de_DE.UTF-8"));

    value_t v(42000);
    xml_schema::namespace_infomap map;

    std::ostringstream os;
    os.imbue(std::locale::classic()); // does not help
    value_(os, v, map);

    std::cout << os.str() << std::endl;
}

Expected output:
42000

Actual output:
42.000

The imbue() call on the user-supplied stream has no effect because the
serialization functions internally construct their own ostringstream
which inherits the global locale at construction time.

Root cause

In xsd/cxx/tree/serialization/int.hxx, the insertion operators
construct a basic_ostringstream without imbuing the classic locale
before writing the value. Compare with xsd/cxx/tree/serialization/element.hxx,
where the insert() helper does call imbue(std::locale::classic())
correctly. The fix in element.hxx appears to have been intentional;
the integer-type headers seem to have been overlooked.

Suggested fix

Add os.imbue(std::locale::classic()) immediately after constructing
the basic_ostringstream in each of the affected operator<< overloads,
analogous to the existing fix in element.hxx. The same fix should be
applied consistently across all numeric-type serialization headers
(int.hxx, long.hxx, short.hxx, byte.hxx, the unsigned variants, and
integer.hxx / non-negative-integer.hxx etc.).

Workaround context

This is particularly painful for users who ship XSD-generated code as
part of a library or DLL, where modifying the global C++ locale is not
acceptable (it would affect the host application's behaviour). Per-thread
locale tricks (uselocale on POSIX, _configthreadlocale on Windows) do
not reliably propagate to std::basic_ostringstream in all standard
library implementations, so a library-side fix is the only robust
solution.

Environment

  • XSD: 4.2.0
  • Compiler: msvc 2022
  • OS: windows 11

Happy to provide a patch if helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions