🐛 Fix / UTF-8 characters in SAMLResponse rejected by xmerl_scan by docJerem · Pull Request #22 · docJerem/ex_saml

docJerem · 2026-05-06T07:59:36Z

Summary

Fixes a production crash on POST /api/sp/consume/... where xmerl_scan rejected any non-ASCII character (e.g. é in an AttributeValue) with {:wfc_Legal_Character, {:bad_character, 233}}.
Root cause: to_charlist/1 (and String.to_charlist/1) was applied to the raw UTF-8 XML binary before :xmerl_scan.string/2. to_charlist decodes UTF-8 into Unicode codepoints, but xmerl_scan expects a list of raw UTF-8 bytes and performs its own decoding — so it interpreted codepoint 233 (0xE9) as a stray UTF-8 continuation byte and bailed out.
Switches the four xmerl call sites to :binary.bin_to_list/1: decode_response/2 (DEFLATE + plain-base64 branches in Core.Binding), the encrypted-assertion scan in Core.SP, and Metadata.parse/1. Only the Core.Binding paths were observed in prod; the other two were latent and would have failed on any IdP returning accents in encrypted assertions or in metadata.

Test plan

mix test — 199 tests, 0 failures (51 new regression tests).
Reproduced original prod crash locally with a SAMLResponse containing é — fails on main, passes on this branch.
Stash-and-rerun verified the new tests fail without the fix (real regression coverage, not just decoration).
Coverage spans 2-byte UTF-8 (é/è/ñ/ß/ü/ø/ł/č), 3-byte (€/Ω/cyrillic/CJK) and 4-byte (emoji 🎉), in element text and attribute values, on both DEFLATE and non-DEFLATE decode paths, plus <OrganizationName> in metadata.
Smoke check on a staging IdP returning a French given_name once deployed.

decode_response/2 (and the encrypted-assertion + metadata parse paths) piped the XML binary through to_charlist/1 before xmerl_scan.string/2. to_charlist decodes UTF-8 into Unicode codepoints, but xmerl_scan expects raw UTF-8 bytes and does its own decoding — feeding it codepoints made it reject any non-ASCII char with {:wfc_Legal_Character, {:bad_character, _}}, crashing /api/sp/consume on assertions containing accents. Switches to :binary.bin_to_list/1 at the four call sites and adds regression coverage across 2/3/4-byte UTF-8 sequences in element text, attribute values, DEFLATE and non-DEFLATE response paths, plus metadata OrganizationName.

docJerem mentioned this pull request May 6, 2026

✨ Feature / Mix security.check_release task #15

Open

4 tasks

docJerem merged commit 64e6e30 into main May 6, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Fix / UTF-8 characters in SAMLResponse rejected by xmerl_scan#22

🐛 Fix / UTF-8 characters in SAMLResponse rejected by xmerl_scan#22
docJerem merged 1 commit intomainfrom
fix/utf-8-characters

docJerem commented May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

docJerem commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

docJerem commented May 6, 2026 •

edited

Loading