[FIX] Replace .expect() with .unwrap_or() in Ucs2String case conversion — fixes panic on CEA-708 surrogate input by NexionisJake · Pull Request #2239 · CCExtractor/ccextractor

NexionisJake · 2026-03-29T04:37:36Z

In raising this pull request, I confirm the following :

Reason for this PR:

This PR adds new functionality.
This PR fixes a bug that I have personally experienced or that a real user has reported and for which a sample exists.
This PR is porting code from C to Rust.

Sanity check:

I have read and understood the contributors guide.
I have checked that another pull request for this purpose does not exist.
If the PR adds new functionality, I've added it to the changelog. If it's just a bug fix, I have NOT added it to the
changelog.
I am NOT adding new C code unless it's to fix an existing, reproducible bug.

Repro instructions:

Process any CEA-708 stream whose subtitle text contains UCS-2 surrogate code units (0xD800–0xDFFF) with a case-conversion
path enabled (e.g. --sentencecap). CCExtractor panics immediately:

thread 'main' panicked at 'Invalid u32 character', src/rust/lib_ccxr/src/util/encoding.rs:245

A minimal Rust reproducer:
use lib_ccxr::util::encoding::{Ucs2String};
let s = Ucs2String::from_vec(vec![0xD800]); // lone high surrogate
let _ = s.to_lowercase(); // panics

Root Cause

Ucs2String::to_lowercase() and to_uppercase() in src/rust/lib_ccxr/src/util/encoding.rs called:

char::from_u32(c as u32).expect("Invalid u32 character")

UCS-2 surrogate code units (0xD800–0xDFFF) are valid u16 values but are not valid Unicode scalar values. char::from_u32()
returns None for them, and .expect() panics unconditionally. Any real-world CEA-708 broadcast stream carrying surrogate
pairs crashed CCExtractor with no recovery path.

Fix

Replaced both .expect("Invalid u32 character") calls with .unwrap_or(UNAVAILABLE_CHAR.into()), consistent with how
ucs2_to_char() already handles this in the same file (line 1027):

Before:
cc_to_lowercase(char::from_u32(c as u32).expect("Invalid u32 character")) as u16
cc_to_uppercase(char::from_u32(c as u32).expect("Invalid u32 character")) as u16

After:
cc_to_lowercase(char::from_u32(c as u32).unwrap_or(UNAVAILABLE_CHAR.into())) as u16
cc_to_uppercase(char::from_u32(c as u32).unwrap_or(UNAVAILABLE_CHAR.into())) as u16

UNAVAILABLE_CHAR is b'?', which is already the established fallback for unrepresentable code points throughout this file.

Testing

cargo build clean, zero new warnings
cargo clippy clean
cargo test encoding passes
Verified zero remaining .expect() calls on char::from_u32 in the codebase

Fixes #2232

Ucs2String::to_lowercase() and to_uppercase() called char::from_u32(c as u32).expect("Invalid u32 character") for every code unit. UCS-2 surrogate values (0xD800–0xDFFF) are valid u16 but are not valid Unicode scalar values — char::from_u32() returns None for them and .expect() panics unconditionally. Any real-world CEA-708 broadcast stream carrying surrogate pairs triggered this crash during case conversion with no recovery path. Replace .expect("Invalid u32 character") with .unwrap_or(UNAVAILABLE_CHAR.into()) at both call sites, substituting '?' for unrepresentable code points — consistent with how ucs2_to_char() already handles this in the same file. Fixes CCExtractor#2232

ccextractor-bot · 2026-03-29T05:09:59Z

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit d56a6be...:

Report Name	Tests Passed
Broken	9/13
CEA-708	1/14
DVB	3/7
DVD	3/3
DVR-MS	2/2
General	20/27
Hardsubx	1/1
Hauppage	3/3
MP4	3/3
NoCC	10/10
Options	72/86
Teletext	20/21
WTV	13/13
XDS	28/34

Your PR breaks these cases:

ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
ccextractor --autoprogram --out=srt --latin1 b22260d065...
ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
ccextractor --out=spupng c83f765c66...
ccextractor --program-number 1 c83f765c66...
ccextractor --datastreamtype 2 c83f765c66...
ccextractor --no-autotimeref c83f765c66...
ccextractor --utf8 c83f765c66...
ccextractor --no-fontcolor c83f765c66...
ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
ccextractor --autoprogram --out=ttxt --latin1 --xds --ucla c813e713a0...
ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 83b03036a2...
ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 88cd42b89a...
ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

ccextractor --out=srt --latin1 --autoprogram 73d9313d64..., Last passed:
Test 8738
ccextractor --out=ttxt --latin1 001dd8cdf7..., Last passed:
Test 8738
ccextractor --out=srt --latin1 4d4e938ef6..., Last passed:
Test 8738
ccextractor --service 1 --out=txt --no-bom --no-rollup ea83ff7bcb..., Last passed:
Test 8738
ccextractor --service 1 --out=txt f17524b53f..., Last passed:
Test 8738
ccextractor --service 1 --out=txt 80848c45f8..., Last passed:
Test 8738
ccextractor --service 1 --out=txt --no-bom --no-rollup b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1[EUC-KR] --out=txt --no-rollup b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1 --out=srt da904de35d..., Last passed:
Test 8738
ccextractor --service 1 --out=sami da904de35d..., Last passed:
Test 8738
ccextractor --service 1 --out=ttxt da904de35d..., Last passed:
Test 8926
ccextractor --service 1[EUC-KR] b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1[EUC-KR] --no-rollup b5d6aad89f..., Last passed:
Test 8738
ccextractor --service all da904de35d..., Last passed:
Test 8738
ccextractor --service all[EUC-KR] b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1,2[UTF-8],3[EUC-KR],54 --out=txt da904de35d..., Last passed:
Test 8738
ccextractor --autoprogram --out=srt --latin1 d41b53b504..., Last passed:
Test 8738
ccextractor --stdout --quiet --no-fontcolor 79a51f3500..., Last passed:
Test 8738
ccextractor --stdout --quiet --no-fontcolor 767b546f96..., Last passed:
Test 8738
ccextractor --service 1 c83f765c66..., Last passed:
Test 8738
ccextractor --myth c83f765c66..., Last passed:
Test 8738
ccextractor --in=raw fb79021542..., Last passed:
Test 8738
ccextractor --mp4vidtrack 5df914ce77..., Last passed:
Test 8738
ccextractor --xmltv=3 --out=null 96efd279cf..., Last passed:
Test 8738
ccextractor --datapid 2310 --autoprogram --out=srt --latin1 e639e54550..., Last passed:
Test 8738

Congratulations: Merging this PR would fix the following tests:

ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

ccextractor-bot · 2026-03-29T05:38:26Z

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit d56a6be...:

Report Name	Tests Passed
Broken	9/13
CEA-708	1/14
DVB	4/7
DVD	3/3
DVR-MS	2/2
General	22/27
Hardsubx	1/1
Hauppage	3/3
MP4	3/3
NoCC	10/10
Options	81/86
Teletext	20/21
WTV	13/13
XDS	31/34

Your PR breaks these cases:

ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
ccextractor --autoprogram --out=srt --latin1 b22260d065...
ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

ccextractor --out=srt --latin1 --autoprogram 73d9313d64..., Last passed:
Test 8611
ccextractor --out=ttxt --latin1 001dd8cdf7..., Last passed:
Test 8611
ccextractor --out=srt --latin1 4d4e938ef6..., Last passed:
Test 8611
ccextractor --service 1 --out=txt --no-bom --no-rollup ea83ff7bcb..., Last passed:
Test 8611
ccextractor --service 1 --out=txt f17524b53f..., Last passed:
Test 8611
ccextractor --service 1 --out=txt 80848c45f8..., Last passed:
Test 8611
ccextractor --service 1 --out=txt --no-bom --no-rollup b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1[EUC-KR] --out=txt --no-rollup b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1 --out=srt da904de35d..., Last passed:
Test 8611
ccextractor --service 1 --out=sami da904de35d..., Last passed:
Test 8611
ccextractor --service 1 --out=ttxt da904de35d..., Last passed:
Test 8943
ccextractor --service 1[EUC-KR] b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1[EUC-KR] --no-rollup b5d6aad89f..., Last passed:
Test 8611
ccextractor --service all da904de35d..., Last passed:
Test 8611
ccextractor --service all[EUC-KR] b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1,2[UTF-8],3[EUC-KR],54 --out=txt da904de35d..., Last passed:
Test 8611
ccextractor --autoprogram --out=srt --latin1 d41b53b504..., Last passed:
Test 8611
ccextractor --stdout --quiet --no-fontcolor 79a51f3500..., Last passed:
Test 8611
ccextractor --stdout --quiet --no-fontcolor 767b546f96..., Last passed:
Test 8611
ccextractor --service 1 c83f765c66..., Last passed:
Test 8611
ccextractor --myth c83f765c66..., Last passed:
Test 8611
ccextractor --in=raw fb79021542..., Last passed:
Test 8611
ccextractor --mp4vidtrack 5df914ce77..., Last passed:
Test 8611
ccextractor --xmltv=3 --out=null 96efd279cf..., Last passed:
Test 8611
ccextractor --datapid 2310 --autoprogram --out=srt --latin1 e639e54550..., Last passed:
Test 8611

Congratulations: Merging this PR would fix the following tests:

ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
ccextractor --out=spupng c83f765c66..., Last passed: Never
ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Replace .expect() with .unwrap_or() in Ucs2String case conversion — fixes panic on CEA-708 surrogate input#2239

[FIX] Replace .expect() with .unwrap_or() in Ucs2String case conversion — fixes panic on CEA-708 surrogate input#2239
NexionisJake wants to merge 1 commit intoCCExtractor:masterfrom
NexionisJake:fix/ucs2-surrogate-panic-case-conversion

NexionisJake commented Mar 29, 2026

Uh oh!

ccextractor-bot commented Mar 29, 2026

Uh oh!

ccextractor-bot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NexionisJake commented Mar 29, 2026

Uh oh!

ccextractor-bot commented Mar 29, 2026

Uh oh!

ccextractor-bot commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants