[FIX] DVB subtitle OCR and SPUPNG output for options test on Linux #2060

aniketchawardol · 2026-01-24T13:23:28Z

In raising this pull request, I confirm the following (please check boxes):

I have read and understood the contributors guide.
I have checked that another pull request for this purpose does not exist.
I have considered, and confirmed that this submission will be valuable to others.
I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
I give this submission freely, and claim no ownership to its content.
I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

I have never used CCExtractor.
I have used CCExtractor just a couple of times.
I absolutely love CCExtractor, but have not contributed previously.
I am an active contributor to CCExtractor.

When processing DVB subtitles from the options test file, three things were wrong:

OCR text extraction failed completely - the code tried to use an undefined variable called "region" which didn't exist in that function scope. This caused crashes or garbage data.
SPUPNG XML output was missing the opening and closing wrapper tags. The file would start with individual subtitle entries but had no and tags around them. This happened because the header writing function was immediately writing the footer and freeing all data structures before any subtitle content could be written.
DVB subtitle regions were being processed repeatedly. After extracting text from a region, the code never marked it as processed by clearing its dirty flag, so the same regions kept getting processed over and over causing duplicate subtitle entries.

What was different in results:
Expected output had OCR text like "explosion" and "Safety remains our number one" inside the subtitle entries. Actual output had no OCR text at all because the extraction crashed.
Expected SPUPNG file structure was:
Actual output was just empty or missing the wrapper tags entirely.

How the fix works:
For OCR: Instead of using the undefined "region" variable, the code now loops through the display list to find the first valid region and uses that for the background color reference needed by the OCR function. This gives OCR the correct data to extract text from bitmap subtitles.
For SPUPNG structure: Removed the code block that was writing the footer immediately after the header. Now the header function only writes the header tags, subtitle content gets written in the middle, and the footer gets written later during normal cleanup. This produces the correct XML structure.
For duplicate processing: Added a loop at the end of subtitle processing that goes through all regions and sets their dirty flag to 0, marking them as already processed so they won't be processed again.

Added safety for non-OCR builds: When CCExtractor is compiled without OCR support, the code now explicitly sets OCR text pointers to NULL to prevent crashes from trying to use uninitialized memory.
All changes are minimal fixes that restore correct behavior without changing the overall architecture.
test link: https://sampleplatform.ccextractor.org/test/7992#

Signed-off-by: Aniket <aniketchawardol@gmail.com>

Copilot

Pull request overview

This PR fixes three issues with DVB subtitle processing and SPUPNG output format that were causing test failures on Linux:

Changes:

Fixed DVB subtitle OCR extraction that was using an undefined region variable, causing crashes or incorrect output
Fixed SPUPNG XML output structure by removing code that was prematurely writing the footer immediately after the header
Fixed duplicate DVB subtitle processing by clearing region dirty flags after processing

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
src/lib_ccx/dvb_subtitle_decoder.c	Moved OCR variables to function scope, added logic to find first region for bgcolor, and added loop to clear dirty flags after processing
src/lib_ccx/ccx_encoders_spupng.c	Removed code that was incorrectly checking for CCX_OF_RAW format and prematurely writing footer/freeing data
docs/CHANGES.TXT	Added three changelog entries describing the fixes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-24T13:28:34Z

src/lib_ccx/dvb_subtitle_decoder.c

+	for (display = ctx->display_list; display; display = display->next)
+	{
+		region = get_region(ctx, display->region_id);
+		if (region)


The loop clears dirty flags for all regions unconditionally. For minor efficiency improvement, consider only clearing the dirty flag if it's actually set. While setting dirty=0 on already non-dirty regions is harmless, checking first would avoid unnecessary memory writes.

Suggested change

if (region)

if (region && region->dirty)

Copilot · 2026-01-24T13:28:34Z

src/lib_ccx/dvb_subtitle_decoder.c

+	// Find first region for bgcolor
+	for (display = ctx->display_list; display && !first_region; display = display->next)
+	{
+		first_region = get_region(ctx, display->region_id);


The loop finds the first region regardless of whether it's dirty. For consistency with the rest of the function which only processes dirty regions (see lines 1554-1559 and 1583-1591), consider finding the first dirty region instead. This can be done by adding a check for region->dirty in the loop condition.

Suggested change

// Find first region for bgcolor

for (display = ctx->display_list; display && !first_region; display = display->next)

{

first_region = get_region(ctx, display->region_id);

// Find first dirty region for bgcolor

for (display = ctx->display_list; display && !first_region; display = display->next)

{

region = get_region(ctx, display->region_id);

if (region && region->dirty)

first_region = region;

aniketchawardol · 2026-01-24T21:16:44Z

I am aware that tester is hosted on gcp and github actions utilisation also needs resources. Also rolling subtitles are complex. The last resort in my opinion is to just not test generated captions to the exact milliseconds. I have watched the sample clip 30 times atleast and both versions are correct.

aniketchawardol added 2 commits January 24, 2026 18:40

Fix DVB subtitle OCR extraction and SPUPNG output

3b7e1b3

Signed-off-by: Aniket <aniketchawardol@gmail.com>

update changelog

d5689de

Signed-off-by: Aniket <aniketchawardol@gmail.com>

Copilot AI review requested due to automatic review settings January 24, 2026 13:23

Copilot started reviewing on behalf of aniketchawardol January 24, 2026 13:23 View session

Copilot AI reviewed Jan 24, 2026

View reviewed changes

aniketchawardol closed this Jan 24, 2026

aniketchawardol deleted the fix/dvb-subtitle-ocr-spupng-v2 branch January 24, 2026 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] DVB subtitle OCR and SPUPNG output for options test on Linux #2060

[FIX] DVB subtitle OCR and SPUPNG output for options test on Linux #2060

Uh oh!

aniketchawardol commented Jan 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 24, 2026

Uh oh!

Copilot AI Jan 24, 2026

Uh oh!

aniketchawardol commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[FIX] DVB subtitle OCR and SPUPNG output for options test on Linux #2060

[FIX] DVB subtitle OCR and SPUPNG output for options test on Linux #2060

Uh oh!

Conversation

aniketchawardol commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

aniketchawardol commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aniketchawardol commented Jan 24, 2026 •

edited

Loading