Skip to content

Conversation

@aniketchawardol
Copy link

@aniketchawardol aniketchawardol commented Jan 24, 2026

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

When processing DVB subtitles from the options test file, three things were wrong:

OCR text extraction failed completely - the code tried to use an undefined variable called "region" which didn't exist in that function scope. This caused crashes or garbage data.
SPUPNG XML output was missing the opening and closing wrapper tags. The file would start with individual subtitle entries but had no and tags around them. This happened because the header writing function was immediately writing the footer and freeing all data structures before any subtitle content could be written.
DVB subtitle regions were being processed repeatedly. After extracting text from a region, the code never marked it as processed by clearing its dirty flag, so the same regions kept getting processed over and over causing duplicate subtitle entries.

What was different in results:
Expected output had OCR text like "explosion" and "Safety remains our number one" inside the subtitle entries. Actual output had no OCR text at all because the extraction crashed.
Expected SPUPNG file structure was:
Actual output was just empty or missing the wrapper tags entirely.

How the fix works:
For OCR: Instead of using the undefined "region" variable, the code now loops through the display list to find the first valid region and uses that for the background color reference needed by the OCR function. This gives OCR the correct data to extract text from bitmap subtitles.
For SPUPNG structure: Removed the code block that was writing the footer immediately after the header. Now the header function only writes the header tags, subtitle content gets written in the middle, and the footer gets written later during normal cleanup. This produces the correct XML structure.
For duplicate processing: Added a loop at the end of subtitle processing that goes through all regions and sets their dirty flag to 0, marking them as already processed so they won't be processed again.

Added safety for non-OCR builds: When CCExtractor is compiled without OCR support, the code now explicitly sets OCR text pointers to NULL to prevent crashes from trying to use uninitialized memory.
All changes are minimal fixes that restore correct behavior without changing the overall architecture.
test link: https://sampleplatform.ccextractor.org/test/7992#

Signed-off-by: Aniket <aniketchawardol@gmail.com>
Signed-off-by: Aniket <aniketchawardol@gmail.com>
Copilot AI review requested due to automatic review settings January 24, 2026 13:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes three issues with DVB subtitle processing and SPUPNG output format that were causing test failures on Linux:

Changes:

  • Fixed DVB subtitle OCR extraction that was using an undefined region variable, causing crashes or incorrect output
  • Fixed SPUPNG XML output structure by removing code that was prematurely writing the footer immediately after the header
  • Fixed duplicate DVB subtitle processing by clearing region dirty flags after processing

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/lib_ccx/dvb_subtitle_decoder.c Moved OCR variables to function scope, added logic to find first region for bgcolor, and added loop to clear dirty flags after processing
src/lib_ccx/ccx_encoders_spupng.c Removed code that was incorrectly checking for CCX_OF_RAW format and prematurely writing footer/freeing data
docs/CHANGES.TXT Added three changelog entries describing the fixes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

for (display = ctx->display_list; display; display = display->next)
{
region = get_region(ctx, display->region_id);
if (region)
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop clears dirty flags for all regions unconditionally. For minor efficiency improvement, consider only clearing the dirty flag if it's actually set. While setting dirty=0 on already non-dirty regions is harmless, checking first would avoid unnecessary memory writes.

Suggested change
if (region)
if (region && region->dirty)

Copilot uses AI. Check for mistakes.
Comment on lines +1718 to +1721
// Find first region for bgcolor
for (display = ctx->display_list; display && !first_region; display = display->next)
{
first_region = get_region(ctx, display->region_id);
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop finds the first region regardless of whether it's dirty. For consistency with the rest of the function which only processes dirty regions (see lines 1554-1559 and 1583-1591), consider finding the first dirty region instead. This can be done by adding a check for region->dirty in the loop condition.

Suggested change
// Find first region for bgcolor
for (display = ctx->display_list; display && !first_region; display = display->next)
{
first_region = get_region(ctx, display->region_id);
// Find first dirty region for bgcolor
for (display = ctx->display_list; display && !first_region; display = display->next)
{
region = get_region(ctx, display->region_id);
if (region && region->dirty)
first_region = region;

Copilot uses AI. Check for mistakes.
@aniketchawardol
Copy link
Author

I am aware that tester is hosted on gcp and github actions utilisation also needs resources. Also rolling subtitles are complex. The last resort in my opinion is to just not test generated captions to the exact milliseconds. I have watched the sample clip 30 times atleast and both versions are correct.

@aniketchawardol aniketchawardol deleted the fix/dvb-subtitle-ocr-spupng-v2 branch January 24, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant