Skip to content

Conversation

@samiuc
Copy link
Contributor

@samiuc samiuc commented Aug 9, 2025

Visualization Fixes

  • Updated side-by-side canvas to compare ground truth (GT) vs. predictions, with per-word overlays.
  • For mismatched true positives, the overlay shows the label as: pred_text
  • Included a stitched legend and summary statistics (TP/FP/FN counts, Precision, Recall, and F1 score).

Recognition Metrics

  • New metrics added (previously we only had the detection metrics, now we have detection + recognition metrics):
    • Word accuracy (case-sensitive and case-insensitive).
    • Character accuracy (case-sensitive and insensitive), using union-based edit distance.
  • Edit distance is calculated using standard Levenshtein:
    • String trimming or character normalization is applied.
    • Case-insensitive metrics are computed using uppercase versions of the strings.
  • Aggregation is union-based by default (configurable).
  • Recognition mismatches:
    • Explicitly flagged as is_true_positive=False in metadata.
    • Shown as incorrect in the visualization.
  • Ignore zone:
    • Added Optional IoU-based (HWR) - currently not used but can be enabled.

Word Merging and Weighting Fixes

  • Added word_weight to correctly assign credit for merged GT words in recognition metrics.
  • Merged words now retain both text and orig values:
    • GT orig uses the ground truth text.

Ignore-Zone Filtering

  • Default filter: Removes words with less than 10% axis overlap relative to word size.
  • HWR filter: Removes GT or predicted zones with:
    • IoU ≥ 0.3, or
    • ≥ 95% overlap on both axes (near-contained).
  • Filter type is selectable via ignore_zone_filter_type ("default" or "hwr").

Aggregation and Reporting

  • Detection metrics (Precision, Recall, F1) are calculated at the dataset level.
  • Recognition aggregation mode is configurable:
    • Default is "union", with "mean" also available.
  • Per-document evaluation metadata includes:
    • TP pairs with edit distances,
    • FPs and FNs

Signed-off-by: samiullahchattha <Sami.Ullah1@ibm.com>
@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2025

DCO Check Failed

Hi @samiuc, your pull request has failed the Developer Certificate of Origin (DCO) check.

This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format.


🛠 Quick Fix: Add a remediation commit

Run this command:

git commit --allow-empty -s -m "DCO Remediation Commit for samiullahchattha <Sami.Ullah1@ibm.com>

I, samiullahchattha <Sami.Ullah1@ibm.com>, hereby add my Signed-off-by to this commit: 794d99fc12d4fe0dcc6debc77ee05fcbd07730e4
I, samiullahchattha <Sami.Ullah1@ibm.com>, hereby add my Signed-off-by to this commit: 20b89544994c0c12c34721b32eb38e230b690f36"
git push

🔧 Advanced: Sign off each commit directly

For the latest commit:

git commit --amend --signoff
git push --force-with-lease

For multiple commits:

git rebase --signoff origin/main
git push --force-with-lease

More info: DCO check report

@mergify
Copy link

mergify bot commented Aug 9, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

Signed-off-by: samiullahchattha <Sami.Ullah1@ibm.com>
@samiuc samiuc requested a review from divekarsc August 9, 2025 00:16
@samiuc samiuc requested a review from cau-git August 18, 2025 20:06
samiullahchattha and others added 5 commits August 20, 2025 12:44
Signed-off-by: samiullahchattha <Sami.Ullah1@ibm.com>
Signed-off-by: samiullahchattha <Sami.Ullah1@ibm.com>
Signed-off-by: samiuc <sami.ullah.chat@gmail.com>
@samiuc samiuc requested review from cau-git and removed request for cau-git September 11, 2025 20:00
@samiuc samiuc changed the title fix: ocr visualization fix: ocr visualization and add ocr recognition metrics Sep 11, 2025
Copy link
Contributor

@divekarsc divekarsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@PeterStaar-IBM PeterStaar-IBM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@samiuc samiuc merged commit d63a439 into main Sep 16, 2025
10 checks passed
@samiuc samiuc deleted the sami/fix-visualization-bug branch September 16, 2025 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants