# Index Card OCR Workflow
gestestet in Python 3.9.18 Venv

## Labelingtool für die Basisannotation
- https://pypi.org/project/labelImg/ 
- funktioniert bestens, Output in YOLO, XML, CreateML (JSON)
- für den folgenden Workflow wird der CreateML-Export geparst
- Bounding Boxes wurden alle großzügig gezogen, der Labelname mit einbezogen und als Label vergeben. So kann später das Label aus dem OCR-Ergebnis gelöscht werden

In [None]:
!pip install labelImg

In [5]:
!labelImg

## Basisworkflow OCR 

### Installation Apple-Vision-OCR-Wrapper für Python + OpenCV-Python-Wrapper + Fuzzy String-Vergleich

In [None]:
!pip install Torch NumPy Pandas Pillow scikit-learn Plotly pyobjc-framework-Vision
!pip install apple_ocr
!pip install opencv-python
!pip install rapidfuzz

### Wie es funktioniert
1. Laden eines Bildes, ausrichten anhand eines Template-Bildes aus dem gleichen Bestand, dass auch für das Bounding-Box-Labeling benutzt wurde. 
2. OCR 
3. Erstellen einer Liste aller Bounding-Boxes aus der OCR mit ihren Werten (Inhalten)
4. Erstellen einer Liste aller Bounding-Boxes aus dem Labeling mit ihren Labeln 
5. Serieller Vergleich aller Labelboxen mit allen OCR-Boxen. Liegt eine OCR-Box zu 90% innerhalb einer Labelbox (Grenzwert kann justiert werden), wird der Wert für diese Labelbox registriert
6. Bereinigung des Ergebnisses um alle Label-Begriffe. Dafür werden die OCR-Resultate in Einzelworte zerlegt, mit dem Label fuzzy abgeglichen und anschließend wieder kombiniert. Bereinigung derzeit ebenfalls um Anführungszeichen, da diese im JSON Probleme machen können.
7. Konstruktion eines JSON mit dem Dateinamen als ID, Zuordnung der gematchten OCR-Boxen zu den Labeln 
8. Ausgabe

#### Pfade
Insgesamt werden am Ende des Scripts drei Pfade benötigt: 
- `folder` – der Ordnerpfad zu den zu scannenden Bildern
- `label_template_path`– das Template mit den Labelboxen (ein JSON-File)
- `align_template_path` – das Bild, an dem alle vor der OCR ausgerichtet werden sollen, idealerweise das Bild für das Label-Template.

#### Voransicht
Im Script ist eine `image.show()` Anweisung, die eine Voransicht der Bounding-Boxes erzeugt, die Label sind rot, die OCR-Resultate blau. So kann geprüft werden, ob die gewünschten Elemente innerhalb der Labelboxen liegen. Die Voransicht verlängert den Umwandlungsprozess und kann auskommentiert werden.

### Demo der OCR
Ausgabe erfolgt als Liste mit einer Eckkoordinate, einer Box-Länge und einem Mittelpunkt, daraus kann die zweite Eckkoordinate abgeleitet werden

In [5]:
from apple_ocr.ocr import OCR
from PIL import Image

image_path = "00001a_Spez_Komp_1_Spez.20.512.jpg"
image = Image.open(image_path)
ocr_instance = OCR(image=image)
dataframe = ocr_instance.recognize()
df = pd.DataFrame(dataframe)
print(df.head(10))

                                  Content    Length   Density         x  \
0            Komponist: Scholz, Siegfried  0.414634  0.024337  0.108749   
1                                  Titel:  0.052326  0.001589  0.107558   
2                   Signatur: Spez•20•512  0.300872  0.012213  0.636628   
3                    Stille kleine Straße  0.348837  0.014263  0.228198   
4                                 Foxtrot  0.125000  0.005058  0.229167   
5                 Orch • Walter Kubiczeck  0.380814  0.014693  0.508721   
6                   Ges • : Giso Weisbach  0.331395  0.012095  0.507267   
7                Textdichter: Helga Heine  0.332849  0.014851  0.106105   
8                                 Verlag:  0.075581  0.002448  0.107558   
9  Material: 1 Part •u•Stim• Bemerkungen:  0.514535  0.023956  0.109012   

          y  Centroid x  Centroid y  
0  0.763285    0.316066    0.792632  
1  0.714575    0.133721    0.729757  
2  0.783294    0.787064    0.803591  
3  0.601051    0.40261

In [None]:
from apple_ocr.ocr import OCR
from PIL import Image, ImageDraw
import pandas as pd
import json
import os
from rapidfuzz import fuzz
import cv2
import numpy as np
from scipy.spatial.distance import cdist

class LoadOCR:
    def __init__(self, image_path):
        self.image_path = image_path
        self.image = cv2.imread(self.image_path)
        if self.image is None:
            raise ValueError(f"Could not load image from {image_path}")

    def cv2_align(self, template_path):
        template = cv2.imread(template_path)
        if template is None:
            raise ValueError(f"Could not load template from {template_path}")
        
        scan = self.image
        
        template_gray = cv2.cvtColor(template, cv2.COLOR_BGR2GRAY)
        scan_gray = cv2.cvtColor(scan, cv2.COLOR_BGR2GRAY)
        
        def detect_text_features(img):
            sift = cv2.SIFT_create()
            keypoints, descriptors = sift.detectAndCompute(img, None)
            return keypoints, descriptors

        template_kp, template_desc = detect_text_features(template_gray)
        scan_kp, scan_desc = detect_text_features(scan_gray)
        
        if template_desc is None or scan_desc is None:
            raise ValueError("No features detected in one or both images")
        
        bf = cv2.BFMatcher()
        matches = bf.knnMatch(template_desc, scan_desc, k=2)
        
        good_matches = []
        for m, n in matches:
            if m.distance < 0.75 * n.distance:
                good_matches.append(m)
        
        if len(good_matches) < 4:
            raise ValueError("Not enough good matches found for alignment")
        
        template_pts = np.float32([template_kp[m.queryIdx].pt for m in good_matches])
        scan_pts = np.float32([scan_kp[m.trainIdx].pt for m in good_matches])
        
        H, mask = cv2.findHomography(scan_pts, template_pts, cv2.RANSAC, 5.0)
        
        if H is None:
            raise ValueError("Could not compute homography")

        aligned_scan = cv2.warpPerspective(scan, H, (template.shape[1], template.shape[0]))
        
        return aligned_scan

    def OCR_process(self, template_path):
        aligned_image = self.cv2_align(template_path)
        aligned_rgb = cv2.cvtColor(aligned_image, cv2.COLOR_BGR2RGB)
        aligned_pil = Image.fromarray(aligned_rgb)

        ocr_instance = OCR(image=aligned_pil)
        dataframe = ocr_instance.recognize()
        self.df = pd.DataFrame(dataframe)
        self.image = aligned_pil
        return self.df, self.image
    
class BoundingBox:
  def __init__(self, x_min, y_min, x_max, y_max, value):
      self.x_min = x_min
      self.y_min = y_min
      self.x_max = x_max
      self.y_max = y_max
      self.value = value

class BoundingBoxCollection:
  def __init__(self):
        self.bboxes = []

  def add_bbox(self, x_min, y_min, x_max, y_max, value):
      bbox = BoundingBox(x_min, y_min, x_max, y_max, value)
      self.bboxes.append(bbox)

  def process_boxes(self, df):
          for index, row in df.iterrows():
              norm_x = row['x']
              norm_y = row['y']
              centroid_x = row['Centroid x']
              centroid_y = row['Centroid y']
              density = row['Density']  
              length = row['Length']

              norm_x1 = row['x']
              norm_y2 = 1 - row['y']
              norm_x2 = norm_x1 + length
              norm_y1 = 1 - norm_y - (abs(centroid_y - norm_y) * 2)  
              self.add_bbox(norm_x1, norm_y1, norm_x2, norm_y2, row['Content'])
              
  def draw_boxes(self, image, draw):
      image_width, image_height = image.size
      for bbox in self.bboxes:

          actual_x = int(bbox.x_min * image_width)
          actual_y = int((bbox.y_max) * image_height)  
          
          box_width = int((bbox.x_max - bbox.x_min) * image_width)
          box_height = int((bbox.y_max - bbox.y_min) * image_height)
          
          actual_bottom_right_x = actual_x + box_width
          actual_top_left_y = actual_y - box_height
          
          draw.rectangle([actual_x, actual_top_left_y, actual_bottom_right_x, actual_y], 
                      outline="blue", width=2)
          draw.text((actual_x, actual_y - 10), bbox.value, fill="blue")

class ImageAnnotationProcessor:
    def __init__(self, json_path):
        self.json_path = json_path
        self.data = None
        self.image = None
        self.draw = None
        self.image_width = None 
        self.image_height = None
        self.labelbboxes = None

    def process_annotations(self, labelbboxes):
        self.labelbboxes = labelbboxes
        
        with open(self.json_path) as f:
            self.data = json.load(f)

        image_path = "./images/" + self.data[0]["image"]
        self.image = Image.open(image_path)
        self.image_width, self.image_height = self.image.size
        
        for annotation in self.data[0]["annotations"]:
            label = annotation["label"]
            coordinates = annotation["coordinates"]
            
            x_center = coordinates["x"] / self.image_width  
            y_center = coordinates["y"] / self.image_height
            width = coordinates["width"] / self.image_width
            height = coordinates["height"] / self.image_height

            x1 = x_center - width / 2
            y1 = y_center - height / 2
            x2 = x_center + width / 2
            y2 = y_center + height / 2
            
            self.labelbboxes.add_bbox(x1, y1, x2, y2, label)

    def draw_annotations(self, image):
        self.draw = ImageDraw.Draw(image)
        image_width, image_height = image.size
        
        for bbox in self.labelbboxes.bboxes:
            x1_scaled = bbox.x_min * image_width
            y1_scaled = bbox.y_min * image_height
            x2_scaled = bbox.x_max * image_width
            y2_scaled = bbox.y_max * image_height
            
            self.draw.rectangle(
                [x1_scaled, y1_scaled, x2_scaled, y2_scaled], 
                outline="red", 
                width=3
            )
            self.draw.text(
                (x1_scaled, y1_scaled - 10), 
                bbox.value, 
                fill="red"
            )
        
        return self.image
    
class BBoxCompare:
    def __init__(self, x_min, y_min, x_max, y_max, value):
        self.x_min = x_min
        self.y_min = y_min
        self.x_max = x_max
        self.y_max = y_max
        self.value = value

    def area(self):
        """Calculate the area of the bounding box."""
        return (self.x_max - self.x_min) * (self.y_max - self.y_min)

    def intersection(self, other):

        x_min_inter = max(self.x_min, other.x_min)
        y_min_inter = max(self.y_min, other.y_min)
        x_max_inter = min(self.x_max, other.x_max)
        y_max_inter = min(self.y_max, other.y_max)

        inter_width = max(0, x_max_inter - x_min_inter)
        inter_height = max(0, y_max_inter - y_min_inter)

        intersection_area = inter_width * inter_height

        bbox2_area = other.area()
        if bbox2_area == 0: 
            return False
        containment_ratio = intersection_area / bbox2_area
 
        return containment_ratio 
    
    def OCRStrClean(self):
            if not fuzz.ratio(bbox1.value,bbox2.value) >= 95:
                strelems = bbox2.value.split(" ")
                ocrbboxval = []
                for strelem in strelems:
                    if not fuzz.partial_ratio(bbox1.value,strelem) >= 85:
                        ocrbboxval.append(strelem)
                ocrbboxval = " ".join(ocrbboxval)
                if ocrbboxval != "":
                    return ocrbboxval.replace("\"","?").replace("\'","?")     

#folder = "/Users/admin/Downloads/DE-MUS-905113_Inventarkarten_Muehlhausen_2Lfg_Schub02_master/test"
#label_template_path = './images/card1.json'
#align_template_path = "/Volumes/QSTICK/Sicherung/Uni/SODa/Ideen/OCR/Images/DE-MUS-905113_Inventarkarten_Muehlhausen_2Lfg_Schub02_master/_test/template.jpg"

folder = "images"
label_template_path = 'label_Eisenach.json'
align_template_path = '00001a_Spez_Komp_1_Spez.20.512.jpg'

for file_name in os.listdir(folder):
    if file_name.endswith(('.png', '.jpg', '.jpeg')):
        ocr_loader = LoadOCR(os.path.join(folder, file_name))
        df, image = ocr_loader.OCR_process(align_template_path)
        #print(df)
        
        ocrbboxes = BoundingBoxCollection()
        ocrbboxes.process_boxes(df)

        labelbboxes = BoundingBoxCollection()
        processor = ImageAnnotationProcessor(label_template_path)
        processor.process_annotations(labelbboxes)

        draw = ImageDraw.Draw(image)
        ocrbboxes.draw_boxes(image, draw)
        #image.save(file_name + '_ocrresults.jpg')
        annotated_image = processor.draw_annotations(image)
        #annotated_image.save('annotated.jpg')
        image.show()  

        results = {}
        results["ID"] = file_name

        for bbox in labelbboxes.bboxes:
            bbox1 = BBoxCompare(bbox.x_min,bbox.y_min,bbox.x_max,bbox.y_max,bbox.value)
            results[bbox1.value] = []  
  
            for bbox in ocrbboxes.bboxes:
                bbox2 = BBoxCompare(bbox.x_min,bbox.y_min,bbox.x_max,bbox.y_max,bbox.value)

                containment_ratio = bbox1.intersection(bbox2)
                # make sure that at least for "Signatur" ocrbbox can be off quite a bit
                if  bbox1.value == "Signatur":  
                    containment_ratio_val = 0.5
                else:
                    containment_ratio_val = 0.9

                if containment_ratio >= containment_ratio_val:
                    cleaned_ocrstr = bbox2.OCRStrClean()
                    if cleaned_ocrstr is not None:
                        results[bbox1.value].append(cleaned_ocrstr)

        print(results),print(',')