# Preperation
- Powerpoint
- This Notebook
- Clear all output & reset
- Reset area threshold for squares with 80
- Remove 2022 from console.log
- Execute LevenshteinDistance
- Execute and clear dependences
- Start PPT
- Start Rise & F11 (Fullscreen)
- Switch to PPT

In [1]:
// https://gist.github.com/Davidblkx/e12ab0bb2aff7fd8072632b396538560

public static class LevenshteinDistance
{
    /// <summary>
    /// Calculate the difference between 2 strings using the Levenshtein distance algorithm
    /// </summary>
    /// <param name="source1">First string</param>
    /// <param name="source2">Second string</param>
    /// <returns></returns>
    public static int Calculate(string source1, string source2) //O(n*m)
    {
        var source1Length = source1.Length;
        var source2Length = source2.Length;

        var matrix = new int[source1Length + 1, source2Length + 1];

        // First calculation, if one entry is empty return full length
        if (source1Length == 0)
            return source2Length;

        if (source2Length == 0)
            return source1Length;

        // Initialization of matrix with row size source1Length and columns size source2Length
        for (var i = 0; i <= source1Length; matrix[i, 0] = i++){}
        for (var j = 0; j <= source2Length; matrix[0, j] = j++){}

        // Calculate rows and collumns distances
        for (var i = 1; i <= source1Length; i++)
        {
            for (var j = 1; j <= source2Length; j++)
            {
                var cost = (source2[j - 1] == source1[i - 1]) ? 0 : 1;

                matrix[i, j] = Math.Min(
                    Math.Min(matrix[i - 1, j] + 1, matrix[i, j - 1] + 1),
                    matrix[i - 1, j - 1] + cost);
            }
        }
        // return result
        return matrix[source1Length, source2Length];
    }
}

# Let's build a form scanner with notebooks and OpenCV

We'll build a form scanner with OpenCV using classic computer vision algorithms as well as a bit ML/Deep Learning
*Of course*, everything will be in C#. We'll also take a quick look into C# notebooks


## Topics
- OpenCV & C# Notebooks
- ML/Deep Learning vs. Klassische Algorithmen
- Form scanner with working ***Live-Code 🚀***
  - Scan document (Perspective Transform)
  - Segmentation of document
  - Text recognition

<img src="https://upload.wikimedia.org/wikipedia/commons/5/53/OpenCV_Logo_with_text.png" width="100"
     style="display: block; margin-left: auto; margin-right: auto" />

# OpenCV

- OpenSource
- De-facto "standard" library for Computer Vision
- C++ library with Wrappers for multiple languages

- Appache-2.0 License

<img src="https://socialify.git.ci/shimat/opencvsharp/image?description=1&forks=1&language=1&owner=1&pattern=Plus&stargazers=1&theme=Light" style="height: 280px;" />

https://github.com/shimat/opencvsharp - Apache-2.0 License

# C# Notebook

<img src="https://user-images.githubusercontent.com/2546640/94438730-833fed80-016d-11eb-94e6-da7b51abf58a.gif" />

# C# Notebooks
- Kernel also runs in Jupyter
- Jupyter Extensions for Presentation

In [2]:
Console.WriteLine("Hello .NET Day 2022");

Hello .NET Day 2022


## Dependencies
- Nuget and Usings just work as you would expect

In [31]:
#r "nuget: OpenCvSharp4.Windows"
#r "nuget: SharpCompress"
#r "nuget: Microsoft.DotNet.Interactive.ExtensionLab,*-*"

# Let's test it

In [4]:
using OpenCvSharp;
Mat src = new Mat("20220831_124447.jpg");
double aspectRatio = src.Width / (double)src.Height;
Cv2.Resize(src, src, new Size(2048 * aspectRatio, 2048), interpolation: InterpolationFlags.Area) 

In [5]:
// src // Crashes the kernel in jupyter (incorrect native memory access)
src.Size()

Width,Height
1536,2048


# Formatter

- Notebooks have "formatters"
- Last Statement without `;` or `.Display()`
- Output: HTML helper [`PocketView`](https://github.com/dotnet/interactive/blob/main/docs/pocketview.md) - simple syntax using `dynamic`

In [6]:
#pragma warning disable CS1701
using System;
using System.Threading.Tasks;
using Microsoft.DotNet.Interactive;
using Microsoft.DotNet.Interactive.Commands;
using Microsoft.DotNet.Interactive.Formatting;
using OpenCvSharp;

using static Microsoft.DotNet.Interactive.Formatting.PocketViewTags;

private static PocketView CreateImgTag(byte[] data, string id, int height, int width)
{
    var imageSource = $"data:image/png;base64, {Convert.ToBase64String(data)}";
    PocketView imgTag = img[id: id, src: imageSource, height: height, width: width]();
    return imgTag;
}

Formatter.Register<Mat>((openCvImage, writer) =>
{
    if(openCvImage.Height > 640 || openCvImage.Width > 640)
    {
        double aspectRatioWoverH = openCvImage.Width / (double)openCvImage.Height;

        double newWidth, newHeight;
        if(openCvImage.Height > openCvImage.Width) {
            newHeight = 640;
            newWidth = 640 * aspectRatioWoverH;
        }
        else {
            newHeight = 640 / aspectRatioWoverH;
            newWidth = 640;
        }

        Mat resized = new Mat();
        Cv2.Resize(openCvImage, resized, new Size(newWidth, newHeight));
        openCvImage = resized;
    }
    var id = Guid.NewGuid().ToString("N");
    var data = openCvImage.ImEncode(".png");
    var imgTag = CreateImgTag(data, id, openCvImage.Height, openCvImage.Width);
    writer.Write(imgTag);
}, HtmlFormatter.MimeType);

In [7]:
     src.Display();

# The Goal

```json
{
    "First name": "Alex", "Company": "Noser",
    "Interests": {
        ".NET": true, "Computer Vision": true, "Machine Learning": true,
        "Performance": true, "Testing": false
    },
    "Lunch Preferences": {
        "Vegetarian": true, "Meat": false, "Vegan": false
    }
}
```

# Possible Solutions

- End-To-End ML Model
  - *In theory*, Deep Learning promises to allow End-To-End solutions - not viable in this showcase though
- Combination of classical CV-Algos & Deep Learning
- Classical CV-Algos

# ML / Deep Learning

- State-of-the-art Computer Vision
- Idea
  - Raw-Data plus Result => "Meta algo trains an algo"
  - New samples can be trained later
- Usually CNNs, Convolutional Neuronal Networks:

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Typical_cnn.png/800px-Typical_cnn.png" />



# ML / Deep Learning

- Often requires lots and lots of data
  - Data Augmentation
  - Transfer Learning
  - GANs / Generate artifical training data
  - ...
- Training often resource intensive

# Classic CV-Algorithms
- Edge / Contourdetection
- *Primitives* such as Erode / Dilate
- Thresholding e.g. [Otsu](https://cw.fel.cvut.cz/b201/_media/courses/a6m33bio/otsu.pdf)
- Surf, Sift, HoG ...

# Classic CV-Algorithms

- Manually select and tune algorithms and their parameters
  - Only few samples necessary (Validation/Verification is still necessary!)
- Usually much less computational resource required

# Classic Algorithms vs. ML / Deep Learning
## Question to ask:
- Available Data: How much? In what quality? Effort to label?
- Environment: Controlled? "In-the-wild"?
- Explainability: Do decisions need to be justified?

- Lifecycle of Product?
  - New Data? Drift of data?
- Target Platform
  - Cloud vs. Edge?
  - Training vs. Inference?
...

# Classic algorithm vs ML / Deep Learning

  - **One** Picture

- **In-the-wild** but only one picture ;)

- C# Notebooks

- **45 minutes**

*=> Classical Algorithms / Combination

# Tasks

1. Dewarp picture
1. Text recognition
1. Checkboxes & Content
1. Association between labels & data

# 1. Dewarp picture

- "Classical Computer Vision Task"
- Idea: Find largest quadrangle

In [8]:
Mat grey = src.CvtColor(ColorConversionCodes.BGR2GRAY);
grey.Display();

In [9]:
Cv2.Blur(grey, grey, new Size(12,12));
grey.Display()

In [10]:
Mat thresholded = grey.Threshold(127,255, ThresholdTypes.Otsu);

Mat kernel = Cv2.GetStructuringElement(MorphShapes.Rect, new Size(40,40));

Cv2.Erode(thresholded, thresholded, kernel);
Cv2.Dilate(thresholded, thresholded, kernel);

thresholded.Display()

In [11]:
Point[][] contours = Cv2.FindContoursAsArray(thresholded, RetrievalModes.List, ContourApproximationModes.ApproxSimple);

Mat tmp = src.Clone();
Cv2.DrawContours(tmp, contours, -1, new Scalar(0,0,255), 3);
tmp.Display();

In [12]:
Point[]  documentRect = null;
int maxLength = 0;
for(int i = 0; i < contours.Length; i++)
{
    Point[] contour = contours[i];
    double length = Cv2.ArcLength(contour, true);
    Point[] approximatedPolygon = Cv2.ApproxPolyDP(contour, 0.15 * length, true);

    if(approximatedPolygon.Length == 4)
    {
        double actualLength = Cv2.ArcLength(approximatedPolygon, true);

        if(actualLength > maxLength)
        {
             documentRect = approximatedPolygon;
        }
    }
}

documentRect = documentRect.OrderBy(p => p.X).ToArray();
documentRect = documentRect.Take(2).OrderBy(p => p.Y).Concat(documentRect.Skip(2).OrderBy(p => p.Y)).ToArray();

Point2f topLeft = documentRect[0];
Point2f bottomLeft = documentRect[1];
Point2f topRight = documentRect[2];
Point2f bottomRight = documentRect[3];

(topLeft, bottomLeft, topRight, bottomRight).Display();
int targetWidth = (int)Math.Max(topRight.X - topLeft.X, bottomRight.X - bottomLeft.X);
int targetHeight = (int)Math.Max(bottomLeft.Y - topLeft.Y, bottomRight.Y - topRight.Y);

Point2f[] targetPoints = new[] {new Point2f(0,0), new Point2f(0, targetHeight), new Point2f(targetWidth, 0), new Point2f(targetWidth, targetHeight)};

Item1,Item2,Item3,Item4
"{ (x:303 y:287): X: 303, Y: 287 }","{ (x:55 y:1672): X: 55, Y: 1672 }","{ (x:1336 y:515): X: 1336, Y: 515 }","{ (x:992 y:1873): X: 992, Y: 1873 }"


In [13]:
Point2f[] targetPoints = new[] {new Point2f(0,0), new Point2f(0, targetHeight), new Point2f(targetWidth, 0), new Point2f(targetWidth, targetHeight)};

Mat transformation = Cv2.GetPerspectiveTransform(new[]{topLeft, bottomLeft, topRight, bottomRight}, targetPoints);

Mat corrected = src.Clone();
Cv2.WarpPerspective(src, corrected, transformation, new Size(targetWidth, targetHeight));

corrected.Display();

# 2. Text Recognition / OCR

- Lots of training material (and easily generated) -> Deep-Learning
- Open-Source OCR: *Tesseract* by Google
- Preprocessing: Thresholding (Tesseract perfoms Otsu internally)

In [14]:
Mat cropped = corrected.SubMat(0, 650, 0, corrected.Width);
Mat greyscale = cropped.CvtColor(ColorConversionCodes.BGR2GRAY);
Mat threshold = greyscale.Threshold(127,255, ThresholdTypes.Otsu);
threshold.Display();

In [15]:
List<(Rect rect, string text)> detectedText = new();

using(var tesseract = OpenCvSharp.Text.OCRTesseract.Create(Environment.CurrentDirectory, "eng"))
{
    tesseract.Run(threshold, out var outputText, out var componentRects, out var componentTexts, out var componentConfidences);
    
    Mat copy = cropped.Clone();
    foreach(var (rect, index) in componentRects.Select((r,i) => (r,i)))
    {
        string text = componentTexts[index];

        if(!string.IsNullOrWhiteSpace(text))
        {
            Cv2.Rectangle(copy, rect, color: new Scalar(0,0,255));
            Cv2.PutText(copy, text, new Point(rect.Left + 100, rect.Top - 10), HersheyFonts.HersheyPlain, fontScale: 2, color: new Scalar(255,0,0));
            detectedText.Add((rect, text));
        }
    }
 
    display(copy);
}

# 3. Recognize checkboxes

- Same idea as used for dewarping
- Find *Quadratic* contours i.e. squares

In [16]:
Point[][] contours = threshold.FindContoursAsArray(RetrievalModes.Tree, ContourApproximationModes.ApproxSimple);

List<Rect> checkBoxes = new();

for(int i = 0; i < contours.Length; i++)
{
    double length = Cv2.ArcLength(contours[i], true);
    Point[] approxPoly = Cv2.ApproxPolyDP(contours[i], 0.14 * length, true);
    Rect boundingRect = Cv2.BoundingRect(approxPoly);
    double area = boundingRect.Width * boundingRect.Height;

    if(approxPoly.Length == 4 && Math.Abs(1 - boundingRect.Width / (double)boundingRect.Height) < 0.1 && area > 120)
    {
        checkBoxes.Add(boundingRect);
    }
}

Mat copy = cropped.Clone();

foreach(Rect checkBox in checkBoxes)
{
    copy.Rectangle(checkBox, new Scalar(0,0,255), 2);
}

copy.Display();

In [17]:
using OpenCvSharp.Dnn;
CvDnn.NMSBoxes(checkBoxes, checkBoxes.Select(_ => 1f), 0.8f, 0.3f, out int[] indices);

copy = cropped.Clone();
checkBoxes = checkBoxes.OrderBy(c => c.Top).ToList();
foreach(int i in indices)
{
    copy.Rectangle(checkBoxes[i], new Scalar(0,0,255), 1);
}

copy.Display();

In [18]:
List<(Rect location, bool isChecked, Mat roi)> checkBoxesWithValue = new();

foreach(int i in indices)
{
    Rect rect = checkBoxes[i];
    rect.Inflate(-(int)(0.2 * rect.Width), -(int)(0.2 * rect.Height));
    Mat roi = threshold.SubMat(rect);
    bool isChecked = roi.CountNonZero() / (double)roi.Total() > 0.9 ? false : true;
    checkBoxesWithValue.Add((rect, isChecked, roi));
}

checkBoxesWithValue.Select(l => (l.location.Left, l.location.Top, l.isChecked, l.roi)).Display();

index,Item1,Item2,Item3,Item4
0,404,285,True,
1,406,286,True,
2,404,308,True,
3,404,331,True,
4,404,354,True,
5,404,377,False,
6,404,405,True,
7,406,429,False,


# 4. Association of data

Idea
- Label & value are on the same row => find table rows
- Fuzzy-search each label (Levenshtein-Distance)
- Search on the same line for the closest value (Text / Checkbox)

# 4.1 Line detection

In [19]:
Mat lines = threshold.Clone();
Cv2.BitwiseNot(lines, lines);
lines.Display()

In [20]:
Mat horizontalStructure = Cv2.GetStructuringElement(MorphShapes.Rect, new Size(lines.Width / 20, 1));

Cv2.Erode(lines, lines, horizontalStructure);
lines.Display();

In [21]:
Mat largeHorizontalStructure = Cv2.GetStructuringElement(MorphShapes.Rect, new Size(lines.Width, 1));

Cv2.Dilate(lines, lines, largeHorizontalStructure);
lines.Display();

In [22]:
LineSegmentPoint[] detectedLines = Cv2.HoughLinesP(lines, rho: 1, theta: Math.PI / 180 * 30, threshold: 10, minLineLength: lines.Width / 2, maxLineGap: 0);
Mat copy = cropped.Clone();

foreach(LineSegmentPoint line in detectedLines)
    copy.Line(line.P1, line.P2, new Scalar(0,0,255));

display(copy);

In [23]:
int[] lineSeperators = detectedLines.GroupBy(l => l.P1.Y / 20 * 20).Select(g => (int)g.Select(l => l.P1.Y).Average()).OrderBy(y => y).ToArray();
lineSeperators.Display();

int getLine(int y, int[] lineIndices)
{
    (bool match, int line) = lineIndices.Select((y_line, i) => (match: y_line > y, i)).FirstOrDefault(l => l.match);
    return match ? line : lineIndices.Length;
}

getLine(310, lineSeperators).Display();

index,value
0,223
1,248
2,275
3,395
4,469


# 4.2 Association of values

In [24]:
static Point Center(this Rect rect) => (rect.TopLeft + rect.BottomRight) * 0.5;

var textsWithLine = detectedText.Select(t => (line: getLine(t.rect.Center().Y, lineSeperators), t.rect, t.text)).ToList();
var textsByLine = textsWithLine.ToLookup(t => t.line, t => (t.rect, t.text));

var checkBoxesWithLine = checkBoxesWithValue.Select(t => (line: getLine(t.location.Top, lineSeperators), t.location, t.isChecked)).ToList();
var checkBoxesByLine = checkBoxesWithLine.ToLookup(t => t.line, t => (t.location, t.isChecked));


In [25]:
LevenshteinDistance.Calculate(".Net", "BNet").Display();
LevenshteinDistance.Calculate("Computer Vision", "DComputerVision").Display();

In [26]:
(int line, Rect rect, string text) getLine(string label) {
    return textsWithLine.MinBy(t => LevenshteinDistance.Calculate(label, t.text));
}

string getTextValue(string label)
{
    (int line, Rect rect, string actualLabel) = getLine(label);    
    return textsByLine[line].Where(x => x.text != actualLabel).MinBy(x => x.rect.Center().DistanceTo(rect.Center())).text;
}

bool getIsChecked(string label)
{
    (int line, Rect rect, string actualLabel) = getLine(label);
    return checkBoxesByLine[line].MinBy(x => x.location.TopLeft.DistanceTo(rect.TopLeft)).isChecked;
}

In [27]:
getTextValue("First name")

Alex

In [28]:
getIsChecked(".NET")

In [29]:
public record Interests(bool Net, bool ComputerVision, bool MachineLearning, bool Performance, bool Testing){}
public record LunchPreferences(bool Vegetarian, bool Meat, bool Vegan){}
public record ScanResult(string FirstName, string Company, Interests Interests, LunchPreferences Lunch){}

In [30]:
using System.Text.Json;

var result = new ScanResult(getTextValue("First name"), getTextValue("Company"),
               new Interests(getIsChecked(".NET"), getIsChecked("Computer Vision"), getIsChecked("Machine Learning"), getIsChecked("Performance"), getIsChecked("Testing")), 
               new LunchPreferences(getIsChecked("Vegetarian"), getIsChecked("Meat"), getIsChecked("Vegan")));

JsonSerializer.Serialize(result, new JsonSerializerOptions() { WriteIndented = true } ).Display();

{
  "FirstName": "Alex",
  "Company": "Moser",
  "Interests": {
    "Net": true,
    "ComputerVision": true,
    "MachineLearning": true,
    "Performance": true,
    "Testing": false
  },
  "Lunch": {
    "Vegetarian": true,
    "Meat": false,
    "Vegan": false
  }
}

# Conclusions 🚀

- Classical algorithms vs. Deep Learning = Manual fine tuning vs. lots of data
- Built an end-to-end scanner in about *one day*
- (C#) Notebooks should be in the toolkit of every dev - for explorative coding or presentations

- OpenCV(Sharp) & C# = ❤
- Combination of classic CV & Deep Learning = ❤
- Jupyter Notebooks & RISE/Reveal.js for a live code demo = ❤

### Questions?