This guide will show you how to run the Threshold Optimizer. You will first need to have a project that can run the Extraction Benchmark successfully.
See Anonymizing Zones from Documents.md for an example of how to configure the Extraction Benchmark.
- Add the following script to the Project level script. The script below is for KTA. If you are using KTT, KTM or RPA then use the event Document_AfterProcess instead of Document_AfterExtract as Document_AfterProcess is run after the locators have run and after the fields have been both formatted and validated - this is important because we are looking at the valid value on each field. KTA is different as validation happens outside of KT.
'#Language "WWB-COM"
Option Explicit
' Class script: Document
Private Sub Document_AfterExtract(ByVal pXDoc As CASCADELib.CscXDocument)
Dim F As Long, Field As CscXDocField, TruthDoc As New CscXDocument, Truth As CscXDocField
TruthDoc.Load(pXDoc.FileName)
Open "c:\temp\parascript_alpha.txt" For Append As #1
For F=0 To pXDoc.Fields.Count-1
Set Field=pXDoc.Fields(F)
If Field.PageIndex>-1 And TruthDoc.Fields.Exists(Field.Name) Then
Set Truth=TruthDoc.Fields.ItemByName(Field.Name)
If Truth.Text<>"" Then
Print #1, FileName_WithoutPath(pXDoc.FileName) & vbTab & vbTab & Field.Name & vbTab & Truth.Text & vbTab & Field.Text;
Print #1, vbTab;
Print #1, Format(Field.Confidence,"0.00%") & vbTab & Format(String_LevenshteinDistance(Field.Text,Truth.Text,IgnoreCase:=True))
End If
End If
Next
Close #1
End Sub
- Add these functions as well. String_LevenshteinDistance/Min/Max and FileName_WithoutPath
- Replace the filename in the script with a name that makes sense for your configuration. I have parascript_alpha.txt because I am testing Parascript's Alphabetic OCR profile.
- Select all of your documents (CTRL-A) in the Test Window.
- Run "Extact Docmuents" (F6) to test all of your documents.
- Open the text *while the documents are being extracted you can view the live updates results file in Visual Studio Code l
- When Extraction has finished, copy the data from the output file (CTRL-A, CTRL-C).
- Duplicate the Worksheet in the Excel Document. Rename it. Paste data into A7 (CTRL-V).
- Add your data to the Summary page by just changing the reference of the cell in column A to point to the first cell in your new Worksheet.
- Go to Visual Studio Code and delete the contents of file (CTRL-A, Delete) and then save it (CTRL-S).