Skip to content

SyncfusionExamples/ocr-examples-csharp

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

Optical Character Recognition (OCR) Made Easy with Syncfusion OCR processor library

Optical Character Recognition (OCR) technology plays a vital role in transforming printed or handwritten text into editable and searchable content.The Syncfusion OCR processor library have extended support to process OCR on scanned PDF documents and images with the help of Google's Tesseract Optical Character Recognition engine. Within this repository, you'll find various examples demonstrating OCR performance on scanned PDF documents, showcasing different available options. Those options are,

  • OCR for an entire scanned paper document
  • OCR for a region of the scanned PDF document
  • OCR an image and convert it to a PDF document
  • OCR on rotated page of PDF document
  • Get OCRed text and bounds from scanned PDF document
  • Perform OCR with Unicode characters
Sample name Description
OCR on entire PDF Convert the entire scanned PDF document into searchable PDF document.
OCR for region of PDF Convert the region of scanned PDF document into searchable PDF document.
OCR on image and convert to PDF Convert a scanned image into a searchable and selectable PDF document.
OCR on rotated PDF Convert rotated scanned PDF document into searchable PDF document.
Get OCRed text and bounds from scanned PDF Retrieve OCR'ed text and its bounds from a scanned PDF document.
OCR on unicode characters Perform OCR with Unicode characters in image file.

OCR for an entire scanned PDF document

Leveraging our library, you can effortlessly transform a complete scanned PDF document into a searchable PDF, enabling quick and efficient access to the extracted textual content.

//Initialize the OCR processor. 
using (OCRProcessor processor = new OCRProcessor()) 
{ 
    //Load an existing PDF document. 
    FileStream inputPDFstream = new FileStream("Input.pdf", FileMode.Open); 
    PdfLoadedDocument document = new PdfLoadedDocument(inputPDFstream); 
    //Set OCR language. 
    processor.Settings.Language = "lat"; 
    //Perform OCR with input document. 
    processor.PerformOCR(document, "Tessdata/");   
    //Create file stream. 
    using (FileStream outputFileStream = new FileStream("Output.pdf", FileMode.Create, FileAccess.ReadWrite)) 
    { 
        //Save the PDF document to file stream. 
        document.Save(outputFileStream); 
    } 
}

By executing this code example, you will get a PDF document like in the following screenshot. Filled PDF Form

OCR for a region of the scanned PDF document

Our library empowers you to conduct OCR on specific regions, or multiple regions of a scanned PDF document effortlessly.

//Initialize the OCR processor. 
using (OCRProcessor processor = new OCRProcessor()) 
{ 
    //Load a PDF document. 
    FileStream inputPDFStream = new FileStream("Input.pdf", FileMode.Open); 
    PdfLoadedDocument loadedDocument = new PdfLoadedDocument(inputPDFStream); 
    //Set OCR language to process. 
    processor.Settings.Language = "lat"; 
    RectangleF rectangle = new RectangleF(0, 100, 950, 150); 
    //Assign rectangles to the page. 
    List<PageRegion> pageRegions = new List<PageRegion>(); 
    PageRegion region = new PageRegion(); 
    region.PageIndex = 0; 
    region.PageRegions = new RectangleF[] { rectangle }; 
    pageRegions.Add(region); 
    processor.Settings.Regions = pageRegions; 
    //Process OCR by providing the PDF document. 
    processor.PerformOCR(loadedDocument, "Tessdata/"); 
    //Create file stream. 
    using (FileStream outputFileStream = new FileStream("Output.pdf", FileMode.Create, FileAccess.ReadWrite)) 
    { 
        //Save the PDF document to file stream. 
        loadedDocument.Save(outputFileStream); 
    } 
} 

By executing this code example, you will get a PDF document like in the following screenshot. Filled PDF Form

OCR on image and convert it to a PDF document

With the aid of our library, any scanned image can be transformed into a searchable and selectable PDF document with ease.

//Initialize the OCR processor. 
using (OCRProcessor processor = new OCRProcessor()) 
{ 
    //Get stream from an image file.  
    FileStream imageStream = new FileStream(@"Input.jpg", FileMode.Open); 
    //Set OCR language to process. 
    processor.Settings.Language = Languages.English; 
    //Process OCR by providing the bitmap image.   
    PdfDocument document = processor.PerformOCR(imageStream); 
    //Create file stream. 
    using (FileStream outputFileStream = new FileStream(@"Output.pdf", FileMode.Create, FileAccess.ReadWrite)) 
    { 
        //Save the PDF document to file stream. 
        document.Save(outputFileStream); 
    } 
}

By executing this code example, you will get a PDF document like in the following screenshot. Filled PDF Form

OCR on rotated page of PDF document

Here is the code example demonstrating how to perform OCR on a rotated PDF document.

//Initialize the OCR processor. 
using (OCRProcessor processor = new OCRProcessor()) 
{ 
    //Load an existing PDF document. 
    FileStream stream = new FileStream("Input.pdf", FileMode.Open); 
    PdfLoadedDocument document = new PdfLoadedDocument(stream); 
    //Set OCR language. 
    processor.Settings.Language = "lat"; 
    //Set OCR page auto detection rotation. 
    processor.Settings.PageSegment = PageSegMode.AutoOsd; 
    //Perform OCR with input document and tessdata (Language packs). 
    string extractedText = processor.PerformOCR(document, "Tessdata/"); 
    //Writes the text to the file. 
    File.WriteAllText("OCR.txt", extractedText); 
}

By executing this code example, you will get a text document like in the following screenshot. Filled PDF Form

Get OCRed text and bounds from scanned PDF document

By utilizing our library, you can easily obtain OCRed text and its corresponding bounds from a scanned PDF document.

//Initialize the OCR processor. 
using (OCRProcessor processor = new OCRProcessor()) 
{ 
    //Load an existing PDF document. 
    FileStream stream = new FileStream("Input.pdf", FileMode.Open); 
    PdfLoadedDocument document = new PdfLoadedDocument(stream); 
    //Set OCR language. 
    processor.Settings.Language = "lat"; 
    //Create the layout result.  
    OCRLayoutResult layoutResult = new OCRLayoutResult(); 
    //Perform OCR with input document and tessdata (Language packs). 
    processor.PerformOCR(document, @"Tessdata/", out layoutResult); 
    //Get OCRed line collection from first page. 
    OCRLineCollection lines = layoutResult.Pages[0].Lines; 
    //Get each OCR'ed line and its bounds. 
    foreach (Line line in lines) 
    { 
        string text = line.Text; 
        RectangleF bounds = line.Rectangle; 
    } 
    //Close the document. 
    document.Close(true); 
} 

Perform OCR with Unicode characters

Below is the code example demonstrating how to perform OCR with Unicode characters in image file.

//Initialize the OCR processor by providing the path of tesseract. 
using (OCRProcessor processor = new OCRProcessor()) 
{ 
    //Get stream from an existing PDF document.  
    FileStream stream = new FileStream(Path.GetFullPath(@"UnicodePDF.pdf"), FileMode.Open); 
    //Load the PDF document.  
    PdfLoadedDocument loadedDocument = new PdfLoadedDocument(stream); 
    //Sets Unicode font to preserve the Unicode characters in a PDF document. 
    FileStream fontStream = new FileStream(Path.GetFullPath(@"ARIALUNI.ttf"), FileMode.Open); 
    //Set the font for unicode text.  
    processor.UnicodeFont = new PdfTrueTypeFont(fontStream, 8); 
    //Set OCR language to process 
    processor.Settings.Language = Languages.English; 
    //Process OCR by providing the PDF document. 
    string ocrText = processor.PerformOCR(loadedDocument); 
    //Create file stream. 
    using (FileStream outputFileStream = new FileStream(Path.GetFullPath(@"Output.pdf"), FileMode.Create, FileAccess.ReadWrite)) 
    { 
        //Save the PDF document to file stream. 
        loadedDocument.Save(outputFileStream); 
    } 
} 

By executing this code example, you will get a text document like in the following screenshot. Filled PDF Form

How to run the examples

  • Download this project to a location in your disk.
  • Open the solution file using Visual Studio.
  • Rebuild the solution to install the required NuGet package.
  • Run the application.

Resources

Support and feedback

License

This is a commercial product and requires a paid license for possession or use. Syncfusion’s licensed software, including this component, is subject to the terms and conditions of Syncfusion's EULA. You can purchase a licnense here or start a free 30-day trial here.

About Syncfusion

Founded in 2001 and headquartered in Research Triangle Park, N.C., Syncfusion has more than 26,000+ customers and more than 1 million users, including large financial institutions, Fortune 500 companies, and global IT consultancies.

Today, we provide 1600+ components and frameworks for web (Blazor, ASP.NET Core, ASP.NET MVC, ASP.NET WebForms, JavaScript, Angular, React, Vue, and Flutter), mobile (Xamarin, Flutter, UWP, and JavaScript), and desktop development (WinForms, WPF, WinUI(Preview), Flutter and UWP). We provide ready-to-deploy enterprise software for dashboards, reports, data integration, and big data processing. Many customers have saved millions in licensing fees by deploying our software.

About

The Syncfusion OCR processor library for processing OCR on both scanned PDF documents and images.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages