Skip to content

Extracting text from a specific area #337

Answered by InusualZ
niko86 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi,

Here is one of the way that you can do it.

var extractionPosition = new PdfRectangle(200, 300, 300, 600);

using var fileStream = File.OpenRead("test.pdf");
using var document = PdfDocument.Open(fileStream);

for (var pageIndex = 1; pageIndex <= document.NumberOfPages; ++pageIndex)
{
    var page = document.GetPage(pageIndex);
    var blocks = DocstrumBoundingBoxes.Instance.GetBlocks(page.GetWords(NearestNeighbourWordExtractor.Instance));

    foreach (var block in blocks)
    {
        if (!extractionPosition.IntersectsWith(block.BoundingBox))
        {
            continue;
        }

        // Do something with the text block...
    }
}

This is not the only way to do it, and you c…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@niko86
Comment options

Answer selected by niko86
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants