How do i extract images from pdf file #26

muntasirhossain1 · 2024-04-03T10:46:26Z

I want to extract all the images.

dmester · 2024-04-06T16:28:39Z

There is no direct API for extracting images, but you can get hold of the images by implementing a custom ImageResolver. It's bit of a hack, but here is a working example:

private class ImageExtractor : ImageResolver
{
    private string outputDirectory;
    private int count;

    public ImageExtractor(string outputDirectory)
    {
        this.outputDirectory = outputDirectory;
    }

    public override string ResolveImageUrl(Image image, CancellationToken cancellationToken)
    {
        var content = image.GetContent(cancellationToken);
        var extension = image.ContentType == "image/jpeg" ? ".jpeg" : ".png";
        var outputFileName = "image" + ++count + extension;
        var outputPath = Path.Combine(outputDirectory, outputFileName);

        File.WriteAllBytes(outputPath, content);

        return outputFileName;
    }
}

public static void Main()
{
    var inputFile = "<enter path to PDF here>";
    var outputDir = "<enter path to output directory here>";

    using (var doc = PdfDocument.Open(inputFile))
    {
        var options = new SvgConversionOptions
        {
            ImageResolver = new ImageExtractor(outputDir),
        };

        foreach (var page in doc.Pages)
        {
            page.ToSvgString(options);
        }
    }
}

I'll see if I can add a dedicated API for accessing images in a future version.

dmester · 2024-04-20T13:27:18Z

There is now a dedicated API for accessing images from a PDF:

using (var document = PdfDocument.Open("input.pdf"))
{
    var imageNo = 1;

    foreach (var image in document.Images)
    {
        var content = image.GetContent();
        var fileName = $"image{imageNo++}{image.Extension}";
        File.WriteAllBytes(fileName, content);
    }
}

This was added in version 1.3.0

dmester added a commit that referenced this issue Apr 20, 2024

API for extracting images (#26)

ad1b1db

dmester closed this as completed Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do i extract images from pdf file #26

How do i extract images from pdf file #26

muntasirhossain1 commented Apr 3, 2024

dmester commented Apr 6, 2024

dmester commented Apr 20, 2024

How do i extract images from pdf file #26

How do i extract images from pdf file #26

Comments

muntasirhossain1 commented Apr 3, 2024

dmester commented Apr 6, 2024

dmester commented Apr 20, 2024