Skip to content

Extract image content from historical book scans

Notifications You must be signed in to change notification settings

YaleDHLab/shears

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shears

Extract pictures from historical book scans.

Installation

pip install shears

Basic Usage

Suppose you want to extract the image content within the following page scan:

Sample book page scan

Assuming you have saved the page scan to your current working directory, you can extract the image content with the following:

import shears

# extract the image content
result = shears.clip('input.jpg')

# show the extracted image
shears.plot_image(result)

# save the extracted image
shears.save_image(result, 'result.jpg')

This returns and saves the following image:

Sample cropped illustration

That's all it takes! The examples below show how to process more complex input images.

Processing Book Scans

Suppose you want to extract the illustration content from the page scan below:

Sample book page scan

To extract illustrations in pages like this, one can pass filter arguments to shears:

import shears

# use the filter parameters to pull out the illustration on a page
result = shears.clip(i,
                      filter_min_size=900,
                      filter_threshold=0.8,
                      filter_connectivity=1)

# show the extracted illustration
shears.plot_image(result, 'Extracted Image')

This returns the following image:

Sample cropped illustration

For additional examples, please see the sample notebooks in this repository.

Testing

To run the test suite, one can run:

pytest

About

Extract image content from historical book scans

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages