Shears

Extract pictures from historical book scans.

Installation

pip install shears

Basic Usage

Suppose you want to extract the image content within the following page scan:

Assuming you have saved the page scan to your current working directory, you can extract the image content with the following:

import shears

# extract the image content
result = shears.clip('input.jpg')

# show the extracted image
shears.plot_image(result)

# save the extracted image
shears.save_image(result, 'result.jpg')

This returns and saves the following image:

That's all it takes! The examples below show how to process more complex input images.

Processing Book Scans

Suppose you want to extract the illustration content from the page scan below:

To extract illustrations in pages like this, one can pass filter arguments to shears:

import shears

# use the filter parameters to pull out the illustration on a page
result = shears.clip(i,
                      filter_min_size=900,
                      filter_threshold=0.8,
                      filter_connectivity=1)

# show the extracted illustration
shears.plot_image(result, 'Extracted Image')

This returns the following image:

For additional examples, please see the sample notebooks in this repository.

Testing

To run the test suite, one can run:

pytest

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
examples		examples
sample-data		sample-data
shears		shears
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shears

Installation

Basic Usage

Processing Book Scans

Testing

About

Releases

Packages

Languages

YaleDHLab/shears

Folders and files

Latest commit

History

Repository files navigation

Shears

Installation

Basic Usage

Processing Book Scans

Testing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages