API

WolfgangFellger edited this page Oct 21, 2015 · 41 revisions
Clone this wiki locally

API Reference

This is the API reference of DocumentVision. It lists all functions that can be called, documents the parameters and provides small code snippets to facilitate learning.

Note: Since the API is far from final, this page is a work in progress.

Box

Boxes are convinience objects:

{
	"x": Number,
	"y": Number,
	"width": Number,
	"height": Number
}

The following rules apply when numbers are rounded to pixel coordinates: Math.floor(x), Math.floor(y), Math.ceil(width) and Math.ceil(height).

Image

An Image object stores either a RGB or grayscale bitmap with 8 bit per channel, or a monochrome bitmap with 1 bit per pixel. Loading 32 bit RGBA is supported for convenience, but the alpha channel will be discarded. Some operations are only available for certain bitdepths, and no automatic conversion is done. See toGray, threshold, and otsuAdaptiveThreshold for that.

Its mostly backed by Leptonica. Operations will return the result in a copy of the image unless noted in the description. Even if they don't, they will return the original object so all operations are chainable.

Image#constructor(otherImage)

Creates a copy of otherImage.

Image#constructor(image1, image2, image3)

Creates a 32 bit imagen from three 8 bit images, where each image represents one channel of RGB or HSV.

Image#constructor(width, height, depth)

Creates an empty image with the specified dimensions (note: this constructor is experimental and likely to change).

Image#constructor('png', buffer)

Creates an image from a Buffer object, that contains the PNG encoded image.

Image#constructor('jpg', buffer)

Creates an image from a Buffer object, that contains the JPG encoded image.

Image#constructor('rgba', buffer, width, height)

Creates an image from an RGBA buffer (32 bit per pixel) with the specified width and height.

Image#constructor('rgb', buffer, width, height)

Creates an image from an RGB buffer (24 bit per pixel) with the specified width and height.

Image#constructor('gray', buffer, width, height)

Creates an image from a grayscale buffer (8 bit per pixel) with the specified width and height.

Image#width

Returns the width of the image in pixels.

Image#height

Returns the height of the image in pixels.

Image#depth

Returns the depth of the image in bits per pixel, i.e. one of 32 (color), 8 (grayscale) or 1 (monochrome).

Image#invert()

Returns the (boolean) inverse of this image.

Image#or(otherImage)

Returns the (boolean) union of two images with equal depth, aligning them to the upper left corner.

Image#and(otherImage)

Returns the (boolean) difference of two images with equal depth, aligning them to the upper left corner.

Image#xor(otherImage)

Returns the (boolean) exclusive disjunction of two images with equal depth, aligning them to the upper left corner.

Image#add(otherImage)

If the images are monochrome, dispatches to Leptonica's pixOr. Otherwise, returns the channelwise addition of b to a, clipped at 255.

Image#subtract(otherImage)

If the images are monochrome, dispatches to Leptonica's pixSubtract and is equivalent to a.and(b.invert()). For grayscale images, returns the pixelwise subtraction of b from a, clipped at zero. For color, the entire RGB value is subtracted instead of doing channelwise subtraction (ask Leptonica why).

Example:

redness = colorImage.toGray(1, 0, 0).subtract(colorImage.toGray(0, 0.5, 0.5))

Image#convolve(halfWidth, halfHeight)

Applies a convoltuion kernel with the specified dimensions. Image convolution is an operation where each destination pixel is computed based on a weighted sum of a set of nearby source pixels.

Image#unsharp(halfWidth, fraction)

Unsharp Masking creates an unsharp mask using halfWidth. The fraction determines how much of the edge is added back into image. The resulting image appears clearer, but it is generally less accurate.

The use of Image#unsharpMasking() is deprecated.

Image#rotate(angle)

Rotates the image around its center by the specified angle in degrees.

Image#scale(scale)

Scales an image proportionally by scale (1.0 = 100%).

Image#scale(scaleX, scaleY)

Scales an image by scaleX and scaleY (1.0 = 100%).

Image#crop(box)

Crops an image from this image by the specified rectangle and returns the resulting image.

Image#inRange(lower1, lower2, lower3, upper1, upper2, upper3)

Creates a mask by testing if pixels (RGB, HSV, ...) are between lower and upper. Formally speaking:

  lower1 ≤ pixel1 ≤ upper1
∧ lower2 ≤ pixel2 ≤ upper2
∧ lower3 ≤ pixel3 ≤ upper3

Image#histogram(mask)

Only available for grayscale images. Returns the histogram in an array of length 256, where each entry represents the fraction (0.0 to 1.0) of that color in the image.

The mask parameter is optional and must be a monochrome image of same width and height; only pixels where mask is 0 will be counted.

Image#projection(mode)

Computes the horizontal or vertical projection of an 1bpp or 8bpp image.

Image#setMasked(mask, value)

Sets the specified value to each pixel set in the mask.

Image#applyCurve(mapping, mask)

Note: this function actually changes the image!

Available for grayscale and color images. Channelwise maps each pixel of image using mapping, which must be an array of length 256 with integer values between 0 and 255.

The mask parameter is optional and must be a monochrome image of same width and height; only pixels where mask is 0 will be modified.

Image#rankFilter(width, height, rank)

Applies a rank (0.0 ... 1.0) filter of the specified width and height (think of it as radius) to this image and returns the result. If you set rank to 0.5 you'll get a Median Filter. Note that this type of filter works best with odd sizes like 3 or 5.

Image#octreeColorQuant(colors)

Color image quantization using an octree based algorithm. colors must be between 2 and 256. Note that support for the resulting palette image is highly experimental at this point; only toGray() and toBuffer('png') are guaranteed to work.

Image#medianCutQuant(colors)

Color image quantization using median cut algorithm. colors must be between 2 and 256. Note that support for the resulting palette image is highly experimental at this point; only toGray() and toBuffer('png') are guaranteed to work.

Image#threshold(value = 128)

Converts a grayscale image to monochrome using a global threshold. value must be between 0 and 255.

Image#toGray()

Converts an image to grayscale using default settings. Can be used to convert monochrome images back to grayscale.

Image#toGray(redWeight, greenWeight, blueWeight)

Converts an RGB image to grayscale using the specified widths for each channel.

Image#toGray(selector)

Converts an RGB image to grayscale by selecting either the 'min' or 'max' channel. This can act as a simple color filter: 'max' maps colored pixels towards white, while 'min' maps colored pixels towards black.

Image#toColor()

Converts a grayscale image to a color image.

Image#toHSV()

Converts from RGB to HSV color space. HSV has the following ranges:

  • Hue: [0 .. 239]
  • Saturation: [0 .. 255]
  • Value: [0 .. 255]

Imave#toRGB()

Converts from HSV to RGB color space.

Image#erode(width, height)

Applies an Erode Filter and returns the result.

Image#dilate(width, height)

Applies a Dilate Filter and returns the result.

Image#open(width, height)

Applies an Open Filter and returns the result.

Image#close(width, height)

Applies a Close Filter and returns the result.

Image#thin(type, connectivity, maxIterations)

Applies morphological thinning of type (fg or bg) with the specified connectivitiy (4 or 8) and maxIterations (0 to iterate until complete).

Image#maxDynamicRange(scale)

Scales an 8bpp image for maximum dynamic range. scale must be either log or linear.

Image#otsuAdaptiveThreshold(tileWidth, tileHeight, smoothWidth, smoothHeight, scoreFactor)

Applies Otsu's Method for computing the threshold of a grayscale image. It computes a threshold for each tile of the specified size and performs the threshold operation, resulting in a binary image for each tile. These are stitched into the final result.

The smooth size controls the a convolution kernel applied to threshold array (use 0 for no smoothing). The score factor controls the fraction of the max. Otsu score (typically 0.1; use 0.0 for standard Otsu). The result is returned as object containing two images:

{
	"thresholdValues": Image,
	"image": Image
}

Image#lineSegments(accuracy, maxLineSegments, useWeightedMeanShift)

Detects Line Segments with the specified accuracy (3 is a good start). The number of found line segments can be limited using maxLineSegments (0 is unlimited). The result is returned as array:

[
	{
		"p1": {x: Number, y: Number},
		"p2": {x: Number, y: Number},
		"error": Number
	}
]

Image#findSkew()

Only available for monochrome images. Tries to find the skew of this image. The resulting angle is in degree. The confidence is between 0.0 and 1.0. The result is returned as object:

{
	"angle": Number,
	"confidence": Number
}

Image#connectedComponents(connectivity)

Only available for monochrome images. Tries to extract connected components (think of flood fill). The connectivity can be specified as 4 or 8 directions. The result is returned as an array of objects:

[ {
	"x": Number,
	"y": Number,
	"width": Number,
	"height": Number
} ]

Image#distanceFunction(connectivity)

The Distance Function works on 1bpp images. It labels each pixel with the largest distance between this and any other pixel in its connected component. The connectivity is either 4 or 8.

Image#clearBox(box)

Note: this function actually changes the image!

Fills a specified rectangle with white.

Image#fillBox(box, value)

Note: this function actually changes the image!

Draws a filled rectangle to this image with the specified value. Works for 8bpp and 1bpp images.

Image#fillBox(box, r, g, b [, fraction])

Note: this function actually changes the image!

Draws a filled rectangle to this image in the specified color with an optional blending parameter (0.0: transparent; 1.0: no transparency).

Image#drawBox(box, borderWidth, operation)

Note: this function actually changes the image!

Draws a rectangle to this image with the specified border. The possible pixel manipulating operations are set, clear and flip.

Image#drawBox(box, borderWidth, red, green, blue, [frac])

Note: this function actually changes the image!

Draws a rectangle to this image with the specified border in the specified color with an optional blending parameter (0.0: transparent; 1.0: no transparency).

Image#drawLine(p1, p2, width, operation)

Note: this function actually changes the image!

Draws a line between p1 and p2 to this image with the specified line width. The possible pixel manipulating operations are set, clear and flip.

Image#drawLine(p1, p2, width, red, green, blue, [frac])

Note: this function actually changes the image!

Draws a line between p1 and p2 to this image with the specified line width in the specified color with an optional blending parameter (0.0: transparent; 1.0: no transparency).

Image#drawImage(image, box)

Note: this function actually changes the image!

Draws an image to this image with the specified destination box.

Image#toBuffer(format = 'raw')

Converts the Image in the specified format to a buffer.

Specifying raw returns the raw image data as buffer. For color images, the result contains three bytes per pixel in the order R, G, B; for grayscale and monochrome images, it contains one byte per pixel.

Specifying png returns a PNG encoded image as buffer.

Specifying jpg returns a JPG encoded image as buffer.

Tesseract

A Tesseract object represents an optical character recognition engine, that reads text using Tesseract from an image. Tesseract supports many langauges and fonts (see Tesseract/Downloads). New language files have to be installed in node-dv/tessdata.

Tesseract#constructor()

Creates a Tesseract engine with language set to english.

Tesseract#constructor(lang)

Creates a Tesseract engine with the specified language.

Tesseract#constructor(lang, image)

Creates a Tesseract engine with the specified language and image.

Tesseract#image

Accessor for the input image.

Tesseract#rectangle

Accessor for the rectangle that specifies a "visible" area on the image.

Tesseract#pageSegMode

Accessor for the page segmentation mode. Valid values are: osd_only, auto_osd, auto_only, auto, single_column, single_block_vert_text, single_block, single_line, single_word, circle_word, single_char, sparse_text, sparse_text_osd.

Tesseract#<variable>

Accessor for internal variables. Some of them are documented - when in doubt, use grep. To get a list of variables you can use this snippet:

Hint: if all you want is numbers set tessedit_char_whitelist = "0123456789".

var tesseract = new dv.Tesseract();
for (var key in tesseract) {
	if (typeof tesseract[key] !== 'function') {
		console.log(key + " = " + tesseract[key]);
	}
}

Tesseract#clear()

Clears the tesseract image and its last results.

Tesseract#clearAdaptiveClassifier()

Clears all adaptive classifiers (use this when results vary during scanning).

Tesseract#thresholdImage()

Returns the binarized image Tesseract uses for its recognition.

Tesseract#findRegions(recognize)

Returns an array of objects, that describe page layout regions with the following format:

[ {
	"box": {
		"x": Number,
		"y": Number,
		"width": Number,
		"height": Number
	},
	"text": String,
	"confidence": Number
} ]

You can omit text information by setting recognize = false, which is considerably faster.

Tesseract#findParagraphs(recognize)

Returns an array of objects, that describe paragraphs with the following format:

[ {
	"box": {
		"x": Number,
		"y": Number,
		"width": Number,
		"height": Number
	},
	"text": String,
	"confidence": Number
} ]

You can omit text information by setting recognize = false, which is considerably faster.

Tesseract#findTextLines(recognize)

Returns an array of objects, that describe text lines with the following format:

[ {
	"box": {
		"x": Number,
		"y": Number,
		"width": Number,
		"height": Number
	}
} ]

You can omit text information by setting recognize = false, which is considerably faster.

Tesseract#findWords(recognize)

Returns an array of objects, that describe words with the following format:

[ {
	"box": {
		"x": Number,
		"y": Number,
		"width": Number,
		"height": Number
	},
	"text": String,
	"confidence": Number
} ]

You can omit text information by setting recognize = false, which is considerably faster.

Tesseract#findSymbols(recognize)

Returns an array of objects, that describe symbols with the following format:

[ {
	"box": {
		"x": Number,
		"y": Number,
		"width": Number,
		"height": Number
	},
	"text": String,
	"confidence": Number,
	"choices": [ {
		"text": String,
		"confidence": Number 
	} ]
} ]

You can omit text information by setting recognize = false, which is considerably faster.

Tesseract#findText(format, [withConfidence])

Returns text in the specified format. Valid formats are: plain, unlv. Setting withConfidence will return an object instead of a string:

[ {
	"text": String,
	"confidence": Number,
} ]

Tesseract#findText(format, pageNumber[, withConfidence])

Returns text in the specified format. Valid formats are: plain, unlv. Setting withConfidence will return an object instead of a string:

[ {
	"text": String,
	"confidence": Number,
} ]

ZXing

A ZXing object represents a barcode reader. By default it attempts to decode all barcode formats that ZXing supports.

ZXing#constructor()

Default constructor.

ZXing#constructor(image)

Initializes a barcode reader with the specified image as input.

ZXing#image

Accessor for the input image this barcode reader operates on.

ZXing#formats

List of barcodes the reader tries to find. It's specified as an object and missing properties account as false:

{
	QR_CODE: true,
	DATA_MATRIX: true,
	PDF_417: true,
	UPC_E: true,
	UPC_A: true,
	EAN_8: true,
	EAN_13: true,
	CODE_128: true,
	CODE_39: true,
	ITF: true,
	AZTEC: true 
}

ZXing#tryHarder

If try harder is enabled, the barcode reader spends more time trying to find a barcode (optimize for accuracy, not speed).

ZXing#findCode()

Returns the first barcode found as an object with the following format:

{
	type: String,
	data: String,
	buffer: Buffer,
	points: [ {
		x: Number,
		y: Number
	} ]
}

type denotes the barcodes type. Possible values of type are: None, QR_CODE, DATA_MATRIX, PDF_417, UPC_E, UPC_A, EAN_8, EAN_13, CODE_128, CODE_39, ITF, AZTEC. data denotes the stringified data read from the barcode. buffer denotes the decoded binary data of the barcode before conversion into another character encoding. points denotes the points in pixels which were used by the barcode reader to detect the barcode.