Skip to content

Latest commit

 

History

History

FindingTheSharpestImageInASequenceOfCapturedImages

Finding the Sharpest Image in a Sequence of Captured Images

Share image data between vDSP and vImage to compute the sharpest image from a bracketed photo sequence.

Overview

This sample code project captures a sequence of photographs and uses a combination of routines from vImage and vDSP to order the images by their relative sharpness. This technique is useful in applications such as an image scanner, where your user requires the least blurry captured image. After applying the routines, the app displays the images in a list, with the sharpest image at the top:

A screenshot of the sample app showing four rows. Each row contains, on the left, the original image, and, on the right, the convolved image. The images are ordered by decreasing sharpness.

This project uses SwiftUI to build the user interface; AVFoundation to capture a sequence of images; and a method known as the variance of Laplacian to determine the sharpness of each image.

This sample walks you through the steps to find the sharpest image in a sequence of captured images:

  1. Configure the capture session.
  2. Define the photo settings for the sequence of captured images.
  3. Acquire the images.
  4. Initialize the grayscale vImage buffer.
  5. Create floating point pixels to use in vDSP.
  6. Convolve the image using a 3 x 3 single-pass edge detection Laplacian kernel (the result of this convolution pass is also shown in the app's user interface).
  7. Calculate the variance, or how spread out the pixel values are, in the convolved image.
  8. Create a vImage buffer from the vDSP convolution result.
  9. Create a display image with correct orientation.

Configure the Capture Session

The 3 x 3 Laplacian kernel used in this sample will report a lot of noise if applied to a full-resolution image. To reduce this noise, use a downscaled image by defining the capture session's preset to a size that's smaller than the camera's native resolution:

captureSession.sessionPreset = .hd1280x720

To learn more about configuring a capture session, see Setting Up a Capture Session.

Define the Photo Settings

The sample defines the AVCapturePhotoBracketSettings object, that specifies the capture features and settings, in BlurDetector.takePhoto().

The sharpness detection algorithm in this sample works on a grayscale image. Use one of the camera's YpCbCr pixel formats, either kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange or kCVPixelFormatType_420YpCbCr8BiPlanarFullRange. These formats represent the luminance of the image using one plane, and color information on separate planes.

The following code checks that the current device supports one or both of these formats:

let pixelFormat: FourCharCode = {
    if photoOutput.availablePhotoPixelFormatTypes
        .contains(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange) {
        return kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
    } else if photoOutput.availablePhotoPixelFormatTypes
        .contains(kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange) {
        return kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange
    } else {
        fatalError("No available YpCbCr formats.")
    }
}()

Create an array of AVCaptureAutoExposureBracketedStillImageSettings instances and define the exposure target bias of each to currentExposureTargetBias. The maximum number of items in the array is defined by the maxBracketedCapturePhotoCount property of the AVCapturePhotoOutput object:

let exposureSettings = (0 ..< photoOutput.maxBracketedCapturePhotoCount).map { _ in
    AVCaptureAutoExposureBracketedStillImageSettings.autoExposureSettings(
        exposureTargetBias: AVCaptureDevice.currentExposureTargetBias)
}

Use the array of exposure settings and the first available YpCbCr format type to define the bracketed settings:

let photoSettings = AVCapturePhotoBracketSettings(
    rawPixelFormatType: 0,
    processedFormat: [kCVPixelBufferPixelFormatTypeKey as String: pixelFormat],
    bracketedSettings: exposureSettings)

Use the AVCapturePhotoBracketSettings instance to capture the sequence of images:

photoOutput.capturePhoto(with: photoSettings,
                         delegate: self)

Acquire the Captured Image

For each captured image, AVFoundation calls the photoOutput(_:didFinishProcessingPhoto:error:) method.

Use the pixelBuffer property of the AVCapturePhoto instance supplied by AVFoundation to acquire the uncompressed CVPixelBuffer that contains the captured photograph. While your code is accessing the pixel data of the pixel buffer, use CVPixelBufferLockBaseAddress to lock the base address:

guard let pixelBuffer = photo.pixelBuffer else {
    fatalError("Error acquiring pixel buffer.")
}

CVPixelBufferLockBaseAddress(pixelBuffer,
                             CVPixelBufferLockFlags.readOnly)

The pixel buffer vended by AVFoundation contains two planes; it is the plane at index zero that contains the luminance data. To run the sharpness detection code in a background thread, use copyMemory to create a copy of the luminance data:

let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, 0)
let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, 0)
let count = width * height

let lumaBaseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
let lumaRowBytes = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)

let lumaCopy = UnsafeMutableRawPointer.allocate(byteCount: count,
                                                alignment: MemoryLayout<Pixel_8>.alignment)
lumaCopy.copyMemory(from: lumaBaseAddress!,
                    byteCount: count)

You can now unlock the pixel buffer's base address and pass the copied luminance data to the processing function in a background thread:

CVPixelBufferUnlockBaseAddress(pixelBuffer,
                               CVPixelBufferLockFlags.readOnly)

DispatchQueue.global(qos: .utility).async {
    self.processImage(data: lumaCopy,
                      rowBytes: lumaRowBytes,
                      width: width,
                      height: height,
                      sequenceCount: photo.sequenceCount,
                      expectedCount: photo.resolvedSettings.expectedPhotoCount,
                      orientation: photo.metadata[ String(kCGImagePropertyOrientation) ] as? UInt32)
    
    lumaCopy.deallocate()
}

Initialize Grayscale vImage Source Buffer

Create a vImage buffer from data passed to the processImage function:

var sourceBuffer = vImage_Buffer(data: data,
                                 height: vImagePixelCount(height),
                                 width: vImagePixelCount(width),
                                 rowBytes: rowBytes)

On return, sourceBuffer contains a grayscale representation of the captured image.

Create Floating Point Pixels to Use in vDSP

vImage buffers store their image data in row major format. However, when you are passing data between vImage and vDSP, be aware that, in some cases, vImage will add extra bytes at the end of each row. For example, the following code declares an 8-bit per pixel buffer that's 10 pixels wide:

let buffer = try? vImage_Buffer(width: 10,
                                height: 5,
                                bitsPerPixel: 8)

Although the code defines a buffer with 10 bytes per row, to maximize performance, vImageBuffer_Init() will initialize a buffer with 16 bytes per row:

Diagram showing the visible pixels and the padding of a vImage buffer.

In some cases, this disparity between the row bytes used to hold image data and the buffer's actual row bytes may not affect your app's results. For this sample, compare the destination's rowBytes property against its width, multiplied by the stride of a Pixel_8 and, if the values are the same, you can infer there's no row byte padding and simply pass a pointer to the vImage buffer's data to vDSP's integerToFloatingPoint method:

var floatPixels: [Float]
let count = width * height

if sourceBuffer.rowBytes == width * MemoryLayout<Pixel_8>.stride {
    let start = sourceBuffer.data.assumingMemoryBound(to: Pixel_8.self)
    floatPixels = vDSP.integerToFloatingPoint(
        UnsafeMutableBufferPointer(start: start,
                                   count: count),
        floatingPointType: Float.self)

However, in the case where there is row byte padding, create an intermediate vImage buffer with explicit row bytes and use the vImage vImageConvert_Planar8toPlanarF function to populate floatPixels:

} else {
    floatPixels = [Float](unsafeUninitializedCapacity: count) {
        buffer, initializedCount in
        
        var floatBuffer = vImage_Buffer(data: buffer.baseAddress,
                                        height: sourceBuffer.height,
                                        width: sourceBuffer.width,
                                        rowBytes: width * MemoryLayout<Float>.size)
        
        vImageConvert_Planar8toPlanarF(&sourceBuffer,
                                       &floatBuffer,
                                       0, 255,
                                       vImage_Flags(kvImageNoFlags))
        
        initializedCount = count
    }
}

Perform the Convolution

The Laplacian kernel finds edges in the single-precision pixel values. Define the kernel as an array:

let laplacian: [Float] = [-1, -1, -1,
                          -1,  8, -1,
                          -1, -1, -1]

Use the vDSP convolve function to perform the convolution in-place on the floatPixels array:

vDSP.convolve(floatPixels,
              rowCount: height,
              columnCount: width,
              with3x3Kernel: laplacian,
              result: &floatPixels)

After the convolution, edges in the image have high values. The following image shows the result after convolution using the Laplacian kernel:

Photograph after Laplacian convolution. Edges appear as bright areas against a dark background.

Calculate the Standard Deviation

Use the vDSP_normalize function to calculate the standard deviation of the pixel values after the edge detection:

var mean = Float.nan
var stdDev = Float.nan

vDSP_normalize(floatPixels, 1,
               nil, 1,
               &mean, &stdDev,
               vDSP_Length(count))

On return, stdDev contains the standard deviation, and you use this value as a measure of relative sharpness. Images with more variance have more detail than those with less variance, and that difference is used to derive the relative sharpness.

Create a vImage Buffer from the vDSP Convolution Result

To display the result of the convolution, create a vImage buffer from the pixel data in floatPixels. The following clips the result to 0 ... 255. The clipping ensures there's no overflow when converting the single-precision values to unsigned 8-bit integers with floatingPointToInteger(_:integerType:rounding:):

let clippedPixels = vDSP.clip(floatPixels, to: 0 ... 255)
var pixel8Pixels = vDSP.floatingPointToInteger(clippedPixels,
                                               integerType: UInt8.self,
                                               rounding: .towardNearestInteger)

Pass pixel8Pixels to makeImage(fromPixels:width:height:gamma:orientation:). This function creates a new vImage buffer, using preferredAlignmentAndRowBytes(width:height:bitsPerPixel:) to compute the ideal row bytes for the image width:

static func makeImage(fromPixels pixels: inout [Pixel_8],
                      width: Int,
                      height: Int,
                      gamma: Float,
                      orientation: CGImagePropertyOrientation) -> CGImage? {
    
    let alignmentAndRowBytes = try? vImage_Buffer.preferredAlignmentAndRowBytes(
        width: width,
        height: height,
        bitsPerPixel: 8)
    
    let image: CGImage? = pixels.withUnsafeMutableBufferPointer {
        var buffer = vImage_Buffer(data: $0.baseAddress!,
                                   height: vImagePixelCount(height),
                                   width: vImagePixelCount(width),
                                   rowBytes: alignmentAndRowBytes?.rowBytes ?? width)
        
        vImagePiecewiseGamma_Planar8(&buffer,
                                     &buffer,
                                     [1, 0, 0],
                                     gamma,
                                     [1, 0],
                                     0,
                                     vImage_Flags(kvImageNoFlags))
        
        return BlurDetector.makeImage(fromPlanarBuffer: buffer,
                                      orientation: orientation)
    }
    
    return image
}

makeImage(fromPixels:width:height:gamma:orientation:) applies a gamma function to the Laplacian result to improve its visibility in the user interface. To learn more about using gamma functions in vImage, see Adjusting the Brightness and Contrast of an Image.

Create a Display Image with Correct Orientation

Use the vImage 90º rotation functions in conjunction with the CGImage objects's orientation to create a vImage buffer suitable for displaying in the app. The static BlurDetector.makeImage(fromPlanarBuffer:orientation:) function accepts a planar buffer (either the grayscale representation of the captured image or the result of the convolution) and the orientation, and returns a CGImage instance:

static func makeImage(fromPlanarBuffer sourceBuffer: vImage_Buffer,
                      orientation: CGImagePropertyOrientation) -> CGImage? {
    
    guard  let monoFormat = vImage_CGImageFormat(bitsPerComponent: 8,
                                                 bitsPerPixel: 8,
                                                 colorSpace: CGColorSpaceCreateDeviceGray(),
                                                 bitmapInfo: []) else {
                                                    return nil
    }

For landscape images, that is images with an orientation of .left of .right, the function creates a destination buffer with a width equal to the height, and a height equal to the width of the supplied buffer. For portrait images, that is images with an orientation of .up or .down, the function creates a destination buffer with the same orientation as the supplied buffer:

var outputBuffer: vImage_Buffer
var outputRotation: Int

do {
    if orientation == .right || orientation == .left {
        outputBuffer = try vImage_Buffer(width: Int(sourceBuffer.height),
                                         height: Int(sourceBuffer.width),
                                         bitsPerPixel: 8)
        
        outputRotation = orientation == .right ?
            kRotate90DegreesClockwise : kRotate90DegreesCounterClockwise
    } else if orientation == .up || orientation == .down {
        outputBuffer = try vImage_Buffer(width: Int(sourceBuffer.width),
                                         height: Int(sourceBuffer.height),
                                         bitsPerPixel: 8)
        outputRotation = orientation == .down ?
            kRotate180DegreesClockwise : kRotate0DegreesClockwise
    } else {
        return nil
    }
} catch {
    return nil
}

The destination buffer is populated using vImageRotate90_Planar8:

withUnsafePointer(to: sourceBuffer) { src in
    error = vImageRotate90_Planar8(src,
                                   &outputBuffer,
                                   UInt8(outputRotation),
                                   0,
                                   vImage_Flags(kvImageNoFlags))
}

Finally, the function returns a CGImage from the destination buffer:

return try? outputBuffer.createCGImage(format: monoFormat)