# OCR on packaging using Google Vison API - can we extract ingredients or nutritional information? 

Initial exploratory work using the Google Vision API - text detection and document detection. 
- Attempted to use the text and document detection functionality out of the box but it proved to be imperfect. 
- Experimented with weather paragraph or block detection works better using the document detection API. 
- Tests on whole package images and on photos of just the ingredients for the Open Food Facts db. 
- Simple flow (WIP) to extract the list of ingredients. 
- Various issues encountered and possible solutions documented in the notes file. 


In [16]:
import io
import os
import cv2 as cv
import numpy as np

# Imports the Google Cloud client library
from google.cloud import vision
from google.cloud.vision import types

from enum import Enum
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt 
%matplotlib inline

# Instantiates a client
client = vision.ImageAnnotatorClient()

Use some test images: Those in the 'test_images/whole' folder contain photos of the whole side of the packaging and those in 'test_images/partial' contain just the part we're interested, i.e. just ingredients or just nutrition etc. 

In [2]:
tests_whole = {}
for img in os.listdir('test_images/whole'):
    tests_whole[img.split('.')[0]] = os.path.join('test_images/whole', img)

tests_partial = {}
for img in os.listdir('test_images/partial'):
    tests_partial[img.split('.')[0]] = os.path.join('test_images/partial', img)

## Attempt 1: Google text detection API 
- Using this example: https://cloud.google.com/vision/docs/detecting-text#vision-text-detection-python
- Not very useful in this case as it simply spits out all the text. In the case of multiple columns or text boxes etc. it doesn't group the text correctly but reads it horizontally. Note that the same propblem happens with the document detection API, too (see below). 

In [3]:
def detect_text(path):
    """Detects text in the file."""
    from google.cloud import vision
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.text_detection(image=image)
    texts = response.text_annotations
    print('Texts:')

    for text in texts:
        print('\n"{}"'.format(text.description))

        vertices = (['({},{})'.format(vertex.x, vertex.y)
                    for vertex in text.bounding_poly.vertices])

        print('bounds: {}'.format(','.join(vertices)))

In [10]:
# test with examples to get a feel for the output
detect_text(tests_whole['test_2'])

Texts:

"200
f an honds, Brazil nuts, cashews and wairu's
EGAN
or best before end: see ba
Store in a cool dry place. Once
tain freshness
Great for all ofu
|
Source of Protein: Protek
the growth and mai
muscle mass.
as part of a
for best betoilypla
Enjowced diet and a
Great to know
utrition
values per 100g per 30g
Iladul RI
serving Der 300
Not Yet Recycled
2665 799
We're sure you't love this
product. li you don't,simpl
return for afull refund
Or, call our careline 080522
Your statutory rights are not affected
Produce of more than one county
Packed in the UK for Sainsbus
Supermarkets Ltd, London ECIYCH
Packaged in a protectiveamaphee
8100
193
644
ukcal
170
Try me!
70
24
mono-tantes 240g 72g
olyunsatuas 219g 6.6g
20g
1% 260g
196
ich su
16 90g
bre
1.99
6.3q
01g
Reference Intakes of an aeage adult (8400kJ72000kcal)
ingredients. Almonds (25 %), Brazil Nuts (25%),
ew Nut (2590), Walnut Halves (25%).
otein
13%
50g
Want to find out more?
sainsburus.co.uk
0518
1060768
llergy
advice For allergens

## Attempt 2: Google document detection API 
- Using (and modifying) this example: https://cloud.google.com/vision/docs/fulltext-annotations
- This doesn't work very well when the layout is with multiple columns (see notes above). 

### helper fuctions for visualising bounding boxes

In [11]:
class FeatureType(Enum):
    PAGE = 1
    BLOCK = 2
    PARA = 3
    WORD = 4
    SYMBOL = 5

In [12]:
def draw_boxes(image, bounds, color):
    """Draw a border around the image using the hints in the vector list."""
    draw = ImageDraw.Draw(image)

    for bound in bounds:
        draw.polygon([
            bound.vertices[0].x, bound.vertices[0].y,
            bound.vertices[1].x, bound.vertices[1].y,
            bound.vertices[2].x, bound.vertices[2].y,
            bound.vertices[3].x, bound.vertices[3].y], None, color)
    return image

In [13]:
def render_doc_text(filein, fileout, block=True, para=False, word=False):
    image = Image.open(filein)
    if block: 
        bounds = get_document_bounds(filein, FeatureType.BLOCK)
        draw_boxes(image, bounds, 'blue')
    if para: 
        bounds = get_document_bounds(filein, FeatureType.PARA)
        draw_boxes(image, bounds, 'red')
    if word: 
        bounds = get_document_bounds(filein, FeatureType.WORD)
        draw_boxes(image, bounds, 'yellow')

    if fileout is not 0:
        image.save(fileout)
    else:
        image.show()

### detect document i.e. all the parts, and draw them 

In [17]:
def get_document_bounds(image_file, feature):
    """Returns document bounds given an image."""
    client = vision.ImageAnnotatorClient()

    bounds = []

    with io.open(image_file, 'rb') as image_file:
        content = image_file.read()

    image = types.Image(content=content)

    response = client.document_text_detection(image=image)
    document = response.full_text_annotation

    # Collect specified feature bounds by enumerating all document features
    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    for symbol in word.symbols:
                        if (feature == FeatureType.SYMBOL):
                            bounds.append(symbol.bounding_box)

                    if (feature == FeatureType.WORD):
                        bounds.append(word.bounding_box)

                if (feature == FeatureType.PARA):
                    bounds.append(paragraph.bounding_box)

            if (feature == FeatureType.BLOCK):
                bounds.append(block.bounding_box)

        if (feature == FeatureType.PAGE):
            bounds.append(block.bounding_box)

    # The list `bounds` contains the coordinates of the bounding boxes.
    return bounds

In [20]:
# run some tests to get a feel of the bounding boxes; note that these pop up in a new window 
render_doc_text(tests_whole['test_2'], 0)

## Attempt 3: Experiment with paragraph and block detection 
- Using out of the box paragraph and block detection on the whole side of the packaging to determine how well that works/which one works better; see examples and discussion. 
- Made a start at combining bboxes of the words to find better clusters of texts than the out of box functionality. 
- This is WIP

In [21]:
def detect_paragraph(path):
    """
    Detects the text in the paragraphs as determined 
    by the API raw
    """
    with io.open(path, 'rb') as image_file:
        content = image_file.read()
    image = vision.types.Image(content=content)
    response = client.document_text_detection(image=image)
    paragraph_texts = []
    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                #print('paragraph bounding box: ', paragraph.bounding_box)
                #print('Paragraph confidence: {}'.format(paragraph.confidence))
                paragraph_words = []
                for word in paragraph.words:
                    word_text = ''.join([symbol.text for symbol in word.symbols])
                    paragraph_words.append(word_text)
                paragraph_text = ' '.join(paragraph_words)
                paragraph_texts.append(paragraph_text)
        return paragraph_texts

In [22]:
def detect_block(path):
    """
    Detects the text in the block as determined 
    by the API raw
    """
    with io.open(path, 'rb') as image_file:
        content = image_file.read()
    image = vision.types.Image(content=content)
    response = client.document_text_detection(image=image)
    block_texts = []
    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            block_words = []
            for paragraph in block.paragraphs:
                #print('paragraph bounding box: ', paragraph.bounding_box)
                #print('Paragraph confidence: {}'.format(paragraph.confidence))
                paragraph_words = []
                for word in paragraph.words:
                    word_text = ''.join([symbol.text for symbol in word.symbols])
                    paragraph_words.append(word_text)
                paragraph_text = ' '.join(paragraph_words)
                block_words.append(paragraph_text)
            block_text = ' '.join(block_words)
            block_texts.append(block_text)
    return block_texts

In [35]:
# play with some examples to see if the block or the paragraph functionality of the API is generally better
paragraph_texts = detect_paragraph(tests_whole['test_40'])
block_texts = detect_block(tests_whole['test_40'])

In [36]:
paragraph_texts

['HELLMANNS LIGHT REDUCED CALORIE MAYONNAISE TRY OUR RANGE OF SAUCES FOR MORE GREAT FLAVOUR :',
 "HELMANN ' S",
 'SMOKEY BBQ',
 'SAUCE Ingredients : water , rapeseed oil ( 25 % ) , spirit vine',
 'anne EGG yolk ( 1 . 5 % ) , cream powder ( MILK ) , citrus fibre , thickeners ( a',
 'CHUNKY KETCHUP',
 'BURGER SWEETENED WITH HONEY',
 'SAUCE seed oil ( 25 % ) , spirit vinegar , modified corn starch , sugar , salt , free',
 '( MILK ) , citrus fibre , thickeners ( guar gum , xanthan gum ) , emon juice concentrate , antioxidant ( calcium disodium EDTA ) , natural MUSTARD favouring , paprika extract , sunflower oil . A good source of Omega 3',
 "stainably sourced oils . For more info , Unilever UK , Hellmann ' s . Committed to sustainably sourced oils For more",
 'Freepost ADM 3940 London visit www . hellmanns . co . uk or www . hellmanns . ie .',
 "SW1A 1YR . 60 % less calories than Hellmann ' s Real Mayonnaise",
 'Unilever Ireland , 20 Riverwalk , . . . . . . NUTRITION INFORMATION . . . . . 

In [37]:
block_texts

['HELLMANNS LIGHT REDUCED CALORIE MAYONNAISE TRY OUR RANGE OF SAUCES FOR MORE GREAT FLAVOUR :',
 "HELMANN ' S",
 'SMOKEY BBQ',
 'SAUCE Ingredients : water , rapeseed oil ( 25 % ) , spirit vine',
 'anne EGG yolk ( 1 . 5 % ) , cream powder ( MILK ) , citrus fibre , thickeners ( a',
 "CHUNKY KETCHUP BURGER SWEETENED WITH HONEY SAUCE seed oil ( 25 % ) , spirit vinegar , modified corn starch , sugar , salt , free ( MILK ) , citrus fibre , thickeners ( guar gum , xanthan gum ) , emon juice concentrate , antioxidant ( calcium disodium EDTA ) , natural MUSTARD favouring , paprika extract , sunflower oil . A good source of Omega 3 stainably sourced oils . For more info , Unilever UK , Hellmann ' s . Committed to sustainably sourced oils For more Freepost ADM 3940 London visit www . hellmanns . co . uk or www . hellmanns . ie . SW1A 1YR . 60 % less calories than Hellmann ' s Real Mayonnaise Unilever Ireland , 20 Riverwalk , . . . . . . NUTRITION INFORMATION . . . . . . . . . . . . . . . . livnic

Given the results, we notice that neither the paragraph nor the block option give very good results as it is. We start exploring the option of manually combining text based on the bounding boxes of the individual words. Some first attempts at exploring this route below. 

In [38]:
def get_word_bounds(image_file, feature=FeatureType.WORD):
    """
    Returns the bounding boxes of words together with the words 
    """
    client = vision.ImageAnnotatorClient()

    bounds = []

    with io.open(image_file, 'rb') as image_file:
        content = image_file.read()

    image = types.Image(content=content)

    response = client.document_text_detection(image=image)
    document = response.full_text_annotation
    
    words = []

    # Collect specified feature bounds by enumerating all document features
    for page in document.pages:
        for block in page.blocks:
            for paragraph in block.paragraphs:
                for word in paragraph.words:
                    word_text = ''.join([symbol.text for symbol in word.symbols])
                    words.append(word_text)
                    for symbol in word.symbols:
                        if (feature == FeatureType.SYMBOL):
                            bounds.append(symbol.bounding_box)

                    if (feature == FeatureType.WORD):
                        bounds.append(word.bounding_box)

    # The list `bounds` contains the coordinates of the bounding boxes.
    return bounds, words

### bbox mergning 
WIP; currently can just merge any given set of boxes and this works quite well; However, note that the format of the bboxes returned by the API is different to what is expected by the handy CV functions below, so make sure to convert. The box merging works quite well. If we want to use this we will need to define criteria for which boxes should be merged, however. 

In [39]:
def get_xywh(image_file, example_number, feature=FeatureType.WORD): 
    bounds, _ = get_word_bounds(image_file, feature)
    x = bounds[example_number].vertices[0].x 
    y = bounds[example_number].vertices[0].y
    w = bounds[example_number].vertices[1].x - bounds[example_number].vertices[0].x
    h = bounds[example_number].vertices[2].y - bounds[example_number].vertices[1].y
    return [x, y, w, h]

In [40]:
def get_conbined_box(contourRects):
    """
    Takes an array of [x, y, w, h] points and returns 
    the coordinates of the 4 outer points of the 
    combined box; note that x and y are the lower 
    left corner
    """
    arr = []
    for x,y,w,h in contourRects:
        arr.append((x,y))
        arr.append((x+w,y+h))

    box = cv.minAreaRect(np.asarray(arr))
    pts = cv.boxPoints(box) # 4 outer corners
    return pts

### test - try on just the first 3 words 
Result: running the below it seems like this works - the words are combined together

In [44]:
filein = tests_whole['test_1']
bounds, words = get_word_bounds(filein, FeatureType.WORD)

In [45]:
contourRects= np.array([get_xywh(filein, 0), get_xywh(filein, 1), get_xywh(filein, 2)])
pts = get_conbined_box(contourRects)

In [46]:
image = Image.open(filein)
draw = ImageDraw.Draw(image)
draw.polygon(pts, None, 'blue')
image.show()

## Tests 
Going over the test images to get a feel for what needs to be done to improve the text and bounding detection. It seems like the block detection works slightly better than paragraph, so we use this at the start. 

### observations on tests - whole
- It's not perfect with numbers and this can be a problem with the nutrition info; sometimes it separates just a few numbers from the rest of the block or paragraph and sometimes it doesn't even detect them correctly e.g. the letter 'g' or '.' 
- Lots of issues with block collection even for the ingredients, e.g. look at test_2, where the ingredients are split even though they are on consecutive lines.
- test_4 fails miserably: doesn't detect the nutrition box and messes up the pargraphs; having issues with properly detecting text where there are columns and boxes; unless these are on a colourful background I think. 
- test_3 and test_7 work quite well; when the box layout is clean and the there is no distortion on the photo. 

### observations on tests - partial 
- Using www.world.openfoodfacts.org 
- Small sample of just ingredients photos to see how well this is doing on clean pictures
- Observations: Curved surfaces are very tricky as it splits into multiple weird boxes; there should be some algs for flattening out the images first; also worth playing with contrast or colours to see if this enhances box detection. 

In [47]:
def test_photos(photo_num, whole=True):
    # whole is True for the whole images folder and False for the partial ones
    if whole: 
        img_name = 'test_images/whole/test_' + str(photo_num) + '.jpg'
        render_doc_text(img_name, 'test_images/whole/annotated_test_' + str(photo_num) + '.jpg', block=True)
    else: 
        img_name = 'test_images/partial/test_' + str(photo_num) + '.jpg'
        render_doc_text(img_name, 'test_images/partial/annotated_test_' + str(photo_num) + '.jpg', block=True)
    paragraph_texts = detect_paragraph(img_name)
    block_texts = detect_block(img_name)
    return block_texts

In [52]:
test_photos(5)

['200mle',
 'Beiersdorf Beiersdorf AG , D - 20245 Hamburg Art . 89050 www . NIVEA . com Ingredients : Aqua , Glycerin , Paraffinum Liquidum . Myristyl Alcohol , Butylene Glycol , Alcohol Denat . , Stearic Acid , Myristyl Myristate , Cera Microcristallina , Glyceryl Stearate , Hydrogenated Coco - Glycerides , Simmondsia Chinensis Seed Oil , Tocopheryl Acetate , Lanolin Alcohol ( Eucerito ) . Polyglyceryl - 2 Caprate , Dimethicone , Sodium Carbomer , Phenoxyethanol , Linalool , Citronellol , Alpha - Isomethyllonone , Butylphenyl Methylpropional , Limonene , Benzyl Alcohol , Benzyl Salicylate , Parfum Beiersdorf UK Ltd . , Birmingham 637 7YS . RSA : Beiersdort , 21 Lighthouse Road , Umhlanga , 4319 , RSA Consumer Careline : 0860 102091 . Beiersdorf Australia Ltd . , 4 Khartoum Road , North Ryde , NSW , 2113 . NZ : Freephone : 0800 696 483 . Made in Spain',
 '12M',
 'Germany Beiersdorf AG , = reg . tm . of',
 '89050 . 450 . AD . 05',
 '5 " 025970 " 022574 " 81224574']

## Ingredients detection 
- Based on observations above, try to detect the ingredients using keyword search and taking out the block that contains that word.
- There will often be other data in the same block and we need to think about how to filter that out. 

In [53]:
# simplest possible option where we just look for the word 'ingredients'
# and retrieve everything after it 

def find_text_after_word(text, word='ingredients'):
    '''
    looks for the word 'ingredients' in the 
    lowercased text and returns all the text 
    after this word
    '''
    if word in text.lower():
        loc_word = text.lower().find(word)
        # adding the +1 as there is often : or space after 
        # the word 'ingredients'
        text_after_word = text[loc_word+len(word)+1:]
        # stripping space and : once more just in case
        # and '.' at the end as there is often a sent end
        return text_after_word.strip().lstrip(':').rstrip('.')
    else:
        return ''

In [54]:
# tests 
print(find_text_after_word('This contains ingredients including the following:'))
print(find_text_after_word('The Ingredients: salt and sugar'))
print(find_text_after_word('INGREDIENTS : Aqua , Sodium Lauroyl Glyci'))

including the following:
salt and sugar
 Aqua , Sodium Lauroyl Glyci


In [55]:
# attempt to extract a list of the ingredients; simplest possible is separating by comma
def split_ingredients(text):
    '''
    returns a list of space-stripped text separated 
    by commas from the original text
    '''
    text_components = text.split(',')
    return [component.strip() for component in text_components]

In [56]:
split_ingredients('suppose it looks like this: water, sugar , salt , additives ( such as E872 which is not a real thing) ')

['suppose it looks like this: water',
 'sugar',
 'salt',
 'additives ( such as E872 which is not a real thing)']

### very simple ingredients extraction flow: 
1. Get text in blocks in a list.
2. For each element in list list check for 'ingredients'.
3. Extract string that contains 'ingredients' if it exists and 
4. pass it to the splitting function that splits into individual ingredients.

In [57]:
def extract_ingredients_list(img):
    block_texts = detect_block(img)
    for block in block_texts:
        ingredients = find_text_after_word(block)
        if ingredients != '':
            individual_ingredients = split_ingredients(ingredients)
            return individual_ingredients
        else:
            pass

In [61]:
# test on the partial images i.e. the ingredients list from OFF 
ingredients_partial = {}
for img in os.listdir(os.path.join('test_images', 'partial')):
    if img.startswith('test_'):
        img_path = os.path.join('test_images', 'partial', img)
        ingredients_partial[img] = extract_ingredients_list(img_path)

In [62]:
# inspect the results 
ingredients_partial['test_20.jpg']

['INGREDIENTS : Rice ( 44 % )',
 'Wholewheat ( 35 % )',
 'Sugar',
 'Barley ( 4 . 5 % )',
 'Freeze dried fruits ( 4 . 5 % ) ( Strawberry',
 'Cherry )',
 'Malted barley flour ( 3 . 5 % )',
 'Barley malt flavouring',
 '']

In [64]:
# test on the full images i.e. own package photos 
ingredients_whole = {}
for img in os.listdir(os.path.join('test_images', 'whole')):
    if img.startswith('test_'):
        img_path = os.path.join('test_images', 'whole', img)
        ingredients_whole[img] = extract_ingredients_list(img_path)

In [65]:
ingredients_whole['test_48.jpg']

['. Stop use and ask a dentist if oral irritation occurs . Keep out of the reach of children . Made in Italy . LISTERINE® is a registered trade mark . Lot number : see bottom of the bottle JOHNSON & JOHNSON GmbH D - 41470 Neuss . DE JOHNSON & JOHNSON LIMITED Maidenhead',
 'UK',
 'SL6 3UG Careline : 0808 238 9999 JOHNSON & JOHNSON ( IRELAND ) LIMITED Airton Road',
 'Tallaght',
 'Dublin 24',
 'Ireland . Careline : 1800 22 0044 PR - 017429 ] - INGREDIENTS : Aqua',
 'Alcohol',
 'Sorbitol',
 'Poloxamer 407',
 'Benzoic Acid',
 'nic Chloride',
 'Eucalyptol Aroma',
 'Sodium Saccharin',
 'Methyl Salicylate',
 'Thymol',
 'Menthol',
 'Sodium Fluoride',
 'Sodium Benzoate',
 'Sucralose',
 'Propylene Glycol',
 'CI 16035',
 'CI 42090']