# Demo Notebook for using Google Vision API

In this notebook we explore the google vision API and try its various methods

## Installing the client library

If the Google cloud vision library is not installed already, install it.

If you have python environment use

```shell
pip install --upgrade google-cloud-vision
```

If you have conda environment use

```shell
conda install -c conda-forge google-cloud-vision
```

## Import the standard libraries

In [1]:
import os, io

## Import Vison API related libraries

In [2]:
from google.cloud import vision
from google.cloud.vision import types

# Setup the service account credentials to use the API

In [3]:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r'F:\EPFL\1st Year\Sem 1\DH_405_Foundation_of_Digital_Humanities\Project\My Project-1c871315ad69.json'

# Creation of Client to access the API

In [4]:
client = vision.ImageAnnotatorClient()

In the next cell, we print to see the methods available with the Image Annotator client. 
Those of our interest are 'annotate_image', 'document_text_detection','face_detection', 'image_properties','label_detection', 'landmark_detection', 'logo_detection', 'object_localization', 'product_search', 'safe_search_detection', 'text_detection', 'transport', 'web_detection'

In [5]:
dir(client)

['SERVICE_ADDRESS',
 '_INTERFACE_NAME',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_client_info',
 '_get_all_features',
 '_inner_api_calls',
 '_method_configs',
 'annotate_image',
 'async_batch_annotate_files',
 'async_batch_annotate_images',
 'batch_annotate_files',
 'batch_annotate_images',
 'crop_hints',
 'document_text_detection',
 'enums',
 'face_detection',
 'from_service_account_file',
 'from_service_account_json',
 'image_properties',
 'label_detection',
 'landmark_detection',
 'logo_detection',
 'object_localization',
 'product_search',
 'safe_search_detection',
 'text_detection',
 'transport',
 'web_detection']

# Vision API Methods

Let's get an image from one the Terzani pictures. 
image url = http://dl.cini.it/files/original/cb46e00d902f9936224ca6ef81834a35.jpg

### Displaying the image

In [6]:
from IPython.display import Image
from IPython.core.display import HTML
img_url = "http://dl.cini.it/files/original/cb46e00d902f9936224ca6ef81834a35.jpg"
Image(url= img_url)

## Test with an image from URL

In [7]:
img_url = "http://dl.cini.it:8080/digilib/Scaler/IIIF/cb46e00d902f9936224ca6ef81834a35/full/max/0/default.jpg"
image = vision.types.Image()
image.source.image_uri = img_url

### Text Detection

In [8]:
response = client.text_detection(image=image)
response

text_annotations {
  locale: "en"
  description: "CRUSH ALL DESTRUCTIVE ELEMENTS.\nshdevyy Fhogfnfenrs GhH91-114\n"
  bounding_poly {
    vertices {
      x: 378
      y: 617
    }
    vertices {
      x: 1640
      y: 617
    }
    vertices {
      x: 1640
      y: 693
    }
    vertices {
      x: 378
      y: 693
    }
  }
}
text_annotations {
  description: "CRUSH"
  bounding_poly {
    vertices {
      x: 1168
      y: 620
    }
    vertices {
      x: 1244
      y: 620
    }
    vertices {
      x: 1244
      y: 639
    }
    vertices {
      x: 1168
      y: 639
    }
  }
}
text_annotations {
  description: "ALL"
  bounding_poly {
    vertices {
      x: 1264
      y: 620
    }
    vertices {
      x: 1306
      y: 620
    }
    vertices {
      x: 1306
      y: 638
    }
    vertices {
      x: 1264
      y: 638
    }
  }
}
text_annotations {
  description: "DESTRUCTIVE"
  bounding_poly {
    vertices {
      x: 1327
      y: 618
    }
    vertices {
      x: 1487
      y: 618


In [17]:
for i in response.text_annotations:
    if i.locale == "en":
        if i.description is not None:
            print(i.description)

CRUSH ALL DESTRUCTIVE ELEMENTS.
shdevyy Fhogfnfenrs GhH91-114



In [18]:
with io.open("./sample_data/T10_100.jpg", 'rb') as image_file:
    content = image_file.read()

In [20]:
type(content)

bytes

### Handwriting detection

In [9]:
response = client.document_text_detection(image=image)
response

text_annotations {
  locale: "en"
  description: "\341\200\241\341\200\226\341\200\273\341\200\200\341\200\272\341\200\236\341\200\231\341\200\254\341\200\270\341\200\200\341\200\255\341\200\257\341\200\201\341\200\273\341\200\261\341\200\231\341\200\276\341\200\257\341\200\224\341\200\272\341\200\270\341\200\200\341\200\274\nCRUSH ALL DESTRUCTIVE ELEMENTS.\n"
  bounding_poly {
    vertices {
      x: 1167
      y: 540
    }
    vertices {
      x: 1640
      y: 540
    }
    vertices {
      x: 1640
      y: 645
    }
    vertices {
      x: 1167
      y: 645
    }
  }
}
text_annotations {
  description: "\341\200\241\341\200\226\341\200\273\341\200\200\341\200\272\341\200\236\341\200\231\341\200\254\341\200\270"
  bounding_poly {
    vertices {
      x: 1167
      y: 543
    }
    vertices {
      x: 1383
      y: 541
    }
    vertices {
      x: 1383
      y: 608
    }
    vertices {
      x: 1167
      y: 610
    }
  }
}
text_annotations {
  description: "\341\200\200\341\200\255\

### Face Detection

In [10]:
response = client.face_detection(image=image)
response

face_annotations {
  bounding_poly {
    vertices {
      x: 1366
      y: 844
    }
    vertices {
      x: 1419
      y: 844
    }
    vertices {
      x: 1419
      y: 907
    }
    vertices {
      x: 1366
      y: 907
    }
  }
  fd_bounding_poly {
    vertices {
      x: 1369
      y: 856
    }
    vertices {
      x: 1414
      y: 856
    }
    vertices {
      x: 1414
      y: 904
    }
    vertices {
      x: 1369
      y: 904
    }
  }
  landmarks {
    type: LEFT_EYE
    position {
      x: 1383.7673
      y: 876.0457
      z: 0.00083225965
    }
  }
  landmarks {
    type: RIGHT_EYE
    position {
      x: 1400.4681
      y: 875.8691
      z: -1.1441102
    }
  }
  landmarks {
    type: LEFT_OF_LEFT_EYEBROW
    position {
      x: 1378.968
      y: 872.35834
      z: 1.4936231
    }
  }
  landmarks {
    type: RIGHT_OF_LEFT_EYEBROW
    position {
      x: 1388.3668
      y: 872.18774
      z: -3.4466605
    }
  }
  landmarks {
    type: LEFT_OF_RIGHT_EYEBROW
    position {


### Image Properties

In [11]:
response = client.image_properties(image=image)
response

image_properties_annotation {
  dominant_colors {
    colors {
      color {
        red: 90.0
        green: 84.0
        blue: 57.0
      }
      score: 0.1658262
      pixel_fraction: 0.0960472
    }
    colors {
      color {
        red: 157.0
        green: 150.0
        blue: 126.0
      }
      score: 0.15258466
      pixel_fraction: 0.1879646
    }
    colors {
      color {
        red: 193.0
        green: 191.0
        blue: 179.0
      }
      score: 0.071288146
      pixel_fraction: 0.20306785
    }
    colors {
      color {
        red: 166.0
        green: 108.0
        blue: 63.0
      }
      score: 0.050707348
      pixel_fraction: 0.0068436577
    }
    colors {
      color {
        red: 126.0
        green: 120.0
        blue: 93.0
      }
      score: 0.14381455
      pixel_fraction: 0.1059587
    }
    colors {
      color {
        red: 61.0
        green: 56.0
        blue: 31.0
      }
      score: 0.1095238
      pixel_fraction: 0.061297935
    }
    colors

### Label Detection

In [12]:
response = client.label_detection(image=image)
response

label_annotations {
  mid: "/m/079bkr"
  description: "Mode of transport"
  score: 0.96869874
  topicality: 0.96869874
}
label_annotations {
  mid: "/m/018p4k"
  description: "Cart"
  score: 0.90895
  topicality: 0.90895
}
label_annotations {
  mid: "/m/07yv9"
  description: "Vehicle"
  score: 0.90143174
  topicality: 0.90143174
}
label_annotations {
  mid: "/m/01hffk"
  description: "Carriage"
  score: 0.89694226
  topicality: 0.89694226
}
label_annotations {
  mid: "/m/07bsy"
  description: "Transport"
  score: 0.86291295
  topicality: 0.86291295
}
label_annotations {
  mid: "/m/07_gml"
  description: "Working animal"
  score: 0.736109
  topicality: 0.736109
}
label_annotations {
  mid: "/m/02j091"
  description: "Horse and buggy"
  score: 0.71871597
  topicality: 0.71871597
}
label_annotations {
  mid: "/m/0464z4"
  description: "Pack animal"
  score: 0.70788527
  topicality: 0.70788527
}
label_annotations {
  mid: "/m/0242gd"
  description: "Wagon"
  score: 0.6617713
  topicality: 

### Landmark Detection

In [13]:
response = client.landmark_detection(image=image)
response



### Logo Detection

In [14]:
response = client.logo_detection(image=image)
response

logo_annotations {
  mid: "/m/03mb5kj"
  description: "Browne Jacobson"
  score: 0.6108932
  bounding_poly {
    vertices {
      x: 1155
      y: 535
    }
    vertices {
      x: 1642
      y: 535
    }
    vertices {
      x: 1642
      y: 621
    }
    vertices {
      x: 1155
      y: 621
    }
  }
}

### Object Detection

In [15]:
response = client.object_localization(image=image)
response

localized_object_annotations {
  mid: "/m/083wq"
  name: "Wheel"
  score: 0.92690665
  bounding_poly {
    normalized_vertices {
      x: 0.51806146
      y: 0.5668511
    }
    normalized_vertices {
      x: 0.61581224
      y: 0.5668511
    }
    normalized_vertices {
      x: 0.61581224
      y: 0.71255636
    }
    normalized_vertices {
      x: 0.51806146
      y: 0.71255636
    }
  }
}
localized_object_annotations {
  mid: "/m/083wq"
  name: "Wheel"
  score: 0.91704434
  bounding_poly {
    normalized_vertices {
      x: 0.4728733
      y: 0.55629605
    }
    normalized_vertices {
      x: 0.56723005
      y: 0.55629605
    }
    normalized_vertices {
      x: 0.56723005
      y: 0.68358594
    }
    normalized_vertices {
      x: 0.4728733
      y: 0.68358594
    }
  }
}
localized_object_annotations {
  mid: "/m/01g317"
  name: "Person"
  score: 0.91213554
  bounding_poly {
    normalized_vertices {
      x: 0.6835687
      y: 0.5708439
    }
    normalized_vertices {
      x: 

### Web Detection

In [16]:
response = client.web_detection(image=image)
response

web_detection {
  web_entities {
    entity_id: "/m/03k3r"
    score: 0.8618525
    description: "Horse"
  }
  web_entities {
    entity_id: "/m/02j091"
    score: 0.61085624
    description: "Horse and buggy"
  }
  web_entities {
    entity_id: "/m/02z6033"
    score: 0.5662589
    description: "Horse harness"
  }
  web_entities {
    entity_id: "/m/07bsy"
    score: 0.53916574
    description: "Transport"
  }
  web_entities {
    entity_id: "/m/0464z4"
    score: 0.5269144
    description: "Pack animal"
  }
  web_entities {
    entity_id: "/m/01c8br"
    score: 0.41458538
    description: "Street"
  }
  web_entities {
    entity_id: "/m/07j7r"
    score: 0.33523893
    description: "Tree"
  }
  web_entities {
    entity_id: "/m/06mq7"
    description: "Science"
  }
  web_entities {
    entity_id: "/m/01540"
    description: "Biology"
  }
  visually_similar_images {
    url: "https://images1.apartments.com/i2/W_AoNVPe9iQvXkYTvLLB9zAC6hzLM4P6OOAoq2yk0h0/117/carriage-homes-at-villebois-

1. How shall we deal with homonyms and synonyms in the search? (For this project we can deal with exact matches.)
2. In the text detection the API returns bounding boxes for individual letters as well. Should we consider them? (For this project we can only consider words and leave letters and symbols, we cannot do this explicitly, so for that we will use only text annotations by API and ignore full text annotations)
3. The API detects only a limited number of languages that too if they are very clear. Otherwise it returns ASCII characters. How shall we deal with them both while creating the database and while searching? (For this project we can only take English responses from the API because anyways other tags like labels and objects are only in English).
4. Some vertices are normalised, while some are not. How shall we deal with this?