# Cloud APIs for Computer Vision: Up and Running in 15 Minutes

This code is part of [Chapter 8- Cloud APIs for Computer Vision: Up and Running in 15 Minutes ](https://learning.oreilly.com/library/view/practical-deep-learning/9781492034858/ch08.html).

## Test OCR from Cloud Providers

This code sample details how one image can be uploaded to various cloud providers using the scripts in [`optical-character-recognition`](https://github.com/PracticalDL/Practical-Deep-Learning-Book/blob/master/code/chapter-8/experiment-scripts/optical-character-recognition) directory.

In [None]:
import skimage.io as io
import matplotlib.pyplot as plt
import os

Add the absolute path to the `data` directory (example: `data-may-2020`).

In [None]:
data_path = "<PATH_TO_DATA_DIR>"
legible_images_path = data_path + "/legible-images/"
os.path.exists(legible_images_path)

In [None]:
def filename_from_image_id(image_id):
    return "COCO_train2014_000000" + str(image_id) + ".jpg"

To compare how various cloud providers fare, let's view one example image and compare the results from the cloud providers.

In [None]:
image_id = 229378

image = io.imread(legible_images_path + filename_from_image_id(image_id))
plt.figure()
plt.imshow(image)

The text in the image is pretty small and even with a normal human eyesight, it's pretty difficult to decipher all the words. Let's enlarge the image to really see all the text. We will use `mpld3` to enable zooming into the image.

In [None]:
!pip install --user mpld3

In [None]:
import mpld3

mpld3.enable_notebook()

Notice that as you hover over the plot, a toolbar appears in the lower left. This has tools to enable panning and zooming, and a button to reset the view once you've explored the plot.

Press the magnification glass button to zoom. Then go to the spot where you want to zoom and drag click the mouse, dragging it to a new position. The X-axis will be zoomed in proportionate to the rightward movement and zoomed out proportionate to the leftward movement. 

In [None]:
plt.figure()
plt.imshow(image)

Now, let's upload this image to the cloud providers and see how well they do. We will be using the Google, Microsoft, and Amazon cloud providers. 

Remember to do the following:

- Register and generate an API key for each and replace it in the corresponding scripts for each cloud provider in the [`experiment-scripts`](https://github.com/PracticalDL/Practical-Deep-Learning-Book/tree/master/code/chapter-8/experiment-scripts) directory.
- Replace the `<PATH_TO_LEGIBLE_DATA>` with the path to the `legible-data` directory.

### Google

In [None]:
google_output_path = data_path + "google-ocr-jsondump.json"
!python ../experiment-scripts/optical-character-recognition/google.py -i $legible_images_path/COCO_train2014_000000229378.jpg -o $google_output_path
!python -m json.tool $google_output_path

### Microsoft

Note: Unlike other providers, Microsoft requires that the images be hosted on a web URL which is accessible publicly. One option might be to use a cloud storage provider like Dropbox, OneDrive, etc. or host them on AWS S3 or Azure storage. We went with a different alternative, which is to run a web server on our own computer and point the script to our public IP address. If that's the route you choose to go, you can either install a LAMPP stack on your machine or simply create a temporary web service from the directory which contains the data as follows:

`$ python3 -m http.server 80`

Also, update the IP address url in `microsoft-phase-1.py`.

In [None]:
microsoft_recognition_ids_path = data_path + "/msft-recognition-ids.txt"
!python ../experiment-scripts/optical-character-recognition/microsoft-phase-1.py -i COCO_train2014_000000229378.jpg > $microsoft_recognition_ids_path

In [None]:
microsoft_output_path = data_path + "/msft-ocr-jsondump.json"
!python ../experiment-scripts/optical-character-recognition/microsoft-phase-2.py -i $microsoft_recognition_ids_path -o $microsoft_output_path

In [None]:
!python -m json.tool $microsoft_output_path

### Amazon

In [None]:
amazon_output_path = data_path + "/amazon-ocr-jsondump.json"
!python ../experiment-scripts/optical-character-recognition/amazon.py -i $legible_images_path/COCO_train2014_000000229378.jpg -o $amazon_output_path
!python -m json.tool $amazon_output_path

Interesting! Different cloud providers are able to pick up different words in the image. But what about running against all images, after all that is how we will be able to generate a useful benchmark. Let's look at that next.