# Computer Vision API

This script explores using the Cognitive Services Vision API via its REST API. 

Here, we'll write an R function to extract a random
image from Wikimedia Commons, and another function to generate a caption of the image using the Vision API. You can see the end result in this [blog post](http://blog.revolutionanalytics.com/2018/03/computer-vision-api.html).

The concepts are mostly explained as we go, but if you
want to find more information, take a look here:

* Computer Vision Overview: https://cda.ms/kZ
* Computer Vision Documentation: https://cda.ms/m0

## Using this Notebook

The scripts 
are provided as Jupyter Notebooks within the [Azure Notebooks](https://notebooks.azure.com/?WT.mc_id=ODSC-workshop-davidsmi) service.
You don't need a Microsoft
Account to view the scripts, but you will need to set one up and generate keys in Azure
to run
the examples. All of the examples use free Azure services.

If you're new to Jupyter Notebooks, here's a quick intro:

1. Click within a cell, and then press `Ctrl+Enter` to run (or render) the current cell.
2. You'll see a number to the left of the cell when the computations are complete, like this: `In [1]:`. (The number represents the order of computations.) If there's output, it will print below the cell. You may have to scroll up to see it all.
3. Run each cell, in order, from to bottom.
4. To download/upload files, return to the [library view](https://notebooks.azure.com/davidsmi/libraries/qcon?WT.mc_id=ODSC-workshop-davidsmi) and use the functions in the toolbar.

For more information about Notebooks, check out the [Jupyter Notebook documentation](http://jupyter.readthedocs.io/en/latest/index.html).

If you're new to R, you might want to start with this [Introduction to
R](https://notebooks.azure.com/davidsmi/libraries/intro-r?WT.mc_id=ODSC-workshop-davidsmi) notebook
to get a sense of the language.

## First, clone these workshop materials

1. Visit https://notebooks.azure.com/davidsmi/libraries/aiforgood

    * Sign in with your Microsoft account if needed

1. click Clone in the toolbar, to create a copy of the workshop materials in your own Azure Notebooks library.


## Connecting to Azure services

You will need:

1. A [Microsoft account](https://account.microsoft.com/account). You can use an existing Outlook 365
or Xbox Live account, or create a new one.

1. A Microsoft Azure subscription. If you don't already have an Azure subscription, you can visit
[https://cda.ms/kT](https://cda.ms/kT) and also get \$200 in credits to use with paid services. You'll need to provide
a credit or debit card, but everything we'll be doing is free to use. If you're a student, you can 
also register at [https://cda.ms/kY](https://cda.ms/kY) without a credit card for a \$100 credit.

You'll also need a few other things specific to this workshop. Follow the instructions below to 
set up everything you need.

## Log in to the Azure Portal

1. Visit https://portal.azure.com 
2. Sign in with your Microsoft Account. If you don't have a Microsoft account, use the 
   links above to create one for free.

## Create an Azure Resource Group

In Azure, a Resource Group is a collection of services you have created. It groups services
together, and makes it easy to bulk-delete things later on. We'll create one for this lab.

1. Visit https://portal.azure.com (and sign in if needed)
2. Click "Resource Groups" in the left column
3. Click "+ Add"
    * Resource Group Name: aiforgood
    * Subscription: _there should be just one option_
    * Resource Group Location: South Central US
4. Click "Create"
   
A notification will appear in the top right. Click the button "Pin to Dashboard" to pin this resource group to your home page in the Azure portal, as you'll be referring to it frequently.

## Create authorization keys for Computer Vision

1. Visit https://portal.azure.com (and sign in if needed)
2. Click "+ Create a Resource" (top-left corner)
3. Click "AI + Machine Learning"
4. Click "Computer Vision"
    * Name: aiforgood-vision
    * Subscription: _there should be just one option_
    * Location: South Central US
    * Pricing Tier: F0 (free, 20 calls per minute)
    * Resource Group: Use existing "aiforgood" group
5. Click "Create"

After a few moments you will get a message that your keys have been generated, after which you can move to the next section.



Once you've done this for all the cognitive services, save the file `keys.txt` and upload it to 
replace the existing file i## Modify the keys.txt file

Edit the `keys.txt` file to provide the necessary keys. In Azure Notebooks, you select the file and press `i` to edit it directly. (Alternatively, you can download the file `keys.txt` -- highlight it in the Library view and then press `d` or click the download icon in the toolbar -- and edit it with an editor, then upload the modified file.)

For the first line
of the file, `region`, change the value to `southcentralus`. 

For the second line of the file, `vision,` visit your `aiforgood` resource
group in the [Azure Portal](https://portal.azure.com?WT.mc_id=ODSC-workshop-davidsmi) and then:

1. Click on the API resource for Computer Vision `aiforgood-vision`
2. In the menu, click on "keys"
3. Click the "copy to clipboard" next to KEY 1. (You can ignore KEY 2).
4. Paste the key into the `vision` entry in keys.txt

You can ignore the remaining lines of `keys.txt` for now.

Your final `keys.txt` file will look like this, but with different (working) keys:

```
       key
region southcentralus
vision 7f1f01ac24064abd80970f41a90237e7
custom 1632b49e2930430694a9bbd3ab0c0cc2
cvpred 37eb1f0e5fd34253939350197ae3d933
```

Now you can run the R code below.

In [2]:
## load some packages required by the code below. 
## These packages come pre-installed in the Azure Notebook service,
## but if you try this code elsewhere you may need to install them first with install.packages
library(tools)
library(httr)

In [7]:
## Retrieve API keys and region from keys.txt file. 
## See above for how to obtain the necessary keys and modify the file accordingly.

keys <- read.table("keys.txt", header=TRUE, stringsAsFactors = FALSE)
vision_api_key <- keys["vision",1]
azure_region <- keys["region",1]
vision_api_endpoint <- paste0("https://", azure_region, ".api.cognitive.microsoft.com/vision/v1.0")
cat("The region is:",azure_region,"\n")

## If you see ERROR-EDIT-KEYS.txt-FILE here, you need to edit keys.txt as described in README.md

The region is: southcentralus 


In [8]:
## Here are some URLs of example images you can try out later.
## Feel free to find other images you want to use.
## I visited https://en.wikipedia.org/wiki/Special:Random to go to a random Wikipedia page
## and downloaded images from there. The Large size works with API limits

example_images =c(
 ## animals
 'https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Pair_of_Merops_apiaster_feeding.jpg/1200px-Pair_of_Merops_apiaster_feeding.jpg',
 'https://upload.wikimedia.org/wikipedia/commons/4/4f/Queenie.JPG', 
 'https://upload.wikimedia.org/wikipedia/commons/3/34/Ectopsocus_briggsi.jpg',
 ## buildings, workplaces
 'https://upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Prze%C5%82%C4%99cz_Okraj-przejscie_graniczne.jpg/1200px-Prze%C5%82%C4%99cz_Okraj-przejscie_graniczne.jpg',
 'https://upload.wikimedia.org/wikipedia/commons/thumb/6/61/Wasseiges_JPG04.jpg/1200px-Wasseiges_JPG04.jpg',
 'https://upload.wikimedia.org/wikipedia/commons/5/58/St_george_edgbaston.jpg',
 'https://upload.wikimedia.org/wikipedia/commons/0/02/Atlanta_College_of_Art_Print_Making_Studio.jpg',
 ## non-photos 
 'https://upload.wikimedia.org/wikipedia/commons/1/15/M15_%28Ukraine%29.png',
 ## people, faces
 'https://upload.wikimedia.org/wikipedia/en/1/1b/I_Remember_You_%28John_Hicks_album%29.jpg',
 'https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/FIS_Ski_Jumping_World_Cup_2014_-_Engelberg_-_20141221_-_Shohei_Tochimoto.jpg/1200px-FIS_Ski_Jumping_World_Cup_2014_-_Engelberg_-_20141221_-_Shohei_Tochimoto.jpg',
 'https://upload.wikimedia.org/wikipedia/en/d/d7/Grover_Washabaugh.jpg',
 ## things that make the API throw errors
 'https://upload.wikimedia.org/wikipedia/commons/3/3a/FIS_Ski_Jumping_World_Cup_2014_-_Engelberg_-_20141221_-_Shohei_Tochimoto.jpg', # too large
 'error' #malformed URL
 )


In [10]:
## In this section, we'll call the Computer Vision API manually
## Later, we'll write a function to automate the process

#image_url ="https://azurecomcdn.azureedge.net/mediahandler/acomblog/media/Default/blog/ef4d0bc7-2c45-4d17-afb1-9cad8f293657.jpg"
image_url = example_images[3] 
## feel free to try a different example, or provide a URL of your own choice

visualFeatures = "Description,Tags,Categories,Faces"
# choose the image information to return
# options = "Categories, Tags, Description, Faces, ImageType, Color, Adult"

details = "Landmarks,Celebrities"
# Ask the Computer Vision API to detect names of celebrities or famous landmarks

reqURL = paste0(vision_api_endpoint,
               "/analyze?visualFeatures=",
               visualFeatures,
               "&details=",
               details)

APIresponse = POST(url = reqURL,
                   content_type('application/json'),
                   add_headers(.headers = c('Ocp-Apim-Subscription-Key' = vision_api_key)),
                   body=list(url = image_url),
                   encode = "json") 

df = content(APIresponse)

## display caption and confidence
cat(image_url,"\n")
df$description$captions[[1]]$text
df$description$captions[[1]]$confidence

https://upload.wikimedia.org/wikipedia/commons/3/34/Ectopsocus_briggsi.jpg 


Explore that `df` object to see what other information is returned by the API (try: `print(df)`). We'll just be looking at the
generated image caption for now.

In [11]:
print(df)

$categories
$categories[[1]]
$categories[[1]]$name
[1] "abstract_"

$categories[[1]]$score
[1] 0.00390625


$categories[[2]]
$categories[[2]]$name
[1] "others_"

$categories[[2]]$score
[1] 0.00390625



$tags
$tags[[1]]
$tags[[1]]$name
[1] "insect"

$tags[[1]]$confidence
[1] 0.9918659


$tags[[2]]
$tags[[2]]$name
[1] "animal"

$tags[[2]]$confidence
[1] 0.9892629


$tags[[3]]
$tags[[3]]$name
[1] "half"

$tags[[3]]$confidence
[1] 0.4304085


$tags[[4]]
$tags[[4]]$name
[1] "fly"

$tags[[4]]$confidence
[1] 0.4304085


$tags[[5]]
$tags[[5]]$name
[1] "wasp"

$tags[[5]]$confidence
[1] 0.1212679


$tags[[6]]
$tags[[6]]$name
[1] "lepidoptera"

$tags[[6]]$confidence
[1] 0.04301453



$description
$description$tags
$description$tags[[1]]
[1] "insect"

$description$tags[[2]]
[1] "animal"

$description$tags[[3]]
[1] "piece"

$description$tags[[4]]
[1] "table"

$description$tags[[5]]
[1] "food"

$description$tags[[6]]
[1] "sitting"

$description$tags[[7]]
[1] "board"

$description$tags[[8]]
[1] "hal

Let's define a function in R to apply the Computer Vision API to an image in a URL, and print out the image caption returned by the API.

In [12]:
image_caption <- function(URL) {
 reqURL = paste0(vision_api_endpoint,
                 "/analyze?visualFeatures=Description",
                 "&details=Celebrities,Landmarks")
 
 APIresponse = POST(url = reqURL,
                    content_type('application/json'),
                    add_headers(.headers = c('Ocp-Apim-Subscription-Key' = vision_api_key)),
                    body=list(url = URL),
                    encode = "json") 
 
 df = content(APIresponse)
 cat(URL, "\n")

  ## when we get Wikimedia Commons images later, we'll grab their description too, and display it if so
 if(!is.null(attr(URL,"desc"))) 
  cat("Wikimedia Commons description:\n", attr(URL,"desc"),  "\n")

 cat("Vision API description:\n",  df$description$captions[[1]]$text,"\n")
 cat(paste0("Confidence: ",df$description$captions[[1]]$confidence,"\n"))
 invisible(df)
}

Let's try it out:

In [13]:
image_caption("http://media.timeout.com/images/100004257/630/472/image.jpg")

http://media.timeout.com/images/100004257/630/472/image.jpg 
Vision API description:
 a large white building with Sacré-Cur, Paris in the background 
Confidence: 0.851741857095374


Let's try some more images. We can write a function to return the URL of a random image in Wikimedia Commons, which will
give us unlimited images to work with. We'll also check that the image meets the Computer Vision API restrictions 
(minimum dimensions 50x50, maximum file size 4Mb, certain image formats).

In [14]:
random_image <- function() {
 ## Return the URL of a random image in Wikimedia Commons
 random_query <- paste0("https://commons.wikimedia.org/w/api.php?",
                        "action=query",
                        "&generator=random", # get a random page
                        "&grnlimit=1",       # return 1 page
                        "&grnnamespace=6",   # category: File
                        "&prop=imageinfo",
                        "&iiprop=url|size|extmetadata",
                        "&iiurlheight=1080",  # limit images height (sometimes)
                        "&format=json&formatversion=2")
 random_response <- POST(random_query)
 output <- content(random_response)
 url <- output$query$pages[[1]]$imageinfo[[1]]$url

 ## check the image metadata, and throw an error if it won't work with the 
 ## Computer Vision API
 ext <- tolower(file_ext(url))
 w <- output$query$pages[[1]]$imageinfo[[1]]$width
 h <- output$query$pages[[1]]$imageinfo[[1]]$height
 size <- output$query$pages[[1]]$imageinfo[[1]]$size
 desc <- output$query$pages[[1]]$imageinfo[[1]]$extmetadata$ImageDescription$value 
 if(w<50 || h<50) stop("Image too small") 
 if(size > 4000000) stop("Image too large")
 if(!(ext %in% c("jpg","jpeg","png","gif","bmp"))) stop(paste("invalid image type:",ext))

 ## In addition to the URL, return the dimensions and Wikimedia description as attributes
 attr(url, "dims") <- c(w=w,h=h)
 attr(url, "desc") <- desc
 url
} 

In [16]:
u <- random_image()
image_caption(u)

# You might see an "image too large" or other error; if that happens just run this chunk again to try a different image
# In some instances you may get no output from the Vision API. This is likely caused by an image of the wrong format or size.

https://upload.wikimedia.org/wikipedia/commons/1/10/7Z1E8641.jpg 
Wikimedia Commons description:
 black-throated green warbler 
Vision API description:
 a small bird perched on a tree branch 
Confidence: 0.969508118246417
