# Cloud APIs for Computer Vision: Up and Running in 15 Minutes

This code is part of [Chapter 8- Cloud APIs for Computer Vision: Up and Running in 15 Minutes ](https://learning.oreilly.com/library/view/practical-deep-learning/9781492034858/ch08.html).

## Get MSCOCO validation image ids with legible text

We will develop a dataset of images from the MSCOCO dataset that contain at least a single instance of legible text and are in the validation split. In order to do this, we first need to download `cocotext.v2.json` from https://bgshih.github.io/cocotext/ and update the path in the next couple of cells. 

In [16]:
# Please download the coco_text file from the COCO-Text repository from http://vision.cornell.edu/se3/coco-text/
import coco_text
import numpy as np
import skimage.io as io
import os

In [9]:
#Load the COCO text json file
ct = coco_text.COCO_Text('/cocotext.v2.json')

loading annotations into memory...
0:00:01.413909
creating index...
index created!


In [10]:
#Find the total number of images in validation set
print(len(ct.val))

10000


In [11]:
dataDir = 'train2014'
dataType = 'val2017'

In [12]:
#Get all images containing at least one instance of legible text
imgIds = ct.getImgIds(imgIds=ct.val, catIds=[('legibility', 'legible')])

In [13]:
#Find total number of validation images which have legible text
print(len(imgIds))

3261


In [27]:
#Make sure all the imgIds exist in the data that we downloaded
def filename_from_imgid(imgid):
    return "COCO_train2014_000000" + str(imgid) + ".jpg"

#Edit with the path to train2014 MSCOCO dataset
path = "/train2014/"

final_imgIds = []

for each in imgIds:
    filename = filename_from_imgid(each)
    if os.path.exists(path + filename):
        final_imgIds.append(each)

print(len(final_imgIds))

2752


In [None]:
#Make a folder where all the temporary data files can be stored
!mkdir data

In [None]:
#Save a list of the image ids of the validation images
with open('/data/val_imgIds_final.csv', 'w') as f:
    f.write("\n".join(str(imgId) for imgId in imgIds))