# Microsoft Azure Computer Vision for Screenshot Transcription OCR

## Etienne P Jacquot - ASC SYSADMIN - epj@asc.upenn.edu

> This notebook was originally for Instagram posts!

### [Quickstart: Computer Vision client library for Python](https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/quickstarts-sdk/python-sdk)
__________________

To [install Azure CLI](https://pypi.org/project/azure-cli/) for Python on MacOS:
- `pip install azure-cli` 
    
To [install Azure SDKs](https://docs.microsoft.com/en-us/azure/cognitive-services/Custom-Vision-Service/python-tutorial):
- `pip install azure.cognitiveservices.vision.computervision`
- `pip install azure-cognitiveservices-vision-customvision` 


### On Azure I created an `ASC-ComputerVision` endpoint w/ Free tier (20/minute, 5k per month)

Save your credentials to `configs/config.ini` and do not share!

_________


In [66]:
import pandas as pd
import os

import configparser
import sys
import time

import requests

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials

## Set your directory with local images

- For example, in aspillari's home directory on Jhub:

In [67]:
img_dir = '../../../aspillari@asc.upenn.edu/Github/ig_pennstories_ocr/img/'

## Set your Azure access key & endpoint

- this is your `./configs/config.ini`, specifically the **ASC-COMPUTERVISION** profile

In [68]:
# add azure Computer Vision key and endpoint to config.ini
azure_cred = {}
config = configparser.ConfigParser()

config.read('./configs/config.ini') # <--- add your Twitter API tokens to this file!
for item,value in config['ASC-COMPUTERVISION'].items():
    azure_cred[item]=value
    
# Azure Variables Here!
_url = azure_cred['endpoint'] # Here, paste your full endpoint from the Azure portal
_key = azure_cred['key1']  # Here, paste your primary key
_maxNumRetries = 10

## Set your Azure ComputerVision API Client

In [69]:
computervision_client = ComputerVisionClient(_url, CognitiveServicesCredentials(_key))

## Some Example Python code for running OCR with Azure

For a helpful list of additional examples, please visit here: https://github.com/Azure-Samples/cognitive-services-quickstart-code/blob/master/python/ComputerVision/ComputerVisionQuickstart.py


### (EXAMPLE) *Computer Vision Quick Description for local image*

This is a quick way to get a confidence interval on whether there is text in the image

> Notice we turn the JPG image into bytes and pass as stream with `describe_image_in_stream`

In [71]:
'''
Describe an image - remote
This example describes the contents of an image with the confidence score.
'''
print("===== Describe an image - remote =====")
# Call API

local_image_printed_text_path = img_dir + '/' + screenshot_filenames[0]
local_image_printed_text = open(local_image_printed_text_path, "rb")

description_results = computervision_client.describe_image_in_stream(local_image_printed_text)

# Get the captions (descriptions) from the response, with confidence level
print("Description of remote image: ")
if (len(description_results.captions) == 0):
    print("No description detected.")
else:
    for caption in description_results.captions:
        print("'{}' with confidence {:.2f}%".format(caption.text, caption.confidence * 100))

===== Describe an image - remote =====
Description of remote image: 
'text' with confidence 99.65%


### (EXAMPLE) *Computer Vision Text Description for local image*:

This will effectively extract text with associated confidence intervals:

> Notice we turn the JPG image into bytes and pass as stream with `recognize_printed_text_in_stream`

In [82]:
'''
Recognize Printed Text with OCR - local
This example will extract, using OCR, printed text in an image, then print results line by line.
'''
print("===== Detect Printed Text with OCR - local =====")

#remote_printed_text_image_url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/printed_text.jpg"
#remote_printed_text_image_url=remote_image_url

# Specify your local Image here:
# For Example: This is the index 0, the first row
local_image_printed_text_path = img_dir + '/' + screenshot_filenames[22]
local_image_printed_text = open(local_image_printed_text_path, "rb")

ocr_result_local = computervision_client.recognize_printed_text_in_stream(local_image_printed_text)
for region in ocr_result_local.regions:
    for line in region.lines:
        print("Bounding box: {}".format(line.bounding_box))
        s = ""
        for word in line.words:
            s += word.text + " "
        print(s)
print()
'''
END - Recognize Printed Text with OCR - local
'''

===== Detect Printed Text with OCR - local =====
Bounding box: 69,55,634,75
HANGING CITY 
Bounding box: 69,168,471,75
EYE SHRINE 
Bounding box: 68,271,692,49
PLANET: BRITTLE HOLLOW 
Bounding box: 68,343,678,52
SCENE: BH EYE SHRINE I 
Bounding box: 53,593,198,44
RIEBECK 
Bounding box: 365,579,1144,40
I'm pretty sure we'll need the others for this next part. We'll need, you know... 
Bounding box: 363,628,139,29
everyone. 
Bounding box: 345,733,616,39
No rush! Take your time. It mightnotevenexisthere„. 
Bounding box: 343,958,230,39
Should I begin? 
Bounding box: 449,1053,55,26
Yes. 
Bounding box: 452,1102,110,36
Not yet. 
Bounding box: 343,1282,447,40
Ok, I'll wait until you're ready. 
Bounding box: 449,1503,55,26
Yes. 
Bounding box: 341,1707,412,40
You g-got it. I'll do my best! 
Bounding box: 345,1969,589,40
I learned a lot, by the end of everything. 
Bounding box: 342,2125,1141,39
The past is past, now, but that's... you know, that's okay! It's never really gone 
Bounding box: 343,2162

'\nEND - Recognize Printed Text with OCR - local\n'

___________________

## Great, wow run Microsoft Azure OCR on the rest of your screenshots to get transcripts

In [73]:
screenshot_filenames = [item for item in os.listdir(img_dir) if item.endswith('.jpg')]

_______

## Looking at our Example OCR (local image text description content) in a DataFrame

- one row represents one image file with OCR extracted

In [74]:
ocr_df = pd.DataFrame.from_dict(ocr_result_local.as_dict())

In [75]:
ocr_df

Unnamed: 0,language,text_angle,orientation,regions,model_version
0,en,0.0,Up,"{'bounding_box': '50,55,1569,2326', 'lines': [...",2021-04-01


In [76]:
ocr_df.regions[0].keys()

dict_keys(['bounding_box', 'lines'])

### We can see all the words extracted:

Ideally you just take this list of strings (words) and punch into a txt or json file

In [77]:
ocr_text = []

for line in ocr_df.regions[0]['lines']:
    for word in line['words']:
        #print(word['text'])
        ocr_text.append(word['text'])
        
print(ocr_text)

['ESKER', '-', 'LUNAR', 'OUTPOST', 'PLANET:', 'ATTLEROCK', 'SCENE:', 'AR', 'ESKAR', 'I', 'FELDSPAR', 'Whoa!', "Where'd", 'you', 'come', 'from?', 'No', "one's", 'come', 'here', 'in...', 'well,', 'ever,', 'actually.', 'That', 'makes', 'you', 'the', 'second', 'Hearthian', 'to', 'ever', 'reach', 'Dark', 'Bramble', '—', 'after', 'me,', 'of', 'course.', 'Well', 'done!', '...Say,', "it's", 'you!', 'They', 'made', 'you', 'an', 'astronaut?', 'And', 'you', "haven't", 'blown', 'yourself', 'up', 'yet,', 'good', 'for', 'you!', 'Feldspar!', "You're", 'alive!', 'We', 'all', 'thought', 'you', 'were', 'dead', 'for', 'sure.', 'Have', 'you', 'been', 'here', 'in', 'Dark', 'Bramble', 'all', 'this', 'time?', '...You', 'never', 'were', 'the', 'brightest', 'hatchling,', 'were', 'you.', 'Yeah,', "that's", 'right,', "I'm", 'alive.', 'Been', 'camping', 'out', 'here', 'since', 'my', 'ship,', 'uh,', "y'know.", 'Crashed.', 'Violently.', 'Wait,', 'what?', 'You', 'crashed?', 'But', "you're", 'the', 'greatest', 'pilot

_________

# RUNNING FOR ALL IMAGES W/ RATE LIMITING

Make sure to start with **AT LEAST** 60 seconds lead in time to prevent rate limits... this is a simple looping, I am guessing the computervision_client has a `rate_limiting_wait=yes` or something like that ... 

In [None]:
'''
Recognize Printed Text with OCR - local
This example will extract, using OCR, printed text in an image, then print results line by line.
'''
print("===== Detect Printed Text with OCR - local =====")

#remote_printed_text_image_url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/printed_text.jpg"
#remote_printed_text_image_url=remote_image_url

# Specify your local Image here:
# For Example: This is the index 0, the first row
local_image_printed_text_path = img_dir + '/' + screenshot_filenames[22]
local_image_printed_text = open(local_image_printed_text_path, "rb")

ocr_result_local = computervision_client.recognize_printed_text_in_stream(local_image_printed_text)
for region in ocr_result_local.regions:
    for line in region.lines:
        print("Bounding box: {}".format(line.bounding_box))
        s = ""
        for word in line.words:
            s += word.text + " "
        print(s)
print()
'''
END - Recognize Printed Text with OCR - local
'''

## Updates for loop to print all lines:

In [83]:
dfs = []

i = 0

for screenshot in screenshot_filenames:
    
    print('getting ocr for local image -->',screenshot)
    
    # RATE LIMITING, CHECK IN BEGINNING OF THE LOOP
    if i == 20:
        print('i --> {}'.format(i))
        print('waiting for 63 seconds...')
        i=0
        time.sleep(63)
        
    else:
        # TRY TO READ LOCAL IMAGE FILE
        try:
            local_image_printed_text_path = img_dir + "{}".format(screenshot)
            local_image_printed_text = open(local_image_printed_text_path, "rb")
        except:
            print('oops! failed to open image...')
            sys.stderr.write()
            break
            
        # TRY TO GET AZURE OCR TEXT EXTRACTION
        try:
            ocr_result_local = computervision_client.recognize_printed_text_in_stream(local_image_printed_text)
            for region in ocr_result_local.regions:
                for line in region.lines:
                    print("Bounding box: {}".format(line.bounding_box))
                    s = ""
                    for word in line.words:
                        s += word.text + " "
                    print(s)
            print('-'*60)
            
            ocr_df = pd.DataFrame.from_dict(ocr_result_local.as_dict())
            #ocr_df['img_filename'] = url
            ocr_df['postId'] = screenshot.strip('.jpg')
            dfs.append(ocr_df)
            i = i + 1
        except:
            print('oops! failed to get azure ocr...')
            sys.stderr.write()
            break

getting ocr for local image --> OW_Script_v3_pt2_Page_01.jpg
Bounding box: 51,55,1039,75
ESKER - LUNAR OUTPOST 
Bounding box: 50,212,542,49
PLANET: ATTLEROCK 
Bounding box: 50,284,567,51
SCENE: AR ESKAR I 
Bounding box: 50,485,387,45
FELDSPAR Whoa! 
Bounding box: 341,580,1258,39
Where'd you come from? No one's come here in... well, ever, actually. That makes you 
Bounding box: 341,618,1233,35
the second Hearthian to ever reach Dark Bramble — after me, of course. Well done! 
Bounding box: 341,711,1241,40
...Say, it's you! They made you an astronaut? And you haven't blown yourself up yet, 
Bounding box: 341,749,193,39
good for you! 
Bounding box: 450,843,296,37
Feldspar! You're alive! 
Bounding box: 448,893,520,35
We all thought you were dead for sure. 
Bounding box: 450,941,681,36
Have you been here in Dark Bramble all this time? 
Bounding box: 341,1079,755,39
...You never were the brightest hatchling, were you. 
Bounding box: 340,1204,1148,39
Yeah, that's right, I'm alive. Been camping

________________________

## Look at combined ocr_dfs for local screenshots

- Text extracted for 24 images:

In [47]:
pd.concat(dfs)

Unnamed: 0,language,text_angle,orientation,regions,model_version,postId
0,en,0.0,Up,"{'bounding_box': '50,55,1569,2326', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_01
0,en,0.0,Up,"{'bounding_box': '43,55,1545,1261', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_03
0,en,0.0,Up,"{'bounding_box': '32,55,1574,2336', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_04
0,en,0.0,Up,"{'bounding_box': '58,55,741,280', 'lines': [{'...",2021-04-01,OW_Script_v3_pt2_Page_05
1,en,0.0,Up,"{'bounding_box': '35,486,156,44', 'lines': [{'...",2021-04-01,OW_Script_v3_pt2_Page_05
2,en,0.0,Up,"{'bounding_box': '341,503,912,216', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_05
3,en,0.0,Up,"{'bounding_box': '1262,503,329,28', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_05
4,en,0.0,Up,"{'bounding_box': '340,793,1254,1479', 'lines':...",2021-04-01,OW_Script_v3_pt2_Page_05
0,en,0.0,Up,"{'bounding_box': '73,55,1545,1803', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_06
0,en,0.0,Up,"{'bounding_box': '68,63,1545,2014', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_07


In [48]:
out_df = pd.concat(dfs)
out_df.reset_index(drop=True,inplace=True)
out_df.to_json('screenshot_ocr.json')

## *UPDATE -->> I exported the results to json, so not going to run again...*
- The above cell for OCR w/ rate limiting takes around 33 minutes to complete

In [49]:
out_df

Unnamed: 0,language,text_angle,orientation,regions,model_version,postId
0,en,0.0,Up,"{'bounding_box': '50,55,1569,2326', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_01
1,en,0.0,Up,"{'bounding_box': '43,55,1545,1261', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_03
2,en,0.0,Up,"{'bounding_box': '32,55,1574,2336', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_04
3,en,0.0,Up,"{'bounding_box': '58,55,741,280', 'lines': [{'...",2021-04-01,OW_Script_v3_pt2_Page_05
4,en,0.0,Up,"{'bounding_box': '35,486,156,44', 'lines': [{'...",2021-04-01,OW_Script_v3_pt2_Page_05
5,en,0.0,Up,"{'bounding_box': '341,503,912,216', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_05
6,en,0.0,Up,"{'bounding_box': '1262,503,329,28', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_05
7,en,0.0,Up,"{'bounding_box': '340,793,1254,1479', 'lines':...",2021-04-01,OW_Script_v3_pt2_Page_05
8,en,0.0,Up,"{'bounding_box': '73,55,1545,1803', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_06
9,en,0.0,Up,"{'bounding_box': '68,63,1545,2014', 'lines': [...",2021-04-01,OW_Script_v3_pt2_Page_07


__________

## Nested OCR text words as `txt` column

In [50]:
def ocr_expand(x):
        
    txt = []
    
    for line in x.regions['lines']:
        for word in line['words']:
            txt.append(word['text'])
            
    return txt

In [51]:
out_df['txt'] = out_df.apply(lambda x: ocr_expand(x), axis=1)

In [52]:
final_df = out_df

In [53]:
final_df.txt

0     [ESKER, -, LUNAR, OUTPOST, PLANET:, ATTLEROCK,...
1     [ESKER, -, LUNAR, OUTPOST, PLANET:, ATTLEROCK,...
2     [ESKER, -, LUNAR, OUTPOST, PLANET:, ATTLEROCK,...
3     [EYE, LOCATOR, PLANET:, ATTLEROCK, SCENE:, AR,...
4                                               [CHERT]
5     [Goodness,, it's, you!, Hello!, I, take, it, y...
6                             [then?, Welcome, to, the]
7     [Hornfels, asked, me, to, update, our, star, c...
8     [EYE, LOCATOR, SCROLL, PLANET:, ATTLEROCK, SCE...
9     [RIEBECK, -, ATTLEROCK, NOTES, PLANET:, ATTLER...
10    [CHERT, -, ATTLEROCK, NOTES, PLANET:, ATTLEROC...
11    [ESKER, SIGNALSCOPE, LOG, PLANET:, ATTLEROCK, ...
12                                              [CHERT]
13    [Hm?, Oh,, it's, you!, I, take, it, your, firs...
14    [BLACK, HOLE, FORGE, PLANET:, BRITTLE, HOLLOW,...
15                                              [CHERT]
16    [I, found, Nomai, writing, about, the, Sun, St...
17    [BLACK, HOLE, FORGE, SCROLL, PLANET:, BRIT

### Export the dataframe w/ extracted text to JSON

In [58]:
final_df.to_json('screenshot_ocr_w_text.json')

### To view full text, we join the ocr text for further analysis... 

In [56]:
' '.join(out_df.txt.iloc[0])

"ESKER - LUNAR OUTPOST PLANET: ATTLEROCK SCENE: AR ESKAR I FELDSPAR Whoa! Where'd you come from? No one's come here in... well, ever, actually. That makes you the second Hearthian to ever reach Dark Bramble — after me, of course. Well done! ...Say, it's you! They made you an astronaut? And you haven't blown yourself up yet, good for you! Feldspar! You're alive! We all thought you were dead for sure. Have you been here in Dark Bramble all this time? ...You never were the brightest hatchling, were you. Yeah, that's right, I'm alive. Been camping out here since my ship, uh, y'know. Crashed. Violently. Wait, what? You crashed? But you're the greatest pilot in Hearthian history! Oh, this is a good story. I'd just finished exploring the core of Giant's Dee and needed a new challenge, and none of us had ever been inside Dark Bramble, so I tkink, hey, let's give that a try. I've been cruising around for a while, dodging the odd massive, interdimensional vine bristling with thorns, when I run i

### We can loop through each row and print out the text for reference

In [64]:
for row in out_df.txt:

    print(' '.join(row))
    print('-'*60)

ESKER - LUNAR OUTPOST PLANET: ATTLEROCK SCENE: AR ESKAR I FELDSPAR Whoa! Where'd you come from? No one's come here in... well, ever, actually. That makes you the second Hearthian to ever reach Dark Bramble — after me, of course. Well done! ...Say, it's you! They made you an astronaut? And you haven't blown yourself up yet, good for you! Feldspar! You're alive! We all thought you were dead for sure. Have you been here in Dark Bramble all this time? ...You never were the brightest hatchling, were you. Yeah, that's right, I'm alive. Been camping out here since my ship, uh, y'know. Crashed. Violently. Wait, what? You crashed? But you're the greatest pilot in Hearthian history! Oh, this is a good story. I'd just finished exploring the core of Giant's Dee and needed a new challenge, and none of us had ever been inside Dark Bramble, so I tkink, hey, let's give that a try. I've been cruising around for a while, dodging the odd massive, interdimensional vine bristling with thorns, when I run in

_________

## at this point you should have the text extracted that you needed from screenshots... all for free with Azure!