## Azure Image Analysis

In [None]:
!pip install azure-cognitiveservices-vision-computervision
!pip install azure-storage-blob

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
# Setup
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials

from array import array
import os
from PIL import Image
import sys
import time

import pandas as pd
import numpy as np

In [None]:
import re

**The following image variables will be analyzed during the image analysis:**

1. Presence of text
2. People-centered vs. text-centered

For 1. Presence of text, there are two approaches to extract this variable:

(1) Use OCR (Optical Character Recognition) to extract the text content of each image. 

(2) Use image tag to get the confidence score (likelyhood, or probabilty) of text portion, if the text portion is greater than a certain thereshold (e.g., > 0.05 since we only care about the presence of text),  consider the image as containing text.

For 2. People-centered vs. text-centered, there are three approaches to extract this variable:

(1) Use image tag to get the confidence score (likelyhood, or probabililty) of text AND people portion when both text and people tags are detected, assign the image as text/people centered with the higher confidence score on one type

(2) Use image description (add captions to images), a sentence will be generated, find if the image is text / people centered.

(3) Use image category. Unlike tags, categories are organized in a parent/child hierarchy, and there are fewer of them (86, as opposed to thousands of tags). Find if the image is text / people centered by checking if the category is "people_" or "text_"

**Note:** 

The Azure accepts local image files or image URL links (e.g.: https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/cognitive-services/Computer-vision/Images/readsample.jpg) for the analysis. 

Image URL under the Google drive is not valid (Bad Request). An alternative is to upload images to the Azure Storage Account

Right now use the image URL generated by Azure Storage Account for the demonstration purpose. 

TBD: Whether to analyze through local paths or URLs

The image file size should be less than 4MB, may need to compress some images

Update: The Instagram seems to have compressed the images already before saving them into its database, so each image should not exceed the 4MB file size limit

## Extracting Variables

In [None]:
# Keys and endpoint from Microsoft Azure
# Note: Not valid anymore
subscription_key = "32f61e402816420fb0b173d80fb7df00"
endpoint = "https://aijins-computer-vision.cognitiveservices.azure.com/"

In [None]:
# Authentication
computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription_key))

In [None]:
# Save images to bolb
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
connect_str = 'DefaultEndpointsProtocol=https;AccountName=aijinsimagedata;AccountKey=RmcEhNQ4LCthPRo22NgQXtq5eDu2oE8AnnT3nprzmAnH8TM7O5I+deG2PJzpEPsn4CpehXlGFaek+AStqiwsmA==;EndpointSuffix=core.windows.net'
container_name = 'newimage'
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_client = blob_service_client.get_container_client(container_name)

In [None]:
image_urls = []
blobs = container_client.list_blobs()
for blob in blobs:
    image_url = f"https://aijinsimagedata.blob.core.windows.net/{container_name}/{blob.name}"
    image_urls.append(image_url)

url = "https://insimagestorage.blob.core.windows.net/image/profile_post_img_all/freeofviolence/269812685_772090997519875_1762577499023572044_n.jpg"

In [None]:
len(image_urls)

14618

For local image:

In [None]:
# #images_folder = os.path.join (os.path.dirname(os.path.abspath('C:\Users\shiho\Aij\SA_Instagram')), "Images")

# images_folder = "C:/Users/shiho/Aij/SA_Instagram/Images/"
# local_image_path = os.path.join (images_folder, "199142856-986c1f8c-10f9-4973-91a1-573d927ec7bf.jpg")
# local_image = open(local_image_path, "rb")

For Azure Storage Account:

In [None]:
image_folder_azure = "https://insimagestorage.blob.core.windows.net/image/"

In [None]:
# people with text ()
image_name = "199142809-bb24a9ff-ebb7-476b-90ee-dc7f4dd0e172.jpg"
# image without text
#image_name = "289842202_737546634131529_4184733584509506915_n.jpg"

#image_name = "292969378_1193562068106318_3930416080560697894_n.webp"


In [None]:
read_image_url = image_folder_azure + image_name
read_image_url

'https://insimagestorage.blob.core.windows.net/image/199142809-bb24a9ff-ebb7-476b-90ee-dc7f4dd0e172.jpg'

For unzip files:

In [None]:
import urllib.request
import zipfile
import os

url = 'https://aijinsimagedata.blob.core.windows.net/newimage/profile_post_img_all.zip'
filename = 'profile_post_img_all.zip'
urllib.request.urlretrieve(url, filename)
# Unzip the file
with zipfile.ZipFile(filename, 'r') as zip_ref:
    zip_ref.extractall('images')

In [None]:
image_url = []
for dirpath, dirnames, filenames in os.walk('images/profile_post_img_all'):
    for filename in filenames:
        # Build image URL
        image_path = os.path.join(dirpath, filename)

        with open(image_path, 'rb') as data:
              container_client.upload_blob(name=image_path, data=data)
              image_url.append("https://aijinsimagedata.blob.core.windows.net/newimage/"+image_path)



In [None]:
len(image_url)

53071

### Presence of Text

#### (1) Use OCR

Demonstration:

In [None]:
# Call API with URL and raw response (allows you to get the operation location)
read_response = computervision_client.read(image_url[0],  raw=True)

In [None]:
# Get the operation location (URL with an ID at the end) from the response
read_operation_location = read_response.headers["Operation-Location"]

# Grab the ID from the URL
operation_id = read_operation_location.split("/")[-1]

# Call the "GET" API and wait for it to retrieve the results 
while True:
    read_result = computervision_client.get_read_result(operation_id)
    if read_result.status not in ['notStarted', 'running']:
        break
    time.sleep(1)

In [None]:
# Print the detected text, line by line
if read_result.status == OperationStatusCodes.succeeded:
    for text_result in read_result.analyze_result.read_results:
        line_text = []
        line_bouding_box = []
        for line in text_result.lines:
            line_text.append(line.text)
            line_bouding_box.append(line.bounding_box)

In [None]:
line_text

['G13',
 'Concertation',
 'nationale',
 'féministe',
 'MINISTÈRE DES DROITS',
 "DES FEMMES ET DE L'ÉGALITÉ",
 'Pour plus',
 'de stabilité',
 'Avoir un ministère des Droits des',
 "femmes et de l'Égalité permettrait",
 "d'assurer une plus grande",
 "stabilité dans l'administration des",
 'fonds et dans la mise en œuvre',
 'des politiques.']

In [None]:
line_bouding_box

[[819.0, 64.0, 921.0, 62.0, 924.0, 111.0, 819.0, 115.0],
 [928.0, 63.0, 1020.0, 63.0, 1020.0, 79.0, 928.0, 79.0],
 [925.0, 77.0, 991.0, 79.0, 990.0, 96.0, 925.0, 95.0],
 [926.0, 94.0, 990.0, 97.0, 989.0, 112.0, 926.0, 109.0],
 [826.0, 133.0, 971.0, 133.0, 971.0, 145.0, 826.0, 145.0],
 [826.0, 149.0, 1010.0, 148.0, 1011.0, 160.0, 826.0, 162.0],
 [569.0, 520.0, 907.0, 520.0, 907.0, 586.0, 569.0, 586.0],
 [571.0, 591.0, 972.0, 591.0, 972.0, 654.0, 571.0, 654.0],
 [451.0, 738.0, 999.0, 738.0, 999.0, 772.0, 451.0, 772.0],
 [451.0, 777.0, 1013.0, 778.0, 1013.0, 819.0, 451.0, 817.0],
 [451.0, 821.0, 883.0, 824.0, 883.0, 861.0, 451.0, 858.0],
 [449.0, 863.0, 1004.0, 863.0, 1004.0, 902.0, 449.0, 901.0],
 [449.0, 906.0, 971.0, 908.0, 971.0, 944.0, 449.0, 942.0],
 [450.0, 949.0, 688.0, 950.0, 688.0, 989.0, 450.0, 987.0]]

Write Function:

In [None]:
def azure_ocr(read_image_url):
    '''Use Azure Computer Vision OCR to extract text from an image
    Return two variables, all in list format
    line_text is the actural text extracted
    line_bouding_box is the region of the text in the image
    '''
    # Call API with URL and raw response (allows you to get the operation location)
    read_response = computervision_client.read(read_image_url, raw=True)
    
    # Get the operation location (URL with an ID at the end) from the response
    read_operation_location = read_response.headers["Operation-Location"]
    
    # Grab the ID from the URL
    operation_id = read_operation_location.split("/")[-1]
    
    # Call the "GET" API and wait for it to retrieve the results
    while True:
        read_result = computervision_client.get_read_result(operation_id)
        if read_result.status not in ['notStarted', 'running']:
            break
        time.sleep(1)
    # Print the detected text, line by line
    if read_result.status == OperationStatusCodes.succeeded:
        for text_result in read_result.analyze_result.read_results:
            line_text = []
            line_bouding_box = []
            for line in text_result.lines:
                line_text.append(line.text)
                line_bouding_box.append(line.bounding_box)
    return line_text, line_bouding_box

In [None]:
line_text, line_bouding_box = azure_ocr(image_url[0])
line_text

['G13',
 'Concertation',
 'nationale',
 'féministe',
 'MINISTÈRE DES DROITS',
 "DES FEMMES ET DE L'ÉGALITÉ",
 'Pour plus',
 'de stabilité',
 'Avoir un ministère des Droits des',
 "femmes et de l'Égalité permettrait",
 "d'assurer une plus grande",
 "stabilité dans l'administration des",
 'fonds et dans la mise en œuvre',
 'des politiques.']

In [None]:
len([])

0

In [None]:
def text_presence_by_ocr(line_text = line_text):
    '''Identify if the image contains the text through image OCR approach
    The text is present if the line_text is empty
    Return boolean True or False
    '''
    if len(line_text) != 0:
        text_presence = True
    else:
        text_presence = False
    return text_presence

In [None]:
has_text_ocr = text_presence_by_ocr(line_text = line_text)
has_text_ocr

True

**(2) Use Image Tag**

Demonstration:

In [None]:
tags_result = computervision_client.tag_image(image_url[0])

In [None]:
tags_result.tags

[<azure.cognitiveservices.vision.computervision.models._models_py3.ImageTag at 0x7f3879420160>,
 <azure.cognitiveservices.vision.computervision.models._models_py3.ImageTag at 0x7f38793bf5e0>,
 <azure.cognitiveservices.vision.computervision.models._models_py3.ImageTag at 0x7f38793bfd90>,
 <azure.cognitiveservices.vision.computervision.models._models_py3.ImageTag at 0x7f38793bfeb0>,
 <azure.cognitiveservices.vision.computervision.models._models_py3.ImageTag at 0x7f3877847c70>]

In [None]:
if (len(tags_result.tags) == 0):
    print("No tags detected.")
else:
    tag_name = []
    tag_confidence = []
    for tag in tags_result.tags:
        tag_name.append(tag.name)
        tag_confidence.append(tag.confidence)
        #print("'{}' with confidence {:.2f}%".format(tag.name, tag.confidence * 100))

In [None]:
tag_name

['text', 'poster', 'graphic design', 'megaphone', 'design']

In [None]:
tag_confidence

[0.9998866319656372,
 0.9135843515396118,
 0.8816847801208496,
 0.8532842397689819,
 0.5847171545028687]

In [None]:
# combine the two list into dictionary
res = {tag_name[i]: tag_confidence[i] for i in range(len(tag_name))}
res

{'text': 0.9998866319656372,
 'poster': 0.9135843515396118,
 'graphic design': 0.8816847801208496,
 'megaphone': 0.8532842397689819,
 'design': 0.5847171545028687}

In [None]:
# Detect if text is in the image
if "text" in res:
    print("there is text")

there is text


Write Function:

In [None]:
def azure_image_tag(read_image_url):
    '''Use Azure Computer Vision Image Tag to assign tags of an image
    Return a variable in dictionary format
    The key of the dictionary is the name of the image tag
    The value of the dictionary is the confidence score of corresponding image tag
    '''
    tags_result = computervision_client.tag_image(read_image_url)
    
    # Return empty dictionary if there is no image tag
    if (len(tags_result.tags) == 0):
        image_tags = {}
        
    # Get all the image tags with its confidence
    else:
        tag_name = []
        tag_confidence = []
        for tag in tags_result.tags:
            tag_name.append(tag.name)
            tag_confidence.append(tag.confidence)
        image_tags = {tag_name[i]: tag_confidence[i] for i in range(len(tag_name))}
    return image_tags

In [None]:
image_tags = azure_image_tag(image_url[0])
image_tags

{'text': 0.9998866319656372,
 'poster': 0.9135843515396118,
 'graphic design': 0.8816847801208496,
 'megaphone': 0.8532842397689819,
 'design': 0.5847171545028687}

In [None]:
def text_presence_by_tag(image_tags = image_tags):
    '''Identify if the image contains the text through image tag approach
    Return boolean True or False
    '''
    if "text" in image_tags:
        text_presence = True
    else:
        text_presence = False
    return text_presence

In [None]:
has_text_tag = text_presence_by_tag(image_tags)
has_text_tag

True

### People-centered vs. Text-centered

**(1) Use Image Tag**

May need to run more people related tags to find the human-related tag categories.

Or use network analysis to find the tag relations 

**Right now write a function to output a list of unique tags for identifying people-centered tags**

In [None]:
def find_unique_tag(image_tags = image_tags):
    unique_tags = list(image_tags)
    return unique_tags

In [None]:
unique_tags = find_unique_tag(image_tags=image_tags)
unique_tags

['text', 'poster', 'graphic design', 'megaphone', 'design']

In [None]:
[k for k in res if re.match('human', k)]

[]

In [None]:
"human face" in res

False

In [None]:
str(ocr_result)

NameError: ignored

In [None]:
ocr_result.regions

[<azure.cognitiveservices.vision.computervision.models._models_py3.OcrRegion at 0x1deea83d910>,
 <azure.cognitiveservices.vision.computervision.models._models_py3.OcrRegion at 0x1deea83de50>,
 <azure.cognitiveservices.vision.computervision.models._models_py3.OcrRegion at 0x1deea2103a0>,
 <azure.cognitiveservices.vision.computervision.models._models_py3.OcrRegion at 0x1deea210460>]

**(2) User Image Description**

Demonstration:

In [None]:
# Call API
description_result = computervision_client.describe_image(read_image_url,language= 'en' , max_candidates=3)

In [None]:
str(description_result)

"{'additional_properties': {}, 'tags': ['text', 'woman', 'person', 'screenshot', 'businesscard'], 'captions': [<azure.cognitiveservices.vision.computervision.models._models_py3.ImageCaption object at 0x000001DEE9DCCEB0>], 'request_id': 'bd02b7e9-c744-4c71-9b75-9d862da1ed0f', 'metadata': <azure.cognitiveservices.vision.computervision.models._models_py3.ImageMetadata object at 0x000001DEEA823370>, 'model_version': '2021-05-01'}"

In [None]:
if (len(description_result.captions) == 0):
    description_text = []
    description_confidence = []
else:
    description_text = []
    description_confidence = []
    for caption in description_result.captions:
        description_text.append(caption.text)
        description_confidence.append(caption.confidence)

In [None]:
description_text

['a woman holding a box']

In [None]:
description_confidence

[0.5649126768112183]

In [None]:
def azure_image_description(read_image_url):
    '''Use Azure Computer Vision Image Description to describe an image
    Return two variables in list format
    The description_text is the text description of the image
    The description_confidence is the confidence score of the description
    '''
    description_result = computervision_client.describe_image(read_image_url,language= 'en' , max_candidates=3)
    if (len(description_result.captions) == 0):
        description_text = []
        description_confidence = []
    else:
        description_text = []
        description_confidence = []
        for caption in description_result.captions:
            caption_text = caption.text
            description_text.append(caption_text)
            description_confidence.append(caption.confidence)
    return description_text, description_confidence

In [None]:
description_text, description_confidence = azure_image_description(read_image_url=read_image_url)

In [None]:
description_text

['a woman holding a box']

**(3) Use Image Catogory**

Demonstration:

In [None]:
analyze_result = computervision_client.analyze_image(read_image_url, visual_features= ['Categories'], language= 'en')

In [None]:
str(analyze_result)

"{'additional_properties': {}, 'categories': [<azure.cognitiveservices.vision.computervision.models._models_py3.Category object at 0x000001DEEF2EA7F0>], 'adult': None, 'color': None, 'image_type': None, 'tags': None, 'description': None, 'faces': None, 'objects': None, 'brands': None, 'request_id': '6eb93eed-cc8e-47d9-9474-a57557dce63a', 'metadata': <azure.cognitiveservices.vision.computervision.models._models_py3.ImageMetadata object at 0x000001DEEF44EB50>, 'model_version': '2021-05-01'}"

In [None]:
for category in analyze_result.categories:
    category_name = []
    category_score = []
    category_name.append(category.name)
    category_score.append(category.score)
category_name

['text_mag']

Write Function:

In [None]:
def azure_image_category(read_image_url):
    '''Use Azure Computer Vision Analyze Image to categorize an image
    Return two variables in list format
    The category_name is the category of the image
    The category_score is the confidence score of corresponding category
    '''
    analyze_result = computervision_client.analyze_image(read_image_url, visual_features= ['Categories'], language= 'en')
    if (len(analyze_result.categories) == 0):
        category_name = []
        category_score = []
    else:
        category_name = []
        category_score = []
        for category in analyze_result.categories:
            category_name.append(category.name)
            category_score.append(category.score)
    return category_name, category_score

In [None]:
category_name, category_score = azure_image_category(read_image_url=read_image_url)

In [None]:
category_name

['text_mag']

In [None]:
category_score

[0.890625]

## Demo for creating dataset for image analysis

The desired output is a dataframe with image URL, image name, and variables in interest

A list of images in Azure Storage Account:

In [None]:
# TODO: This list of all files under Azure could be obtained through Azure Data Factory
image_folder_item = ['199142760-01f4f00a-a46f-4ea8-a7aa-02eb76a35a33.jpg',
                    '199142769-b6cdb0aa-f267-4570-ad0d-0663703c7974.jpg',
                    '199142809-bb24a9ff-ebb7-476b-90ee-dc7f4dd0e172.jpg',
                    '199142848-e0cd2641-0b01-40e9-be3c-bcb73ba1c635.jpg',
                    '199142856-986c1f8c-10f9-4973-91a1-573d927ec7bf.jpg',
                    '289842202_737546634131529_4184733584509506915_n.jpg',
                    '292969378_1193562068106318_3930416080560697894_n.webp',
                    '58409537_459823631424204_7100044955669757952_n.jpg']

In [None]:
azure_storage_path = "https://insimagestorage.blob.core.windows.net/image/"
azure_storage_path

'https://insimagestorage.blob.core.windows.net/image/'

In [None]:
image_url[0].split("/")[-1]

'306693307_613394816895124_8968443589920319007_n.jpg'

In [None]:
def create_image_analysis_df(image_url):
    '''This fuction create a dataframe that contains the image name, URL and the variables in interest
    azure_storage_path: Azure Storage Account path
    image_folder_item: A list containing all image names. e.g., "sample_image.jpg"
    Return a pandas dataframe
    '''
    # First create an empty pandas dataframe for appending each image and its related variable
    image_analysis_df = pd.DataFrame(columns= ['image_name', 'image_storage_URL', 'ocr_text', 
                                               'ocr_text_bounding_box', 'has_text_ocr', 'image_tags', 
                                               'has_text_tag', 'unique_tags', 'description_text', 
                                               'description_confidence', 'category_name', 'category_score'])
    
    for i in range(0, len(image_url)):
        # Call all previous functions to extract variables in interest through Azure
        read_image_url = image_url[i]
        name = read_image_url.split("/")[-1]
        # OCR part
        line_text, line_bouding_box = azure_ocr(read_image_url = read_image_url)
        has_text_ocr = text_presence_by_ocr(line_text = line_text)
        
        # Image tag part
        image_tags = azure_image_tag(read_image_url = read_image_url)
        has_text_tag = text_presence_by_tag(image_tags)
        unique_tags = find_unique_tag(image_tags=image_tags)
        
        # Image Description part
        description_text, description_confidence = azure_image_description(read_image_url=read_image_url)
        
        # Image Category part
        category_name, category_score = azure_image_category(read_image_url=read_image_url)
        
        # Put all variables in dictionary format as a row in pd dataframe, append the row
        each_image_info = {'image_name':name, 'image_storage_URL': read_image_url, 'ocr_text': line_text, 
                           'ocr_text_bounding_box': line_bouding_box, 'has_text_ocr': has_text_ocr, 'image_tags': image_tags,
                           'has_text_tag': has_text_tag, 'unique_tags': unique_tags, 'description_text': description_text,
                           'description_confidence' : description_confidence, 'category_name': category_name, 
                           'category_score': category_score} 
        image_analysis_df = image_analysis_df.append(each_image_info, ignore_index= True)
        
    return image_analysis_df 

In [None]:
%%time
image_analysis_df = create_image_analysis_df(image_url)

ComputerVisionOcrErrorException: ignored

In [None]:
image_analysis_df

NameError: ignored

In [None]:
image_analysis_df.to_csv('sample_image_analysis_dataset.csv')

**Draft Code (for local files)**

In [None]:
# Call API
description_result = computervision_client.describe_image_in_stream(local_image,language= 'en' ,max_candidates=3)

# Get the captions (descriptions) from the response, with confidence level
print("Description of local image: ")
if (len(description_result.captions) == 0):
    print("No description detected.")
else:
    for caption in description_result.captions:
        print("'{}' with confidence {:.2f}%".format(caption.text, caption.confidence * 100))
print()

Description of local image: 
'a group of people holding a sign' with confidence 66.56%



In [None]:
for caption in description_result.captions:
    caption_text = caption.text
    caption_confidence = caption.confidence
caption_text  

'a group of people holding a sign'

In [None]:
print(description_result.metadata)

{'additional_properties': {}, 'width': 736, 'height': 945, 'format': 'Jpeg'}


In [None]:
def imageDiscriptiontoDict(description_result):
    d = {}
    d['additional_properties'] = description_result.additional_properties
    d['tags'] = description_result.tags
    dcaptions = []
    for i in description_result.captions:
        dcaptions.append({'additional_properties': i.additional_properties, 'text': i.text, 'confidence': i.confidence})
    d['captions'] = dcaptions
    d['request_id'] = description_result.request_id
    d['metadata'] = {'additional_properties': description_result.metadata.additional_properties, 'width': description_result.metadata.width, 'height': description_result.metadata.height, 'format': description_result.metadata.format}
    d['model_version'] = description_result.model_version
    
    return d

In [None]:
imageDiscriptiontoDict(description_result)

{'additional_properties': {},
 'tags': ['text', 'person', 'standing', 'people', 'group'],
 'captions': [{'additional_properties': {},
   'text': 'a group of people holding a sign',
   'confidence': 0.665584146976471}],
 'request_id': 'c35173f1-ce86-4252-916d-d798a24c8f5d',
 'metadata': {'additional_properties': {},
  'width': 736,
  'height': 945,
  'format': 'Jpeg'},
 'model_version': '2021-05-01'}

In [None]:
str(description_result)

"{'additional_properties': {}, 'tags': ['text', 'person', 'standing', 'people', 'group'], 'captions': [<azure.cognitiveservices.vision.computervision.models._models_py3.ImageCaption object at 0x000001DEEA3B4730>], 'request_id': 'c35173f1-ce86-4252-916d-d798a24c8f5d', 'metadata': <azure.cognitiveservices.vision.computervision.models._models_py3.ImageMetadata object at 0x000001DEEA4BA5B0>, 'model_version': '2021-05-01'}"

In [None]:
print(description_result.captions[0])

{'additional_properties': {}, 'text': 'a woman holding a box', 'confidence': 0.5649126768112183}


In [None]:
print(description_result.as_dict)

<bound method Model.as_dict of <azure.cognitiveservices.vision.computervision.models._models_py3.ImageDescription object at 0x000001DEEA5F0970>>


In [None]:
description_result.captions

[<azure.cognitiveservices.vision.computervision.models._models_py3.ImageCaption at 0x1deea5f0e80>]

In [None]:
for caption in description_result.captions:
    caption_text = caption.text
    caption_confidence = caption.confidence
caption_confidence

0.5649126768112183

In [None]:
caption_text

'a woman holding a box'

In [None]:
image_analysis_df.to_csv()