# Azure Computer Vision and Lainchain

**Azure Computer Vision** is a cloud-based service that provides advanced algorithms for processing images and returning information based on the visual features you’re interested in. You can use it to extract text, analyze faces, moderate content, generate captions, and more. You can also run it on the edge, in containers, for scenarios that require data security and low latency.

**LangChain** is an open-source framework that simplifies the creation of applications using large language models (LLMs). It provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications. You can use it to connect a language model to other sources of data, and allow it to interact with its environment.

## Before running this Notebook

Make sure you have created an Azure Computer Vision resource and added the following settings to the GitHub Codespace secrets in your repo:

- dsfs
- fsdfds

First import all needed libraries:

In [None]:
import azure.ai.vision as visionsdk
import json
import openai
import os
import requests

#from dotenv import load_dotenv
from io import BytesIO
from langchain.prompts import PromptTemplate
from langchain import LLMChain
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import AzureOpenAI
from PIL import Image

Load all settings from GitHub Codespace secrets

In [None]:
AZURE_COMPUTER_VISION_ENDPOINT = os.getenv("AZURE_COMPUTER_VISION_ENDPOINT")
AZURE_COMPUTER_VISION_KEY = os.getenv("AZURE_COMPUTER_VISION_KEY")

OPENAI_API_BASE = os.getenv("AZURE_OPENAI_ENDPOINT")
OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
OPENAI_API_VERSION = os.getenv("AZURE_OPENAI_MODEL_CHAT_VERSION")

Use Azure Computer Vision to analyze images


In [None]:
def analyze_image(image_url):
    """
    Azure computer vision analysis
    """
    service_options = visionsdk.VisionServiceOptions(
        AZURE_COMPUTER_VISION_ENDPOINT, AZURE_COMPUTER_VISION_KEY
    )
    vision_source = visionsdk.VisionSource(url=image_url)
    analysis_options = visionsdk.ImageAnalysisOptions()

    analysis_options.features = (
        visionsdk.ImageAnalysisFeature.CROP_SUGGESTIONS
        | visionsdk.ImageAnalysisFeature.CAPTION
        | visionsdk.ImageAnalysisFeature.DENSE_CAPTIONS
        | visionsdk.ImageAnalysisFeature.OBJECTS
        | visionsdk.ImageAnalysisFeature.PEOPLE
        | visionsdk.ImageAnalysisFeature.TEXT
        | visionsdk.ImageAnalysisFeature.TAGS
    )

    analysis_options.language = "en"
    analysis_options.model_version = "latest"
    analysis_options.gender_neutral_caption = True

    image_analyzer = visionsdk.ImageAnalyzer(
        service_options, vision_source, analysis_options
    )
    result = image_analyzer.analyze()

    if result.reason == visionsdk.ImageAnalysisResultReason.ANALYZED:
        print(" Image url:", image_url)
        print(" Image height: {}".format(result.image_height))
        print(" Image width: {}".format(result.image_width))
        print(" Model version: {}".format(result.model_version))

        if result.caption is not None:
            print()
            print(" Caption:")
            print(
                "   '{}', Confidence {:.4f}".format(
                    result.caption.content, result.caption.confidence
                )
            )

        if result.dense_captions is not None:
            print()
            print(" Dense Captions:")
            for caption in result.dense_captions:
                print(
                    "   '{}', {}, Confidence: {:.4f}".format(
                        caption.content, caption.bounding_box, caption.confidence
                    )
                )

        if result.objects is not None:
            print()
            print(" Objects:")
            for object in result.objects:
                print(
                    "   '{}', {}, Confidence: {:.4f}".format(
                        object.name, object.bounding_box, object.confidence
                    )
                )

        if result.tags is not None:
            print()
            print(" Tags:")
            for tag in result.tags:
                print("   '{}', Confidence {:.4f}".format(tag.name, tag.confidence))

        if result.people is not None:
            print()
            print(" People:")
            for person in result.people:
                print(
                    "   {}, Confidence {:.4f}".format(
                        person.bounding_box, person.confidence
                    )
                )

        if result.crop_suggestions is not None:
            print()
            print(" Crop Suggestions:")
            for crop_suggestion in result.crop_suggestions:
                print(
                    "   Aspect ratio {}: Crop suggestion {}".format(
                        crop_suggestion.aspect_ratio, crop_suggestion.bounding_box
                    )
                )

        if result.text is not None:
            print()
            print(" Text:")
            for line in result.text.lines:
                points_string = (
                    "{"
                    + ", ".join([str(int(point)) for point in line.bounding_polygon])
                    + "}"
                )
                print(
                    "   Line: '{}', Bounding polygon {}".format(
                        line.content, points_string
                    )
                )
                for word in line.words:
                    points_string = (
                        "{"
                        + ", ".join(
                            [str(int(point)) for point in word.bounding_polygon]
                        )
                        + "}"
                    )
                    print(
                        "     Word: '{}', Bounding polygon {}, Confidence {:.4f}".format(
                            word.content, points_string, word.confidence
                        )
                    )

        result_details = visionsdk.ImageAnalysisResultDetails.from_result(result)

    else:
        error_details = visionsdk.ImageAnalysisErrorDetails.from_result(result)
        print(" Analysis failed.")
        print("   Error reason: {}".format(error_details.reason))
        print("   Error code: {}".format(error_details.error_code))
        print("   Error message: {}".format(error_details.message))
        print(" Did you set the computer vision endpoint and key?")

    return result_details.json_result

Let´s test it with the following image

In [None]:
image_url = "https://www.awanireview.com/wp-content/uploads/2021/06/Play-Xbox-X-Series-games-on-your-Xbox-One-with-2048x1224.jpg"
response = requests.get(image_url)
image_data = BytesIO(response.content)
img = Image.open(image_data)
img

Analyze the shown picture with Azure Computer Vision

In [None]:

json_result = analyze_image(image_url)

Lets take only the given captions

In [None]:
result_dict = json.loads(json_result)
dense_captions = result_dict["denseCaptionsResult"]["values"]
text = "\n".join(caption["text"] for caption in dense_captions)
print(text)

Now with the use of OpenAI models and Langchain to orchestrate the call to the models, we will generate a Twitter(X) post based on the detected captions

**replace** model name if needed


In [None]:
OPENAI_DEPLOYMENT_NAME = "gpt-35-turbo-unai"
OPENAI_MODEL_NAME = "gpt-35-turbo"

Create an instance of a class named "AzureOpenAI" and assigns it to a variable named "llm". The "AzureOpenAI" class is initialized with several parameters including the deployment name, model name, API base URL, API key, and API version.

In [None]:
llm = AzureOpenAI(
    deployment_name=OPENAI_DEPLOYMENT_NAME,
    model_name=OPENAI_MODEL_NAME,
    openai_api_base=OPENAI_API_BASE,
    openai_api_key=OPENAI_API_KEY,
    openai_api_version=OPENAI_API_VERSION,
)
llm

In [None]:
def text_generation_from_image_AI_insights():
    """
    Text generation from image AI insights
    """
    template = """Generate a Tweeter post based on the list of sentences provided below.
    The Tweeter post must be {length} characters long. Use the objects described in the list for the Tweeter post.
    The Tweeter post must have a strong marketing message, it must have some emoticons and hashtags.

    Sentences:"""

    prompt_template = PromptTemplate(input_variables=["length"], template=template)
    prompt_template.format(length=700)
    #
    langchain = LLMChain(llm=llm, prompt=prompt_template)
    # Give the Azure Computer Vision captions as input to the LLM  model
    generated_txt = langchain.run(text)

    print("Marketing post provided by Azure AI:")
    print("\033[1;31;34m", generated_txt)
text_generation_from_image_AI_insights()