# Code analysis with Langchain + Azure OpenAI + Azure Cognitive Search (vector store)

The following demo will show how to analyze your existing by using both Azure OpenAI and Search with the help of Langchain.

**LangChain** is an open-source framework that simplifies the creation of applications using large language models (LLMs). It provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications. You can use it to connect a language model to other sources of data, and allow it to interact with its environment.


In [1]:
import os
import json
import sys

#from dotenv import load_dotenv
from langchain.chat_models import AzureChatOpenAI
from langchain.chains import RetrievalQA
from langchain.retrievers import AzureCognitiveSearchRetriever
from langchain.prompts import PromptTemplate
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import AzureSearch

In [2]:
sys.version

'3.11.5 (main, Sep 26 2023, 13:12:46) [GCC 10.2.1 20210110]'

## Documents

In [3]:
!ls notebooks/*.*

'notebooks/01 Image Analysis.ipynb'
'notebooks/02 Captioning and dense captioning.ipynb'
'notebooks/03 Background removal.ipynb'


Analyze the 3 example notebooks for customized code analysis

In [4]:
root_dir = "notebooks"

# Loop through the folders
docs = []
for dirpath, dirnames, filenames in os.walk(root_dir):
    for file in filenames:
        print(file)
        try:
            loader = TextLoader(os.path.join(dirpath, file), encoding="utf-8")
            docs.extend(loader.load_and_split())
        except Exception as e:
            pass

02 Captioning and dense captioning.ipynb
01 Image Analysis.ipynb
03 Background removal.ipynb


In [5]:
# Split into chunk of texts
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(docs)

We are going to load the settings from GitHub Codespace secrets instead

In [6]:
#load_dotenv("azure.env")

True

**Make sure this settings exist on your GitHub repository Codespace secrets!**


In my case both the model and deployment are named "text-embedding-ada-002"


In [7]:
# Initialize our embedding model
embeddings = OpenAIEmbeddings(
    deployment=os.getenv("OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME"),
    model=os.getenv("OPENAI_ADA_EMBEDDING_MODEL_NAME"),
    openai_api_base=os.getenv("OPENAI_API_BASE"),
    openai_api_type="azure",
    chunk_size=1,
)

index_name = "index-pythonnotebooks"

# Set our Azure Search
acs = AzureSearch(
    azure_search_endpoint=os.getenv("AZURE_COGNITIVE_SEARCH_ENDPOINT"),
    azure_search_key=os.getenv("AZURE_COGNITIVE_SEARCH_API_KEY"),
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)

# Add documents to Azure Search
acs.add_documents(documents=texts)

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: Requests to the Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms. Operation under Azure OpenAI API version 2023-05-15 have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 1 second. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit..


['YWE4NTRjM2UtZjdiMC00NTRjLWJjZWEtMjBmYThiZWQ2Nzkz',
 'ZDBkYTk3ODEtMWM0NC00ODVmLWIyZmEtZmQ4ZGUwYzAzMTdi',
 'ZjMxMmJjZTItZTNlNC00MWQ0LWE1MjYtNzJmMjY2MTU1Yjgw',
 'ODU1Y2U4YjUtY2Y5MS00NGNkLWFhNzMtZjRhYmU4NDVkZTkx',
 'ZTQ5MzY3YmEtMWJhMy00YmVmLWI4NTgtN2E3MmE1MjQzOGQ5',
 'NDZmYmI2YjItM2UyMi00MDE5LWEyMDctMmRjM2YyNzhkNjMy',
 'Njg2NGUwNTAtMWU5My00MjA3LWI3NDAtY2MzY2RkMzMyZjc3',
 'ZjhjNDM4MzYtYWEwYi00OWUzLWI3ZDYtNjJiMzliNzVmMTJh',
 'MzgwODlhOTQtMTJhNC00ZGM2LWEzMTgtNTQwMjNmMThlYzcw',
 'YmE2NzA3NzUtMjNkMS00YjhlLTkyMGYtNWE3ZWIzYTM4Mjg1',
 'ZmI1OTI4MWQtZmI3OC00YTJlLWIwODAtOWJmZTUwMDA3ZWEx',
 'MTgwMWY5NzMtZTA0Ni00NWIxLWEwZmMtNzcwNWViNDgwOGRh',
 'Y2FiYWIwOTEtOTA1YS00ZTJiLWI0YjEtZDE4MTM1ZjQ4NmRk',
 'YTYwODVmZTQtM2Q2NS00NzcyLTljOGEtYjYyZjk3YjQxMjI0',
 'ZmQxMGEyNTktNzc2NC00YjRhLTgxZDQtOTFiNTEyMDYzNWY4',
 'ZDdjNDRiODgtYzNkOS00MTBiLTliMTMtYTc2MjRiYTI4Zjgy',
 'MDQ3NTVkYzktODZmNC00NGQ1LTgwZWMtMTg4N2U1NTI5OWMx',
 'YmJiYTg4MDMtMGNjMi00Y2I3LWE3MDgtZDQzOWRjZDI4YjBk',
 'MTNkYjNhYzQtYzJkNS00ZmU2LThiMjQtYzE2MjljZGUy

In [8]:
# Define Azure Cognitive Search as our retriever
retriever = AzureCognitiveSearchRetriever(
    content_key="content", top_k=10, index_name=index_name
)

In [9]:
# Set chatGPT 3.5 as our LLM
llm = AzureChatOpenAI(deployment_name="gpt-35-turbo-16k", temperature=0.7)

In [10]:
retriever

AzureCognitiveSearchRetriever(tags=None, metadata=None, service_name='azurecogsearcheastussr', index_name='index-pythonnotebooks', api_key='ViHEHiP4CdH3zH0BYLDgHG0DKr6yHoTwbWXR4F90ujAzSeDP6Y0a', api_version='2020-06-30', aiosession=None, content_key='content', top_k=10)

In [11]:
llm

AzureChatOpenAI(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, model_name='gpt-3.5-turbo', temperature=0.7, model_kwargs={}, openai_api_key='8d0786663aa1480f9dee3c9edd842b1a', openai_api_base='https://azure-openai-serge.openai.azure.com', openai_organization='', openai_proxy='', request_timeout=None, max_retries=6, streaming=False, n=1, max_tokens=None, tiktoken_model_name=None, deployment_name='gpt-35-turbo-16k', model_version='', openai_api_type='azure', openai_api_version='2023-05-15')

## Testing

In [12]:
# Define a template message
template = """Use the following pieces of context to answer the question at the end. 
You are a python expert and you should demonstrate some python knowledge.
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""

QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

# Set the Retrieval QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
    return_source_documents=True,
)

In [13]:
questions = ["Could you explain the notebook 01 Image Analysis.ipynb"]

chat_history = []

for question in questions:
    result = qa_chain({"query": question, "chat_history": chat_history})
    # chat_history.append((question, result))
    print(f"Question: {question} \n")
    print(f"Answer: {result['result']} \n")
    print(
        f"Source: {json.loads(result['source_documents'][0].metadata['metadata'])['source']} \n"
    )

Question: Could you explain the notebook 01 Image Analysis.ipynb 

Answer: The notebook "01 Image Analysis.ipynb" demonstrates the use of Azure Computer Vision 4 to perform image analysis tasks such as captioning, object detection, and reading text from images. It utilizes Python libraries like PIL and requests to interact with the Azure Computer Vision API. The notebook also provides links to relevant documentation and updates on Azure Computer Vision. Thanks for asking! 

Source: notebooks/01 Image Analysis.ipynb 



In [18]:
questions = ["How to get image captions? Show me a python code"]

chat_history = []

for question in questions:
    result = qa_chain({"query": question, "chat_history": chat_history})
    # chat_history.append((question, result))
    print(f"Question: {question} \n")
    print(f"Answer: {result['result']} \n")
    print(
        f"Source: {json.loads(result['source_documents'][0].metadata['metadata'])['source']} \n"
    )

Question: How to get image captions? Show me a python code 

Answer: import requests

def get_image_captions(image_url):
    url = endpoint + "/computervision/v3.0/describe"

    headers = {
        "Content-Type": "application/json",
        "Ocp-Apim-Subscription-Key": key,
    }

    data = {"url": image_url}

    response = requests.post(url, headers=headers, json=data)
    response.raise_for_status()

    captions = response.json()["description"]["captions"]

    return [caption["text"] for caption in captions]

# Example usage
image_url = "https://example.com/image.jpg"
captions = get_image_captions(image_url)
print(captions)

Thanks for asking! 

Source: notebooks/02 Captioning and dense captioning.ipynb 



In [14]:
questions = ["Explain the notebook 03 Background removal.ipynb"]

chat_history = []

for question in questions:
    result = qa_chain({"query": question, "chat_history": chat_history})
    # chat_history.append((question, result))
    print(f"Question: {question} \n")
    print(f"Answer: {result['result']} \n")
    print(
        f"Source: {json.loads(result['source_documents'][0].metadata['metadata'])['source']} \n"
    )

Question: Explain the notebook 03 Background removal.ipynb 

Answer: The notebook demonstrates how to use Azure Computer Vision 4 to remove the background from images. It provides functions to remove the background and get the alpha matte of the foreground object. The notebook also includes a batch processing example to remove the background from multiple images. Thanks for asking! 

Source: notebooks/03 Background removal.ipynb 



In [15]:
questions = ["How to remove background from an image using Azure AI?"]

chat_history = []

for question in questions:
    result = qa_chain({"query": question, "chat_history": chat_history})
    # chat_history.append((question, result))
    print(f"Question: {question} \n")
    print(f"Answer: {result['result']} \n")
    print(
        f"Source: {json.loads(result['source_documents'][0].metadata['metadata'])['source']} \n"
    )

Question: How to remove background from an image using Azure AI? 

Answer: To remove the background from an image using Azure AI, you can use the Azure Computer Vision service with the background removal feature. This feature can create an alpha matte that separates the foreground object from the background in an image. You can use the API endpoint and the appropriate headers to make a request to the service and get the edited image or the alpha matte.

Thanks for asking! 

Source: notebooks/03 Background removal.ipynb 



In [16]:
questions = ["How to get image captions?"]

chat_history = []

for question in questions:
    result = qa_chain({"query": question, "chat_history": chat_history})
    # chat_history.append((question, result))
    print(f"Question: {question} \n")
    print(f"Answer: {result['result']} \n")
    print(
        f"Source: {json.loads(result['source_documents'][0].metadata['metadata'])['source']} \n"
    )

Question: How to get image captions? 

Answer: To get image captions using Azure Computer Vision 4.0, you can use the "analyze" API with the "caption" feature. This will generate a caption that describes the content of the image. Make sure to include the necessary headers and endpoint URL when making the API call. Thanks for asking! 

Source: notebooks/02 Captioning and dense captioning.ipynb 

