# 03 - Langchain with Azure Cognitive Search

In this lab, we will do a deeper dive around the Azure Cognitive Search (ACS) vector store and different ways to interact with it.

## Create Azure Cognitive Search Vector Store in Azure

First, we need to create an Azure Cognitive Search (ACS) vector store in Azure. We'll use the Azure CLI to do this.

**Note:** Update **INITIALS** to make the name unique.

In [None]:
RG="azure-cognitive-search-rg"
LOC="westeurope"
NAME="acs-vectorstore-<INITIALS>"
!az group create --name $RG --location $LOC
!az search service create -g $RG -n $NAME -l $LOC --sku Basic --partition-count 1 --replica-count 1

Next, we need to find and update the following values in the **.env** file with the Azure Cognitive Search **endpoint**, **admin key**, and **index name** values. Use the Azure Portal or CLI.

AZURE_COGNITIVE_SEARCH_SERVICE_NAME = "<YOUR AZURE COGNITIVE SEARCH SERVICE NAME - e.g. cognitive-search-service>"
AZURE_COGNITIVE_SEARCH_ENDPOINT_NAME = "<YOUR AZURE COGNITIVE SEARCH ENDPOINT NAME - e.g. https://cognitive-search-service.search.windows.net"
AZURE_COGNITIVE_SEARCH_INDEX_NAME = "<YOUR AZURE COGNITIVE SEARCH INDEX NAME - e.g. cognitive-search-index>"
AZURE_COGNITIVE_SEARCH_API_KEY = "<YOUR AZURE COGNITIVE SEARCH ADMIN API KEY - e.g. cognitive-search-admin-api-key>"

## Setup Azure OpenAI and Langchain

We'll start as usual by defining our Azure OpenAI service API key and endpoint details, specifying the model deployment we want to use and then we'll initiate a connection to the Azure OpenAI service.

**NOTE**: As with previous labs, we'll use the values from the `.env` file in the root of this repository.

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables
if load_dotenv():
    print("Found OpenAPI Base Endpoint: " + os.getenv("OPENAI_API_BASE"))
else: 
    print("No file .env found")

openai_api_type = os.getenv("OPENAI_API_TYPE")
openai_api_key = os.getenv("OPENAI_API_KEY")
openai_api_base = os.getenv("OPENAI_API_BASE")
openai_api_version = os.getenv("OPENAI_API_VERSION")
deployment_name = os.getenv("AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME")
embedding_name = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME")
acs_service_name = os.getenv("AZURE_COGNITIVE_SEARCH_SERVICE_NAME")
acs_endpoint_name = os.getenv("AZURE_COGNITIVE_SEARCH_ENDPOINT_NAME")
acs_index_name = os.getenv("AZURE_COGNITIVE_SEARCH_INDEX_NAME")
acs_api_key = os.getenv("AZURE_COGNITIVE_SEARCH_API_KEY")

First, we will load the data from the movies.csv file using a Langchain document loader.

In [None]:
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='./movies.csv', source_column='original_title', encoding='utf-8', csv_args={'delimiter':',', 'fieldnames': ['id', 'original_language', 'original_title', 'popularity', 'release_date', 'vote_average', 'vote_count', 'genre', 'overview', 'revenue', 'runtime', 'tagline']})
data = loader.load()
data = data[1:21] # reduce dataset if you want
print('Loaded %s movies' % len(data))

Next, we will create an Azure OpenAI embedding and completion deployments in order to create the vector representation of the movies in the loaded CSV file and then be able to ask questions.

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import AzureOpenAI

# Create an Embeddings Instance of Azure OpenAI
embeddings = OpenAIEmbeddings(
    deployment=embedding_name,
    chunk_size=1
) 

# Create a Completion Instance of Azure OpenAI
llm = AzureOpenAI(
    openai_api_type = openai_api_type,
    openai_api_version = openai_api_version,
    openai_api_base = openai_api_base,
    openai_api_key = openai_api_key,
    deployment_name = deployment_name,
    model_name="gpt-35-turbo"
)

## Load Movies into Azure Cognitive Search (ACS)

Next, we'll configure Langchain to use ACS as vector store, embedd the loaded documents and store the embeddings in the vector store. Depending on the number of movies loaded and rate limiting, this might take a while.

Here is a great link highlighting how Langchain and Azure Cognitive Search work together: https://python.langchain.com/docs/integrations/vectorstores/azuresearch

In [None]:
from langchain.vectorstores.azuresearch import AzureSearch

vector_store = AzureSearch(
    azure_search_endpoint=acs_endpoint_name,
    azure_search_key=acs_api_key,
    index_name=acs_index_name,
    embedding_function=embeddings.embed_query,
)

vector_store.add_documents(documents=data)

## Vector Store Searching using Azure Cognitive Search (ACS)

Now that we have the movies loaded into ACS, let's do some searches using the ACS API via the SDK.

In [None]:
# Perform a similarity search
docs = vector_store.similarity_search(
    query="What are the best 80s movies I should look?",
    k=3,
    search_type="similarity"
)
print(docs[0].page_content)

In [None]:
# Perform a hybrid search
docs = vector_store.similarity_search(
    query="What are the best 80s movies I should look?",
    k=3,
    search_type="hybrid"
)
print(docs[0].page_content)

## Vector Store Searching using Langchain Retriever

In this part we will use Langchain to search Azure Cognitive Search, retrieve the results, then interact with the LLM in Azure OpenAI. This is different than the previous section where we were using Azure Cognitive Serach's search API directly.

In [None]:
# Now combine the Search with Azure OpenAI
from langchain.retrievers import AzureCognitiveSearchRetriever

retriever = AzureCognitiveSearchRetriever(
    service_name=acs_service_name,
    api_key=acs_api_key,
    index_name=acs_index_name,
    content_key="content",
    top_k=2
)

retriever.get_relevant_documents("What is the best movie of all time?")

### Do NOT include source documents.

In [None]:
# Now use the retriever in combination with the LLM in Azure OpenAI.
from langchain.chains import RetrievalQA

# Do NOT include the source documents in the response.
chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    input_key="question",
    return_source_documents=False
)
query = "What is the best movie of all time?"
response = chain({"question": query})
print(response['result'])

### Include source documents.

In [None]:
# Now use the retriever in combination with the LLM in Azure OpenAI.
from langchain.chains import RetrievalQA

# Include the source documents in the response.
chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    input_key="question",
    return_source_documents=True
)
query = "What is the best movie of all time?"
response = chain({"question": query})
print(response['result'])
print(response['source_documents'])

## Next Section

📣 [Deploy AI](../../04-deploy-ai/README.md)