# 03 - Langchain with Azure Cognitive Search

In this lab, we will do a deeper dive around the Azure Cognitive Search (ACS) vector store and different ways to interact with it.

We'll start as usual by defining our Azure OpenAI service API key and endpoint details, specifying the model deployment we want to use and then we'll initiate a connection to the Azure OpenAI service.

**NOTE**: As with previous labs, we'll use the values from the `.env` file in the root of this repository.

In [None]:
# First, load the Azure Cognitive Search Python SDK
!pip install --index-url=https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/ azure-search-documents==11.4.0a20230509004

In [None]:
import os, json
from dotenv import load_dotenv

# Load environment variables
if load_dotenv():
    print("Found OpenAPI Base Endpoint: " + os.getenv("OPENAI_API_BASE"))
else: 
    print("No file .env found")

openai_api_type = os.getenv("OPENAI_API_TYPE")
openai_api_key = os.getenv("OPENAI_API_KEY")
openai_api_base = os.getenv("OPENAI_API_BASE")
openai_api_version = os.getenv("OPENAI_API_VERSION")
deployment_name = os.getenv("AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME")
embedding_name = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME")

First, we will load the data from the movies.csv file using a Langchain document loader.

In [None]:
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='./movies.csv', source_column='original_title', encoding='utf-8', csv_args={'delimiter':',', 'fieldnames': ['id', 'original_language', 'original_title', 'popularity', 'release_date', 'vote_average', 'vote_count', 'genre', 'overview', 'revenue', 'runtime', 'tagline']})
data = loader.load()
data = data[1:200] # reduce dataset if you want
print('Loaded %s movies' % len(data))

Next, we will create an Azure OpenAI embedding and completion deployments in order to create the vector representation of the movies in the loaded CSV file and then be able to ask questions.

In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import AzureOpenAI

# Create an Embeddings Instance of Azure OpenAI
embeddings = OpenAIEmbeddings(
    deployment=embedding_name,
    chunk_size=1
) 

# Create a Completion Instance of Azure OpenAI
llm = AzureOpenAI(
    openai_api_type = openai_api_type,
    openai_api_version = openai_api_version,
    openai_api_base = openai_api_base,
    openai_api_key = openai_api_key,
    deployment_name = deployment_name,
    model_name="gpt-35-turbo"
)

## Create Azure Cognitive Search Vector Store in Azure

Next, we'll configure Langchain to use ACS as vector store using docker, embedd the loaded documents and store the embeddings in the vector store. Depending on the number of movies loaded and rate limiting, this might take a while.

In [None]:
import openai
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.azuresearch import AzureSearch

model: str = "text-embedding-ada-002"
index_name: str = "langchain-vector-demo"

embeddings: OpenAIEmbeddings = OpenAIEmbeddings(model=model, chunk_size=1)
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=os.environ["AZURE_COGNITIVE_SEARCH_ENDPOINT"],
    azure_search_key=os.environ["AZURE_COGNITIVE_SEARCH_ADMIN_KEY"],
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)

vector_store.add_documents(documents=data)

In [None]:
# Perform a similarity search
docs = vector_store.similarity_search(
    query="What are the best 80s movies I should look?",
    k=3,
    search_type="similarity",
)
print(docs[0].page_content)

# Perform a hybrid search
docs = vector_store.similarity_search(
    query="What are the best 80s movies I should look?", k=3
)
print(docs[0].page_content)

In [None]:
from langchain.retrievers import AzureCognitiveSearchRetriever

# os.environ["AZURE_COGNITIVE_SEARCH_SERVICE_NAME"] = "<YOUR_ACS_SERVICE_NAME>"
# os.environ["AZURE_COGNITIVE_SEARCH_INDEX_NAME"] = "<YOUR_ACS_INDEX_NAME>"
# os.environ["AZURE_COGNITIVE_SEARCH_API_KEY"] = "<YOUR_API_KEY>"

retriever = AzureCognitiveSearchRetriever(content_key="content")

retriever.get_relevant_documents("what is the best movie of all time")

## Next Section

📣 [Deploy AI](../../04-deploy-ai/README.md)