# 03 - Langchain with Qdrant

In this lab, we will do a deeper dive around the Qdrant vector store and different ways to interact with it.

We'll start as usual by defining our Azure OpenAI service API key and endpoint details, specifying the model deployment we want to use and then we'll initiate a connection to the Azure OpenAI service.

**NOTE**: As with previous labs, we'll use the values from the `.env` file in the root of this repository.

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables
if load_dotenv():
    print("Found OpenAPI Base Endpoint: " + os.getenv("OPENAI_API_BASE"))
else: 
    print("No file .env found")

openai_api_type = os.getenv("OPENAI_API_TYPE")
openai_api_key = os.getenv("OPENAI_API_KEY")
openai_api_base = os.getenv("OPENAI_API_BASE")
openai_api_version = os.getenv("OPENAI_API_VERSION")
deployment_name = os.getenv("AZURE_OPENAI_COMPLETION_DEPLOYMENT_NAME")
embedding_name = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME")

First, we will load the data from the movies.csv file using a Langchain document loader.

In [None]:
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path='./movies.csv', source_column='original_title', encoding='utf-8', csv_args={'delimiter':',', 'fieldnames': ['id', 'original_language', 'original_title', 'popularity', 'release_date', 'vote_average', 'vote_count', 'genre', 'overview', 'revenue', 'runtime', 'tagline']})
data = loader.load()
data = data[1:11] # reduce dataset if you want
print('Loaded %s movies' % len(data))

Next, we will create an Azure OpenAI embedding and completion deployments in order to create the vector representation of the movies in the loaded CSV file and then be able to ask questions.

In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import AzureOpenAI

# Create an Embeddings Instance of Azure OpenAI
embeddings = OpenAIEmbeddings(
    deployment=embedding_name,
    chunk_size=1
) 

# Create a Completion Instance of Azure OpenAI
llm = AzureOpenAI(
    openai_api_type = openai_api_type,
    openai_api_version = openai_api_version,
    openai_api_base = openai_api_base,
    openai_api_key = openai_api_key,
    deployment_name = deployment_name,
    model_name="gpt-35-turbo"
)

## Start Qdrant Server Locally

_If you are running the lab in Codespaces, Qdrant is already running, otherwise, we need to start Qdrant and the easiest way is using Docker._

In [None]:
# Start Qdrant Server
!docker run -d --name qdrant -p 6333:6333 -p 6334:6334 -v "$(pwd)/qdrantstorage:/qdrant/storage" qdrant/qdrant

# If you want to stop and cleanup the Qdrant server, uncomment and run the following commands:
# !docker stop qdrant
# !docker rm qdrant
# !rm -rf labs/03-orchestration/03-Qdrant/qdrantstorage

Next, we'll configure Langchain to use Qdrant as vector store, embedd the loaded documents and store the embeddings in the vector store. Depending on the number of movies loaded and rate limiting, this might take a while.

## Load Movies into Qdrant

Now that we have the Qdrant server running and persisting data locally, **see the qdrantstorage directory for more details**, let's load the movies into the vector store.

In [None]:
from langchain.vectorstores import Qdrant

url = "http://localhost:6333"
qdrant = Qdrant.from_documents(
    data,
    embeddings,
    url=url,
    prefer_grpc=False,
    collection_name="my_movies",
)

## Vector Store Searching using Qdrant

Now we are going to test the vector store by searching it in different ways.

The first way is to search for similarity.

In [None]:
vectorstore = qdrant

query = "What is the best 80s movie I should look?"
found_docs = vectorstore.similarity_search(query)

print(found_docs[0].metadata['source'])

Another way would be to search for similar movies but with more diverse results, note the **mmr** search_type.

In [None]:
retriever = vectorstore.as_retriever(search_type="mmr")

query = "Which movies are about space travel?"
print(retriever.get_relevant_documents(query)[0].metadata['source'])

## Vector Store Searching using Langchain Retriever

In this part we will use Langchain to search the Qdrant vector store and retrieve the results. This is different than the previous section where we were using Qdrant's search API directly.

In [None]:
from langchain.indexes import VectorstoreIndexCreator
from langchain.chains import RetrievalQA

index_creator = VectorstoreIndexCreator(embedding=embeddings)
docsearch = index_creator.from_loaders([loader])

Now we are using a Langchain QA chain to ask questions about the movies.

In [None]:
llm = AzureOpenAI(
    openai_api_type = openai_api_type,
    openai_api_version = openai_api_version,
    openai_api_base = openai_api_base,
    openai_api_key = openai_api_key,
    deployment_name = deployment_name,
    model_name="gpt-35-turbo"
)
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.vectorstore.as_retriever(), input_key="question", return_source_documents=True)
query = "Do you have a column called popularity?"
response = chain({"question": query})
print(response['result'])
print(response['source_documents'])

query = "What is the movie with the highest popularity?"
response = chain({"question": query})
print(response['result'])
print(response['source_documents'])

## Load Movies into Qdrant from File

Load the vector database from a file and ask the same question again.

In [None]:
from langchain.vectorstores import Qdrant
from qdrant_client import QdrantClient

from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(deployment=embedding_name, chunk_size=1) 

client = QdrantClient(url="http://localhost:6333", prefer_grpc=False)
qdrantStore = Qdrant(client=client, collection_name="my_movies", embeddings=embeddings)

query = "What are the three best movie about space travel?"
found_docs = qdrantStore.similarity_search(query)

print(found_docs[0].metadata['source'])

Next, lets create a retriever to query against the vector store.

In [None]:
retriever = qdrantStore.as_retriever(search_type="mmr")

query = "What are the three best movie about space travel?"
print(retriever.get_relevant_documents(query)[0].metadata['source'])

## Next Section

📣 [ACS](../04-ACS/acs.ipynb)