# Example: Chat bot over governed data using GDrive, Qdrant and Codified

### Pre-requisites
- A [Qdrant Cloud](https://qdrant.tech/) cluster
- An [OpenAI Platform](https://platform.openai.com/docs/overview) API key

### Setup
To make things easy, we provide a sample Google Drive folder, which has lease agreements in docx and pdf formats, along with Google Workspace users you can use to test permissions enforcement. We also provide a Codified workspace already setup to connect to the lease agreenents Google Drive folder. [This](https://drive.google.com/drive/u/1/folders/1kfwunHsiJ_qb560HuGMrBf825nRMrHlS) is what the sample folder looks like. And, here are some user permissions that are set on the folder and that we use in this notebook:
- camila.c@g3a.io has access to the entire folder.
- eva.a@g3a.io has access to nothing.
- ethan.e@g3a.io has access to "cascade_realy_lease_agreement.docx"

If you want to try this sample against your own Google Drive, you will need your own Codified workspace, you can get one [here](https://p.codified.app/). 

Make a copy of `sample.env` and name it `.env`. This file contains configuration for using the sample Codfied instance and Google Drive along with settings you will need to provide: 
- An Open AI API key for creating embeddings
- A Qdrant Cloud cluster url and an API key for storing your vectorized data.

In [None]:
# Use this if running from colab or other notebook hosters

!wget https://github.com/codified-io/examples/blob/main/requirements.txt
!wget https://github.com/codified-io/examples/blob/simple-rag/gdrive-creds.json
%pip install -r requirements.txt

In [None]:
import os

from dotenv import load_dotenv
from llama_index.core import Settings, StorageContext, VectorStoreIndex
from llama_index.core.chat_engine import ContextChatEngine
from llama_index.readers.file import DocxReader, PDFReader
from llama_index.readers.google import GoogleDriveReader
from llama_index.vector_stores.qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.models import CollectionInfo, Distance, VectorParams

from codified.client import CodifiedClient, access_context
from codified.llama_index.retrievers import CodifiedRetriever

load_dotenv()

### Find or create a Qdrant collection

In [None]:
def create_qdrant_collection(qdrant_client: QdrantClient) -> CollectionInfo:
    qdrant_client = QdrantClient(
        url=os.environ["QDRANT_URL"],
        api_key=os.environ["QDRANT_API_KEY"]
    )

    collection_name = os.environ["QDRANT_COLLECTION_NAME"]
    dimensions = len(Settings.embed_model.get_text_embedding("get vector dimensions"))

    if not qdrant_client.collection_exists(collection_name):
        qdrant_client.create_collection(
            collection_name=collection_name,
            vectors_config=VectorParams(
                size=dimensions,
                distance=Distance.COSINE
            )
        )

    return qdrant_client.get_collection(
        collection_name=collection_name
    )

qdrant_client = QdrantClient(
    url=os.environ["QDRANT_URL"],
    api_key=os.environ["QDRANT_API_KEY"]
)

collection = create_qdrant_collection(qdrant_client)

vector_store = QdrantVectorStore(
    client=qdrant_client,
    collection_name=os.environ["QDRANT_COLLECTION_NAME"],
)


### Load documents then create and store embeddings.

In [None]:
storage_ctx=StorageContext.from_defaults(vector_store=vector_store)

reader = GoogleDriveReader(
    folder_id=os.environ["DRIVE_OR_FOLDER_ID"],
    service_account_key_path="gdrive-creds.json",
    file_extractor={
        "docx": DocxReader(),
        "pdf": PDFReader()
    }
)

index = VectorStoreIndex.from_documents(reader.load_data(), storage_context=storage_ctx)
print(index)

### Simple llama-based chat

In [None]:
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
ce = ContextChatEngine.from_defaults(retriever=index.as_retriever())
response = ce.chat("What property is Emily Carter leasing or sub-leasing?")
print(response)

### Governed llama-based chat

We use a `CodfifiedRetriever` to tell Codified to enforce the permissions as they exist in GDrive.

In [None]:
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
codified_client = CodifiedClient(os.environ["CODIFIED_API_KEY"], os.environ["CODIFIED_URL"])
codified_retriever = CodifiedRetriever(codified_client, index.as_retriever())
ce = ContextChatEngine.from_defaults(retriever=codified_retriever)

And we use an `access_context` to declare who is prompting our chat bot; they will only be able to get information that they have access to in GDrive.

In [None]:
with access_context(user_email="eva.a@g3a.io"):
    response = ce.chat("What property is Emily Carter leasing or sub-leasing?")
    print(response)

with access_context(user_email="camila.c@g3a.io"):
    response = ce.chat("What property is Emily Carter leasing or sub-leasing?")
    print(response)

with access_context(user_email="ethan.e@g3a.io"):
    response = ce.chat("What property is Emily Carter leasing or sub-leasing?")
    print(response)

with access_context(user_email="ethan.e@g3a.io"):
    response = ce.chat("Who is the landlord of the property at 123 Adventure Street")
    print(response)