[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/rag-chatbot.ipynb)

# APARAVI RAG as a Service 
Aparavi is a leader in the unstructured data space and its service in locating, quantizing and retrieving of large unstructured data files.
 
This retriever is at your service to leverage Aparavi's RAG intelligence as a service to your LangChain pipeline and will enable you to build a smarter AI system with your local private or company data. 
 
This notebook goes over how to use a retriever that under the hood uses Aparavi's platform and RAG to retrieve the semantically most relevant documents to your query from your file system


Please run 

##################
!pip install -e .
##################

in project directory to install necessary local packages!

In [1]:
# import the retriever from what will be our pip install 
from Aparavi_PublicLangchainRetriever.Aparavi_PublicLangchainRetriever import SemanticSearchRetriever


# get the json library for fomatting
import json


### Define the configuration and login parameters

These parameters are user specific. Given the user has a running aparavi platform on his machine, the configarations can be taken from the URL of the local hosting set up or client server. 

In [18]:
# server & port number
server_url = "localhost"
port = 80

# user data 
user_id = "root"
password = "root"

# Aparavi Platform (Enter the name of your aggregator-collector here)
store_path = '/SM-WS-57002 Aggregator-Collector/'


Next step is to initialize the retriever and log in to the platform obtaining the authentication token

In [19]:
retriever = SemanticSearchRetriever(user_id = 'root', password='root',server_url=server_url,port=port)

Now the setup is complete and the user can start querying for unstructured data that has been previously loaded and vectorized by Aparavi's application. Simply define a user query (potentially targeted to some of your uploaded data) and decode the response as shown below. Results will be listed with an associated score indicating the semantic relevance to the user query. Furthermore, the response contains:

  -  chunk ID: A numeric identifier for the data chunk.
  -  Content: Text content that was vectorized, provided as a payload with the associated vector.
  -  objectId: A unique identifier associated with the document/object of origin.
  -  parent: File path in the file system.
  -  permissionId: Permissions associated with the user.


In [20]:
query='Please tell me about our last Human Ressources Policy Update?'

Execute sematic search through the Aparavi Search Endpoint

In [21]:
response = retriever.search(query=query,store_path=store_path, limit=5)

In [22]:
print(json.dumps(response, indent=4))

{
    "status": "OK",
    "data": [
        {
            "chunk": 490,
            "content": "\u25cf It has raised the self-awareness of people managers of how they personally\n\nimpact upon others \u2013 positively and negatively.\n\u25cf It is supporting a climate of continuous improvement.\n\u25cf It is starting to improve the climate/morale, as measured through our employee\n\nopinion survey.\n\u25cf Focused agenda for development. Forced line managers to discuss development\n\nissues.\n\u25cf Perception of feedback as more valid and objective, leading to acceptance of\n\nresults and actions required.\n\nBut there may be problems. These include:\n\n\u25cf people not giving frank or honest feedback;\n\u25cf people being put under stress in receiving or giving feedback;\n\u25cf lack of action following feedback;\n\u25cf over-reliance on technology;\n\u25cf too much bureaucracy.\n\nThese can all be minimized if not avoided completely by careful design, communica-\ntion, training and

In [None]:
!pip install langchain openai

In [None]:
import os
from langchain.chat_models import ChatOpenAI

from langchain.schema import (
    SystemMessage,
    HumanMessage,
)

os.environ["OPENAI_API_KEY"] = "sk-***********************"

# initialize OpenAi to return a good answer with the augmented prompt
chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)

# augment the user prompt with the knowledge retrieved from the vector DB
def augment_prompt(query: str, context):

    # feed into an augmented prompt
    augmented_prompt = f""" 

    Using the context below pleasse answer this query. 
        
    Contexts:
    {context}

    Query: {query}"""
    return augmented_prompt

messages = [
    SystemMessage(content="You are a helpful assistant. Focus on the context provided."),
    HumanMessage(content=augment_prompt(query, response))
]


# let the ChatGPT contextualize the relevant information to a readable response 
response = chat(messages)
print("Response from ChatGPT augmented with RAG:\n----------------------\n", response.content)