In [0]:
review_df = spark.table("external.default.trustpilot_reviews_rag")

In [0]:
postive_df = review_df.filter(review_df.sentiment == "POSITIVE").filter(review_df.SUBSCRIBER_SR_KEY.isNotNull()).limit(500)
display(postive_df)

In [0]:
negative_df = review_df.filter(review_df.sentiment == "NEGATIVE").filter(review_df.SUBSCRIBER_SR_KEY.isNotNull()).limit(500)
display(negative_df)

In [0]:
neutral_df = review_df.filter(review_df.sentiment == "NEUTRAL").filter(review_df.SUBSCRIBER_SR_KEY.isNotNull())
display(neutral_df)

In [0]:
final_review_df = postive_df.unionByName(neutral_df).unionByName(negative_df)
display(final_review_df)

In [0]:
# final_review_df.write.mode("overwrite").saveAsTable("external.default.trustpilot_reviews_rag")

#RAG

In [0]:
%pip install mlflow==2.10.1 langchain==0.1.5 databricks-vectorsearch==0.22 databricks-sdk==0.18.0 mlflow[databricks]
dbutils.library.restartPython()

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
Collecting mlflow==2.10.1
  Obtaining dependency information for mlflow==2.10.1 from https://files.pythonhosted.org/packages/e5/10/27177dedbe4ea5ffd81bd90a0149d069f6ba431767f490702198c5021cd5/mlflow-2.10.1-py3-none-any.whl.metadata
  Downloading mlflow-2.10.1-py3-none-any.whl.metadata (13 kB)
Collecting langchain==0.1.5
  Obtaining dependency information for langchain==0.1.5 from https://files.pythonhosted.org/packages/c1/c3/0e59a0c24e0c61b52271445df55302ab2f3dd8489a365721c7ef7ecaba24/langchain-0.1.5-py3-none-any.whl.metadata
  Downloading langchain-0.1.5-py3-none-any.whl.metadata (13 kB)
Collecting databricks-vectorsearch==0.22
  Obtaining dependency information for databricks-vectorsearch==0.22 from https://files.pythonhosted.org/packages/81/37/3106dc8dedbab9b7560f65088aa7f2894ffa75b5f1928bc472bf42f3b936/databricks_vectorsearch-0.22-py3-none-any.whl.meta

In [0]:
import os

host = "https://" + spark.conf.get("spark.databricks.workspaceUrl")
os.environ['DATABRICKS_TOKEN'] = 'ENTER TOKEN'

index_name="external.default.trustpilot_vector_index"
host = "https://" + spark.conf.get("spark.databricks.workspaceUrl")

VECTOR_SEARCH_ENDPOINT_NAME="trustpilot-rag_endpoint"

In [0]:
from databricks.vector_search.client import VectorSearchClient
from langchain_community.vectorstores import DatabricksVectorSearch
from langchain_community.embeddings import DatabricksEmbeddings

embedding_model = DatabricksEmbeddings(endpoint="databricks-gte-large-en")

def get_retriever(persist_dir: str = None):
    os.environ["DATABRICKS_HOST"] = host
    #Get the vector search index
    vsc = VectorSearchClient(workspace_url=host, personal_access_token=os.environ["DATABRICKS_TOKEN"])
    vs_index = vsc.get_index(
        endpoint_name=VECTOR_SEARCH_ENDPOINT_NAME,
        index_name=index_name
    )

    # Create the retriever
    vectorstore = DatabricksVectorSearch(
        vs_index, text_column="TRANSLATION", embedding=embedding_model
    )
    return vectorstore.as_retriever()



In [0]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.chat_models import ChatDatabricks

chat_model = ChatDatabricks(endpoint="databricks-meta-llama-3-1-70b-instruct", max_tokens = 200)

TEMPLATE = """Ypou are an customer service agent who has read all the customer reviews posted on trustpilot related to a telecom company. You are answering question about the reviews and the sentiments about the company. If the question is not related to one of these topics, kindly decline to answer. If you don't know the answer, just say that you don't know, don't try to make up an answer.  Provide all answers only in English and provide answers from the review itself.
Use the following pieces of context to answer the question at the end:
{context}
Question: {question}
Answer:
"""
prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "question"])

chain = RetrievalQA.from_chain_type(
    llm=chat_model,
    chain_type="stuff",
    retriever=get_retriever(),
    chain_type_kwargs={"prompt": prompt}
)



[NOTICE] Using a Personal Authentication Token (PAT). Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().




In [0]:
question = {"query": "how is the lebara sim order exeperiece"}
answer = chain.run(question)
print(answer)

According to the reviews, the Lebara SIM order experience is mixed. One reviewer found it "quick and easy" and received a clear confirmation email right away. However, another reviewer had a poor experience, ordering a subscription in early December but not receiving the SIM card by January 19th, and had difficulty contacting the company.


# AUTO REPLY

In [0]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.chat_models import ChatDatabricks

chat_model = ChatDatabricks(endpoint="databricks-meta-llama-3-1-70b-instruct", max_tokens = 200)

TEMPLATE = """You are an customer service agent who has read all the customer reviews posted on trustpilot related to a telecom company. You are answerwing to a new review that customer has posted. Understand the question and use the knowledge from the exisitng review.If the question is not related to one of these topics, kindly decline to answer. If you don't know the answer, just say that you don't know, don't try to make up an answer.  Provide all answers only in English and provide answers from the review itself.
Use the following pieces of context to answer the question at the end:
{context}
Question: {question}
Answer:
"""
prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "question"])

chain = RetrievalQA.from_chain_type(
    llm=chat_model,
    chain_type="stuff",
    retriever=get_retriever(),
    chain_type_kwargs={"prompt": prompt}
)



[NOTICE] Using a Personal Authentication Token (PAT). Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().




In [0]:
question = {"query": "I am facing problems with the internet connection. What can I do?"}
answer = chain.run(question)
print(answer)

I understand your frustration with the internet connection issues. Based on previous reviews, it seems that some customers have experienced similar problems, especially when abroad. One customer mentioned that the internet didn't work in Belgium and Germany, but it started working after some time. Another customer mentioned that the internet connection is bad when outside.

However, I don't have a specific solution to offer as the reviews don't provide a clear resolution to this issue. I would recommend contacting our technical support team directly to report the issue and they will be able to assist you further. They may be able to provide a more detailed solution or send a technician to resolve the issue.
