# Implementing a Simple LangChain

Installing the necessary imports

In [7]:
NEW_ENV = True

if NEW_ENV:
    %pip install -q langchain langchain_community langchain_openai langchain_chroma sentence_transformers

import os
import getpass
import pandas as pd

from langchain_openai import ChatOpenAI
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_core.documents.base import Document
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_chroma import Chroma
from langchain.prompts import ChatPromptTemplate
from operator import itemgetter

os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY")

Note: you may need to restart the kernel to use updated packages.


OPENAI_API_KEY ·······························································································


Setting up our OpenAI connection

In [8]:
llm = ChatOpenAI(model="gpt-3.5-turbo")

Import Yelp review data

In [9]:
df = pd.read_csv("/kaggle/input/yelp-reviews/yelp_reviews.csv", encoding="latin-1")
df["reviewer_and_review"] = df["Reviewer Name"] + " | " + df["Review"]

Creating the vector database and embedding the documents

In [10]:
documents = [Document(review) for review in df["reviewer_and_review"]]
embedding_model = "all-MiniLM-L6-v2"
db = Chroma.from_documents(documents, SentenceTransformerEmbeddings(model_name=embedding_model))

  db = Chroma.from_documents(documents, SentenceTransformerEmbeddings(model_name=embedding_model))
  from tqdm.autonotebook import tqdm, trange


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]



1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Creating the retriever

In [11]:
retriever = db.as_retriever(
    search_type="similarity_score_threshold", # customize the search_type
    search_kwargs={
        "score_threshold": 0.1, # set the threshold: all similarity scores above this threshold will be included
        "k": 5 # set the max number of elements to retrieved, regardless of the above threshold.
    }
)

question = "What did customers have to say about the location?"

Writing the system and user prompts and then compiling the two while utilizing a little bit of prompt engineering

In [12]:
system = \
"""
You are a helpful AI bot. You answer a user's question about ice cream store reviews.

You have the retrieved the following reviews from a datasource:

{reviews}

Use these reviews to answer the user's question. Analyze the topic and provide a summary followed by five
quotations from the most relevant reviews for the topic. Include the user's name from each of the five reviews to make it easier to
locate the specific review later.

First include the summary analyzing the users question. This section should have the label "Summary"

Before the quotations delineate the summary and quotations sections by saying "Supporting Reviews"

For the quotations, state the quotation number, state the reviewer, a summary of the reviewers sentiments, and the quotation using the following as an example for formatting:
The quotation should be the most important component of the review that supports the summary of the review.

Review 1
Reviewer: Sherry S.
Summary: Sherry S. expressed frustration with the service at the park road store, stating that the staff ignored customers and did not provide assistance in taking orders.
Quotation: "just stood there for 15 minutes and the 4 people working there didn't make eye contact or help to take order just were carrying on their conversation."

"""

human = \
"""
{question}
"""

prompt = ChatPromptTemplate.from_messages([
    ("system", system),
    ("human", human),
])

#### A tangent on different types of prompt engineering

Prompt engineering is a crucial technique for guiding the responses of large language models (LLMs). By carefully crafting prompts, you can influence the quality and relevance of the model's output. Here are some common types of prompt engineering techniques along with detailed explanations and examples for each:

**1. Assigning Roles**

Assigning roles involves giving the model a specific role to play, which helps guide its responses. This technique can be particularly useful when you want the model to adopt a certain perspective or expertise.

**Example:**
- **Prompt:** "You are an expert data scientist. Explain the concept of overfitting in machine learning."
- **Expected Response:** "Overfitting occurs when a machine learning model captures noise in the training data instead of the underlying pattern. This results in high accuracy on training data but poor generalization to new data."

**2. One-Shot Prompting**

One-shot prompting provides the model with a single example to guide its response. This technique is useful when you want the model to perform a task based on a single instance.

**Example:**
- **Prompt:** "Translate the following English sentence to French: 'The cat is on the roof.' Example: 'The dog is in the garden.' -> 'Le chien est dans le jardin.'"
- **Expected Response:** "'The cat is on the roof.' -> 'Le chat est sur le toit.'"

**3. Few-Shot Prompting**

Few-shot prompting gives the model a few examples to learn from before generating a response. This technique helps the model understand the pattern or task more effectively than one-shot prompting.

**Example:**
- **Prompt:** 
Translate the following English sentences to French:

'The dog is in the garden.' -> 'Le chien est dans le jardin.'
'The bird is in the sky.' -> 'L'oiseau est dans le ciel.'
'The cat is on the roof.' ->

- **Expected Response:** "'Le chat est sur le toit.'"

**4. Chain of Thought (CoT) Prompting**

Chain of Thought (CoT) prompting involves guiding the model to show its reasoning process step-by-step. This technique is particularly useful for tasks that require logical reasoning or multi-step problem-solving.

**Example:**
- **Prompt:** "Solve the following math problem step-by-step: What is 15% of 200?"
- **Expected Response:** Step 1: Convert the percentage to a decimal: 15% = 0.15 Step 2: Multiply the decimal by the number: 0.15 * 200 = 30 Answer: 30

**5. Providing Context**

Providing context involves giving the model additional background information to help it generate a more accurate response. This technique ensures that the model has all the necessary information to understand the query fully.

**Example:**
- **Prompt:** "In the context of machine learning, explain the term 'regularization.'"
- **Expected Response:** "Regularization is a technique used to prevent overfitting by adding a penalty to the model's complexity. This can be done through methods like L1 or L2 regularization, which add a penalty based on the absolute or squared values of the model parameters, respectively."

**6. Multi-Turn Dialogue**

Multi-turn dialogue involves creating a conversation where the model needs to remember and build upon previous interactions. This technique is useful for simulating more natural and coherent conversations.

**Example:**
- **Prompt:** 
User: What is the capital of France? Model: The capital of France is Paris. User: What is the population of Paris?

- **Expected Response:** "The population of Paris is approximately 2.1 million people."

**7. Task-Specific Prompting**

Task-specific prompting involves tailoring the prompt to a specific task, such as summarization, text classification, or any other specialized task. This technique helps the model focus on the particular requirements of the task.

**Example:**
- **Prompt:** "Summarize the following article in one sentence: [Insert article text here]"
- **Expected Response:** "[A concise summary of the article]"

Creating the chain

In [13]:
# function to process retriever output into a string
def docs_to_string(docs_list):
    string = ""
    for doc in docs_list:
        string += doc.page_content + "\n"
    return string

# creating the chain 
"""
itemgetter retrieves the "question" as a string that can be used as input for a retriever
RunnableLambda  converts a function into an element that can be used in a chain
RunnablePassthrough keeps the existing elements (like "documents") and appends the element output
"""
chain = (
        RunnablePassthrough.assign(reviews= itemgetter("question") | retriever | RunnableLambda(docs_to_string)) 
        | RunnablePassthrough.assign(prompt=prompt)
        | RunnablePassthrough.assign(llm_output= itemgetter("prompt") | llm)
        )

output = chain.invoke({"question": question})
print(output["llm_output"].content)

Summary:
Customers had mixed experiences at the ice cream store location, with complaints ranging from poor service and rude behavior to unexpected closures without adequate communication or compensation.

Supporting Reviews:

Review 1
Reviewer: Sherry S.
Summary: Sherry S. expressed frustration with the service at the park road store, stating that the staff ignored customers and did not provide assistance in taking orders.
Quotation: "just stood there for 15 minutes and the 4 people working there didn't make eye contact or help to take order just were carrying on their conversation."

Review 2
Reviewer: Josue L.
Summary: Josue L. mentioned a decline in their favoritism towards the ice cream spot due to rude behavior towards his wife, emphasizing the importance of quality and customer service.
Quotation: "Hey Tutu, you don't get to be rude to my wife and continue to get our business."

Review 3
Reviewer: Sarah M.
Summary: Sarah M. expressed disappointment in finding the ice cream shop 