# Langchain introduction

In [1]:
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, NonNegativeInt
from typing import List
from random import sample 

First, let's create a loader and load reviews from tv-reviews.csv into memory

In [2]:
# TODO: load reviews from tv-reviews.csv
from langchain.document_loaders.csv_loader import CSVLoader
data = CSVLoader("./data/tv-reviews.csv").load()

Then, let's initialize our LLM

In [3]:
model_name = "gpt-3.5-turbo"
temperature = 0.0
llm = OpenAI(model_name=model_name, temperature=temperature, max_tokens=500)



Now, let's setup our parser and a template

In [4]:
class ReviewSentiment(BaseModel):
    positives: List[NonNegativeInt] = Field(
        description="index of a positive TV review, starting from 0"
    )
    negatives: List[NonNegativeInt] = Field(
        description="index of a negative TV review, starting from 0"
    )


parser = PydanticOutputParser(pydantic_object=ReviewSentiment)
print(parser.get_format_instructions())

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"positives": {"title": "Positives", "description": "index of a positive TV review, starting from 0", "type": "array", "items": {"type": "integer", "minimum": 0}}, "negatives": {"title": "Negatives", "description": "index of a negative TV review, starting from 0", "type": "array", "items": {"type": "integer", "minimum": 0}}}, "required": ["positives", "negatives"]}
```


In [5]:
# TODO: setup a template with partial and input variables
prompt = PromptTemplate(
    template="{question}\n{format_instructions}\nContext: {context}",
    input_variables=["question", "context"],
    partial_variables={"format_instructions": parser.get_format_instructions()}
)

Pick 3 sample reviews to classify - LLMs have a limited context window they can work with. In later exercises, we'll see how to deal with that differently

In [6]:
sample(data, k=3)

[Document(page_content='TV Name: Imagix Pro\nReview Title: Easy Setup and Navigation\nReview Rating: 9\nReview Text: Setting up the Imagix Pro was a breeze. The instructions were clear and the TV guided me through the process smoothly. The interface is intuitive and easy to navigate. I love how seamless it is to switch between different apps and inputs. This TV has made my life so much simpler!', metadata={'source': './data/tv-reviews.csv', 'row': 4}),
 Document(page_content="TV Name: VisionMax Ultra\nReview Title: Disappointing Sound\nReview Rating: 5\nReview Text: While the picture quality of the VisionMax Ultra is exceptional, the sound quality falls short. The built-in speakers lack depth and the audio feels hollow. I had to connect external speakers to enjoy a fulfilling audio experience. It's a letdown considering the overall performance of the TV.", metadata={'source': './data/tv-reviews.csv', 'row': 11}),
 Document(page_content="TV Name: VisionMax Ultra\nReview Title: Immersive

In [7]:
# TODO: pick 3 random reviews and save them into reviews_to_classify variable
reviews_to_classify = sample(data, k=3)

## generate textual prompt from the prompt template
question = """
    Review TVs provided in the context. 
    Only use the reviews provided in this context, do not make up new reviews or use any existing information you know about these TVs. 
    If there are no positive or negative reviews, output an empty JSON array. 
"""
query = prompt.format(context = context, question = question)

In [8]:
question = """
    Review TVs provided in the context. 
    Only use the reviews provided in this context, do not make up new reviews or use any existing information you know about these TVs. 
    If there are no positive or negative reviews, output an empty JSON array. 
"""
context = "\n".join(review.page_content for review in reviews_to_classify)

query = prompt.format(context=context, question=question)
print(query)


    Review TVs provided in the context. 
    Only use the reviews provided in this context, do not make up new reviews or use any existing information you know about these TVs. 
    If there are no positive or negative reviews, output an empty JSON array. 

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"positives": {"title": "Positives", "description": "index of a positive TV review, starting from 0", "type": "array", "items": {"type": "integer", "minimum": 0}}, "negatives": {"title": "Negatives", "description": "index of a negative TV review, starting from 0", "type": "a

Finally, let's send our query to LLM and use the parser we setup to parse an output into a Python object

In [9]:
output = llm(query)
print(output)


  warn_deprecated(


{
    "positives": [1, 2],
    "negatives": [0]
}


In [10]:
result = parser.parse(output)
result

ReviewSentiment(positives=[1, 2], negatives=[0])

In [11]:
# TODO: query LLM, then parse output into the result variable
print("Positives:\n" + "\n".join([reviews_to_classify[i].page_content for i in result.positives]))

Positives:
TV Name: Imagix Pro
Review Title: Outstanding Value for Money
Review Rating: 9
Review Text: The Imagix Pro is a fantastic value for money. Considering its high-quality performance, impressive features, and sleek design, it offers more bang for the buck compared to other TVs in the market. I am extremely satisfied with my purchase.
TV Name: Imagix Pro
Review Title: Impressive Features
Review Rating: 8
Review Text: The Imagix Pro is packed with impressive features that enhance my viewing experience. The smart functionality allows me to easily stream my favorite shows and movies. The remote control is user-friendly and has convenient shortcuts. The slim design is sleek and fits perfectly in my living room. The only downside is that the sound could be better, but overall, I'm satisfied.


In [12]:
print(
    "Negatives:\n"
    + "\n".join([reviews_to_classify[i].page_content for i in result.negatives])
)

Negatives:
TV Name: VisionMax Ultra
Review Title: Insufficient HDMI Ports
Review Rating: 6
Review Text: One downside of the VisionMax Ultra is the limited number of HDMI ports. With the increasing number of HDMI devices, it's frustrating to constantly switch cables. I wish there were more ports to accommodate all my devices without the need for an HDMI switcher.


# Add semantic Search using RAG

In [6]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain import LLMChain
from langchain.chains.question_answering import load_qa_chain

use a Text Splitter to split the documents into chunks

In [19]:
model_name = "gpt-3.5-turbo"
temperature = 0.0
llm = OpenAI(model_name=model_name, temperature=temperature, max_tokens=2000)



In [20]:
data = CSVLoader("./data/tv-reviews.csv").load()
text_splitter = CharacterTextSplitter(
    chunk_size=1000, chunk_overlap=0
)

documents = text_splitter.split_documents(data)

In [21]:
len(documents)

20

Initialize your embeddings model

In [22]:
underlying_embeddings = OpenAIEmbeddings()

Populate your vector database with the chunks

In [23]:
db = Chroma.from_documents(documents, OpenAIEmbeddings())

In [30]:
query = """
    Based on the reviews in the context, tell me what people liked about the picture quality.
    Make sure you do not paraphrase the reviews, and only use the information provided in the reviews.
    """
# find top 5 semantically similar documents to the query
docs = db.similarity_search(query, 5)

In [31]:
print(len(docs))

5


In [32]:
print(docs[0].page_content)

TV Name: Imagix Pro
Review Title: Amazing Picture Quality
Review Rating: 9
Review Text: I recently purchased the Imagix Pro and I am blown away by its picture quality. The colors are vibrant and the images are crystal clear. It feels like I'm watching movies in a theater! The sound is also impressive, creating a truly immersive experience. Highly recommended!


Query your LLM with the query and the top 5 documents

In [33]:
prompt = PromptTemplate(
    template="{query}\Context: {context}", input_variables=["query", "context"]
)

chain = load_qa_chain(llm, prompt=prompt, chain_type="stuff")
print(chain.run(input_documents=docs, query=query))

People liked the vibrant colors, crystal clear images, and unmatched clarity of the picture quality on the Imagix Pro TV. They mentioned that it felt like watching movies in a theater and that every detail was sharp and lifelike, enhancing their overall viewing experience.


Use rag chain

In [35]:
rag = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type='stuff',
    retriever=db.as_retriever()
    
)
print(rag.run(query))

People liked the vibrant colors, crystal clear images, and unmatched clarity of the picture quality on the Imagix Pro TV.


# Memory

In [1]:
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.schema import AIMessage, HumanMessage, SystemMessage
from langchain.memory import ConversationSummaryMemory, ConversationBufferMemory, CombinedMemory, ChatMessageHistory
from langchain.chains import ConversationChain
from typing import Any, Dict, Optional, Tuple

import requests

In [2]:
# Code to get the movie plot from Wikipedia
def get_movie_plot(movie_name):
    headers = {"User-Agent": "MoviePlotFetcher/1.0"}

    base_url = f"https://en.wikipedia.org/w/api.php"

    def is_movie_page(title):
        params = {
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "categories|revisions",
            "rvprop": "content",
            "cllimit": "max",
        }

        response = requests.get(base_url, headers=headers, params=params)
        data = response.json()

        try:
            page = list(data["query"]["pages"].values())[0]

            # Check categories for Movie indication
            categories = [cat["title"] for cat in page.get("categories", [])]
            for category in categories:
                if "films" in category.lower():
                    return True

            # Check for infobox movie in the page content
            content = page["revisions"][0]["*"]
            if "{{Infobox film" in content:
                return True

        except Exception as e:
            pass

        return False

    def extract_plot_from_text(full_text):
        try:
            # Find the start of the Plot section
            plot_start = full_text.index("== Plot ==") + len("== Plot ==")

            # Find the start of the next section
            next_section_start = full_text.find("==", plot_start)

            # If no next section is found, use the end of the text
            if next_section_start == -1:
                next_section_start = len(full_text)

            # Extract the plot text and strip leading/trailing whitespace
            plot_text = full_text[plot_start:next_section_start].strip()

            # Return the extracted plot
            return plot_text

        except ValueError:
            # Return a message if the Plot section isn't found
            return "Plot section not found in the text."

    def extract_first_paragraph(full_text):
        # Find the first double newline
        end_of_first_paragraph = full_text.find("\n\n")

        # If found, slice the string to get the first paragraph
        if end_of_first_paragraph != -1:
            return full_text[:end_of_first_paragraph].strip()

        # If not found, return the whole text as it might be just one paragraph
        return full_text.strip()

    search_params = {
        "action": "query",
        "format": "json",
        "list": "search",
        "srsearch": movie_name,
        "utf8": 1,
        "srlimit": 5,  # Top 5 search results
    }

    response = requests.get(base_url, headers=headers, params=search_params)
    data = response.json()

    # Go through top search results to find a movie page
    for search_result in data["query"]["search"]:
        title = search_result["title"]
        if is_movie_page(title):
            # Fetch plot for the movie page
            plot_params = {
                "action": "query",
                "format": "json",
                "titles": title,
                "prop": "extracts",
                "explaintext": True,
            }

            plot_response = requests.get(base_url, headers=headers, params=plot_params)
            plot_data = plot_response.json()

            try:
                page = list(plot_data["query"]["pages"].values())[0]
                full_text = page.get("extract", "No text...")
                return f"""Overview:\n{extract_first_paragraph(full_text)}\nPlot:\n{extract_plot_from_text(full_text)}""".strip()
            except:
                return "Error fetching plot."

    return "Movie not found."

In [3]:
model_name = "gpt-3.5-turbo"
temperature = 0.0
llm = OpenAI(model_name=model_name, temperature=temperature, max_tokens=2000)



Now, let's setup some personal q/a over your movie preferences. Feel free to pick whichever questions you think will allow 
LLL to predict the movies you'll like

In [4]:
# update these questions as you think will be the most helpful for your AI recommender
personal_questions = [
    "Which movie genre you like the most?",
    "What is your favorite color?",
    "What is your favorite movie?",
    "Pick one - dogs, cats or hamsters?",
    "What is your favorite food?",
    "What is your favorite drink?",
]

# personal_answers = [ ]

# for question in personal_questions:
#    answer = input(question)
#    personal_answers.append(answer)

# list of your personal answers to the questions above
personal_answers = [
    "Fantasy",
    "blue",
    "The lord of the rings",
    "dogs",
    "pasta",
    "coffe",
]

In [5]:
# list of recent movies that you'd like AI to consider when recommending a movie
movies = ["Barbie", "Oppenheimer", "The Notebook", "Dumb Money"]

Now, let's setup a chat history between you and AI where we provide your answers to the questions AI "asked"

In [9]:
history = ChatMessageHistory()
history.add_user_message(
    f"""You are AI that will recommend user a movie based on their answers 
    to personal questions. 
    Ask user {len(personal_questions)} questions"""
)

# add questions and answers to the history
for question, answer in zip(personal_questions, personal_answers):
    history.add_ai_message(question)
    history.add_user_message(answer)

history.add_ai_message(
    """Now tell me a plot summary of a movie you're considering watching, 
    and specify how you want me to respond to you with the movie rating"""
)

In [10]:
print(history)

Human: You are AI that will recommend user a movie based on their answers 
    to personal questions. 
    Ask user 6 questions
AI: Which movie genre you like the most?
Human: Fantasy
AI: What is your favorite color?
Human: blue
AI: What is your favorite movie?
Human: The lord of the rings
AI: Pick one - dogs, cats or hamsters?
Human: dogs
AI: What is your favorite food?
Human: pasta
AI: What is your favorite drink?
Human: coffe
AI: Now tell me a plot summary of a movie you're considering watching, 
    and specify how you want me to respond to you with the movie rating


Now, we want to load movie plots from Wikipedia, pass them to LLM and see how it would the movie for us based on our personal q/a
Holding all movie plots and their recommendations within conversation would eventually put us over max tokens limit, so let's create a ConversationSummaryMemory 
that would hold a summary of our conversation and AI recommendations

In [11]:

max_rating = 100
summary_memory = ConversationSummaryMemory(
    llm=llm,
    memory_key="recommendation_summary",
    input_key="input",
    buffer=f"The human answered {len(personal_questions)} personal questions). Use them to rate, from 1 to {max_rating}, how much they like a movie they describe to you.",
    return_messages=True,
)

Create a memory that will have a summary of the recommendations.
it'll track of the onngoing conversation state.

we'll forget human messages passing the initial ones, we initialize with. Because the plots are too long.

We do this by overriding the save_context function and ignoring the inputs or human messages.

In [12]:
# you could choose to store some of the q/a in memory as well, in addition to original questions
# it'll keep track of llm responses
class MementoBufferMemory(ConversationBufferMemory):
    def save_context(self, inputs: Dict[str, Any], outputs: Dict[str, str]) -> None:
        input_str, output_str = self._get_input_output(inputs, outputs)
        self.chat_memory.add_ai_message(output_str)


conversational_memory = MementoBufferMemory(
    chat_memory=history, memory_key="questions_and_answers", input_key="input"
)



we'll setup a combined memory which will forward each message in our ongoing conversation into both: conversational and summary memories.

In [13]:
# Combined
memory = CombinedMemory(memories=[conversational_memory, summary_memory])

Now, let's create a PromptTemplate that would hold continuously updating summary of our conversation, our personal Q/A, and a placeholder for movie plot for AI to rate.
Think about how you can pass your questions and answers into the template - there are many different ways to do it

In [14]:
RECOMMENDER_TEMPLATE = """
The following is a friendly conversation between a human and an AI Movie Recommender. 
The AI is follows human instructions and provides movie ratings for a human 
based on the movie plot. 

Summary of Recommendations:
{recommendation_summary}
Personal Questions and Answers:
{questions_and_answers}
Human: {input}
AI:
"""
PROMPT = PromptTemplate(
    input_variables=["recommendation_summary", "input", "questions_and_answers"],
    template=RECOMMENDER_TEMPLATE
)
# create a recommendation conversation chain that will let us ask AI for recommendations on all movies
recommender = ConversationChain(llm=llm, verbose=True, memory=memory, prompt=PROMPT)

Let's go thru a list of our movies, fetch their plots and run our recommendation chain for one movie at a time so we don't overload the context window

In [17]:
max_rating = 100
for movie in movies:
    print("Movie: " + movie)
    movie_plot = get_movie_plot(movie)

    plot_rating_instructions = f"""
         =====================================
        === START MOVIE PLOT SUMMARY FOR {movie} ===
        {movie_plot}
        === END MOVIE PLOT SUMMARY ===
        =====================================
        
        RATING INSTRUCTIONS THAT MUST BE STRICTLY FOLLOWED:
        AI will provide a highly personalized rating based only on the movie plot summary human provided 
        and human answers to questions included with the context. 
        AI should be very sensible to human personal preferences captured in the answers to personal questions, 
        and should not be influenced by anything else.
        AI will also build a persona for human based on human answers to questions, and use this persona to rate the movie.
        OUTPUT FORMAT:
        First, include that persona you came up with in the explanation for the rating. Describe the persona in a few sentences.
        Explain how human preferences captured in the answers to personal questions influenced creation of this persona.
        In addition, consider other ratings for this human that you might have as they might give you more information about human's preferences.
        Your goal is to provide a rating that is as close as possible to the rating human would give to this movie.
        Remember that human has very limited time and wants to see something they will like, so your rating should be as accurate as possible.
        Rating will range from 1 to {max_rating}, with {max_rating} meaning human will love it, and 1 meaning human will hate it. 
        You will include a logical explanation for your rating based on human persona you've build and human responses to questions.
        YOUR REVIEW MUST END WITH TEXT: "RATING FOR MOVIE {movie} is " FOLLOWED BY THE RATING.
        FOLLOW THE INSTRUCTIONS STRICTLY, OTHERWISE HUMAN WILL NOT BE ABLE TO UNDERSTAND YOUR REVIEW.
    """
    # TODO: run the the recommendation chain to get a rating for the movie that will be summarized in the conversation summary
    prediction = recommender.predict(input=plot_rating_instructions)
    print(prediction)

Movie: Barbie




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
The following is a friendly conversation between a human and an AI Movie Recommender. 
The AI is follows human instructions and provides movie ratings for a human 
based on the movie plot. 

Summary of Recommendations:
[SystemMessage(content='The human answered 6 personal questions and provided a detailed plot summary of the movie "Barbie." Based on the human\'s preferences for fantasy movies, the color blue, dogs, pasta, and coffee, as well as their favorite movie being "The Lord of the Rings," the AI created a persona for the human. The persona enjoys imaginative and adventurous stories, vibrant colors, dogs, comforting meals, and coffee. Considering this persona, the AI rated the movie "Barbie" a 85 out of 100, as it aligns well with the human\'s preferences for fantasy, adventure, and thought-provoking themes. Based on the plot summary of "Oppenheimer," an epic biographical thriller film, the

We've initialized our chain to run in a verbose mode, and we will see full text that gets sent to the LLM
Note how the summary keeps updating after each movie is recommended.
Finally, once AI has rated all the movies, let's ask for the final recommendation

In [18]:
final_recommendation = """Now that AI has rated all the movies, AI will recommend human the one that human will like the most. 
                            AI will respond with movie recommendation, and short explanation for why human will like it over all other movies. 
                            AI will not include any ratings in your explanation, only the reasons why human will like it the most.
                            However, the movie you will pick must be one of the movies you rated the highest.
                            For example, if you rated one movie 65, and the other 60, you will recommend the movie with rating 65 because rating 65 
                            is greate than rating of 60 ."""

# run recommendation once more to get the final movie recommendation
prediction = recommender.predict(input=final_recommendation)
print(prediction)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
The following is a friendly conversation between a human and an AI Movie Recommender. 
The AI is follows human instructions and provides movie ratings for a human 
based on the movie plot. 

Summary of Recommendations:
[SystemMessage(content='The human answered 6 personal questions and provided a detailed plot summary of the movie "Barbie." Based on the human\'s preferences for fantasy movies, the color blue, dogs, pasta, and coffee, as well as their favorite movie being "The Lord of the Rings," the AI created a persona for the human. The persona enjoys imaginative and adventurous stories, vibrant colors, dogs, comforting meals, and coffee. Considering this persona, the AI rated the movie "Barbie" a 85 out of 100, as it aligns well with the human\'s preferences for fantasy, adventure, and thought-provoking themes. Based on the plot summary of "Oppenheimer," an epic biographical thriller film, the