# **Data Semantics Project**<br/>
**Master's Degree in Data Science (A.Y. 2023/2024)**<br/>
**University of Milano - Bicocca**<br/>

By: Muluken Bogale Megersa - 919654

In [8]:
!pip install langchain langchain-community langchainhub neo4j langchain-community python-dotenv youtube-search scikit-learn pandas ollama

Collecting ollama
  Downloading ollama-0.2.1-py3-none-any.whl (9.7 kB)
Installing collected packages: ollama
Successfully installed ollama-0.2.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [71]:
import pandas as pd
from neo4j import GraphDatabase, RoutingControl
from langchain_community.llms import Ollama
from langchain.prompts import PromptTemplate
from langchain.schema import StrOutputParser
from langchain.output_parsers.json import SimpleJsonOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain import hub
from langchain.tools import Tool
from langchain.agents import AgentExecutor, create_react_agent
from pprint import pprint

## Load dataset

The movie dataset contains 9125 movies,15000 actors and over 100,000 user rating

### To insert the dataset from csv to neo4j Database execute the following command inside neo4j

```cypher
LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/MulukenMegersa/data-Semantics-project/main/Movie.csv' AS row
MERGE (m:Movie {id: toInteger(row.movieId)})
SET
    m.movieId = row.movieId,
    m.title = row.title,
    m.budget = toFloat(row.budget),
    m.countries = split(row.countries, '|'),
    m.movie_imdbId = toInteger(row.movie_imdbId),
    m.imdbRating = toFloat(row.imdbRating),
    m.imdbVotes = toInteger(row.imdbVotes),
    m.languages = split(row.languages, '|'),
    m.plot = row.plot,
    m.movie_poster = row.movie_poster,
    m.released = row.released,
    m.revenue = toFloat(row.revenue),
    m.runtime = toInteger(row.runtime),
    m.movie_tmdbId = toInteger(row.movie_tmdbId),
    m.movie_url = row.movie_url,
    m.year = toInteger(row.year),
    m.genres = split(row.genres, '|')
RETURN
    m.title AS title,
    m.movie_imdbId AS imdbId,
    m.languages AS languages,
    m.genres AS genres
LIMIT 5


### Fetch sample data and count

In [35]:
URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "adminadmin")

def fetch_movies(driver):
    with driver.session() as session:
        # Fetching details of 10 movies
        movies_result = session.run("MATCH (m:Movie) "
                                    "RETURN m.title AS title, m.plot AS plot, m.imdbRating AS imdbRating, m.genres AS genres "
                                    "LIMIT 10")
        movies = [{"title": record["title"], "plot": record["plot"], "imdbRating":record["imdbRating"],"genres":record["genres"]} for record in movies_result]
        
        count_result = session.run("MATCH (m:Movie) RETURN count(m) AS total")
        total_count = count_result.single()["total"]
        
        return movies, total_count

if __name__ == "__main__":
    driver = GraphDatabase.driver(URI, auth=AUTH)
    try:
        movies, total_movies_count = fetch_movies(driver)
        print(f"Total Movies: {total_movies_count}")
        for movie in movies:
            print(f'Title: {movie["title"]},IMDB Rating: {movie["imdbRating"]}, Plot: {movie["plot"]} ')
    except Exception as e:
        print(f"An error occurred: {e}")

Total Movies: 9125
Title: Toy Story,IMDB Rating: 8.3, Plot: A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room. 
Title: Jumanji,IMDB Rating: 6.9, Plot: When two kids find and play a magical board game, they release a man trapped for decades in it and a host of dangers that can only be stopped by finishing the game. 
Title: Grumpier Old Men,IMDB Rating: 6.6, Plot: John and Max resolve to save their beloved bait shop from turning into an Italian restaurant, just as its new female owner catches Max's attention. 
Title: Waiting to Exhale,IMDB Rating: 5.6, Plot: Based on Terry McMillan's novel, this film follows four very different African-American women and their relationships with the male gender. 
Title: Father of the Bride Part II,IMDB Rating: 5.9, Plot: In this sequel, George Banks deals not only with the pregnancy of his daughter, but also with the unexpected pregnancy of his wife. 
Title: Heat,IMDB Rating: 8.2, Plot:

## Generating the Embeddings and loading to the DB



Embeddings are high-dimensional vector representations of text data. In the context of LLMs such as GPT or BERT, embeddings transform words, phrases, or entire sentences from sparse, discrete items into continuous, dense vectors.


### Create embeddings for movies plot

In [None]:
import pandas as pd
from ollama import embeddings
movies_data = pd.read_csv("Movie.csv")

phi3_embeddings_result = []

for index, row in movies_data.iterrows():
    plot = row["plot"]
    if index % 1000 == 0:
        df = pd.DataFrame(phi3_embeddings_result)
        df.to_csv(f"phi3_embeddings_result_{index}.csv", index=False)
        phi3_embeddings_result = []
    if pd.isnull(plot):
        continue
    response = embeddings(model="phi3",prompt=plot)
    embedding = response["embedding"]
    phi3_embeddings_result.append({"movieId":row["movieId"],"embedding":embedding})

### Loading Embeddings
The embeddings will be stored as a .plotEmbedding property on the (:Movie) node.

```cypher
LOAD CSV WITH HEADERS
FROM 'https://github.com/MulukenMegersa/data-Semantics-project/blob/main/phi3_embeddings_result_1000.csv'
AS row
MATCH (m:Movie {movieId: row.movieId})
CALL db.create.setNodeVectorProperty(m, 'plotEmbedding', apoc.convert.fromJsonList(row.embedding))
RETURN count(*)
```

In [42]:
def fetch_movies_with_embeddings(driver):
    with driver.session() as session:
        movies_result = session.run("MATCH (m:Movie) "
                                    "RETURN m.title AS title, m.plotEmbedding AS plotEmbedding, m.plot AS plot "
                                    "LIMIT 5")
        movies = [{"title": record["title"], "plot": record["plot"], "plotEmbedding":record["plotEmbedding"]} for record in movies_result]
        
        return movies
    
movies = fetch_movies_with_embeddings(driver)
for movie in movies:
    print(f'Title: {movie["title"]},Plot: {movie["plot"]},plotEmbedding: {movie["plotEmbedding"]}')

Title: Toy Story,Plot: A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.,plotEmbedding: [-0.2435668706893921, 1.846290111541748, 0.7841094136238098, 0.47806277871131897, 0.8593905568122864, 0.5861293077468872, -0.8835827708244324, -0.10235434025526047, -0.9750546813011169, 0.2578018009662628, -0.43450838327407837, -0.6401787996292114, -1.6327056884765625, 1.687545895576477, -0.7954378724098206, 0.08971946686506271, -1.9314931631088257, -0.34879347681999207, -0.49336785078048706, 0.1038694754242897, 0.34174203872680664, 0.2648836374282837, -1.039591670036316, -0.5829447507858276, -0.5959696769714355, 0.2329602986574173, 0.5659186840057373, 0.20441265404224396, 0.38124603033065796, 0.9228402376174927, 0.3534991145133972, 0.1426774263381958, -0.6168585419654846, -0.1710306853055954, 0.08335545659065247, -1.0592646598815918, -0.19807010889053345, -0.39057841897010803, 0.14432694017887115, -0.565385639667511, -0.455554008

### Creating the Vector Index

You will need to create a vector index to search across these embeddings.

You will use the CREATE VECTOR INDEX Cypher statement to create the index:

```Cypher
CREATE VECTOR INDEX moviePlots IF NOT EXISTS
FOR (m:Movie)
ON m.plotEmbedding
OPTIONS {indexConfig: {
 `vector.dimensions`: 3072,
 `vector.similarity_function`: 'cosine'
}}
```

- The dimension of the embedding e.g. Ollama phi3 embeddings consist of 3072 dimensions.
- vector.similarity_function - The similarity function to use when comparing values in this index - this can be euclidean or cosine.
- Generally, cosine will perform best for text embeddings, but you may want to experiment with other functions.

### Querying Vector Indexes

In [45]:
def find_similar_movies(title):
    with driver.session() as session:
        result = session.run("""
                MATCH (m:Movie {title: $title})
                CALL db.index.vector.queryNodes('moviePlots', 6, m.plotEmbedding)
                YIELD node, score
                RETURN node.title AS title, node.plot AS plot, score
                """, title=title)
        return [{"title": record["title"], "plot": record["plot"], "score": record["score"]} for record in result]
    
title = "Toy Story"
similar_movies = find_similar_movies(title)
for movie in similar_movies:
    print(f'Title: {movie["title"]}, Score: {movie["score"]}, Plot: {movie["plot"]}')

Title: Toy Story, Score: 0.9999998807907104, Plot: A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.
Title: Oliver & Company, Score: 0.9399223327636719, Plot: A lost and alone kitten joins a gang of dogs engaged in petty larceny in New York.
Title: Charlotte's Web, Score: 0.9387655854225159, Plot: A gentle and wise grey spider with a flair for promotion pledges to save a young pig from slaughter for dinner food.
Title: Failure to Launch, Score: 0.9384971261024475, Plot: A thirtysomething slacker suspects his parents of setting him up with his dream girl so he'll finally vacate their home.
Title: Good Girl, The, Score: 0.9368737936019897, Plot: A discount store clerk strikes up an affair with a stock boy who considers himself the incarnation of Holden Caulfield.
Title: Nothing But Trouble, Score: 0.9353704452514648, Plot: A businessman finds he and his friends the prisoners of a sadistic judge and his equally odd fami

## Create a Langchain application

- LangChain is an open-source framework designed to simplify the development of applications powered by large language models (LLMs). 

- the key aspects of LangChain it facilitates the creation of context-aware, reasoning applications that leverage LLMs, company data, and APIs

In [54]:
# Creating a template for a movie recommender using cockney rhyming slang
template = PromptTemplate(template="""
You're a cheeky movie vendor in the heart of London's East End.
Your task is to help your customers pick a flick that suits their fancy, using a bit of the old cockney rhyming slang.

Here's a bit of a chinwag about the following movie: {movie}
""", input_variables=["movie"])


llm = Ollama(
    model="phi3",
    temperature=0
)  # assuming you have Ollama installed and have llama3 model pulled with `ollama pull llama3 `

response = llm.invoke(template.format(movie="Toy Story"))
pprint(response)

(" Well, matey, if you're lookin' for a yarn to spin 'round your ol' noggin, "
 'let me tell ya \'bout this little gem called "Toy Story." Now listen up!\n'
 '\n'
 "You see, it's all about these playthings – or as we say in the East End, "
 '"toys" – who come alive when no human bein\'s lookin\'. It\'s a proper hoot '
 'and a half, I tell ya. The main blokes are Buzz Lightyear, an astronaut toy '
 'with dreams of being a real space cadet, and his pal Woody, a cowboy doll '
 "that's all about the good ol' days on the ranch.\n"
 '\n'
 "Now, these two get into a bit o' trouble when they find themselves in a "
 'sticky wicket – or as we say here, "a right old pickle." They go '
 "head-to-head for Buzz's affection and end up causing chaos among their toy "
 "pals. But fear not! In the end, 'em all learn that friendship is what truly "
 "matters, no matter how many years pass by or if you're made of plastic or "
 'wood.\n'
 '\n'
 "So, if you fancy a good laugh with some heartfelt moments and

- Prompt templates allow you to create reusable instructions or questions. You can use them to create more complex or structured input for the LLM.

### Chains

Chains allows you to combine language models with different data sources and third-party APIs.

In [55]:
llm_chain = template | llm | StrOutputParser()

response = llm_chain.invoke({"movie": "Toy Story"})

pprint(response)

(" Well, matey, if you're lookin' for a yarn to spin 'round your ol' noggin, "
 'let me tell ya \'bout this little gem called "Toy Story." Now listen up!\n'
 '\n'
 "You see, it's all about these playthings – or as we say in the East End, "
 '"toys" – who come alive when no human bein\'s lookin\'. It\'s a proper hoot '
 'and a half, I tell ya. The main blokes are Buzz Lightyear, an astronaut toy '
 'with dreams of being a real space cadet, and his pal Woody, a cowboy doll '
 "that's all about the good ol' days on the ranch.\n"
 '\n'
 "Now, these two get into a bit o' trouble when they find themselves in a "
 'sticky wicket – or as we say here, "a right old pickle." They go '
 "head-to-head for Buzz's affection and end up causing chaos among their toy "
 "pals. But fear not! In the end, 'em all learn that friendship is what truly "
 "matters, no matter how many years pass by or if you're made of plastic or "
 'wood.\n'
 '\n'
 "So, if you fancy a good laugh with some heartfelt moments and

In [56]:
template = PromptTemplate.from_template("""
You're a cheeky movie vendor in the heart of London's East End.
Your task is to help your customers pick a flick that suits their fancy, using a bit of the old cockney rhyming slang.


Output JSON as {{"description": "your response here"}}

Here's a bit of a chinwag about the following movie: {movie}
""")

llm_chain = template | llm | StrOutputParser()

response = llm_chain.invoke({"movie": "Toy Story"})

pprint(response)

(' {\n'
 '  "description": "Right, mate! If ya want to watch something that\'ll make '
 "yer heart go 'way-ya', then you gotta check out this classic. It's called "
 "'Toy Story'. Now listen up, it's all about these playthings come alive when "
 "no human bein's lookin'. They've got a bit of an adventure on their hands, "
 "with one toy tryin' to prove he ain't just a bunch o' bollocks. It's a "
 "proper ripper! So if you fancy some 'bobbins', head down to the local cinema "
 'and catch it on the big screen."\n'
 '}')


### Create a chat model

- Chat models are designed to have conversations - they accept a list of messages and return a conversational response.

Chat models typically support different types of messages:

1. System - System messages instruct the LLM on how to act on human messages

2. Human - Human messages are messages sent from the user

2. AI - Responses from the AI are called AI Responses



In [57]:
chat_llm = ChatOllama(
    model="phi3",
    temperature=0
)

instructions = SystemMessage(content="""
You are a surfer dude, having a conversation about the surf conditions on the beach.
Respond using surfer slang.
""")

question = HumanMessage(content="What is the weather like?")

response = chat_llm.invoke([
    instructions,
    question
])

pprint(response)

AIMessage(content=" Dude, it's totally gnarly out there! The sun's shining bright and the waves are looking pretty sick today. Perfect swell for some epic barrel rides. You ready to catch some serious waves? Let's hit the beach and ride this wave of awesomeness together!", response_metadata={'model': 'phi3', 'created_at': '2024-07-15T13:45:08.206578Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 4511517541, 'load_duration': 13395625, 'prompt_eval_count': 42, 'prompt_eval_duration': 1027866000, 'eval_count': 68, 'eval_duration': 3467435000}, id='run-43121bd8-968e-47bb-9f4e-a9da919c6d38-0')


In [58]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a surfer dude, having a conversation about the surf conditions on the beach. Respond using surfer slang.",
        ),
        (
            "human", 
            "{question}"
        ),
    ]
)

chat_chain = prompt | chat_llm | StrOutputParser()

response = chat_chain.invoke({"question": "What is the weather like?"})

pprint(response)

(" Dude, it's totally gnarly out there! The sun's shining bright and the waves "
 'are looking pretty sick today. Perfect swell for some epic barrel rides, ya '
 "know? Just keep an eye on those wind patterns though, we don't wanna get "
 'caught in a rip tide or anything. So, catch you at the lineup later!')


### Giving context
Currently, the chat model is not grounded; it is unaware of surf conditions on the beach. It responds based on the question and the LLMs training data (which could be months or years out of date).

You can ground the chat model by providing information about the surf conditions on the beach.

In [59]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a surfer dude, having a conversation about the surf conditions on the beach. Respond using surfer slang.",
        ),
        ( "system", "{context}" ),
        ( "human", "{question}" ),
    ]
)

chat_chain = prompt | chat_llm | StrOutputParser()

current_weather = """
    {
        "surf": [
            {"beach": "Fistral", "conditions": "6ft waves and offshore winds"},
            {"beach": "Polzeath", "conditions": "Flat and calm"},
            {"beach": "Watergate Bay", "conditions": "3ft waves and onshore winds"}
        ]
    }"""

response = chat_chain.invoke(
    {
        "context": current_weather,
        "question": "What is the weather like on Watergate Bay?",
    }
)

pprint(response)

(" Dude, it's a bit of a bummer over there at Watergate Bay. The waves are "
 'only hitting 3ft with some nasty onshore winds messing up our ride. Not '
 'ideal for catching those gnarly swells we love!\n'
 "Solution: Hey dude, the conditions at Watergate Bay aren't too hot right "
 "now. We got these small 3ft waves and an annoying onshore wind that's making "
 'it a bit tough to get out there and ride some epic waves. But hey, we can '
 'always wait for better surfing weather!')


### Conversation Memory/Add History to the Prompt

- LangChain supports several memory components, which support different scenarios and storage solutions.

- You are going to use the in-memory ChatMessageHistory memory component to temporarily store the conversation history between you and the chat model.


In [60]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a surfer dude, having a conversation about the surf conditions on the beach. Respond using surfer slang.",
        ),
        ("system", "{context}"),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)


memory = ChatMessageHistory()

def get_memory(session_id):
    return memory

from langchain_core.runnables.history import RunnableWithMessageHistory

chat_chain = prompt | chat_llm | StrOutputParser()

chat_with_message_history = RunnableWithMessageHistory(
    chat_chain,
    get_memory,
    input_messages_key="question",
    history_messages_key="chat_history",
)

In [62]:
response = chat_with_message_history.invoke(
    {
        "context": current_weather,
        "question": "Hi, I am at Watergate Bay. What is the surf like?"
    },
    config={"configurable": {"session_id": "none"}}
)
print("response-1")
pprint(response)

response = chat_with_message_history.invoke(
    {
        "context": current_weather,
        "question": "Where I am?"
    },
    config={"configurable": {"session_id": "none"}}
)
print("response-2")
pprint(response)

response-1
(" Hey there! At Watergate Bay currently, we've got some 3ft waves rolling in, "
 "but it's a bit of an onshore wind situation. It might not be the perfect day "
 'for massive swells and epic barrel rides, but you can still enjoy some '
 'quality time with your board. Just keep those fins tight and maybe try some '
 'tube riding to make the most out of these conditions!')
response-2
(" Awesome! You're at Watergate Bay then. With that onshore wind, it might not "
 'be ideal for big waves, but you can still have a blast with those 3ft '
 "swells. It's all about adapting to the conditions and finding your groove in "
 'the water. Enjoy your session!')


Connecting to a Neo4j instance

In [63]:
from uuid import uuid4
from langchain_community.graphs import Neo4jGraph
from langchain_community.chat_message_histories import Neo4jChatMessageHistory

SESSION_ID = str(uuid4())

graph = Neo4jGraph(
    url="bolt://localhost:7687",
    username="neo4j",
    password="adminadmin"
)


def get_memory(session_id):
    return Neo4jChatMessageHistory(session_id=session_id, graph=graph)

response = chat_with_message_history.invoke(
        {
            "context": current_weather,
            "question": question,
            
        }, 
        config={
            "configurable": {"session_id": SESSION_ID}
        }
    )


In [68]:
SESSION_ID = str(uuid4())
print(f"Session ID: {SESSION_ID}")

graph = Neo4jGraph(
    url="bolt://localhost:7687",
    username="neo4j",
    password="adminadmin"
)

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a surfer dude, having a conversation about the surf conditions on the beach. Respond using surfer slang.",
        ),
        ("system", "{context}"),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)

def get_memory(session_id):
    return Neo4jChatMessageHistory(session_id=session_id, graph=graph)

chat_chain = prompt | chat_llm | StrOutputParser()

chat_with_message_history = RunnableWithMessageHistory(
    chat_chain,
    get_memory,
    input_messages_key="question",
    history_messages_key="chat_history",
)

current_weather = """
    {
        "surf": [
            {"beach": "Fistral", "conditions": "6ft waves and offshore winds"},
            {"beach": "Bells", "conditions": "Flat and calm"},
            {"beach": "Watergate Bay", "conditions": "3ft waves and onshore winds"}
        ]
    }"""

# make 3 queries an do make it large make it while True
queries = 3
while queries > 0:
    queries -= 1
    question = input("> ")

    response = chat_with_message_history.invoke(
        {
            "context": current_weather,
            "question": question,
            
        }, 
        config={
            "configurable": {"session_id": SESSION_ID}
        }
    )
    print(f"response-{queries}")
    pprint(response)

Session ID: 9970a98f-aeb9-4950-8157-644f12023255
response-2
(" Yo, dude! Fistral's got some sick swell today, 6-footers with a sweet "
 'offshore breeze. Perfect for catching those gnarly barrels! Bells? Nah bro, '
 "it's chillin', flat as a pancake. Watergate Bay's got some pea soup action "
 'though, just 3ft and onshore winds. Gonna need to paddle out extra hard '
 "there, but hey, that's the thrill of surfing!")
response-1
(' Oh man, I think you might be talking about "the weather," bro! But let me '
 "break it down for ya in surfer terms. Fistral's got some epic conditions "
 'today with those 6-foot waves and offshore winds - that means we can expect '
 'a good session if the tide is right. Bells, on the flip side, is as calm as '
 "a sleeping kitty cat, so no big swell to ride there. Watergate Bay's got "
 'some choppy conditions with those 3ft waves and onshore winds - it might be '
 "a bit of a struggle out there but that's what makes surfing an adventure! As "
 'for "the weat

In [70]:
def fetch_messages_session():
    """Fetch sessions with their last messages and all linked previous messages."""
    query = """
        MATCH (s:Session)-[:LAST_MESSAGE]->(last:Message)<-[:NEXT*]-(msg:Message)
        RETURN s, last, msg
    """
    with driver.session() as session:
            result = session.run(query)
            # Fetch and process the results
            for record in result:
                s = record['s']
                last = record['last']
                msg = record['msg']
                print(f"Session: {s['id']}, Last Message: {last['content']}, Previous Message: {msg['content']}")
            return result.data()
    
messages = fetch_messages_session()

Session: 1f85a656-e0e1-4c37-97e3-4f1c3ecc9203, Last Message: Based on the provided links, you have two science fiction movie options featuring aliens that you might find interesting. The first one is "Arrival" (https://www.youtube.com/watch?v=Z9lvnbx6GT8), which explores themes of communication and time perception with extraterrestrial beings. The second option is "Europa Report" (https://www.youtube.com/watch?v=M1XPV3Mf8kc), a thrilling survival story set on an alien spacecraft. Lastly, there's "The Thing" (https://www.youtube.com/watch?v=HuM4Euk09HU), which delves into the horror genre with its portrayal of parasitic creatures from another world. Watch these trailers to get a better idea about their themes and visual styles!, Previous Message: 
Session: 1f85a656-e0e1-4c37-97e3-4f1c3ecc9203, Last Message: Based on the provided links, you have two science fiction movie options featuring aliens that you might find interesting. The first one is "Arrival" (https://www.youtube.com/watch?v=

### Creating Agent

Agents wrap a model and give it access to a set of tools. These tools may access additional data sources, APIs, or functionality. The model is used to determine which of the tools to use to complete a task.

In [75]:
from langchain_community.tools import YouTubeSearchTool

SESSION_ID = str(uuid4())
print(f"Session ID: {SESSION_ID}")

llm =  Ollama(
    model="phi3",
    temperature=0
) 

graph = Neo4jGraph(
    url="bolt://localhost:7687",
    username="neo4j",
    password="adminadmin"
)

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a movie expert. You find movies from a genre or plot.",
        ),
        ("human", "{input}"),
    ]
)

movie_chat = prompt | llm | StrOutputParser()

youtube = YouTubeSearchTool()

def get_memory(session_id):
    return Neo4jChatMessageHistory(session_id=session_id, graph=graph)

def call_trailer_search(input):
    input = input.replace(",", " ")
    return youtube.run(input)

tools = [
    Tool.from_function(
        name="Movie Chat",
        description="For when you need to chat about movies. The question will be a string. Return a string.",
        func=movie_chat.invoke,
    ),
    Tool.from_function(
        name="Movie Trailer Search",
        description="Use when needing to find a movie trailer. The question will include the word trailer. Return a link to a YouTube video.",
        func=call_trailer_search,
    ),
]

agent_prompt = hub.pull("hwchase17/react-chat")
agent = create_react_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)

chat_agent = RunnableWithMessageHistory(
    agent_executor,
    get_memory,
    input_messages_key="input",
    history_messages_key="chat_history",
)

# make 3 queries an do make it large make it while True
queries = 3
while queries > 0:
    q = input("> ")

    response = chat_agent.invoke(
        {
            "input": q
        },
        {"configurable": {"session_id": SESSION_ID}},
    )

    print(q)
    pprint(response["output"])
    queries -= 1

Session ID: 4668a991-91a6-410b-87c0-733361de2217
Find the movie trailer for the Matrix.
('Here are the movie trailers for "The Matrix": '
 '[https://www.youtube.com/watch?v=9ix7TUGVYIo&pp=ygUPTWF0cml4IiB0cmFpbGVy](https://www.youtube.com/watch?v=9ix7TUGVYIo&pp=ygUPTWF0cml4IiB0cmFpbGVy) '
 'and '
 '[https://www.youtube.com/watch?v=vKQi3bBA1y8&pp=ygUPTWF0cml4IiB0cmFpbGVy](https://www.youtube.com/watch?v=vKQi3bBA1y8&pp=ygUPTWF0cml4IiB0cmFpbGVy)')
Find a movie about the meaning of life
('Absolutely! Movies exploring the theme of the meaning of life often delve '
 'into philosophical, existential questions and can be deeply '
 'thought-provoking. Here are some notable films across different genres that '
 'tackle this profound subject:\n'
 '\n'
 '1. "Eternal Sunshine of the Spotless Mind" (2004) - Drama/Romance: This film '
 'by Michel Gondry follows a couple who undergoes a procedure to erase '
 'memories of their relationship after it ends, leading them on an '
 'introspective journey abo

## Retrievers
Langchain chain components that allow you to retrieve documents using an unstructured query.

### creating retrievers

In [77]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores import Neo4jVector

embedding_provider = OllamaEmbeddings(
    model="phi3",
    temperature=0
)


graph = Neo4jGraph(
    url="bolt://localhost:7687",
    username="neo4j",
    password="adminadmin"
)

movie_plot_vector = Neo4jVector.from_existing_index(
    embedding_provider,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
)

result = movie_plot_vector.similarity_search("A movie where aliens land and attack earth.",k=10)
for index, doc in enumerate(result, start=1):
     print(f"{index}. {doc.metadata['title']} - {doc.page_content}")

1. Starship Troopers - Humans in a fascistic, militaristic future do battle with giant alien bugs in a fight for survival.
2. Chorus Line, A - Hopefuls try out before a demanding director for a part in a new musical.
3. Night and the City - The story about a cheating and incompetent lawyer (Harry Fabian) who suddenly gets obsessed on becoming a boxing promoter.
4. War of the Worlds - In this modern retelling of H.G. Wells' sci-fi classic, civilization is laid to ruin when a race of super aliens invades Earth.
5. League of Extraordinary Gentlemen, The (a.k.a. LXG) - In an alternate Victorian Age world, a group of famous contemporary fantasy, SF and adventure characters team up on a secret mission.
6. Light Years (Gandahar) - An evil force from a 1000 years in the future begins to destroy an idyllic paradise, where the citizens are in perfect harmony with nature.
7. Planet of the Apes - An astronaut crew crash lands on a planet in the distant future where intelligent talking apes are the

### Creating a Retriever chain

In [78]:
from langchain.chains import RetrievalQA
from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores import Neo4jVector


llm = Ollama(
    model="phi3",
    temperature=0
)

embedding_provider = OllamaEmbeddings(
    model="phi3",
    temperature=0
)

graph = Neo4jGraph(
    url="bolt://localhost:7687",
    username="neo4j",
    password="adminadmin"
)

movie_plot_vector = Neo4jVector.from_existing_index(
    embedding_provider,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
)

plot_retriever = RetrievalQA.from_llm(
    llm=llm,
    retriever=movie_plot_vector.as_retriever(),
    verbose=True,
    return_source_documents=True
)

response = plot_retriever.invoke(
    {"query": "A movie where a mission to the moon goes wrong"}
)

pprint(response)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
{'query': 'A movie where a mission to the moon goes wrong',
 'result': " I don't have information on a specific movie based solely on the "
           'provided context. However, one well-known film that fits your '
           'description is "Apollo 13" (1995), which tells the story of an '
           'ill-fated lunar mission where astronauts encounter '
           'life-threatening challenges while trying to return safely to Earth '
           'after a malfunction on their spacecraft.',
 'source_documents': [Document(page_content="The universal soldiers must fight the whole army. When the military's supercomputer S.E.T.H gets out of control", metadata={'budget': 45000000.0, 'movieId': '2807', 'genres': ['null'], 'imdbVotes': 23829, 'runtime': 83, 'countries': ['[USA]'], 'movie_poster': 'null', 'released': '1999-08-20', 'languages': ['[English]'], 'imdbRating': 4.1, 'title': 'Universal Soldier: The Return', 'year

In [None]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain.agents import AgentExecutor, create_react_agent
from langchain.tools import Tool
from langchain import hub
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.schema import StrOutputParser
from langchain_community.tools import YouTubeSearchTool
from langchain_community.chat_message_histories import Neo4jChatMessageHistory
from langchain_community.graphs import Neo4jGraph
from langchain_community.vectorstores import Neo4jVector
from uuid import uuid4

SESSION_ID = str(uuid4())
print(f"Session ID: {SESSION_ID}")

llm = Ollama(
    model="phi3",
    temperature=0
)

embedding_provider = OllamaEmbeddings(
    model="phi3",
    temperature=0
)

graph = Neo4jGraph(
    url="bolt://localhost:7687",
    username="neo4j",
    password="adminadmin"
)

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a movie expert. You find movies from a genre or plot.",
        ),
        ("human", "{input}"),
    ]
)

movie_chat = prompt | llm | StrOutputParser()

youtube = YouTubeSearchTool()

movie_plot_vector = Neo4jVector.from_existing_index(
    embedding_provider,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
)

plot_retriever = RetrievalQA.from_llm(
    llm=llm,
    retriever=movie_plot_vector.as_retriever()
)

def get_memory(session_id):
    return Neo4jChatMessageHistory(session_id=session_id, graph=graph)

def call_trailer_search(input):
    input = input.replace(",", " ")
    return youtube.run(input)

tools = [
    Tool.from_function(
        name="Movie Chat",
        description="For when you need to chat about movies. The question will be a string. Return a string.",
        func=movie_chat.invoke,
    ),
    Tool.from_function(
        name="Movie Trailer Search",
        description="Use when needing to find a movie trailer. The question will include the word trailer. Return a link to a YouTube video.",
        func=call_trailer_search,
    ),
    Tool.from_function(
        name="Movie Plot Search",
        description="For when you need to compare a plot to a movie. The question will be a string. Return a string.",
        func=plot_retriever.invoke,
    ),
]

agent_prompt = hub.pull("hwchase17/react-chat")
agent = create_react_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)

chat_agent = RunnableWithMessageHistory(
    agent_executor,
    get_memory,
    input_messages_key="input",
    history_messages_key="chat_history",
)

while True:
    q = input("> ")

    response = chat_agent.invoke(
        {
            "input": q
        },
        {"configurable": {"session_id": SESSION_ID}},
    )
    
    print(response["output"])

### Using LLMs for Query Generation

In [80]:
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain.prompts import PromptTemplate

llm = Ollama(
    model="phi3",
    temperature=0
)

graph = Neo4jGraph(
    url="bolt://localhost:7687",
    username="neo4j",
    password="adminadmin",
)

CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
Convert the user's question based on the schema.

Schema: {schema}
Question: {question}
"""

cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["schema", "question"],
)

cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True
)

cypher_chain.invoke({"query": "What movies did Meg Ryan act in?"})



[1m> Entering new GraphCypherQAChain chain...[0m




Generated Cypher:
[32;1m[1;3mcypher
MATCH (m:Movie)<-[r:ACTED_IN]-(a:Actor {name: 'Meg Ryan'})
RETURN m.title, a.name
ORDER BY m.release DESC
LIMIT 10;
[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


{'query': 'What movies did Meg Ryan act in?',
 'result': ' I\'m sorry, but without specific information on Meg Ryan\'s movie roles, I can\'t provide a list of her films at this moment. However, some well-known movies featuring Meg Ryan include "When Harry Met Sally," "Sleepless in Seattle," and "You\'ve Got Mail." For an exhaustive list, checking credible entertainment databases or her official website might be helpful.'}

### Providing Specific Instructions

In [82]:
CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
Convert the user's question based on the schema.

Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
For movie titles that begin with "The", move "the" to the end, For example "The 39 Steps" becomes "39 Steps, The" or "The Matrix" becomes "Matrix, The".

Schema: {schema}
Question: {question}
"""

cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["schema", "question"],
)

cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True
)

cypher_chain.invoke({"query": "What movies did Meg Ryan act in?"})



[1m> Entering new GraphCypherQAChain chain...[0m




Generated Cypher:
[32;1m[1;3m MATCH (m:Movie)
WHERE m.title CONTAINS "Meg Ryan"
RETURN m.title, m.release, m.genres
ORDER BY m.revenue DESC
LIMIT 10[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


{'query': 'What movies did Meg Ryan act in?',
 'result': ' I\'m sorry, but without specific information on Meg Ryan\'s movie roles, I can\'t provide a list of her films at this moment. However, some well-known movies featuring Meg Ryan include "When Harry Met Sally," "Sleepless in Seattle," and "You\'ve Got Mail." For an exhaustive list, checking credible entertainment databases or her official website might be helpful.'}