In [45]:
from google.colab import drive

drive.mount('/content/gdrive')

%cd gdrive/My Drive

Mounted at /content/gdrive
/content/gdrive/My Drive


Requirements:

1. **LangChain**: A framework for developing applications powered by language models.
2. **LangChain-Community**: A collaborative space for sharing resources, tools, and discussions related to LangChain.
3. **LangChain-OpenAI**: A LangChain extension providing integration with OpenAI's language models.
4. **LangChain-Experimental**: A repository for experimental features and prototypes within the LangChain ecosystem.
5. **Neo4j**: A graph database management system designed for handling and querying connected data.
6. **tiktoken**: A library for tokenizing text, commonly used with language models to preprocess input and output.
7. **yfiles_jupyter_graphs**: A library for creating and visualizing graphs and network diagrams in Jupyter notebooks.
8. **Streamlit**: An open-source framework for building interactive, web-based applications using Python.
9. **Localtunnel**: A tool that allows you to expose your local web server to the internet via a secure tunnel.
10. **CTransformers** : A python wrapper for transfomer based models with essential configurations.

In [46]:
%pip install --upgrade --quiet  langchain langchain-community langchain-openai langchain-experimental neo4j tiktoken yfiles_jupyter_graphs streamlit

In [47]:
pip install ctransformers



In [48]:
pip install neo4j



In [49]:
!npm install localtunnel

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K
up to date, audited 23 packages in 3s
[1G[0K⠼[1G[0K
[1G[0K⠼[1G[0K3 packages are looking for funding
[1G[0K⠼[1G[0K  run `npm fund` for details
[1G[0K⠼[1G[0K
2 [31m[1mhigh[22m[39m severity vulnerabilities

To address all issues (including breaking changes), run:
  npm audit fix --force

Run `npm audit` for details.
[1G[0K⠴[1G[0K

The provided code imports several components from different libraries, each serving a specific purpose. The `RunnableBranch`, `RunnableLambda`, `RunnableParallel`, and `RunnablePassthrough` from LangChain Core are utilities for creating complex workflows involving branching logic, function definitions, parallel execution, and straightforward data passing. For crafting prompts, `ChatPromptTemplate` and `PromptTemplate` are used to design structured inputs for language models.

The `BaseModel` and `Field` from Pydantic assist in data validation and configuration, while typing utilities like `Tuple`, `List`, and `Optional` are used for type hinting in Python code. The message classes `AIMessage` and `HumanMessage` define interactions between an AI and a user. The `StrOutputParser` is employed for converting outputs into string format.

The `os` module provides a way to interact with the operating system, and `Neo4jGraph` facilitates working with Neo4j databases. `TokenTextSplitter` helps in breaking down text into manageable tokens, essential for language processing. `ChatOpenAI` enables communication with OpenAI's chat models, and `LLMGraphTransformer` is used for transforming graph data with language models. For database interaction, `GraphDatabase` connects with Neo4j, and `GraphWidget` allows for the creation and visualization of graphs in Jupyter notebooks.

Further, `Neo4jVector` supports storing and retrieving embeddings in Neo4j, while `OpenAIEmbeddings` provides tools for generating embeddings from OpenAI models. The `remove_lucene_chars` function cleans text by removing specific characters, ensuring data is properly formatted for Neo4j. Lastly, `ConfigurableField` allows for configurable data fields within a runnable, aiding in the customization of workflows.

In [50]:
# %%writefile main.py

from langchain_core.runnables import (
    RunnableBranch,
    RunnableLambda,
    RunnableParallel,
    RunnablePassthrough,
)
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts.prompt import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import Tuple, List, Optional
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.output_parsers import StrOutputParser
import os
from langchain_community.graphs import Neo4jGraph
from langchain.text_splitter import TokenTextSplitter
from langchain_openai import ChatOpenAI
from langchain_experimental.graph_transformers import LLMGraphTransformer
from neo4j import GraphDatabase
from yfiles_jupyter_graphs import GraphWidget
from langchain_community.vectorstores import Neo4jVector
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores.neo4j_vector import remove_lucene_chars
from langchain_core.runnables import ConfigurableField, RunnableParallel, RunnablePassthrough


try:
  import google.colab
  from google.colab import output
  output.enable_custom_widget_manager()
except:
  pass





These lines of code set environment variables for accessing the OpenAI API and a Neo4j database:

- `os.environ["OPENAI_API_KEY"] = "sk-...."` sets the API key for accessing OpenAI's services.
- `os.environ["NEO4J_URI"] = "bolt+s://d......databases.neo4j.io:7687"` specifies the URI for connecting to the Neo4j database.
- `os.environ["NEO4J_USERNAME"] = "usr"` sets the username for the Neo4j database connection.
- `os.environ["NEO4J_PASSWORD"] = "pwd"` sets the password for the Neo4j database connection.

These environment variables are crucial for securely storing sensitive information required for connecting to external services.

In [51]:
os.environ["OPENAI_API_KEY"] = "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"#replace your OPEN_API key
os.environ["NEO4J_URI"] = "neo4j+s://06464093.databases.neo4j.io"#NEO4J URL
os.environ["NEO4J_USERNAME"] = "neo4j" #NEO4J USERNAME
os.environ["NEO4J_PASSWORD"] = "UjMMz0QsAh59ZpBOlsRaJV0kqX-5Xkec8F7Xp5peD6Y"#NEO4J PASSWORD


This instance, stored in the variable graph, can be used to interact with the Neo4j database. The Neo4jGraph class provides methods and functionalities to execute queries, retrieve data, and manipulate the graph database, facilitating seamless integration and operations within the LangChain framework.

In [52]:
graph = Neo4jGraph()

In [53]:
# ✅ Install required packages
!pip install tmdbv3api wikipedia

from tmdbv3api import TMDb, Movie
import wikipedia
from langchain_core.documents import Document
import time

# Setup TMDB
tmdb = TMDb()
tmdb.api_key = "88bd80b2be4bd14d2d4cb3229e072e9f"
tmdb.language = 'en'
tmdb.debug = True

movie_api = Movie()

# Storage for LangChain documents
documents = []

# Fetch top 100 movies using TMDB pages (20 per page)
for page in range(1, 6):  # 5 pages x 20 = 100 movies
    top_movies = movie_api.popular(page=page)

    for movie in top_movies:
        try:
            title = movie.title
            overview = movie.overview
            release_date = movie.release_date
            rating = movie.vote_average

            # Try to fetch Wikipedia summary
            try:
                wiki_summary = wikipedia.summary(title)
            except:
                wiki_summary = "Wikipedia summary not found."

            # Combine into one rich document
            content = (
                f"Title: {title}\n"
                f"Release Date: {release_date}\n"
                f"Overview: {overview}\n"
                f"Rating: {rating}\n"
                f"Wikipedia Summary: {wiki_summary}"
            )

            doc = Document(page_content=content, metadata={"source": title})
            documents.append(doc)

            print(f"✅ Added: {title}")
            time.sleep(1)  # be nice to the Wikipedia API

        except Exception as e:
            print(f"❌ Failed to process movie: {movie.title} – {str(e)}")

print(f"\n🎬 Total documents prepared: {len(documents)}")


✅ Added: A Working Man
✅ Added: A Minecraft Movie
✅ Added: In the Lost Lands
✅ Added: Captain America: Brave New World
✅ Added: The Siege




  lis = BeautifulSoup(html).find_all('li')


✅ Added: G20
✅ Added: Novocaine
✅ Added: Sinners
✅ Added: Gunslingers
✅ Added: The Woman in the Yard
✅ Added: Conclave
✅ Added: Moana 2
✅ Added: Bullet Train Explosion
✅ Added: Mufasa: The Lion King
✅ Added: Sonic the Hedgehog 3
✅ Added: The Passion of the Christ
✅ Added: Home Sweet Home: Rebirth
✅ Added: A Knight's War
✅ Added: Cleaner
✅ Added: The Codes of War
✅ Added: Laila
✅ Added: Mickey 17
✅ Added: Carjackers
✅ Added: Ask Me What You Want
✅ Added: The Hard Hit
✅ Added: iHostage
✅ Added: Flight Risk
✅ Added: Sugar Baby
✅ Added: Deva
✅ Added: The Gorge
✅ Added: Locked
✅ Added: Cosmic Chaos
✅ Added: Batman Ninja vs. Yakuza League
✅ Added: Avengers: Infinity War
✅ Added: Counterattack
✅ Added: Easter Bloody Easter
✅ Added: Peter Pan's Neverland Nightmare
✅ Added: Ash
✅ Added: The Quiet Ones
✅ Added: Snow White
✅ Added: Here After
✅ Added: Turno nocturno
✅ Added: The Amateur
✅ Added: Gladiator II
✅ Added: My Fault
✅ Added: Pulp Fiction
✅ Added: Fight or Flight
✅ Added: Superboys of Ma

In [54]:
!pip install tiktoken






- **TokenTextSplitter**: This class is used to split text into chunks, typically to prepare it for processing by language models or other natural language processing tasks.
- **text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24)**: Initializes an instance of `TokenTextSplitter` with a chunk size of 512 tokens and an overlap of 24 tokens between consecutive chunks.
- **documents = text_splitter.split_documents(data)**: Splits the input `data` (presumably a collection of text documents) into chunks of the specified size and overlap, storing the resulting chunks in the `documents` variable.

This approach is useful for breaking down large text documents or datasets into manageable pieces that can be processed more efficiently by downstream tasks or models.

In [55]:
from langchain.text_splitter import TokenTextSplitter

text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24)
split_docs = text_splitter.split_documents(documents)

print(f"📄 Total chunks created: {len(split_docs)}")


📄 Total chunks created: 113


In [57]:
!pip install -q langchain-openai


In [58]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Neo4jVector

embeddings = OpenAIEmbeddings(api_key="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")

vectorstore = Neo4jVector.from_documents(
    documents=split_docs,
    embedding=embeddings,
    url="neo4j+s://06464093.databases.neo4j.io",
    username="neo4j",
    password="UjMMz0QsAh59ZpBOlsRaJV0kqX-5Xkec8F7Xp5peD6Y",
    index_name="movie_rag_index"
)




- **ChatOpenAI**: This class is used to interact with OpenAI's chat models. In this case, it's initialized with parameters like `temperature=0` and `model_name="gpt-3.5-turbo-0125"`.
- **LLMGraphTransformer**: This class is used to transform text documents into graph documents using a language model (LLM).
- **Neo4jGraph**: This class represents a connection to a Neo4j graph database.


The code first initializes the language model (`llm`) and the transformer (`llm_transformer`) to convert text documents (`documents`) into graph documents. Then, it initializes a `Neo4jGraph` instance (`graph`) and adds the transformed graph documents to the Neo4j database using the `add_graph_documents` method.

This approach integrates natural language processing with graph database operations, enabling the creation of structured graph representations from unstructured text data.

In [59]:
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

# Load vector store (optional if already initialized earlier)
vectorstore = Neo4jVector(
    embedding=embeddings,
    url="neo4j+s://06464093.databases.neo4j.io",
    username="neo4j",
    password="UjMMz0QsAh59ZpBOlsRaJV0kqX-5Xkec8F7Xp5peD6Y",
    index_name="movie_rag_index"
)

# Create retriever
retriever = vectorstore.as_retriever()

# Load OpenAI LLM
llm = ChatOpenAI(
    temperature=0,
    model_name="gpt-3.5-turbo-0125",  # or gpt-4-0125-preview
    api_key="your-openai-api-key"
)

# Build RAG Chain (retrieval + generation)
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)


In [60]:
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate


In [61]:
custom_prompt = PromptTemplate(
    input_variables=["chat_history", "question", "context"],
    template="""
You are a helpful movie assistant. Use the context below to answer the user's question.
Keep it short, relevant, and include movie titles if appropriate.

Context: {context}
Chat History: {chat_history}
Question: {question}

Helpful Answer:
""")


In [62]:
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)


In [63]:
conversational_rag_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory,
    # combine_docs_chain_kwargs={"prompt": custom_prompt}  # optional if using custom
)


In [64]:
pip install tmdbv3api




In [65]:
pip show tmdbv3api


Name: tmdbv3api
Version: 1.9.0
Summary: A lightweight Python library for The Movie Database (TMDb) API.
Home-page: https://github.com/AnthonyBloomer/tmdbv3api
Author: Anthony Bloomer
Author-email: ant0@protonmail.ch
License: MIT
Location: /usr/local/lib/python3.11/dist-packages
Requires: requests
Required-by: 


In [68]:
%%writefile main.py
import os
import time
from typing import List
from langchain_core.documents import Document
from langchain.text_splitter import TokenTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Neo4jVector
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from tmdbv3api import TMDb, Movie
import wikipedia
import warnings
warnings.filterwarnings("ignore")

# ✅ Load environment variables
os.environ["OPENAI_API_KEY"] = "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"
os.environ["NEO4J_URI"] = "neo4j+s://06464093.databases.neo4j.io"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "UjMMz0QsAh59ZpBOlsRaJV0kqX-5Xkec8F7Xp5peD6Y"

# ✅ TMDB Setup
tmdb = TMDb()
tmdb.api_key = "88bd80b2be4bd14d2d4cb3229e072e9f"
tmdb.language = 'en'
tmdb.debug = True

movie_api = Movie()
documents = []

# ✅ Fetch Top 100 Movies (TMDB + Wikipedia)
for page in range(1, 6):
    top_movies = movie_api.popular(page=page)
    for movie in top_movies:
        try:
            title = movie.title
            overview = movie.overview
            release_date = movie.release_date
            rating = movie.vote_average

            try:
                wiki_summary = wikipedia.summary(title)
            except:
                wiki_summary = "Wikipedia summary not found."

            content = (
                f"Title: {title}\n"
                f"Release Date: {release_date}\n"
                f"Overview: {overview}\n"
                f"Rating: {rating}\n"
                f"Wikipedia Summary: {wiki_summary}"
            )

            doc = Document(page_content=content, metadata={"source": title})
            documents.append(doc)

            print(f"✅ Added: {title}")
            time.sleep(1)

        except Exception as e:
            print(f"❌ Failed to process movie: {movie.title} – {str(e)}")

print(f"\n🎬 Total documents prepared: {len(documents)}")

# ✅ Chunk the documents
text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24)
split_docs = text_splitter.split_documents(documents)
print(f"📄 Total chunks created: {len(split_docs)}")

# ✅ Initialize Embeddings
embeddings = OpenAIEmbeddings()

# ✅ Store Chunks in Neo4j
vectorstore = Neo4jVector.from_documents(
    documents=split_docs,
    embedding=embeddings,
    url="neo4j+s://06464093.databases.neo4j.io",
    username="neo4j",
    password="UjMMz0QsAh59ZpBOlsRaJV0kqX-5Xkec8F7Xp5peD6Y",
    index_name="movie_rag_index"
)
print("✅ Embeddings successfully stored in Neo4j!")

# ✅ Conversational Chain Setup
retriever = vectorstore.as_retriever()
llm = ChatOpenAI(model_name="gpt-3.5-turbo-0125", temperature=0)

memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    output_key="answer"  # ✅ THIS FIXES THE ERROR
)

conversational_rag_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=retriever,
    memory=memory,
    return_source_documents=True,
    output_key="answer"  # ✅ Match with memory
)
def answerquery(prompt: str):
    result = conversational_rag_chain.invoke({"question": prompt})
    return result["answer"]



Overwriting main.py


In [69]:
%%writefile app.py
import streamlit as st
from main import answerquery

st.set_page_config(page_title="MovieMate AI", page_icon="🎬")
st.title("🎬 MovieMate AI: Your Movie Recommendation Assistant")

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat messages from history on app rerun
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Use chat-style input
if prompt := st.chat_input("Ask me anything about movies..."):
    # Display user message
    st.chat_message("user").markdown(prompt)
    st.session_state.messages.append({"role": "user", "content": prompt})

    # Get response from RAG chain
    response = answerquery(prompt)

    # Display assistant response
    with st.chat_message("assistant"):
        st.markdown(response)
    st.session_state.messages.append({"role": "assistant", "content": response})


Overwriting app.py


In [31]:
!streamlit run app.py


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8501[0m
[34m  External URL: [0m[1mhttp://35.245.0.181:8501[0m
[0m
[34m  Stopping...[0m
^C
