# 第 8 章 RAG與圖形資料庫
本章中，我們將以Knowledge Grapg (KG)實作RAG中的（圖形）資料庫。為此必須使用`Neo4j`圖形資料庫。必須先到https://console.neo4j.io/ 註冊

In [1]:
import os
from rich import print as pprint
from langchain.tools import StructuredTool
from langchain.chains import GraphCypherQAChain
from langchain_community.graphs import Neo4jGraph
from langchain.pydantic_v1 import BaseModel, Field
from langchain_community.vectorstores import Neo4jVector
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import CSVLoader
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

In [2]:
os.environ['OPENAI_API_KEY'] = "sk-None-vowLahS2p4mOq6FP56VCT3BlbkFJTY1umKuhsfu61iHTNVDc"
os.environ["GOOGLE_API_KEY"] = "AIzaSyCGtKgSU_XxaFGFbPCEt3H4uTmP3tOrAFg"
os.environ['NEO4J_URI'] = "neo4j+s://e53bd7c6.databases.neo4j.io"
os.environ['NEO4J_PASSWORD'] = "Z2c-k96KV_nU1j0bC-wQxAnxCQI8R8g09Ey6UgRIcLc"

In [3]:
chat_model = ChatOpenAI(model='gpt-3.5-turbo', api_key=os.environ['OPENAI_API_KEY'], cache=False)
embeddings = OpenAIEmbeddings(model='text-embedding-3-small', api_key=os.environ['OPENAI_API_KEY'])

In [4]:
graph = Neo4jGraph(url=os.environ['NEO4J_URI'],
                   username='neo4j',
                   password=os.environ['NEO4J_PASSWORD'])

## 8-1 圖形資料庫

### 1. 匯入資料
以下是Cypher語法的查詢語言。大意是讀取一份csv檔案並建立各種屬性，並將導演和電影設定為KG中的node，分別指定屬性後建立directed graph（由導演指向電影），代表某導演執導某電影

In [5]:
movies_query = """
LOAD CSV WITH HEADERS FROM
'https://FlagTech.github.io/F4763/movie_data.csv'
AS row
MERGE (m:Movie {id:row.MovieID})
SET m.released = date(row.Release_Date),
    m.title = row.Title,
    m.imdbRating = toFloat(row.Vote_Average)
FOREACH (director in split(row.Director, ', ') |
    MERGE (p:Person {name:trim(director)})
    MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.Cast, ', ') |
    MERGE (p:Person {name:trim(actor)})
    MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.Genres, ', ') |
    MERGE (g:Genre {name:trim(genre)})
    MERGE (m)-[:IN_GENRE]->(g))
"""

執行 Cypher 程式碼建立KG

In [6]:
graph.query(movies_query)

[]

### 2. 圖形結構
使用`refresh_schema`重新整理KG，觀察KG裡node和relation的屬性

In [7]:
graph.refresh_schema() # 重新整理
print(graph.schema)

Node properties:
Movie {id: STRING, released: DATE, title: STRING, imdbRating: FLOAT}
Person {name: STRING}
Genre {name: STRING}
Chunk {id: STRING, embedding: LIST, text: STRING, source: STRING, row: INTEGER}
Relationship properties:

The relationships:
(:Movie)-[:IN_GENRE]->(:Genre)
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)


### 3. 串接流程鏈
使用`GraphCypherQAChain`建立流程鏈。可以讓模型自行生成Cypher語法來查詢資料，並自行做彙整。設定`verbose=True`觀察過程

In [8]:
cypher_chain = GraphCypherQAChain.from_llm(graph=graph,
                                           llm=chat_model,
                                           top_k=4,
                                           verbose=True)

In [9]:
response = cypher_chain.invoke({"query": "陳以文演過的電影有?"})
print(response['result'])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: "陳以文"})-[:ACTED_IN]->(m:Movie)
RETURN m.title[0m
Full Context:
[32;1m[1;3m[{'m.title': '陽光普照'}, {'m.title': '瀑布'}, {'m.title': '周處除三害'}][0m

[1m> Finished chain.[0m
I don't know the answer.


此外，只要設定`exclude_types`參數，就可以排除有特定標籤的節點。例如下面的例子中就排除了Genre節點

In [10]:
exclude_types_chain = GraphCypherQAChain.from_llm(
    graph=graph,
    llm=chat_model,
    exclude_types=["Genre"],
    verbose=True)
print(exclude_types_chain.graph_schema)

Node properties are the following:
Movie {id: STRING, released: DATE, title: STRING, imdbRating: FLOAT},Person {name: STRING},Chunk {id: STRING, embedding: LIST, text: STRING, source: STRING, row: INTEGER}
Relationship properties are the following:

The relationships are the following:
(:Person)-[:DIRECTED]->(:Movie),(:Person)-[:ACTED_IN]->(:Movie)


## 8-2 向量資料庫
圖形資料庫也可以和向量資料庫做結合。首先匯入電影評論的csv資料

In [11]:
docs = CSVLoader('./Ch8/movie.csv', encoding="utf-8").load()

### 1. 圖形資料庫向量化
可以使用`Neo4jVector`物件建立Neo4j向量資料庫，接著就可以用cosine similarity等方法進行搜尋

In [12]:
db = Neo4jVector.from_documents(docs,
                                embedding=embeddings,
                                url=os.environ['NEO4J_URI'],
                                username='neo4j',
                                password=os.environ['NEO4J_PASSWORD'])

In [13]:
query = "芭比好看嗎?"
docs_with_score = db.similarity_search_with_score(query, k=2) # 最相關的兩筆
for doc, score in docs_with_score:
    print("-" * 60)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 60)

------------------------------------------------------------
Score:  0.6832531094551086
﻿MovieID: 47
Title: Barbie芭比
Review: FULL SPOILER-FREE REVIEW @ https://www.firstshowing.net/2023/review-greta-gerwigs-barbie-is-both-hilarious-thought-provoking/

"Barbie is hilariously meta, containing spectacularly funny musical numbers, and an efficient tonal balance between over-the-top comedy and rich, thought-provoking social commentary. Inevitable awards are on the way for the brightly colored production design, costumes, and makeup.

Greta Gerwig and Noah Baumbach's narrative unapologetically tackles quite serious topics, from sociopolitical matters like patriarchy and sexual harassment to questions about existential crises, personal identity, self-love, and, of course, the roles of women and men in today's society.

Margot Robbie was destined to play Barbie just as Ryan Gosling was born with Kenergy in his veins. Absolutely fantastic, as are the rest of the Barbies and Kens.

A must-see in

建立retriever

In [14]:
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 3})

寫好prompt並把chain串起來

In [15]:
str_parser = StrOutputParser()
template = ("""
    請根據以下內容加上自身判斷回答問題:\n
    {context}\n
    問題: {question}
    """)
prompt = ChatPromptTemplate.from_template(template)

In [16]:
vector_chain = {"context": retriever, "question": RunnablePassthrough()} | prompt | chat_model | str_parser

In [17]:
print(vector_chain.invoke(query))

根據提供的兩則評論來看，對於電影"芭比"的評價是相當正面的。第一則評論稱讚了電影中的喜劇元素、社會評論以及演員表現，並強調了製作設計和服裝的優秀之處。第二則評論也提到了電影帶來的趣味性和思考性，並稱讚了男性角色在故事中逐漸獲得尊重的情節。因此從這兩則評論來看，"芭比"是一部相當不錯的電影，應該是值得一看的作品。


### 2. 合併兩個資料庫
我們現在有兩個資料庫，一個是電影導演與演員（圖形），另一個是影評資料庫（向量）。我們建立工具讓LLM自己挑選要使用哪種工具查詢資訊。

In [18]:
class ReviewsInput(BaseModel):
    input: str = Field(description="為使用者提出的問題")

reviews = StructuredTool.from_function(func=vector_chain.invoke,
                                       name="Reviews",
                                       description="這是一個關於電影的觀後感受或想法的向量資料庫, 當問題是需要參考評論時很有用。",
                                       args_schema=ReviewsInput)

class GraphInput(BaseModel):
    input: str = Field(description="為使用者提出的完整問題, 請保持中文語言")

graph = StructuredTool.from_function(func=cypher_chain.invoke,
                                     name="Graph",
                                     description="這一個電影關係的圖形資料庫, 包含演員、導演和電影風格",
                                     args_schema=GraphInput)

tools = [reviews, graph]

建立agent並用`AgentExecutor`物件包起來

In [19]:
agent_prompt = ChatPromptTemplate.from_messages([('system','你是一位電影資料助理, 請判斷上下文來回答問題, 不要盲目使用工具'),
                                                 ('human','{input}'),
                                                 MessagesPlaceholder(variable_name="agent_scratchpad")])

In [20]:
agent = create_openai_tools_agent(chat_model, tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [21]:
while True:
    msg = input("我說：")
    if not msg.strip():
        break
    for chunk in agent_executor.stream({"input": msg}):
        if 'output' in chunk:
            print(f"AI 回覆：{chunk['output']}", end="", flush=True)
    print('\n')

我說： 芭比好看嗎




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Reviews` with `{'input': '芭比好看嗎'}`


[0m[36;1m[1;3m根據提供的兩則評論來看，對於電影"Barbie芭比"的評價是非常正面的。第一則評論提到了該電影有著幽默搞笑的元素和富有想像力的情節，並且在第二部分留下了一個引人思考的訊息。第二則評論則著重於該電影中豐富的音樂元素、社會評論和演員表現等方面。因此，從這兩則評論來看，"Barbie芭比" 可能是一部值得一看的電影。[0m[32;1m[1;3m根據提供的兩則評論來看，"Barbie芭比"這部電影獲得了很正面的評價，具有幽默搞笑的元素、豐富的想像力和音樂元素。整體來說，這部電影可能是一部值得一看的作品。[0m

[1m> Finished chain.[0m
AI 回覆：根據提供的兩則評論來看，"Barbie芭比"這部電影獲得了很正面的評價，具有幽默搞笑的元素、豐富的想像力和音樂元素。整體來說，這部電影可能是一部值得一看的作品。



我說： 李安導演過哪些電影




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Graph` with `{'input': '李安導演過哪些電影'}`


[0m

[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: "李安"})-[:DIRECTED]->(m:Movie)
RETURN m.title[0m
Full Context:
[32;1m[1;3m[{'m.title': '卧虎藏龍'}][0m

[1m> Finished chain.[0m
[33;1m[1;3m{'query': '李安導演過哪些電影', 'result': '卧虎藏龍'}[0m[32;1m[1;3m李安導演過的電影有《卧虎藏龍》。[0m

[1m> Finished chain.[0m
AI 回覆：李安導演過的電影有《卧虎藏龍》。



我說： 
