## 0. GraphDB 연결하기 
영화 추천 데이터셋 : https://github.com/neo4j-graph-examples/recommendations
* 벡터기반의 검색을 위해서는 텍스트 형태의 데이터가 있어야함 
* 벡터임베딩을 기반으로 검색을 한 후 그 주변에 있는 맥락을 확인하는 그래프형태 
  * 영화 줄거리를 벡터임베딩으로 먼저 찾음
  * 검색된 영화 노드를 기준으로 추가적인 그래프 순회 로직 추가 

In [1]:
from neo4j import GraphDatabase, basic_auth


driver = GraphDatabase.driver(
  "neo4j://54.209.48.102:7687",
  auth=basic_auth("neo4j", "baby-grain-challenge"))

In [2]:
# 로컬에 있는 .env 를 통해 API key 불러옴 
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
from neo4j_graphrag.llm.openai_llm import OpenAILLM

llm = OpenAILLM(model_name="gpt-4o")

## 1. Vector+ Graph 기반 RAG 구현하기

### 1) Embedding, Vector Index 추가하기

In [4]:
# 영화추천 DB 에는 벡터임베딩이 없으므로, 추가해줌 
from neo4j_graphrag.embeddings.sentence_transformers import SentenceTransformerEmbeddings

embedder = SentenceTransformerEmbeddings(
    model = "all-MiniLM-L6-v2"
)

  from .autonotebook import tqdm as notebook_tqdm
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


In [None]:
with driver.session() as session:
    result = session.run("MATCH (m:Movie) WHERE m.plot IS NOT NULL RETURN elementId(m) AS id, m.plot AS plot")
    records = result.data()

    for record in records:
        node_id = record["id"] # 노드 고유 ID 추출
        text = record["plot"] # 영화 줄거리 텍스트 추출
        vector = embedder.embed_query(text) # 텍스트를 벡터로 변환 (예: 384 차원)

        session.run("""
        MATCH (m) WHERE elementId(m) = $id
        SET m.embedding = $embedding 
        """, {"id": node_id, "embedding": vector}) # ID로 특정 노드를 찾은 뒤, 해당 노드에 임베딩 속성을 추가, 중괄호는 파라미터 바인딩(SQL Injection 방지지)



In [5]:
from neo4j import GraphDatabase
from neo4j_graphrag.indexes import create_vector_index

INDEX_NAME = "plotindex"
DIMENSION = 384

create_vector_index(
    driver, 
    INDEX_NAME, 
    label="Movie", 
    embedding_property="embedding", 
    dimensions=DIMENSION,
    similarity_fn="cosine"
)

### 1-1) 임베딩과 벡터 인덱스 확인 

In [6]:
# 임베딩 속성확인
# 임베딩이 추가된 노드 개수 확인
with driver.session() as session:
    result = session.run("""
    MATCH (m:Movie) 
    WHERE m.embedding IS NOT NULL 
    RETURN count(m) AS count
    """)
    print(result.single()["count"])


9083


In [7]:
# 임베딩 벡터 샘플 확인 (처음 5개 요소만)
with driver.session() as session:
    result = session.run("""
    MATCH (m:Movie) 
    WHERE m.embedding IS NOT NULL 
    RETURN m.title AS title, m.embedding[0..5] AS embedding_sample
    LIMIT 3
    """)
    for record in result:
        print(f"{record['title']}: {record['embedding_sample']}")


Toy Story: [-0.028351454064249992, -0.0184655599296093, 0.04556655138731003, -0.006739762146025896, 0.001581665244884789]
Jumanji: [0.042965974658727646, 0.01807079277932644, -0.02787213958799839, -0.023518972098827362, -0.019368266686797142]
Grumpier Old Men: [-0.05748794600367546, 0.0023444269318133593, 0.016525570303201675, -0.032274261116981506, -0.07190457731485367]


In [8]:
# 생성된 벡터 인덱스 목록 확인
with driver.session() as session:
    result = session.run("SHOW INDEXES")
    for record in result:
        print(record)


<Record id=60 name='__org_neo4j_schema_index_label_scan_store_converted_to_token_index' state='ONLINE' populationPercent=100.0 type='LOOKUP' entityType='NODE' labelsOrTypes=None properties=None indexProvider='token-lookup-1.0' owningConstraint=None lastRead=neo4j.time.DateTime(2026, 1, 14, 13, 18, 1, 788000000, tzinfo=<UTC>) readCount=6>
<Record id=64 name='constraint_3b27b0' state='ONLINE' populationPercent=100.0 type='RANGE' entityType='NODE' labelsOrTypes=['User'] properties=['userId'] indexProvider='range-1.0' owningConstraint='constraint_3b27b0' lastRead=None readCount=0>
<Record id=59 name='constraint_3d5fcb7f' state='ONLINE' populationPercent=100.0 type='RANGE' entityType='NODE' labelsOrTypes=['Movie'] properties=['movieId'] indexProvider='range-1.0' owningConstraint='constraint_3d5fcb7f' lastRead=None readCount=0>
<Record id=63 name='constraint_4499eae9' state='ONLINE' populationPercent=100.0 type='RANGE' entityType='NODE' labelsOrTypes=['Person'] properties=['tmdbId'] indexPro

In [9]:
# plotindex 인덱스 상태 확인
with driver.session() as session:
    result = session.run("""
    SHOW INDEXES 
    YIELD name, type, state, populationPercent
    WHERE name = 'plotindex'
    """)
    for record in result:
        print(f"이름: {record['name']}")
        print(f"타입: {record['type']}")
        print(f"상태: {record['state']}")
        print(f"인덱싱 완료율: {record['populationPercent']}%")


이름: plotindex
타입: VECTOR
상태: ONLINE
인덱싱 완료율: 100.0%


In [10]:
# 실제 벡터 검색이 작동하는지 테스트
test_query = "A hero saves the world"
test_vector = embedder.embed_query(test_query)

with driver.session() as session:
    result = session.run("""
    CALL db.index.vector.queryNodes('plotindex', 3, $embedding)
    YIELD node, score
    RETURN node.title AS title, score
    """, {"embedding": test_vector})
    
    for record in result:
        print(f"{record['title']}: {record['score']:.4f}")


Hancock: 0.7716
Mystery Men: 0.7658
Noah: 0.7502


In [11]:
# 이 DB에 누가 무슨 변경을 했는지 간접 확인
with driver.session() as session:
    # 임베딩 있는 노드 수
    result = session.run("""
    MATCH (m:Movie) 
    RETURN 
        count(m) AS total_movies,
        count(m.embedding) AS movies_with_embedding
    """)
    record = result.single()
    print(f"전체 영화: {record['total_movies']}")
    print(f"임베딩 있는 영화: {record['movies_with_embedding']}")


전체 영화: 9125
임베딩 있는 영화: 9083


## 2. VectorCyper Retriever 사용하기
>`VectorCypherRetriever` 의 Arguments
>- driver (neo4j.Driver) - Neo4j 파이썬 드라이버
>- index_name (str) - Vector INDEX 이름
>- retrieval_query (str) - 추가 그래프 순회를 위한 쿼리
>- embedder (Optional[Embedder]) = 임베딩모델

In [13]:
# 영화(node)에 출연한 배우(Actor) 정보
from neo4j_graphrag.retrievers import VectorCypherRetriever

INDEX_NAME = "plotindex"

retrieval_query = """
MATCH
(actor:Actor)-[:ACTED_IN]->(node)
RETURN
node.title AS movie_title,
node.plot AS movie_plot, 
collect(actor.name) AS actors;
"""
retriever = VectorCypherRetriever(
    driver,
    index_name=INDEX_NAME,
    retrieval_query=retrieval_query,
    embedder= embedder
)

In [20]:
retriever.search(query_text="정글에서 보드게임하는 영화에 나왔던 배우들이 누가있지?", top_k=5)

RetrieverResult(items=[RetrieverResultItem(content="<Record movie_title='Yeh Jawaani Hai Deewani' movie_plot='Yeh Jawaani Hai Deewani is the story of the relationship between two characters, Bunny (Ranbir Kapoor) & Naina (Deepika Padukone), at two separate but defining times in their lives... first...' actors=['Ranbir Kapoor', 'Deepika Padukone', 'Aditya Roy Kapoor', 'Kalki Koechlin']>", metadata=None), RetrieverResultItem(content='<Record movie_title="Snake in the Eagle\'s Shadow (Se ying diu sau)" movie_plot="Jackie Chan is a boy who is used as a janitor at his kung-fu school. Jackie Chan can\'t fight and is always getting bullied by the teachers and pupils. One day an old man helps Jackie train ..." actors=[\'Hwang Jang-Lee\', \'Yuen Siu-Tien\', \'Dean Shek\', \'Jackie Chan\']>', metadata=None), RetrieverResultItem(content="<Record movie_title='Story of Xinghua, The (Xinghua san yue tian)' movie_plot='Gesitueerd in een verlaten dorp nabij de ruines van de Grote Muur in Noord-China, 

In [21]:
from neo4j_graphrag.retrievers import VectorCypherRetriever

INDEX_NAME = "plotindex"

retrieval_query = """
MATCH
(actor:Actor)-[:ACTED_IN]->(node)-[:IN_GENRE]->(genre:Genre)
RETURn
node.title AS movie_title,
node.plot AS movie_plot,
collect(DISTINCT actor.name) AS actors,
collect(DISTINCT genre.name) AS genres;
"""

retriever = VectorCypherRetriever(
    driver,
    index_name = INDEX_NAME,
    retrieval_query=retrieval_query, 
    embedder=embedder
)

In [23]:
retriever.search(query_text = "What is the movie about the magic board game?", top_k=5)

RetrieverResult(items=[RetrieverResultItem(content="<Record movie_title='Jumanji' movie_plot='When two kids find and play a magical board game, they release a man trapped for decades in it and a host of dangers that can only be stopped by finishing the game.' actors=['Robin Williams', 'Bradley Pierce', 'Kirsten Dunst', 'Jonathan Hyde'] genres=['Adventure', 'Children', 'Fantasy']>", metadata=None), RetrieverResultItem(content="<Record movie_title='Warcraft' movie_plot='An epic fantasy/adventure based on the popular video game series.' actors=['Travis Fimmel', 'Paula Patton', 'Ben Foster', 'Clancy Brown'] genres=['Fantasy', 'Action', 'Adventure']>", metadata=None), RetrieverResultItem(content="<Record movie_title='Last Action Hero' movie_plot='With the help of a magic ticket, a young film fan is transported into the fictional world of his favorite action film character.' actors=['F. Murray Abraham', 'Art Carney', 'Arnold Schwarzenegger', 'Charles Dance'] genres=['Fantasy', 'Adventure', '

## 3. GraphRAG 파이프라인 모듈 사용하기

In [None]:
from neo4j_graphrag.llm.openai_llm import OpenAILLM
from neo4j_graphrag.generation import RagTemplate, GraphRAG

# 커스텀된 템플릿 사용
prompt_template = RagTemplate(
    template ="""
    You are a helpful movie assistant. Based on the provided plot-based question and retrieved movie information, identify the most relevant movie and explain it to the user in a clear and concise manner.

    Use the context to include:
    - the movie title
    - a brief plot summary (if available)
    - genre(s)
    - main actor(s)

    If multiple movies match, mention the most likely one and briefly list alternatives.

    Answer the question below naturally and informatively in Korean.

    Question: {query_text}

    Context: {context}

    Answer:
    """,
    expected_inputs = ["context", "query_text"]

) 
# Query_text

llm = OpenAILLM(model_name="gpt-4o")
graph_rag = GraphRAG(retriever, llm, prompt_template=prompt_template)

In [29]:
query = "정글에서 보드게임을 하는 영화가 뭐더라??"
response = graph_rag.search(query_text = query, return_context=True)

In [30]:
print('## 생성 답변 ## \n' + response.answer)
print("\n## 검색 결과 ##")
for i in response.retriever_result.items:
    print(i.content)

## 생성 답변 ## 
당신이 말씀하신 정글에서 보드게임을 하는 영화는 "쥬만지: 새로운 세계"일 가능성이 높습니다. 이 영화는 주인공들이 오래된 비디오 게임을 시작하면서 게임 속 정글 안으로 빨려 들어가는 이야기로, 그들은 게임 속에서 살아남아야 합니다. 이 영화는 모험, 판타지, 코미디 장르에 속하며, 주요 배우로는 드웨인 존슨, 케빈 하트, 잭 블랙, 카렌 길런이 있습니다.

제공된 컨텍스트에서 해당 영화는 언급되지 않았지만, 질문에 가장 적합한 영화로 보입니다. 

## 검색 결과 ##
<Record movie_title='Yeh Jawaani Hai Deewani' movie_plot='Yeh Jawaani Hai Deewani is the story of the relationship between two characters, Bunny (Ranbir Kapoor) & Naina (Deepika Padukone), at two separate but defining times in their lives... first...' actors=['Ranbir Kapoor', 'Deepika Padukone', 'Aditya Roy Kapoor', 'Kalki Koechlin'] genres=['Comedy', 'Drama', 'Musical', 'Romance']>
<Record movie_title='Tidal Wave' movie_plot='Professor Kim, a marine geologist, recognizes the impending danger of a mega tsunami headed straight for Haeundae, a popular vacation spot on the south coast of Korea. He desperately ...' actors=['Min-ki Lee', 'Kyung-gu Sol', 'Nicole Dionne', 'Ji-won Ha'] genres=['Drama']>
<Record movie_ti

In [37]:
query = "19금 장르를 다루는 영화들을 알려줘."
response = graph_rag.search(query_text = query, return_context=True)

In [38]:
print('## 생성 답변 ## \n' + response.answer)
print("\n## 검색 결과 ##")
for i in response.retriever_result.items:
    print(i.content)

## 생성 답변 ## 
질문에서 언급된 '19금 장르'와 관련하여 제공된 영화 정보 내에서는 특별히 성인용 콘텐츠를 다루는 영화가 없습니다. 여기 제공된 영화들은 주로 드라마나 코미디 장르로 분류되며, 내용 면에서도 성인용으로 엄격하게 제한된 주제를 다루고 있지는 않습니다.

만약 성인용 혹은 보다 성숙한 주제를 다루는 영화를 찾고 계신다면 다른 영화 목록이나 추천을 참고하시기 바랍니다. 제공된 영화들 중 일부를 소개하자면:

- **Men, Women & Children**: 인터넷의 발달로 인해 사람들의 관계와 자아상, 사랑이 변화하는 과정을 그린 드라마 장르의 영화입니다. 주요 배우로는 애덤 샌들러, 로즈마리 드윗, 제니퍼 가너 등이 있습니다.

- **Saturday Night Fever**: 디스코 무대의 왕으로 성공을 꿈꾸는 브루클린 청소년의 이야기를 담고 있는 로맨스와 드라마 장르의 영화로, 존 트라볼타가 주연을 맡았습니다.

이외에도 다른 장르나 주제를 찾고 계신다면, 구체적인 내용이나 배우에 따라 다른 영화를 추천받으시는 것을 권장합니다.

## 검색 결과 ##
<Record movie_title='Men, Women & Children' movie_plot='A group of high school teenagers and their parents attempt to navigate the many ways the Internet has changed their relationships, their communication, their self-image, and their love lives.' actors=['Adam Sandler', 'Rosemarie DeWitt', 'Jennifer Garner', 'Judy Greer'] genres=['Comedy', 'Drama']>
<Record movie_title='Saturday Night Fever' movie_plot='A Brooklyn teenager feels his only c