<a href="https://colab.research.google.com/github/AmiriShavaki/KG-based-RAG-for-Multi-hop-QA/blob/main/src/Main_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Proposed Method

## Load the KG from Graph DB

### Install langchain neo4j

In [None]:
!pip install langchain neo4j openai wikipedia tiktoken langchain_openai langchain-community

Collecting langchain
  Downloading langchain-0.3.1-py3-none-any.whl.metadata (7.1 kB)
Collecting neo4j
  Downloading neo4j-5.25.0-py3-none-any.whl.metadata (5.7 kB)
Collecting openai
  Downloading openai-1.50.1-py3-none-any.whl.metadata (24 kB)
Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting tiktoken
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting langchain_openai
  Downloading langchain_openai-0.2.1-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.1-py3-none-any.whl.metadata (2.8 kB)
Collecting langchain-core<0.4.0,>=0.3.6 (from langchain)
  Downloading langchain_core-0.3.6-py3-none-any.whl.metadata (6.3 kB)
Collecting langchain-text-splitters<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0

### Connect to DB

In [None]:
from langchain.graphs import Neo4jGraph

url = input("Enter your Neo4j URL: ")

username = input("Enter your Neo4j username: ")
password = input("Enter your Neo4j password: ")

graph = Neo4jGraph(
    url=url,
    username=username,
    password=password
)

### Fetch all relations in the graph

In [None]:
data = graph.query("Match (n)-[r]->(m) Return n,r,properties(r),m")

In [None]:
len(data)

611

In [None]:
relations = []
for triplet in data:
    try:
        first_entity = triplet['n']['id']
    except:
        first_entity = triplet['n']['name']
    relation = triplet['r'][1]
    relation_properties = ' '.join(triplet['properties(r)'].values())
    try:
        second_entity = triplet['m']['id']
    except:
        second_entity = triplet['m']['name']
    relations.append((first_entity, relation, second_entity, relation_properties))

In [None]:
len(relations)

611

## Generating embeddings from relations

model="text-embedding-ada-002"

### OpenAI client

In [None]:
from openai import OpenAI
client = OpenAI(
    api_key=input("Enter your OpenAI API Key: "),
)

### Generate embeddings

In [None]:
relations[0]

('Karim Benzema', 'TOP_SCORER_IN', 'Real Madrid', '2020 La Liga season')

In [None]:
def get_embedding(text, model="text-embedding-ada-002"):
   text = text.replace("\n", " ")
   return client.embeddings.create(input = [text], model=model).data[0].embedding

rel_embeddings = []
for rel in relations:
    embedding = get_embedding(" ".join(rel))
    rel_embeddings.append((rel, embedding))

## Index embeddings into vector database

### Install faiss-cpu

In [None]:
! pip install faiss-cpu

### Add index on embedding vectors

In [None]:
import faiss
import numpy as np

embeddings_np = np.array([embedding for _, embedding in rel_embeddings])

dimension = embeddings_np.shape[1]
index = faiss.IndexFlatL2(dimension)

index.add(embeddings_np)

embedding_to_rel_map = {i: triplet for i, (triplet, _) in enumerate(rel_embeddings)}

## Retrieval

### Questions

In [None]:
questions = [
    "What was the primary match t-shirt color of the winner of the Ballon d'Or in the season of 1998-1999?",
    "What is the home stadium of the team that won the UEFA Champions League in 2012?",
    "What was the jersey number of the player who was the top scorer in the 2002 FIFA World Cup?",
    "What was the main architectural style of the palace where the Treaty of Versailles was signed?",
    "Who was the captain of the team that won the ICC Cricket World Cup in 2011 and what was his primary role in the team?",
    "What was the primary aircraft model used by the airline that was the largest operator in Europe in 2010?",
    "What was the primary diet of the largest land animal found in Africa?",
    "What was the primary programming language used by the company that developed the Windows operating system in the 1980s?",
    "What was the main theme of the novel written by the author who won the Nobel Prize in Literature in 1954?",
    "What was the primary material used in the construction of the structure that is the tallest building in the world?",
    "What was the primary economic activity of the civilization that built the pyramids of Giza?",
    "What was the primary industry of the company whose CEO was named Time's Person of the Year in 1999?",
    "What was the main ingredient in the first synthetic plastic created?",
    "What was the main objective of the space mission that successfully landed humans on the Moon in 1969?",
    "What is the primary language spoken in the country that hosted the 2008 Summer Olympics?",
    "What was the primary match t-shirt color of the team that won the FIFA World Cup in 2010?",
    "Who was the main character in the novel that won the Pulitzer Prize for Fiction in 1961?",
    "What was the primary economic activity in the city that hosted the Summer Olympics in 2000?",
    "What was the main theme of the movie that won the Academy Award for Best Picture in 1994?",
    "What was the national team that the player who won the Golden Boot in the 2014 FIFA World Cup played for?",
    "What is the home stadium of the team that won the NBA Championship in 2016?",
    "What was the jersey number of the player who was named MVP of the 2006 NBA Finals?",
    "What was the main architectural style of the building where the Declaration of Independence was signed?",
    "Who was the captain of the team that won the FIFA Women's World Cup in 2019 and what was her primary position?",
    "What was the primary diet of the largest marine mammal found in the Indian Ocean and South Pacific Ocean?",
    "What was the main theme of the play written by the playwright who won the Pulitzer Prize for Drama in 1949?",
    "What was the primary material used in the construction of the bridge that connects San Francisco to Marin County?",
    "What was the main mission of the Mars rover mission launched by NASA in 2011?",
    "What is the primary language spoken in the country that won the Eurovision Song Contest in 2017?",
    "What was the home stadium of the team that won the La Liga title in 2014?",
    "What was the jersey number of the player who scored the winning goal in the 2010 FIFA World Cup Final?",
    "What was the architectural style of the cathedral where the coronation of the French kings took place?",
    "Who was the captain of the team that won the Stanley Cup in 2013",
    "What is the primary diet of the largest predator in the Arctic Ocean?",
    "What was the primary material used in the construction of the bridge that connects Brooklyn to Manhattan?",
    "What was the primary industry of the company whose CEO was named Time's Person of the Year in 1997?",
    "What is the primary language spoken in the country that won the Rugby World Cup in 2003?",
    "What was the primary economic activity in the city that hosted the Winter Olympics in 2002?",
    "What was the primary material used in the construction of the monument located in Washington, D.C. dedicated to the third president of the United States?",
    "What is the primary language spoken in the country that hosted the FIFA World Cup in 2006?",
    "What was the primary match t-shirt color of the team that won the Champions League in 2019?",
    "What are the primary colors of the team that won the Super Bowl in 2020?",
    "What are the primary colors of the team that won the NBA Finals in 2018?",
    "What is the primary color of the team that won the FIFA World Cup in 1998?",
    "What are the primary colors of the team that won the UEFA Europa League in 2016?",
    "What was the primary language spoken in the country that hosted the Summer Olympics in 2004?",
    "What was the primary jersey color of the team that won the NBA Championship in 2014?",
    "What is the home stadium of the team that won the English Premier League in 2013?",
    "What was the jersey number of the player who won the MVP award in the 2017 NFL season?",
    "What was the primary architectural style of the building where the United Nations Charter was signed?",
    "Who was the captain of the team that won the FIFA World Cup in 2014 and what was his primary position?",
    "What was the primary diet of the largest bird found in North America?",
    "What was the primary economic activity of the civilization that built Machu Picchu?",
    "What was the primary jersey color of the team that won the Super Bowl in 2015?",
    "What is the home stadium of the team that won the MLB World Series in 2016?",
    "What was the main theme of the film that won the Academy Award for Best Picture in 2003?",
    "What was the primary diet of the largest fish found in the ocean?",
    "What is the primary language spoken in the country that hosted the Winter Olympics in 2010?",
    "What was the primary match t-shirt color of the team that won the Copa America in 2019?",
    "What was the primary industry of the company that introduced the iPhone?",
    "What was the primary diet of the largest terrestrial carnivore in North America?",
    "What was the main architectural style of the building where the first United Nations General Assembly was held?",
    "What was the primary match t-shirt color of the team that won the NBA Finals in 2015?",
    "What was the primary mission of the space probe that first landed on a comet?",
    "What was the primary language spoken in the country that hosted the Winter Olympics in 2014?",
    "What was the main theme of the television series that won the Primetime Emmy Award for Outstanding Drama Series in 2018?",
    "What was the primary diet of the largest reptile found in the Nile River?",
    "What was the primary material used in the construction of the ship that completed the first successful circumnavigation of the Earth?",
    "What was the primary color of the uniform worn by the team that won the World Series in 2016?",
    "What was the main architectural style of the palace where the Congress of Vienna was held?",
    "What was the primary diet of the largest bird found in Australia?",
    "What was the primary language spoken in the country that won the Eurovision Song Contest in 2018?",
    "What was the primary match t-shirt color of the team that won the English Premier League in the 2017-2018 season?",
    "What was the primary material used in the construction of the ship that carried the Pilgrims to North America in 1620?",
    "What was the primary ingredient in the food that sustained sailors on long voyages during the Age of Exploration?",
    "What was the primary language spoken in the country that won the FIFA World Cup in 2014?",
    "What was the primary match t-shirt color of the team that won the Copa del Rey in 2021?",
    "What was the primary material used in the construction of the first artificial satellite launched into space?",
    "What was the primary match t-shirt color of the team that won the Serie A in the 2019-2020 season?",
    "What was the primary match t-shirt color of the team that won the FA Cup in 2019?",
    "What was the primary architectural style of the building where the Magna Carta was signed?",
    "What was the primary match t-shirt color of the team that won the Copa Libertadores in 2018?",
    "What was the primary language spoken in the country that hosted the Summer Olympics in 2012?",
    "What was the primary match t-shirt color of the team that won the Bundesliga in 2020?",
    "What was the primary language spoken in the country that won the Rugby World Cup in 1995?",
    "What was the primary architectural style of the church where the Battle of Hastings was commemorated?",
    "What was the primary match t-shirt color of the team that won the Copa del Rey in 2018?",
    "What was the primary diet of the largest marine predator found in the Southern Ocean?",
    "What was the primary match t-shirt color of the team that won the Premier League in 2015?",
    "What was the primary material used in the construction of the spacecraft that first landed humans on the Moon?",
    "What was the primary language spoken in the country that won the FIFA Women's World Cup in 2015?",
    "What was the primary match t-shirt color of the team that won the Serie A in 2021?",
    "What was the primary language spoken in the country that hosted the FIFA World Cup in 1994?",
    "What was the primary match t-shirt color of the team that won the UEFA Europa League in 2020?",
    "What is the home stadium of the team that won the French Ligue 1 in 2019?",
    "What was the primary jersey number of the player who won the Ballon d'Or in 2018?",
    "What was the primary match t-shirt color of the team that won the Copa America in 2021?",
    "What was the primary material used in the construction of the tower that is the tallest in Tokyo?",
    "What is the home stadium of the team that won the Italian Serie A in 2019?",
    "What was the primary diet of the largest predator in the Amazon River?",
    "What was the primary match t-shirt color of the team that won the FA Cup in 2021?",
    "What was the primary economic activity in the city that hosted the Winter Olympics in 2018?",
    "What was the primary match t-shirt color of the team that won the Copa Libertadores in 2021?",
    "What was the primary language spoken in the country that won the Eurovision Song Contest in 2019?",
    "What was the primary architectural style of the palace where the Potsdam Conference was held?",
    "What was the primary material used in the construction of the spaceship that carried the first human into space?",
    "What was the primary match t-shirt color of the team that won the FIFA Club World Cup in 2019?",
    "Who was the captain of the team that won the UEFA Champions League in 2005, and what was his primary position?",
    "What was the primary language spoken in the country that won the FIFA Women's World Cup in 1999, and who was the top scorer of that tournament?",
    "What was the main ingredient used in the creation of the first practical incandescent light bulb, and who was the inventor?",
    "Who was the main character in the novel that won the Booker Prize in 1981?",
    "What was the home stadium of the team that won the English Premier League in 2002?",
    "What was the primary color of the jersey worn by the team that won the FA Cup in 2004?",
    "What was the primary architectural style of the building where the first United States Congress met?",
    "What was the jersey number of the player who won the NBA MVP award in 2011?",
    "Who was the surgeon that performed the first implantation of the first artificial heart?",
    "Who was the captain of the team that won the Rugby World Cup in 2007?",
    "What is the primary language spoken in the country that won the Rugby World Cup in 2007?",
    "What was the primary color of the match t-shirt of the team that won the UEFA Europa League in 2013?",
    "Who was the main character in the novel that won the Pulitzer Prize for Fiction in 1980?",
    "What was the primary language spoken in the country that hosted the Winter Olympics in 1998?",
    "What was the primary color of the team that won the Super Bowl in 2018?",
    "Who was the captain of the team that won the Super Bowl in 2018",
    "Who was the manager of the team that won the FIFA World Cup in 1994?",
    "What was the primary color of the team that won the FIFA World Cup in 1994?",
    "What was the primary match t-shirt color of the team that won the Copa Libertadores in 2015?",
    "What was the primary language spoken in the country that hosted the Eurovision Song Contest in 2010?",
    "Who was the captain of the team that won the Stanley Cup in 2008?",
    "What was primary position of the captain of the team that won the Stanley Cup in 2008?",
    "What was the primary diet of the largest carnivorous dinosaur discovered?",
    "Where was the largest carnivorous dinosaur found?",
    "What was the primary color of the team that won the FA Cup in 2006?",
    "Who was the architect of the building where the first modern Olympic Games were held?",
    "What was the primary architectural style of the building where the first modern Olympic Games were held?",
    "Who was the captain of the team that won the Copa America in 2011?",
    "What was primary position of the captain of the team that won the Copa America in 2011?",
    "What was the primary color of the jersey worn by the team that won the FIFA World Cup in 2006?",
    "Who was captain of the team that won the FIFA World Cup in 2006?",
    "In which continent were the largest herbivorous dinosaur fossils discovered?",
    "What was the primary material used in the construction of the Statue of Liberty, and who was the sculptor?",
    "Who was the manager of the team that won the UEFA Champions League in 2013?",
    "What was the primary color of the team that won the UEFA Champions League in 2013?",
    "What was the primary language spoken in the country that hosted the World Expo in 1967",
    "What was the primary architectural style of the building where the Nuremberg Trials took place?",
    "In which country is the building where the Nuremberg Trials took place located?",
    "Who was the main character in the novel that won the Pulitzer Prize for Fiction in 1975?",
    "What was the main theme of the novel that won the Pulitzer Prize for Fiction in 1975?",
    "What was the primary material used in the construction of the Hoover Dam, and who was the president of the United States at the time?",
    "What was the primary color of the team that won the UEFA Europa League in 2017?",
    "Who was captain of the team that won the UEFA Europa League in 2017?",
    "What was the primary language spoken in the country that won the FIFA Women's World Cup in 2011?",
    "What was the primary material used in the construction of the Sydney Opera House, and who was the architect?",
    "What was the primary color of the match t-shirt of the team that won the English Premier League in 2016?",
    "Who was the manager of the team that won the FIFA World Cup in 1986?",
    "What was the primary color of the team that won the FIFA World Cup in 1986?",
    "What was the primary language spoken in the country that won the Rugby World Cup in 1999?",
    "Who was captain of the team that won the Rugby World Cup in 1999?",
    "What was the primary architectural style of the palace where the Potsdam Conference took place?",
    "In which country was the palace where the Potsdam Conference took place, located?",
    "What was the primary material used in the construction of the Leaning Tower of Pisa, and who was the architect?",
    "What was the primary diet of the largest predator in the Atlantic Ocean?",
    "What is scientific name of the largest predator in the Atlantic Ocean?",
    "Who was the captain of the team that won the NBA Finals in 2020?",
    "What was the primary color of the team that won the NBA Finals in 2020?",
    "What was the primary material used in the construction of the Eiffel Tower, and who was the engineer responsible for its design?",
    "What was the primary color of the team that won the Copa del Rey in 2020?",
    "What was the primary diet of the largest carnivore found in Africa?",
    "What was the primary color of the team that won the NBA Finals in 2013?",
    "What was the jersey number of the player who won the Golden Boot in the 2018 FIFA World Cup?",
    "Which club was the player who won the Golden Boot in the 2018 FIFA World Cup playing for that same year?",
    "Who was the captain of the team that won the 2016 Summer Olympics Men's Football Tournament?",
    "What was primary position of the captain of the team that won the 2016 Summer Olympics Men's Football Tournament?",
    "What was the primary material used in the construction of the Colosseum in Rome, and who was the emperor who commissioned its construction?",
    "Who was captain of the team that won the 1995 Rugby World Cup?",
    "What was the main architectural style of the building where the Treaty of Paris 1783 was signed?, and what was the country that ceded the territory in this treaty?",
    "What was the primary diet of the largest land mammal during the Pleistocene epoch?",
    "Where were the fossils of the largest land mammal during the Pleistocene epoch primarily discovered?",
    "What was the primary color of the uniform worn by the team that won the 2017 Copa Libertadores?",
    "Who was the coach of the team that won the 2017 Copa Libertadores?",
    "What was the main objective of the space mission that first landed humans on Mars, and which agency led this mission?",
    "What was the primary match t-shirt color of the team that won the Copa América in 2011?",
    "Who was the top scorer for the team that won the Copa América in 2011?",
    "What was the primary language spoken in the country that won the FIFA World Cup in 2002?",
    "Who was the head coach of the team that won the FIFA World Cup in 2002?",
    "What was the primary diet of the largest flying bird in history?",
    "Where have largest flying bird in history fossils been found?",
    "What was the primary material used in the construction of the Parthenon, and who was the architect responsible for its design?",
    "What was the primary architectural style of the building where the first United Nations General Assembly was held, and in which city is it located?",
    "What was the primary material used in the construction of the Berlin Wall, and which country was primarily responsible for its construction?",
    "What was the primary match t-shirt color of the team that won the FIFA Club World Cup in 2014, and who was their top scorer?",
    "What was the primary language spoken in the country that won the Eurovision Song Contest in 2010, and who was the winning artist?",
    "What was the primary color of the uniform worn by the team that won the Stanley Cup in 2019?",
    "Who was captain of the team that won the Stanley Cup in 2019?",
    "What was the primary diet of the largest land carnivore during the Cretaceous period?",
    "Where Were fossils of largest land carnivore during the Cretaceous period discovered?",
    "What was the primary language spoken in the country that won the Eurovision Song Contest in 2009, and who was the winner?",
    "Who was the captain of the team that won the FIFA World Cup in 2002?",
    "What was primary position of the captain of the team that won the FIFA World Cup in 2002?",
    "What was the primary color of the jersey worn by the team that won the World Series in 2017?",
    "During which era did the largest herbivorous dinosaur in North America live?",
    "Who was the captain of the team that won the La Liga in 2012?",
    "What was the primary color of the team that won the La Liga in 2012?",
    "What was the primary language spoken in the country that won the FIFA World Cup in 2018?",
    "Who was the director of the movie that won the Academy Award for Best Picture in 1994?",
    "What was the primary theme of the movie that won the Academy Award for Best Picture in 1994?",
    "Where was the building located in which the Declaration of Independence was signed?",
    "What was the primary diet of the largest carnivorous dinosaur in South America?",
    "What was the primary subject of the painting that sold for the highest price in 2020?",
    "Who was the captain of the team that won the UEFA Champions League in 2020?",
    "What was primary jersey color of the team that won the UEFA Champions League in 2020?",
    "Who was the top scorer of the team that won the Copa America in 2019?",
    "What was primary jersey color of the team that won the Copa America in 2019?",
    "Who was the manager of the team that won the FA Cup in 2018?",
    "What was the primary language spoken in the country that won the FIFA World Cup in 2010?",
    "Who was top scorer of the team that won the FIFA World Cup in 2010?",
    "Who was top scorer of the team that won the NBA Finals in 2015?",
    "Who was the captain of the team that won the Copa Libertadores in 2019?",
    "What was primary jersey color of the team that won the Copa Libertadores in 2019?",
    "What was the primary language spoken in the country that hosted the World Cup in 2014?",
    "Who was the top scorer of the team that won the La Liga in 2019?",
    "What was primary jersey color of the team that won the La Liga in 2019?",
    "Who was the manager of the team that won the Serie A in 2019?",
    "Who was the captain of the team that won the Champions League in 2021?",
    "What was the primary jersey color of the team that won the Champions League in 2021?",
    "Who was the captain of the team that won the Premier League in 2017?",
    "What was the primary jersey color of the team that won the Premier League in 2017?",
    "Who was the top scorer for the team that won the UEFA Europa League in 2019?",
    "What was the primary jersey color of the team that won the UEFA Europa League in 2019?",
    "Who was the captain of the team that won the Copa del Rey in 2019?",
    "What was primary jersey color of ther team that won the Copa del Rey in 2019?",
    "Who was the top scorer for the team that won the Bundesliga in 2020?",
    "What was the primary language spoken in the country that hosted the Summer Olympics in 1980?",
    "What was the primary economic activity of the civilization that built the city of Petra?",
    "Who was the captain of the team that won the Copa Libertadores in 2020?",
    "What was primary jersey color of the team that won the Copa Libertadores in 2020?",
    "Who was the top scorer for the team that won the La Liga in 2018?",
    "What was primary jersey color of the team that won the La Liga in 2018?",
    "Who was the top scorer for the team that won the La Liga in 2018?",
    "Who was the captain of the team that won the Serie A in 2020?",
    "What was the primary language spoken in the country that hosted the Winter Olympics in 2006?",
    "Who was the top scorer for the team that won the UEFA Champions League in 2018?",
    "What was primary jersey color of the team that won the UEFA Champions League in 2018?",
    "What was the primary economic activity of the civilization that built the city of Angkor?",
    "Who was the captain of the team that won the FIFA World Cup in 2018?",
    "What was primary jersey color of the team that won the FIFA World Cup in 2018?",
    "What was the primary language spoken in the country that hosted the Summer Olympics in 1972?",
    "Who was the top scorer for the team that won the La Liga in 2020?",
    "What was primary jersey color of the team that won the La Liga in 2020?",
    "Who was the captain of the team that won the FIFA Women's World Cup in 2015?",
    "What was primary jersey color of the team that won the FIFA Women's World Cup in 2015?",
    "What was the primary language spoken in the country that hosted the Winter Olympics in 1994?",
    "Who was the top scorer for the team that won the UEFA Europa League in 2020?",
    "What was the primary language spoken in the country that hosted the Summer Olympics in 1996?",
    "Who was the captain of the team that won the Premier League in 2018?",
    "Who was the top scorer for the team that won the Serie A in 2021",
    "Who was the captain of the team that won the Copa del Rey in 2020?",
    "What was the primary language spoken in the country that hosted the Summer Olympics in 1984?",
    "What was the primary language spoken in the country that hosted the Winter Olympics in 1988?",
    "What was the primary match t-shirt color of the team that won the MLS Cup in 2020?",
    "What was the primary language spoken in the country that hosted the World Expo in 2010?",
    "What was the primary architectural style of the building where the Treaty of Ghent was signed?",
    "What was the primary match t-shirt color of the team that won the Copa del Rey in 2022?",
    "What was the primary diet of the largest marsupial found in Australia?",
    "What was the primary language spoken in the country that won the Eurovision Song Contest in 2016?",
    "What was the primary match t-shirt color of the team that won the French Cup in 2021?",
    "What was the primary architectural style of the building where the Yalta Conference was held?",
    "What was the primary mission of the space probe that first visited Pluto?",
    "What was the primary language spoken in the country that hosted the FIFA Confederations Cup in 2013?",
    "What was the primary match t-shirt color of the team that won the English Premier League in the 2020-2021 season?",
    "What was the primary diet of the largest predator found in the Amazon rainforest?",
    "What was the primary material used in the construction of the first successful human-powered flight vehicle?",
    "What was the primary language spoken in the country that hosted the Commonwealth Games in 2018?",
    "What was the primary match t-shirt color of the team that won the Scottish Premiership in 2020?",
    "What was the primary architectural style of the building where the Congress of the Confederation was held?",
    "What was the primary language spoken in the country that hosted the FIFA U-20 World Cup in 2017?",
    "What was the primary diet of the largest carnivorous mammal found in South America?",
    "What was the primary material used in the construction of the spacecraft that first orbited Mercury?",
    "What was the primary architectural style of the building where the Kellogg-Briand Pact was signed?",
    "What was the primary material used in the construction of the first space station?",
    "What was the primary match t-shirt color of the team that won the DFB-Pokal in 2021?",
    "What was the primary architectural style of the building where the United States Constitution was drafted?",
    "What was the primary match t-shirt color of the team that won the French Ligue 1 in 2020?",
    "What was the primary diet of the largest herbivore found in South America?",
    "What was the primary match t-shirt color of the team that won the Copa Sudamericana in 2020?",
    "What was the primary language spoken in the country that hosted the 1986 FIFA World Cup?",
    "What was the primary architectural style of the building where the United States Bill of Rights was signed?",
    "What was the primary material used in the construction of the first human-made object to reach the Moon?",
    "What was the primary match t-shirt color of the team that won the FA Women's Super League in 2020?",
    "What was the primary diet of the largest marine predator found in the Pacific Ocean?",
    "What was the primary material used in the construction of the first successful passenger airplane?",
    "What was the primary match t-shirt color of the team that won the Eredivisie in 2021?",
    "What was the primary architectural style of the building where the Declaration of the Rights of Man and of the Citizen was drafted?",
    "What was the primary material used in the construction of the first skyscraper?",
    "What was the primary language spoken in the country that won the UEFA Euro Championship in 2016?",
    "What was the primary architectural style of the building where the Peace of Westphalia was signed?",
    "What was the primary language spoken in the country that hosted the FIFA Women's World Cup in 2019?"
]

### Generate answer prompt

In [None]:
def generate_answer(question, retrieved_relations):
    formated_relations = list(map(lambda x:" ".join(x), retrieved_relations))
    prompt = "RETRIEVED RELATIONS:\n" + "\n".join(formated_relations)
    prompt += "\n\n" + "QUESTION:\n" + question
    prompt += "\n\n" + "INSTRUCTIONS:\n" + """Answer the users QUESTION using the RETRIEVED RELATIONS above.
Keep your answer ground in the facts of the RETRIEVED RELATIONS. it is guranteed retrieved relations are related to the question"""

    print()
    print()
    print("****************** generated prompt ******************")
    print(prompt)
    print("******************************************************")
    print()
    print()

    client = OpenAI(
        api_key=input("Enter your OpenAI API Key: "),
    )
    return client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model="gpt-3.5-turbo",
    ).choices[0].message.content.replace("'", '').split(',')

### Use LLM to extract the relations to search



In [None]:
from openai import OpenAI

client = OpenAI(
    api_key=input("Enter your OpenAI API Key: "),
)

def generate_search_relations(question):
    prompt = f"""Given the following user query: '{question}', suggest which relation I should search for in my knowledge graph. The relations are stored in the format \"Entity1 RelationName Entity2 propertiesOfRelation\", where:

Entity1 is the subject of the relation,
RelationName describes the connection between two entities,
Entity2 is the object of the relation,
propertiesOfRelation includes additional relevant details.
If the query provides clear entities or relation parts, suggest an appropriate relation format to search for, filling in placeholders (e.g. '?') for any missing elements. If only partial information is given, suggest the most likely relation string based on the available parts.

Example format: \"Karim Benzema  TOP_SCORER_IN  Real Madrid  2020 La Liga season\"

Give me each candidate relation in a separate line. (at least one, maximum five)
Do not include any explanations or apologies in your responses. Do not include any text except the relations.

Some of ENTITY or RELATION maybe not known from the question, so please put ? instead of them. Remember that proper format of a triplet is a RELATION

For example if I give you this question: "Who was top scorer of real madrid in 2020 La Liga season?" then you can give me this relation to retrieve: \"?  TOP_SCORER_IN  Real Madrid 2020  La Liga season\"
"""

    client = OpenAI(
        api_key=input("Enter your OpenAI API Key: "),
    )
    result = client.chat.completions.create(
            messages=[
                {
                    "role": "user",
                    "content": prompt,
                }
            ],
            model="gpt-4o",
        ).choices[0].message.content.replace('"', '').replace("'", '').replace('(', '').replace(')', '')
    return result.split('\n')

In [None]:
suggested_entity_relations = []

for query in questions:
    print(f"Query: {query}")
    suggestions = generate_search_relations(query)
    suggestions = [s for s in suggestions if s]
    print("Suggested entity-relations:")
    for i, sug in enumerate(suggestions):
        print(sug)
        suggestions[i] = sug.replace("?", "")
    suggested_entity_relations.append(suggestions)
    print("\n\n")

Query: What was the primary match t-shirt color of the winner of the Ballon d'Or in the season of 1998-1999?
Suggested entity-relations:
?  WINNER_OF  Ballon dOr 1998-1999
?  PRIMARY_MATCH_TSHIRT_COLOR  ?  1998-1999 season
?  WINNER_OF  ?  propertiesOfRelation {primaryMatchTshirtColor: ?}
?  BALLON_DOR_WINNER  ?  1998-1999
?  TSHIRT_COLOR  ?  Ballon dOr winner 1998-1999



Query: What is the home stadium of the team that won the UEFA Champions League in 2012?
Suggested entity-relations:
?  WON  UEFA Champions League 2012
?  IS_HOME_STADIUM_OF  ?  name=?
?  HOME_STADIUM_OF  ?  2012 UEFA Champions League winner
Chelsea WON  UEFA Champions League 2012
Chelsea IS_HOME_STADIUM_OF  ?



Query: What was the jersey number of the player who was the top scorer in the 2002 FIFA World Cup?
Suggested entity-relations:
?  TOP_SCORER_IN  2002 FIFA World Cup  ?
?  JERSEY_NUMBER_OF  ?  ?
?  JERSEY_NUMBER_OF  ?  top scorer 2002 FIFA World Cup
Top Scorer 2002 FIFA World Cup  JERSEY_NUMBER_OF  ?
?  TOP_SC

In [None]:
answers = []

for i, question in enumerate(questions):
    print(f"Query {i+1} out of {len(questions)}:")
    print(f"    - Query: {questions[i]}")
    print(f"    - Retrieved pairs:")
    query_embeddings = [np.array(get_embedding(search_candidate)).reshape(1, -1) for search_candidate in suggested_entity_relations[i]]
    retrieved_pairs = []
    for suggestion in query_embeddings:
        D, I = index.search(np.array(suggestion).reshape(1, -1), k=5)
        for idx in I[0]:
            if idx != -1:
                print(f"      - {relations[idx]}")
                retrieved_pairs.append(relations[idx])
    answers.append("".join(generate_answer(question, retrieved_pairs)))
    print(f"    - Answer: {answers[-1]}")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
      - ('Paris Saint-Germain', 'SHIRT_COLOUR', 'Blue', '')
      - ('Real Madrid', 'SHIRT_COLOUR', 'White', '')
      - ('Chelsea', 'SHIRT_COLOUR', 'Blue', '')
      - ('Sevilla', 'WON', 'Uefa Europa League', '2020')
      - ('Chelsea', 'WON', 'Champions League', '2021')
      - ('Bayern Munich', 'WON', 'Bundesliga', '2020')
      - ('Real Madrid', 'WON', 'Uefa Champions League', '2018')
      - ('Real Madrid', 'WON', 'La Liga', '2020')
      - ('Bayern Munich', 'SHIRT_COLOUR', 'Red', '')
      - ('Chelsea', 'WON', 'Champions League', '2021')
      - ('Paris Saint-Germain', 'SHIRT_COLOUR', 'Blue', '')
      - ('Manchester United', 'SHIRT_COLOUR', 'Red', '')
      - ('Chelsea', 'SHIRT_COLOUR', 'Blue', '')
      - ('Bayern Munich', 'SHIRT_COLOUR', 'Red', '')
      - ('Bayern Munich', 'WON', 'Bundesliga', '2020')
      - ('Manuel Neuer', 'HAS_CAPTAINED', 'Bayern Munich', '2020')
      - ('Bayern Munich', 'WON', 'Uefa Champi

In [None]:
answers

['The primary match t-shirt color of the winner of the Ballon dOr in the season of 1998-1999 was blue. This is because Rivaldo won the Ballon dOr in 1999 and the color associated with his club FC Barcelona is blue.',
 'The home stadium of the team that won the UEFA Champions League in 2012 is Stamford Bridge. The team is Chelsea Football Club.',
 'The retrieved relations do not provide information about the top scorer in the 2002 FIFA World Cup. Therefore the jersey number of the player who was the top scorer in the 2002 FIFA World Cup cannot be determined from the given data.',
 'The main architectural style of the palace where the Treaty of Versailles was signed is Baroque. This is based on the retrieved relation that the Palace of Versailles which includes the Hall of Mirrors where the treaty was signed is of the Baroque architectural style.',
 'The captain of the team that won the ICC Cricket World Cup in 2011 was Mahendra Singh Dhoni. His primary role in the team was as the captai

# ‌Baseline RAG

### Install packages

In [None]:
!pip install datasets
!pip install ragas

Collecting datasets
  Downloading datasets-3.0.0-py3-none-any.whl.metadata (19 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.0.0-py3-none-any.whl (474 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m474.3/474.3 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl (39.9 MB)
[2K  

Collecting ragas
  Downloading ragas-0.1.18-py3-none-any.whl.metadata (5.4 kB)
Collecting pysbd>=0.3.4 (from ragas)
  Downloading pysbd-0.3.4-py3-none-any.whl.metadata (6.1 kB)
Collecting appdirs (from ragas)
  Downloading appdirs-1.4.4-py2.py3-none-any.whl.metadata (9.0 kB)
Downloading ragas-0.1.18-py3-none-any.whl (185 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m185.7/185.7 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pysbd-0.3.4-py3-none-any.whl (71 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.1/71.1 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Installing collected packages: appdirs, pysbd, ragas
Successfully installed appdirs-1.4.4 pysbd-0.3.4 ragas-0.1.18


### Baseline RAG

In [None]:
import pandas as pd

file_path = 'shuffled_dataset.csv'
data = pd.read_csv(file_path)

translated_texts = data['text']

output_file_path = 'rag_repo.txt'

with open(output_file_path, 'w') as file:
    for text in translated_texts:
        file.write(text + '\n\n')

In [None]:
from langchain.document_loaders import TextLoader

loader = TextLoader('./rag_repo.txt')
documents = loader.load()

In [None]:
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)



In [None]:
!pip install langchain openai weaviate-client

Collecting weaviate-client
  Downloading weaviate_client-4.8.1-py3-none-any.whl.metadata (3.6 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting validators==0.34.0 (from weaviate-client)
  Downloading validators-0.34.0-py3-none-any.whl.metadata (3.8 kB)
Collecting authlib<1.3.2,>=1.2.1 (from weaviate-client)
  Downloading Authlib-1.3.1-py2.py3-none-any.whl.metadata (3.8 kB)
Collecting grpcio-tools<2.0.0,>=1.57.0 (from weaviate-client)
  Downloading grpcio_tools-1.66.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting grpcio-health-checking<2.0.0,>=1.57.0 (from weaviate-client)
  Downloading grpcio_health_checking-1.66.1-py3-none-any.whl.metadata (1.1 kB)
Collecting protobuf<6.0dev,>=5.26.1 (from grpcio-health-checking<2.0.0,>=1.57.0->weaviate-client)
  Downloading protobuf-5.28.1-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Collecting grpcio<2.0.0,>=1.57.0 (from weaviate

In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Weaviate
import weaviate
from weaviate.embedded import EmbeddedOptions
import os

os.environ["OPENAI_API_KEY"] = input("Enter your OpenAI API Key: ")

client = weaviate.Client(
  embedded_options = EmbeddedOptions()
)

vectorstore = Weaviate.from_documents(
    client = client,
    documents = chunks,
    embedding = OpenAIEmbeddings(),
    by_text = False
)

class LoggingRetriever:
    def __init__(self, retriever):
        self.retriever = retriever

    def get_relevant_documents(self, query):
        docs = self.retriever.get_relevant_documents(query)
        print(f"Retrieved documents for query '{query}':")
        for doc in docs:
            print(doc.page_content)
        return docs

logging_retriever = LoggingRetriever(vectorstore.as_retriever())

Python client v3 `weaviate.Client(...)` connections and methods are deprecated and will
            be removed by 2024-11-30.

            Upgrade your code to use Python client v4 `weaviate.WeaviateClient` connections and methods.
                - For Python Client v4 usage, see: https://weaviate.io/developers/weaviate/client-libraries/python
                - For code migration, see: https://weaviate.io/developers/weaviate/client-libraries/python/v3_v4_migration

            If you have to use v3 code, install the v3 client and pin the v3 dependency in your requirements file: `weaviate-client>=3.26.7;<4.0.0`
  client = weaviate.Client(
INFO:weaviate-client:Binary /root/.cache/weaviate-embedded did not exist. Downloading binary from https://github.com/weaviate/weaviate/releases/download/v1.26.1/weaviate-v1.26.1-Linux-amd64.tar.gz
INFO:weaviate-client:Started /root/.cache/weaviate-embedded: process ID 84516
  embedding = OpenAIEmbeddings(),


In [None]:
from langchain.prompts import ChatPromptTemplate

template = """You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
Limit your knowledge to the provided data If you can't answer the question based on provided data, just say that you don't know.
Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

def retriever_fn(query):
    docs = logging_retriever.get_relevant_documents(query)
    return {'context': "\n".join([doc.page_content for doc in docs]), 'question': query}

rag_chain = (
    retriever_fn
    | prompt
    | llm
    | StrOutputParser()
)

langchain_answers = []

for query in questions:
    langchain_answers.append(rag_chain.invoke(query))

  llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
  docs = self.retriever.get_relevant_documents(query)


[1;30;43mStreaming output truncated to the last 5000 lines.[0m

The primary match t-shirt color of Borussia Dortmund is yellow.
The primary color of the United States' jersey is white.

Japan won the FIFA Women's World Cup in 2011.

The primary material used in the construction of the Gossamer Albatross was carbon fiber.

Portugal won the Eurovision Song Contest in 2017

France won the FIFA World Cup in 2018.

Cafu primarily played as a right-back.

Manchester City won the English Premier League in the 2017-2018 season.

The primary material used in the construction of Luna 2 was aluminum alloy.
For over 100 years the colours blue and garnet have come together on the shirts of FC BARCELONA players and this is why the Club is also known as the equip blaugrana (blue and garnet team). The colours have always been used on the team kit especially the shirt. The shorts were white for the first 10 years of the Club's history then black and since the 1920s blue. 

Defensa y Justicia won the 

In [None]:
langchain_answers

["The primary match t-shirt color of the winner of the Ballon d'Or in the season of 1998-1999 was not provided in the given context.",
 "The home stadium of the team that won the UEFA Champions League in 2012 is not provided in the context, so I don't know.",
 'Ronaldo wore the jersey number 9 when he was the top scorer in the 2002 FIFA World Cup.\n\n',
 'The main architectural style of the Palace of Versailles, where the Treaty of Versailles was signed, is Baroque.',
 'The captain of the team that won the ICC Cricket World Cup in 2011 was Mahendra Singh Dhoni. His primary role in the team was as a wicketkeeper-batsman and captain. India emerged as the champions of the tournament, with Sri Lanka finishing as the runners-up.',
 'The primary aircraft model used by the airline that was the largest operator in Europe in 2010 was the Boeing 737. Ryanair, the biggest airline in Europe, has adjusted their Boeing 737 planes to have 189 seats, maximizing capacity.',
 'The primary diet of the la