# Assignment 3: Grounding LLMs with a Knowledge Graph

- In this Jupyter Notebook we will use an  architectural patterns through which a knowledge graph can be used together with an LLM to provide grounded answers to the users.

- We will use the movie neo4j knowledge graph we saw 2 days ago.

- To create an instance of this graph, go to the Neo4j sandbox in this [link](https://sandbox.neo4j.com/), log in, and click on "New Project."  From here, select the Movies graph and "Create".

---

> Evangelia P. Panourgia, Master Student in Data Science, AUEB <br />
> Department of Informatics, Athens University of Economics and Business <br />
> eva.panourgia@aueb.gr <br/><br/>

## Setup


In [1]:
# install the necessary libraries
!pip install langchain
!pip install -U langchain-openai langchain-community
!pip install openai
!pip install neo4j
! pip install pandas 


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgr

In [2]:
# The OpenAI key will be valid for the duration of this hands-on. We are using chatGPT 3.5

OPENAI_API_KEY = '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'
OPENAI_ENDPOINT = 'https://api.openai.com/v1/embeddings'


In [3]:
# import libarries for neo4j & LLM
from langchain_openai import ChatOpenAI
from langchain.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain.prompts import PromptTemplate

# import data preprocess libraries
import pandas as pd 
import json
import os

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [4]:
# Connect to GPT. Temperature is set to 0 as we want to have consinstency in the GPT responses
llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY, temperature = 0
)

In [5]:
# Connect to the movie graph by specifying the Bolt URL, the username and the password. These are available under the "Connection Details" tab in the instance we created.

movie_graph = Neo4jGraph(
    url="bolt://3.232.107.187:7687",
    username="neo4j",
    password="loaf-dirt-tears",
)

# Load the Data 

In [6]:
df_cypher = pd.read_csv('cypher_dataset.csv', sep=';')
df_cypher

Unnamed: 0,main_category,sub_category,nlp_question,cypher_code,actual_returned_output
0,Simple Questions,Attribute Queries,What is the tagline of the movie The Matrix?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",2.json
1,Simple Questions,Attribute Queries,When was the movie The Matrix released?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",3.json
2,Simple Questions,Attribute Queries,What is the birth year of Keanu Reeves?,"MATCH (p:Person {name: ""Keanu Reeves""}) RETURN...",4.json
3,Simple Questions,Relationship Queries,Who acted in the movie The Matrix?,MATCH (a:Person)-[:ACTED_IN]->(m:Movie {title:...,5.json
4,Simple Questions,Relationship Queries,Who produced the movie The Matrix?,MATCH (p:Person)-[:PRODUCED]->(m:Movie {title:...,6.json
5,Simple Questions,Relationship Queries,Who wrote the movie A Few Good Men?,"MATCH (w:Person)-[:WROTE]->(m:Movie {title: ""A...",7.json
6,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (d:Person {name: ""Lana Wachowski""})-[:DI...",8.json
7,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (a:Person {name: ""Keanu Reeves""})-[:ACTE...",9.json
8,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (p:Person {name: ""Joel Silver""})-[:PRODU...",10.json
9,Advanced Queries,Two Relationships,What is the name of the movie that Lilly Wacho...,"MATCH (d:Person {name: ""Lilly Wachowski""})-[:D...",11.json


## Pattern 1: The LLM answers questions by transforming them into Cypher queries and executing them against a knowledge graph

In [7]:
# We define a prompt template to generate a cypher query from an input question, given a knowledge graph schema.

CYPHER_GENERATION_TEMPLATE = """
Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}

Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.

The question is:
{question}"""


cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["schema", "question"],
)

In [8]:
# We create a Cypher QA chain and pass as parameters the llm, the graph, and the prompt template
cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=movie_graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    allow_dangerous_requests = True
)

In [9]:
# Now, we can use the chain to transform natural language questions into Cypher queries and get an answer by executing the queries
def answer_question_using_cypher(question):
  try:
    answer = cypher_chain.run(question)
    print("Answer: ", answer)
  except Exception as e:
    print("Problem answering the question: ",e)


In [10]:
index_json = 2
for nlp_question in df_cypher["nlp_question"]:
    print(f"{index_json} ===================={nlp_question}============================================")
    answer_question_using_cypher(nlp_question)
    index_json +=1



[1m> Entering new GraphCypherQAChain chain...[0m


  answer = cypher_chain.run(question)


Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "The Matrix"})
RETURN m.tagline[0m
Full Context:
[32;1m[1;3m[{'m.tagline': 'Welcome to the Real World'}][0m

[1m> Finished chain.[0m
Answer:  Welcome to the Real World.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "The Matrix"})
RETURN m.released[0m
Full Context:
[32;1m[1;3m[{'m.released': 1999}][0m

[1m> Finished chain.[0m
Answer:  The movie The Matrix was released in 1999.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: "Keanu Reeves"})
RETURN p.born as birth_year;[0m
Full Context:
[32;1m[1;3m[{'birth_year': 1964}][0m

[1m> Finished chain.[0m
Answer:  Keanu Reeves was born in 1964.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title = 'The Matrix'
RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Emil 

- The generated txt that appears in terminal (Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...), I leverage it  via permanent saving in the folder `task_1` and I locate its context in a file named `report_llm.txt`
- In order to evaluate it.
- I split my screen so as in left I vahe the current json file see folder `json` that contains the "actual" output of the cyoher (I created tested them duting the dataset creation).
- And in the right side, I have the generated LLM repoort.
- I compare them, and fill the list in below values 
    - 1: if the prediction is correct 
    - 0: if the prediction is wrong.
- I do it in order tp extend the data frame with an additional column named `gpt_prediction`
- This is useful for the evaluation process.

In [11]:
# manually work implementes for the lebels.
List_gpt_prediction = [
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1, 
0, #(Answer:  I don't know the answer., while it finds it..)
0, #(Answer:  I don't know the answer., while it finds it..)
0, #(Answer:  I don't know the answer., while it finds it..)
1, 
0, #returns fewer rowns in the final answer 
0, #returns fewer rowns  in the final answer 
1,
0, #(Answer:  I don't know the answer., while it finds it..)
0, #(Answer:  I don't know the answer.)
0, #(Answer:  I don't know the answer.)
0, #(diffrent returns)
0, #syntax 
0, #(Answer:  I don't know the answer.)
0, #syntax error 
0, #syntax error
0, #syntax error 
]


# Error analysis

In [12]:
df_cypher["gpt_prediction"] = List_gpt_prediction
df_cypher

Unnamed: 0,main_category,sub_category,nlp_question,cypher_code,actual_returned_output,gpt_prediction
0,Simple Questions,Attribute Queries,What is the tagline of the movie The Matrix?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",2.json,1
1,Simple Questions,Attribute Queries,When was the movie The Matrix released?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",3.json,1
2,Simple Questions,Attribute Queries,What is the birth year of Keanu Reeves?,"MATCH (p:Person {name: ""Keanu Reeves""}) RETURN...",4.json,1
3,Simple Questions,Relationship Queries,Who acted in the movie The Matrix?,MATCH (a:Person)-[:ACTED_IN]->(m:Movie {title:...,5.json,1
4,Simple Questions,Relationship Queries,Who produced the movie The Matrix?,MATCH (p:Person)-[:PRODUCED]->(m:Movie {title:...,6.json,1
5,Simple Questions,Relationship Queries,Who wrote the movie A Few Good Men?,"MATCH (w:Person)-[:WROTE]->(m:Movie {title: ""A...",7.json,1
6,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (d:Person {name: ""Lana Wachowski""})-[:DI...",8.json,1
7,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (a:Person {name: ""Keanu Reeves""})-[:ACTE...",9.json,1
8,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (p:Person {name: ""Joel Silver""})-[:PRODU...",10.json,1
9,Advanced Queries,Two Relationships,What is the name of the movie that Lilly Wacho...,"MATCH (d:Person {name: ""Lilly Wachowski""})-[:D...",11.json,1


In [13]:
df_analysis_error_predictions = df_cypher[df_cypher["gpt_prediction"] == 0]
df_analysis_error_predictions

Unnamed: 0,main_category,sub_category,nlp_question,cypher_code,actual_returned_output,gpt_prediction
11,Advanced Queries,Two Relationships,What movies did Rob Reiner direct that were al...,"MATCH (w:Person {name: ""Rob Reiner""})-[:DIRECT...",13.json,0
12,Advanced Queries,Two Relationships with Attribute Filter Queries,What movies did Rob Reiner direct and Aaron So...,"MATCH (w:Person {name: ""Rob Reiner""})-[:DIRECT...",14.json,0
13,Advanced Queries,Two Relationships with Attribute Filter Queries,Which movies did Jim Cash write and Val Kilmer...,"MATCH (w:Person {name: ""Jim Cash""})-[:WROTE]->...",15.json,0
15,Advanced Queries,Exclusions of Entities,What movies did Lana Wachowski direct that do ...,"MATCH (d:Person {name: ""Lana Wachowski""})-[:DI...",17.json,0
16,Advanced Queries,Exclusions of Entities,What movies did Lana Wachowski direct that do ...,"MATCH (d:Person {name: ""Lana Wachowski""})-[:DI...",18.json,0
18,Advanced Queries,Relationship with Aggregation Queries,Which producers have produced at least four mo...,MATCH (p:Person)-[:PRODUCED]->(m:Movie) WHERE ...,20.json,0
19,Advanced Queries,Relationship with Aggregation Queries,How many movies did each director direct betwe...,MATCH (d:Person)-[:DIRECTED]->(m:Movie) WHERE ...,21.json,0
20,Advanced Queries,Relationship with Aggregation Queries,What is the total sum of ratings for movies di...,MATCH (d:Person)-[:DIRECTED]->(m:Movie) WHERE ...,22.json,0
21,Advanced Queries,Multi-Hop and Nested Relationship Queries,Which movies have different people as the writ...,MATCH (w:Person)-[:WROTE]->(m:Movie)<-[:DIRECT...,23.json,0
22,Advanced Queries,Multi-Hop and Nested Relationship Queries,Which movies have different people as the revi...,MATCH (r:Person)-[:REVIEWED]->(m:Movie)<-[:PRO...,24.json,0


- make column  `actual`_returned_output` as index in order to know which json actuaal output too see in folder `json`.
- it contains the actual output (we run then beforehand to know the correct answer for eac hnlp question).

In [14]:
df_analysis_error_predictions.set_index('actual_returned_output', inplace=True)

In [15]:
df_analysis_error_predictions

Unnamed: 0_level_0,main_category,sub_category,nlp_question,cypher_code,gpt_prediction
actual_returned_output,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
13.json,Advanced Queries,Two Relationships,What movies did Rob Reiner direct that were al...,"MATCH (w:Person {name: ""Rob Reiner""})-[:DIRECT...",0
14.json,Advanced Queries,Two Relationships with Attribute Filter Queries,What movies did Rob Reiner direct and Aaron So...,"MATCH (w:Person {name: ""Rob Reiner""})-[:DIRECT...",0
15.json,Advanced Queries,Two Relationships with Attribute Filter Queries,Which movies did Jim Cash write and Val Kilmer...,"MATCH (w:Person {name: ""Jim Cash""})-[:WROTE]->...",0
17.json,Advanced Queries,Exclusions of Entities,What movies did Lana Wachowski direct that do ...,"MATCH (d:Person {name: ""Lana Wachowski""})-[:DI...",0
18.json,Advanced Queries,Exclusions of Entities,What movies did Lana Wachowski direct that do ...,"MATCH (d:Person {name: ""Lana Wachowski""})-[:DI...",0
20.json,Advanced Queries,Relationship with Aggregation Queries,Which producers have produced at least four mo...,MATCH (p:Person)-[:PRODUCED]->(m:Movie) WHERE ...,0
21.json,Advanced Queries,Relationship with Aggregation Queries,How many movies did each director direct betwe...,MATCH (d:Person)-[:DIRECTED]->(m:Movie) WHERE ...,0
22.json,Advanced Queries,Relationship with Aggregation Queries,What is the total sum of ratings for movies di...,MATCH (d:Person)-[:DIRECTED]->(m:Movie) WHERE ...,0
23.json,Advanced Queries,Multi-Hop and Nested Relationship Queries,Which movies have different people as the writ...,MATCH (w:Person)-[:WROTE]->(m:Movie)<-[:DIRECT...,0
24.json,Advanced Queries,Multi-Hop and Nested Relationship Queries,Which movies have different people as the revi...,MATCH (r:Person)-[:REVIEWED]->(m:Movie)<-[:PRO...,0


- For each index that is 13, 14, 15, 16, 21, 22, 23, 24, 25, 26, 27, 28 
- Via analyzing both the actual output (folder `json`) and the llm-report (folder:task_1)
- We have the following observations: 
    - For `13`: The cypher is correct, however the llm returns   "I don't know the answer", but it finds it as the cypher is correct, strange ... So, we have error in the level of final output.
    - For `14` : The cypher is correct, however the llm returns   "I don't know the answer", but it finds it as the cypher is correct, strange ... So, we have error in the level of final output.
    - For `15` The cypher is correct, however the llm returns   "I don't know the answer", but it finds it as the cypher is correct, strange ... So, we have error in the level of final output.
    - For `17`  Again cypher is correct but in the final output the returned values are fewer it returns only Lana Wachowski directed "Speed Racer" and "Cloud Atlas" that do not feature Keanu Reeves as an actor. but we have the folloing movies : 
   ```
  [{"m.title": "Speed Racer"},{"m.title": "Cloud Atlas"},{"m.title": "The Matrix Revolutions"},{"m.title": "The Matrix Reloaded"},{"m.title": "The Matrix"}]``` 
- For `18`  Again cypher is correct but in the final output the returned values are fewer 
- For `20` The cypher is correct, however the llm returns   "I don't know the answer", but it finds it as the cypher is correct, strange ... So, we have error in the level of final output.
- For `21` {create figure} run other cypher than teh correct one ! confused meaning of attribute born and released. Furthermore, if the query had was correc tthen again ois wrotng the answer say don't know for null handle it. 
- For `22` The cypher is correct, however the llm returns   "I don't know the answer", but it finds it as the cypher is correct, strange ... So, we have error in the level of final output.
- For `23` {craete figure} run other cypher (omit where )
- For `24` syntax
- For `25` {create figure} run other cypher than teh correct one ! (omit where)
- fOR `26`, `27`, `28` erro syntax

------

* **Error in the level of Answer, returning "I don't know the answer"** (see Figure #structure_matrix_query), but the Cypher query is correct. This phenomenon represents (**35.71%**) of the total detected errors. This category includes the following index numbers: 13, 14, 15, 20, 22.

* **Error in the level of Answer, returning fewer data**, but the Cypher query is correct and returns **all** the needed data. This phenomenon represents (**14.28%**) of the total detected errors. This category includes the following index numbers: 17, 18.

* **Error in the level of Generated Cypher** (see Figure #structure_matrix_query), where the Cypher query is incorrect (e.g., using the incorrect attribute "born" instead of "released"). This phenomenon represents (**21.42%**) of the total detected errors. This category includes the following index numbers: 21, 23, 25.

* **Cypher Syntax Error**, this phenomenon represents (**28.57%**) of the total detected errors. This category includes the following index numbers: 24, 26, 27, 28.


# Evaluation Metrices / Estimate Performance 

## Accuracy Formula

The formula for calculating accuracy is as follows:

- Accuracy = (Number of Correct Predictions) / (Total Number of Predictions)


In [16]:
# initialize a dictionary storing all calculated accuracy for each approach useful for our final report. 
dict_accuracy_per_approach = {}

In [17]:
number_of_incorrect_prediction = df_analysis_error_predictions.shape[0]
total_number_of_prediction = df_cypher.shape[0]
number_of_correct_predictions = total_number_of_prediction - number_of_incorrect_prediction

accuracy_main_category_task_1 = number_of_correct_predictions / total_number_of_prediction
print(f"The accuracy based on our main categories is : {round(accuracy_main_category_task_1,2)}")

The accuracy based on our main categories is : 0.48


In [18]:
# store the accuracy to our dictionary 
dict_accuracy_per_approach.update({'initial':round(accuracy_main_category_task_1,2)})
dict_accuracy_per_approach

{'initial': 0.48}

Notes : 

- An accuracy of 0.48 means that 48% of the predictions made by the LLM (ChatGPTOpenAI). 
- In other words, out of all the predictions, only 48% match the actual results, while the remaining 52% are incorrect.

- This indicates that the model's performance is not much better than random guessing (especially if it's a binary classification problem, where a random guess would have an accuracy of around 0.5). It suggests that the model may need further improvement or tuning to achieve better accuracy.

# Additional Evaluation / Analysis for the sub-categories 

- Now, we will use the data frame containing the "errors" rows, in order to extract some specific insights for the sub categories.
- We want to find the vulnerabilities of our model in order to know the user the "dangerous" cases.

In [19]:
number_of_incorrect_prediction

14

In [20]:
df_analysis_error_predictions[["main_category","sub_category"]].value_counts()

main_category     sub_category                                   
Advanced Queries  Multi-Hop and Nested Relationship Queries          3
                  Relationship with Aggregation Queries              3
                  Sub-Query-Cardinality Queries                      3
                  Exclusions of Entities                             2
                  Two Relationships with Attribute Filter Queries    2
                  Two Relationships                                  1
Name: count, dtype: int64

From the "table" derived from `value_counts()` functions, we can conclude teh following ones:

- Let's examine the coverage of erros, for the "dominant" categories: 
- Our model is "perfect" for the main category named "Simple Questions"
- But, makes mistakes for the "Advanced Questions"
- For `Multi-Hop and Nested Relationship Queries` the percenatge of errors is `(21.42)%` to the total number of errors 
- For  `Relationship with Aggregation Queries` the percenatge of errors is `(21.42)%` to the total number of errors 
- For  `Sub-Query-Cardinality Queries ` the percenatge of errors is `(21.42)%` to the total number of errors 

In [21]:
(100*3)/14

21.428571428571427

# Task 2

# Approach 1 for "Potential Improvemnt" - Zero Shot Learning

In [22]:
# We define a prompt template to generate a cypher query from an input question, given a knowledge graph schema.

CYPHER_GENERATION_TEMPLATE = """
Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}

Guidelines for Generating Accurate Cypher Queries and Responses:
- Provide Complete Output: Ensure your response includes the full output of the Cypher query, with no placeholder responses or partial answers.
- Retrieve All Relevant Data: Structure the Cypher query to return all required data points. Ensure the query is comprehensive and does not omit any needed information.
- Use Correct Properties and Filters: Reference properties and relationships exactly as defined in the schema, with careful attention to property names and required filtering conditions.
- Follow Cypher Syntax Strictly: Adhere closely to Cypher syntax to ensure that queries run without syntax errors.

Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.

The question is:
{question}"""


cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["schema", "question"],
)

In [23]:
# We create a Cypher QA chain and pass as parameters the llm, the graph, and the prompt template
cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=movie_graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    allow_dangerous_requests = True
)

In [24]:
# Now, we can use the chain to transform natural language questions into Cypher queries and get an answer by executing the queries
def answer_question_using_cypher(question):
  try:
    answer = cypher_chain.run(question)
    print("Answer: ", answer)
  except Exception as e:
    print("Problem answering the question: ",e)

In [25]:
index_json = 2
for nlp_question in df_cypher["nlp_question"]:
    print(f"{index_json} ===================={nlp_question}============================================")
    answer_question_using_cypher(nlp_question)
    index_json +=1



[1m> Entering new GraphCypherQAChain chain...[0m


Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "The Matrix"})
RETURN m.tagline;[0m
Full Context:
[32;1m[1;3m[{'m.tagline': 'Welcome to the Real World'}][0m

[1m> Finished chain.[0m
Answer:  Welcome to the Real World.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "The Matrix"})
RETURN m.released;[0m
Full Context:
[32;1m[1;3m[{'m.released': 1999}][0m

[1m> Finished chain.[0m
Answer:  The Matrix was released in 1999.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: "Keanu Reeves"})
RETURN p.born as BirthYear[0m
Full Context:
[32;1m[1;3m[{'BirthYear': 1964}][0m

[1m> Finished chain.[0m
Answer:  Keanu Reeves was born in 1964.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
WHERE m.title = 'The Matrix'
RETURN p.name, r.roles[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Emil E

- manually we investiagted the generated report stored in teh folder `task 2/zero_shot_learning_promt_report.txt`

In [26]:
list_zero =[
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
0, #Answer:  I don't know the answer.but cypher correct
1,
0, #same 13  #Answer:  I don't know the answer.but cypher correct
0, #same  #Answer:  I don't know the answer.but cypher correct
0, #few data
0, #few data
1,
1,
0, #cypher wrong born
0, #same
0, #cypher where omit
0, #cypher where 
0, #cypher where 
0, #synatx
0, #synatx 
0 #syntax
]

- we will extend our datfarme df_cypher with the column `zero_eval`

In [27]:
df_cypher['zero_eval'] = list_zero

- we will compare the columns gpt_prediction adn zero_eval

In [28]:
df_cypher[['gpt_prediction','zero_eval']]

Unnamed: 0,gpt_prediction,zero_eval
0,1,1
1,1,1
2,1,1
3,1,1
4,1,1
5,1,1
6,1,1
7,1,1
8,1,1
9,1,1


- Filter the dataframe to see the diffrences between the columns

In [29]:
df_cypher[df_cypher['gpt_prediction'] != df_cypher['zero_eval']]

Unnamed: 0,main_category,sub_category,nlp_question,cypher_code,actual_returned_output,gpt_prediction,zero_eval
12,Advanced Queries,Two Relationships with Attribute Filter Queries,What movies did Rob Reiner direct and Aaron So...,"MATCH (w:Person {name: ""Rob Reiner""})-[:DIRECT...",14.json,0,1
14,Advanced Queries,Two Relationships with Attribute Filter Queries,What movies did Lilly Wachowski direct and Emi...,"MATCH (d:Person {name: ""Lilly Wachowski""})-[:D...",16.json,1,0
18,Advanced Queries,Relationship with Aggregation Queries,Which producers have produced at least four mo...,MATCH (p:Person)-[:PRODUCED]->(m:Movie) WHERE ...,20.json,0,1


In [30]:
df_cypher

Unnamed: 0,main_category,sub_category,nlp_question,cypher_code,actual_returned_output,gpt_prediction,zero_eval
0,Simple Questions,Attribute Queries,What is the tagline of the movie The Matrix?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",2.json,1,1
1,Simple Questions,Attribute Queries,When was the movie The Matrix released?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",3.json,1,1
2,Simple Questions,Attribute Queries,What is the birth year of Keanu Reeves?,"MATCH (p:Person {name: ""Keanu Reeves""}) RETURN...",4.json,1,1
3,Simple Questions,Relationship Queries,Who acted in the movie The Matrix?,MATCH (a:Person)-[:ACTED_IN]->(m:Movie {title:...,5.json,1,1
4,Simple Questions,Relationship Queries,Who produced the movie The Matrix?,MATCH (p:Person)-[:PRODUCED]->(m:Movie {title:...,6.json,1,1
5,Simple Questions,Relationship Queries,Who wrote the movie A Few Good Men?,"MATCH (w:Person)-[:WROTE]->(m:Movie {title: ""A...",7.json,1,1
6,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (d:Person {name: ""Lana Wachowski""})-[:DI...",8.json,1,1
7,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (a:Person {name: ""Keanu Reeves""})-[:ACTE...",9.json,1,1
8,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (p:Person {name: ""Joel Silver""})-[:PRODU...",10.json,1,1
9,Advanced Queries,Two Relationships,What is the name of the movie that Lilly Wacho...,"MATCH (d:Person {name: ""Lilly Wachowski""})-[:D...",11.json,1,1


- We can observe that with our new promts the LLM imprived in two questions that it had made mistake
- But, made mistake to a new one belonging to advsnces queries category as it was expected.

In [31]:
len(df_cypher[df_cypher.zero_eval==0])

13

In [32]:
# filter feth only false generationf of chatgpt for the column zero eval 
number_of_incorrect_prediction_approah_1, total_number_of_prediction = len(df_cypher[df_cypher.zero_eval==0]) , df_cypher.shape[0]
number_of_correct_predictions = total_number_of_prediction - number_of_incorrect_prediction_approah_1
print(f"The accuracy based on our zero shot learning approach is : {round(number_of_correct_predictions / total_number_of_prediction,2)}")

The accuracy based on our zero shot learning approach is : 0.52


In [33]:
dict_accuracy_per_approach.update({'zero shot learning':round(number_of_correct_predictions / total_number_of_prediction,2)})
dict_accuracy_per_approach

{'initial': 0.48, 'zero shot learning': 0.52}

# Approach 1 for "Potential Improvemnt" - Few Shot Learning

- Now, we will try to impove our promt and as a result our llm via providing examples based on ou error analysis.

In [34]:
# We define a prompt template to generate a cypher query from an input question, given a knowledge graph schema.

CYPHER_GENERATION_TEMPLATE = """
Task: Generate a Cypher statement to query a graph database.

Instructions:
- Use only the relationship types and properties specified in the schema below.
- Avoid using any relationship types or properties not included in the schema.

**Schema:**
{schema}

**Examples Cypher Queries**

1. **Correct Use of the Attributes "born" and "released" Based on Node Type**  
   
   - **Question:** How many movies has each director directed after 2009?
   - **Cypher Query:**
     ```
     MATCH (d:Person)-[:DIRECTED]->(m:Movie)
     WHERE m.released > 2009
     RETURN d.name AS directorName, COUNT(m) AS directedMovies
     ```

    - **Question:** Who are the individuals born between 1900 and 2020 that directed movies, and how many movies has each person directed?
    - **Cypher Query:**
     ```
    MATCH (d:Person)-[:DIRECTED]->(m:Movie)
    WHERE d.born >= 1900 AND d.born <= 2020
    RETURN d.name, COUNT(m) as numMoviesDirected
     ```

2. **Ensuring Different Individuals for Specific Roles for **three** relationships**  
   - **Question:** Which movies released after 2010 have a writer, director, and actor who were each born before 1980, and are all different individuals?
   - **Cypher Query:**
     ```
     MATCH (w:Person)-[:WROTE]->(m:Movie)<-[:DIRECTED]-(d:Person)
     MATCH (m)<-[:ACTED_IN]-(a:Person)
     WHERE w <> d AND d <> a AND w <> a
       AND m.released > 2010
       AND w.born < 1980
       AND d.born < 1980
       AND a.born < 1980
     RETURN DISTINCT m.title AS Movie, m.released AS ReleaseYear,
                     w.name AS Writer, w.born AS WriterBorn,
                     d.name AS Director, d.born AS DirectorBorn,
                     a.name AS Actor, a.born AS ActorBorn
     ORDER BY m.released DESC
     ```

3. **Usage of subquery for Queries related to Cardinality**    
   - **Question:** For each distinct relationship type in the graph, which node has the lowest number of connections of that relationship type?
   - **Cypher Query:**
     ```
    MATCH ()-[r]->() 
    WITH DISTINCT type(r) AS relationshipType 
    CALL {{ 
        WITH relationshipType 
        MATCH (n)-[r]->()  
        WHERE type(r) = relationshipType 
        RETURN n AS Node, COUNT(r) AS connectionCount 
        ORDER BY connectionCount ASC 
        LIMIT 1
    }} 
    RETURN relationshipType AS RelationshipType, 
        Node.name AS NodeName, 
        connectionCount AS NumberOfConnections 
    ORDER BY relationshipType;
     ```

**Guidelines for Generating Accurate Cypher Queries:**
- **Complete Output**: Ensure your response includes the full output of the Cypher query (e.g. if the cypher query returns 5 rows, don't answer "I don't know the answer").
- **Retrieve All Relevant Data**: Include all required data points in the query (e.g. if the cypher query returns 5 rows then in the final output display all of them).
- **Use Correct Properties and Filters**: Reference properties and relationships exactly as defined in the schema (e.g. understand the diffrence of attributes "born" vs "released").
- **Follow Cypher Syntax Strictly**: Adhere closely to Cypher syntax to avoid errors (e.g. don't use unsupported functions like `size()` with pattern expressions).

**Important:**
- Do not include explanations or apologies.
- Respond only to requests to construct a Cypher statement.
- Your response should include only the generated Cypher statement.

The question is:
{question}
"""

cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["schema", "question"],
)

In [35]:
CYPHER_GENERATION_TEMPLATE

'\nTask: Generate a Cypher statement to query a graph database.\n\nInstructions:\n- Use only the relationship types and properties specified in the schema below.\n- Avoid using any relationship types or properties not included in the schema.\n\n**Schema:**\n{schema}\n\n**Examples Cypher Queries**\n\n1. **Correct Use of the Attributes "born" and "released" Based on Node Type**  \n   \n   - **Question:** How many movies has each director directed after 2009?\n   - **Cypher Query:**\n     ```\n     MATCH (d:Person)-[:DIRECTED]->(m:Movie)\n     WHERE m.released > 2009\n     RETURN d.name AS directorName, COUNT(m) AS directedMovies\n     ```\n\n    - **Question:** Who are the individuals born between 1900 and 2020 that directed movies, and how many movies has each person directed?\n    - **Cypher Query:**\n     ```\n    MATCH (d:Person)-[:DIRECTED]->(m:Movie)\n    WHERE d.born >= 1900 AND d.born <= 2020\n    RETURN d.name, COUNT(m) as numMoviesDirected\n     ```\n\n2. **Ensuring Different I

In [36]:
# We create a Cypher QA chain and pass as parameters the llm, the graph, and the prompt template
cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=movie_graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    allow_dangerous_requests = True
)

In [37]:
# Now, we can use the chain to transform natural language questions into Cypher queries and get an answer by executing the queries
def answer_question_using_cypher(question):
  try:
    answer = cypher_chain.run(question)
    print("Answer: ", answer)
  except Exception as e:
    print("Problem answering the question: ",e)

In [38]:
index_json = 2
for nlp_question in df_cypher["nlp_question"]:
    print(f"{index_json} ===================={nlp_question}============================================")
    answer_question_using_cypher(nlp_question)
    index_json +=1



[1m> Entering new GraphCypherQAChain chain...[0m


Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "The Matrix"})
RETURN m.tagline as Tagline[0m
Full Context:
[32;1m[1;3m[{'Tagline': 'Welcome to the Real World'}][0m

[1m> Finished chain.[0m
Answer:  Welcome to the Real World.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "The Matrix"})
RETURN m.released as ReleaseYear[0m
Full Context:
[32;1m[1;3m[{'ReleaseYear': 1999}][0m

[1m> Finished chain.[0m
Answer:  The movie The Matrix was released in 1999.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: "Keanu Reeves"})
RETURN p.born AS BirthYear[0m
Full Context:
[32;1m[1;3m[{'BirthYear': 1964}][0m

[1m> Finished chain.[0m
Answer:  Keanu Reeves was born in 1964.


[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WHERE m.title = "The Matrix"
RETURN p.name as Actor[0m
Full Context:


- Manually lebeled with 0 if the fianl output is wrong based on the correct one. 

In [39]:
few_list = [1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
0, #i dont know the answer but cypheer correct
0, #syntax
0, #false query  
0, # i don't know 
1,
1,
0, #dont know answer corret cypher 
0, #correct querh bur fewer return 
0, #corerct cypher wrong final IMPROVED
1,
0, #SyntaxError
0, #IMPROVED
0, #CORRECT CYPHER BUT DONT NO ANSWER IMPROVED 
]

In [40]:
df_cypher['few_eval'] = few_list

In [41]:
df_cypher

Unnamed: 0,main_category,sub_category,nlp_question,cypher_code,actual_returned_output,gpt_prediction,zero_eval,few_eval
0,Simple Questions,Attribute Queries,What is the tagline of the movie The Matrix?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",2.json,1,1,1
1,Simple Questions,Attribute Queries,When was the movie The Matrix released?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",3.json,1,1,1
2,Simple Questions,Attribute Queries,What is the birth year of Keanu Reeves?,"MATCH (p:Person {name: ""Keanu Reeves""}) RETURN...",4.json,1,1,1
3,Simple Questions,Relationship Queries,Who acted in the movie The Matrix?,MATCH (a:Person)-[:ACTED_IN]->(m:Movie {title:...,5.json,1,1,1
4,Simple Questions,Relationship Queries,Who produced the movie The Matrix?,MATCH (p:Person)-[:PRODUCED]->(m:Movie {title:...,6.json,1,1,1
5,Simple Questions,Relationship Queries,Who wrote the movie A Few Good Men?,"MATCH (w:Person)-[:WROTE]->(m:Movie {title: ""A...",7.json,1,1,1
6,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (d:Person {name: ""Lana Wachowski""})-[:DI...",8.json,1,1,1
7,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (a:Person {name: ""Keanu Reeves""})-[:ACTE...",9.json,1,1,1
8,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (p:Person {name: ""Joel Silver""})-[:PRODU...",10.json,1,1,1
9,Advanced Queries,Two Relationships,What is the name of the movie that Lilly Wacho...,"MATCH (d:Person {name: ""Lilly Wachowski""})-[:D...",11.json,1,1,1


In [42]:
number_of_incorrect_prediction_approah_2, total_number_of_prediction = len(df_cypher[df_cypher.few_eval==0]) , df_cypher.shape[0]
number_of_correct_predictions = total_number_of_prediction - number_of_incorrect_prediction_approah_2
number_of_correct_predictions

17

In [43]:
dict_accuracy_per_approach.update({'few shot learning':round(number_of_correct_predictions / total_number_of_prediction,2)})
dict_accuracy_per_approach

{'initial': 0.48, 'zero shot learning': 0.52, 'few shot learning': 0.63}

In [44]:
dict_accuracy_per_approach # accuracy for all caes initial,approach 1, aproach 2. 

{'initial': 0.48, 'zero shot learning': 0.52, 'few shot learning': 0.63}

- Additiionaly we will calculate for the "simple", "zero-shot learning" and "few-shot learning" the accuracy focused only in the generation of cypher not in errors related to the final output. 
- In order to calculate this quickly, we will reuse the binary evaluation lists and in the calues having related comments, we willl convert them into 1 instead of zero.
    - e.g. related comment to have to be transformed:  `I don't know the answer.but cypher correct` or `returns fewer rowns in the final answer`.
- For thi caase we added in folders task1, task 2 the needed txt files 
    - `binary_eval_focused_on_cypher.txt`, 
    - `zero_list_focused_on_cypher.txt`, 
    - `few_shot_learning_focused_on_cypher.txt`

In [45]:
# simple prompt, evaluation based on cypher query errors not in output level.
simple_focused_on_cypher_errors = [
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1, 
1, #(Answer:  I don't know the answer., while it finds it..) -----------> convert to 1 
1, #(Answer:  I don't know the answer., while it finds it..) -----------> convert to 1 
1, #(Answer:  I don't know the answer., while it finds it..) -----------> convert to 1 
1, 
1, #returns fewer rowns in the final answer -----------> convert to 1 
1, #returns fewer rowns  in the final answer -----------> convert to 1 
1,
1, #(Answer:  I don't know the answer., while it finds it..)-----------> convert to 1 
0, #(Answer:  I don't know the answer.)
1, #(Answer:  I don't know the answer.)-----------> convert to 1 
0, #(diffrent returns)
0, #syntax 
0, #(Answer:  I don't know the answer.)
0, #syntax error 
0, #syntax error
0, #syntax error 
]


In [46]:
# zero shot 
list_zero_focused_on_cypher =[
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1, #Answer:  I don't know the answer.but cypher correct -----> convert to 1
1,
1, #same 13  #Answer:  I don't know the answer.but cypher correct -----> convert to 1
1, #same  #Answer:  I don't know the answer.but cypher correct -----> convert to 1
1, #few data-----> convert to 1
1, #few data-----> convert to 1
1,
1,
0, #cypher wrong born
1, #same-----> convert to 1
0, #cypher where omit
0, #cypher where 
0, #cypher where 
0, #synatx
0, #synatx 
0 #syntax
]

In [47]:
# few shot 
few_list_focused = [
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1, #i dont know the answer but cypheer correct --> convert 
0, #syntax
0, #false query 
0, # i don't know 
1,
1,
1, #dont know answer corret cypher --> convert 
1, #correct querh bur fewer return --> convert 
1, #corerct cypher wrong final IMPROVED--> convert 
1,
0, #SyntaxError
0, # false query 
1, #CORRECT CYPHER BUT DONT NO ANSWER IMPROVED --> convert 
]

In [48]:
# {'initial': 0.48, 'zero shot learning': 0.52, 'few shot learning': 0.63}

In [49]:
# calculate accuracy 
simple_focused_correct = simple_focused_on_cypher_errors.count(1) # number of correct 
zero_focused_correct = list_zero_focused_on_cypher.count(1)# number of correct 
few_focused_correct = few_list_focused.count(1)# number of correct 

In [50]:
print('Simple Case: Accuracy focused on cypher query erros:', round(simple_focused_correct / len(simple_focused_on_cypher_errors),2)) # coreect answer tha tis equal to 1 / toyal number of rows rounded to two digits 
print('Zero-Shot Learning Case: Accuracy focused on cypher query erros:', round(zero_focused_correct / len(simple_focused_on_cypher_errors),2))# coreect answer tha tis equal to 1 / toyal number of rows rounded to two digits 
print('Few-Shot learning: Accuracy focused on cypher query erros:', round(few_focused_correct / len(simple_focused_on_cypher_errors),2))# coreect answer tha tis equal to 1 / toyal number of rows rounded to two digits 

Simple Case: Accuracy focused on cypher query erros: 0.74
Zero-Shot Learning Case: Accuracy focused on cypher query erros: 0.74
Few-Shot learning: Accuracy focused on cypher query erros: 0.81


In [51]:
# extend data frame to see in which categories with our best prompt continue to make errors for future work 
df_cypher

Unnamed: 0,main_category,sub_category,nlp_question,cypher_code,actual_returned_output,gpt_prediction,zero_eval,few_eval
0,Simple Questions,Attribute Queries,What is the tagline of the movie The Matrix?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",2.json,1,1,1
1,Simple Questions,Attribute Queries,When was the movie The Matrix released?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",3.json,1,1,1
2,Simple Questions,Attribute Queries,What is the birth year of Keanu Reeves?,"MATCH (p:Person {name: ""Keanu Reeves""}) RETURN...",4.json,1,1,1
3,Simple Questions,Relationship Queries,Who acted in the movie The Matrix?,MATCH (a:Person)-[:ACTED_IN]->(m:Movie {title:...,5.json,1,1,1
4,Simple Questions,Relationship Queries,Who produced the movie The Matrix?,MATCH (p:Person)-[:PRODUCED]->(m:Movie {title:...,6.json,1,1,1
5,Simple Questions,Relationship Queries,Who wrote the movie A Few Good Men?,"MATCH (w:Person)-[:WROTE]->(m:Movie {title: ""A...",7.json,1,1,1
6,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (d:Person {name: ""Lana Wachowski""})-[:DI...",8.json,1,1,1
7,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (a:Person {name: ""Keanu Reeves""})-[:ACTE...",9.json,1,1,1
8,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (p:Person {name: ""Joel Silver""})-[:PRODU...",10.json,1,1,1
9,Advanced Queries,Two Relationships,What is the name of the movie that Lilly Wacho...,"MATCH (d:Person {name: ""Lilly Wachowski""})-[:D...",11.json,1,1,1


In [52]:
# Add the list as a new column to the DataFrame
df_cypher['gpt_prediction_focused_cypher'] = simple_focused_on_cypher_errors
df_cypher['zero_eval_focused_cypher'] = list_zero_focused_on_cypher
df_cypher['few_eval_focused_cypher'] = few_list_focused

In [53]:
df_cypher#preview our extended dataframe

Unnamed: 0,main_category,sub_category,nlp_question,cypher_code,actual_returned_output,gpt_prediction,zero_eval,few_eval,gpt_prediction_focused_cypher,zero_eval_focused_cypher,few_eval_focused_cypher
0,Simple Questions,Attribute Queries,What is the tagline of the movie The Matrix?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",2.json,1,1,1,1,1,1
1,Simple Questions,Attribute Queries,When was the movie The Matrix released?,"MATCH (m:Movie {title: ""The Matrix""}) RETURN m...",3.json,1,1,1,1,1,1
2,Simple Questions,Attribute Queries,What is the birth year of Keanu Reeves?,"MATCH (p:Person {name: ""Keanu Reeves""}) RETURN...",4.json,1,1,1,1,1,1
3,Simple Questions,Relationship Queries,Who acted in the movie The Matrix?,MATCH (a:Person)-[:ACTED_IN]->(m:Movie {title:...,5.json,1,1,1,1,1,1
4,Simple Questions,Relationship Queries,Who produced the movie The Matrix?,MATCH (p:Person)-[:PRODUCED]->(m:Movie {title:...,6.json,1,1,1,1,1,1
5,Simple Questions,Relationship Queries,Who wrote the movie A Few Good Men?,"MATCH (w:Person)-[:WROTE]->(m:Movie {title: ""A...",7.json,1,1,1,1,1,1
6,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (d:Person {name: ""Lana Wachowski""})-[:DI...",8.json,1,1,1,1,1,1
7,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (a:Person {name: ""Keanu Reeves""})-[:ACTE...",9.json,1,1,1,1,1,1
8,Simple Questions,Relationship Filter Queries,What are the titles and release years of movie...,"MATCH (p:Person {name: ""Joel Silver""})-[:PRODU...",10.json,1,1,1,1,1,1
9,Advanced Queries,Two Relationships,What is the name of the movie that Lilly Wacho...,"MATCH (d:Person {name: ""Lilly Wachowski""})-[:D...",11.json,1,1,1,1,1,1


In [54]:
# delve into our best approach where the models continues to be confused 
df_cypher[['main_category','sub_category','few_eval_focused_cypher']][df_cypher.few_eval_focused_cypher==0]

Unnamed: 0,main_category,sub_category,few_eval_focused_cypher
15,Advanced Queries,Exclusions of Entities,0
16,Advanced Queries,Exclusions of Entities,0
17,Advanced Queries,Exclusions of Entities,0
24,Advanced Queries,Sub-Query-Cardinality Queries,0
25,Advanced Queries,Sub-Query-Cardinality Queries,0
