<a href="https://colab.research.google.com/github/AyushiKashyapp/foodwise_knowledgeDB/blob/main/Neo4jGraphCreation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generating a Neo4j Knowledge Graph using the extracted triples.

A knowledge graph created using the triples extracted in the previous parts of the project, sources from the wikipedia pages and web pages of the major stakeholders and committess members of the Food Wise and Food Vision projects.

**Installing required libraries**
- xlrd : To import .xlsx file, which stores the extracted triples.
- neo4j-driver : To establish connection with Neo4j Sandbox instance.

In [17]:
!pip install xlrd
!pip install neo4j-driver



In [18]:
import pandas as pd
import re

**Reading the excel file that contains the triples, and stroing them in a dataframe 'triples'.**

In [19]:
triples = pd.read_excel('triples_data.xlsx')
triples.head()

Unnamed: 0,chunk_id,head,type,tail
0,Chunk_1,IFA2024,country,Singapore
1,Chunk_1,IFA2024,location,Singapore
2,Chunk_1,IFA2024 in Singapore,country,Singapore
3,Chunk_1,Jam Jar,subclass of,jam jar
4,Chunk_1,oak tree,has parts of the class,leaves


**Triple relations cleaning: Replace any non-alphanumeric characters with underscores.**

In [20]:
def sanitize_relation_type(relation):
    return re.sub(r'\W+', '_', relation)

**Creating the knowledge database using Neo4j driver and with the head, cleanned relation text , and tail of the triple.**

In [21]:
from neo4j import GraphDatabase, basic_auth

driver = GraphDatabase.driver(
  "bolt://3.83.136.242:7687",
  auth=basic_auth("neo4j", "firings-lock-conflicts"))

def create_knowledge_graph(triples):
    with driver.session() as session:
        for _, triple in triples.iterrows():
            head = triple['head']
            relation = sanitize_relation_type(triple['type']) #Applying text cleaning function on the relations in triples.
            tail = triple['tail']

            cypher_query = f"""
            MERGE (h:Entity {{name: $head}})
            MERGE (t:Entity {{name: $tail}})
            MERGE (h)-[:{relation.upper()}]->(t)
            """

            session.run(cypher_query, head=head, tail=tail)


In [22]:
create_knowledge_graph(triples)

**Querying Knowledge DB to check the total number of nodes.**

In [23]:
from neo4j import GraphDatabase, basic_auth

driver = GraphDatabase.driver(
  "bolt://3.83.136.242:7687",
  auth=basic_auth("neo4j", "firings-lock-conflicts"))

cypher_query = '''
MATCH (n)
RETURN COUNT(n) AS count
LIMIT $limit
'''

with driver.session(database="neo4j") as session:
  results = session.read_transaction(
    lambda tx: tx.run(cypher_query,
                      limit=10).data())
  for record in results:
    print(record['count'])

driver.close()

  results = session.read_transaction(


606


**Relations where 'Enda Kenny' is the head in the triple.**

In [24]:
from neo4j import GraphDatabase, basic_auth

# Establish connection
driver = GraphDatabase.driver(
    "bolt://3.83.136.242:7687",
    auth=basic_auth("neo4j", "firings-lock-conflicts")
)

def get_relations_where_head_is_enda(driver):
    query = """
    MATCH (h:Entity {name: 'Enda Kenny'})-[r]->(t:Entity)
    RETURN h.name AS head, TYPE(r) AS relation, t.name AS tail
    """

    relations = []
    with driver.session() as session:
        result = session.run(query)
        for record in result:
            relations.append({
                'head': record['head'],
                'relation': record['relation'],
                'tail': record['tail']
            })

    return relations

# Execute the query and get the relations
relations_where_head_is_enda = get_relations_where_head_is_enda(driver)

# Print the relations
print("Relations where 'Enda Kenny' is the head:")
for relation in relations_where_head_is_enda:
    print(relation)

# Close the driver connection
driver.close()

Relations where 'Enda Kenny' is the head:
{'head': 'Enda Kenny', 'relation': 'CANDIDACY_IN_ELECTION', 'tail': '2011 general election'}
{'head': 'Enda Kenny', 'relation': 'CANDIDACY_IN_ELECTION', 'tail': '2002 general election'}
{'head': 'Enda Kenny', 'relation': 'SPOUSE', 'tail': 'Fionnuala OKelly'}
{'head': 'Enda Kenny', 'relation': 'CANDIDACY_IN_ELECTION', 'tail': '2011 election campaign'}
{'head': 'Enda Kenny', 'relation': 'NOTABLE_WORK', 'tail': 'Second national address'}
{'head': 'Enda Kenny', 'relation': 'POSITION_HELD', 'tail': 'European Commission president'}
{'head': 'Enda Kenny', 'relation': 'POSITION_HELD', 'tail': 'European Council president'}
{'head': 'Enda Kenny', 'relation': 'POSITION_HELD', 'tail': 'Prime Minister'}
{'head': 'Enda Kenny', 'relation': 'CANDIDACY_IN_ELECTION', 'tail': '2007 election'}
{'head': 'Enda Kenny', 'relation': 'EMPLOYER', 'tail': 'TG4'}
{'head': 'Enda Kenny', 'relation': 'EMPLOYER', 'tail': 'RTÉ'}
{'head': 'Enda Kenny', 'relation': 'EMPLOYER', 't

**All committee members related to Food Wise, wherever 'Food Wise' is in head or tail part of the triple.**

In [32]:
from neo4j import GraphDatabase, basic_auth

# Establish connection
driver = GraphDatabase.driver(
    "bolt://3.83.136.242:7687",
    auth=basic_auth("neo4j", "firings-lock-conflicts")
)

def get_all_food_wise_relations(driver):
    query = """
    MATCH (n)-[r]->(m)
    WHERE (n.name CONTAINS 'Food Wise' OR m.name CONTAINS 'Food Wise' OR TYPE(r) CONTAINS '_COMMITTEE_MEMBER')
    RETURN n AS node1, r AS relation, m AS node2
    """

    relations = []
    with driver.session() as session:
        result = session.run(query)
        for record in result:
            node1 = record['node1']
            relation = record['relation']
            node2 = record['node2']
            relations.append({
                'node1': node1,
                'relation': relation,
                'node2': node2
            })

    return relations

# Execute the query and get the relations
food_wise_relations = get_all_food_wise_relations(driver)

# Print the relations
print("Relations containing 'Food Wise':")
for relation in food_wise_relations:
    print(relation)

# Close the driver connection
driver.close()


Relations containing 'Food Wise':
{'node1': <Node element_id='4:75ed9e99-b1d5-420f-af2b-6768ccc4d558:154' labels=frozenset({'Entity'}) properties={'name': 'Tom Arnold'}>, 'relation': <Relationship element_id='5:75ed9e99-b1d5-420f-af2b-6768ccc4d558:629' nodes=(<Node element_id='4:75ed9e99-b1d5-420f-af2b-6768ccc4d558:154' labels=frozenset({'Entity'}) properties={'name': 'Tom Arnold'}>, <Node element_id='4:75ed9e99-b1d5-420f-af2b-6768ccc4d558:577' labels=frozenset({'Entity'}) properties={'name': 'Food Wise 2025'}>) type='_COMMITTEE_MEMBER' properties={}>, 'node2': <Node element_id='4:75ed9e99-b1d5-420f-af2b-6768ccc4d558:577' labels=frozenset({'Entity'}) properties={'name': 'Food Wise 2025'}>}
{'node1': <Node element_id='4:75ed9e99-b1d5-420f-af2b-6768ccc4d558:578' labels=frozenset({'Entity'}) properties={'name': 'Sharon Buckley'}>, 'relation': <Relationship element_id='5:75ed9e99-b1d5-420f-af2b-6768ccc4d558:630' nodes=(<Node element_id='4:75ed9e99-b1d5-420f-af2b-6768ccc4d558:578' labels=fr