<a href="https://colab.research.google.com/github/AyushiKashyapp/foodwise_knowledgeDB/blob/main/Neo4jGraphCreation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generating a Neo4j Knowledge Graph using the extracted triples.

A knowledge graph created using the triples extracted in the previous parts of the project, sources from the wikipedia pages and web pages of the major stakeholders and committess members of the Food Wise and Food Vision projects.

**Installing required libraries**
- xlrd : To import .xlsx file, which stores the extracted triples.
- neo4j-driver : To establish connection with Neo4j Sandbox instance.

In [17]:
!pip install xlrd
!pip install neo4j-driver



In [18]:
import pandas as pd
import re

**Reading the excel file that contains the triples, and stroing them in a dataframe 'triples'.**

In [19]:
triples = pd.read_excel('triples_data.xlsx')
triples.head()

Unnamed: 0,chunk_id,head,type,tail
0,Chunk_1,IFA2024,country,Singapore
1,Chunk_1,IFA2024,location,Singapore
2,Chunk_1,IFA2024 in Singapore,country,Singapore
3,Chunk_1,Jam Jar,subclass of,jam jar
4,Chunk_1,oak tree,has parts of the class,leaves


**Triple relations cleaning: Replace any non-alphanumeric characters with underscores.**

In [20]:
def sanitize_relation_type(relation):
    return re.sub(r'\W+', '_', relation)

**Creating the knowledge database using Neo4j driver and with the head, cleanned relation text , and tail of the triple.**

In [21]:
from neo4j import GraphDatabase, basic_auth

driver = GraphDatabase.driver(
  "bolt://3.83.136.242:7687",
  auth=basic_auth("neo4j", "firings-lock-conflicts"))

def create_knowledge_graph(triples):
    with driver.session() as session:
        for _, triple in triples.iterrows():
            head = triple['head']
            relation = sanitize_relation_type(triple['type']) #Applying text cleaning function on the relations in triples.
            tail = triple['tail']

            cypher_query = f"""
            MERGE (h:Entity {{name: $head}})
            MERGE (t:Entity {{name: $tail}})
            MERGE (h)-[:{relation.upper()}]->(t)
            """

            session.run(cypher_query, head=head, tail=tail)


In [22]:
create_knowledge_graph(triples)

**Querying Knowledge DB to check the total number of nodes.**

In [23]:
from neo4j import GraphDatabase, basic_auth

driver = GraphDatabase.driver(
  "bolt://3.83.136.242:7687",
  auth=basic_auth("neo4j", "firings-lock-conflicts"))

cypher_query = '''
MATCH (n)
RETURN COUNT(n) AS count
LIMIT $limit
'''

with driver.session(database="neo4j") as session:
  results = session.read_transaction(
    lambda tx: tx.run(cypher_query,
                      limit=10).data())
  for record in results:
    print(record['count'])

driver.close()

  results = session.read_transaction(


606


**Relations where 'Enda Kenny' is the head in the triple.**

In [24]:
from neo4j import GraphDatabase, basic_auth

# Establish connection
driver = GraphDatabase.driver(
    "bolt://3.83.136.242:7687",
    auth=basic_auth("neo4j", "firings-lock-conflicts")
)

def get_relations_where_head_is_enda(driver):
    query = """
    MATCH (h:Entity {name: 'Enda Kenny'})-[r]->(t:Entity)
    RETURN h.name AS head, TYPE(r) AS relation, t.name AS tail
    """

    relations = []
    with driver.session() as session:
        result = session.run(query)
        for record in result:
            relations.append({
                'head': record['head'],
                'relation': record['relation'],
                'tail': record['tail']
            })

    return relations

# Execute the query and get the relations
relations_where_head_is_enda = get_relations_where_head_is_enda(driver)

# Print the relations
print("Relations where 'Enda Kenny' is the head:")
for relation in relations_where_head_is_enda:
    print(relation)

# Close the driver connection
driver.close()

Relations where 'Enda Kenny' is the head:
{'head': 'Enda Kenny', 'relation': 'CANDIDACY_IN_ELECTION', 'tail': '2011 general election'}
{'head': 'Enda Kenny', 'relation': 'CANDIDACY_IN_ELECTION', 'tail': '2002 general election'}
{'head': 'Enda Kenny', 'relation': 'SPOUSE', 'tail': 'Fionnuala OKelly'}
{'head': 'Enda Kenny', 'relation': 'CANDIDACY_IN_ELECTION', 'tail': '2011 election campaign'}
{'head': 'Enda Kenny', 'relation': 'NOTABLE_WORK', 'tail': 'Second national address'}
{'head': 'Enda Kenny', 'relation': 'POSITION_HELD', 'tail': 'European Commission president'}
{'head': 'Enda Kenny', 'relation': 'POSITION_HELD', 'tail': 'European Council president'}
{'head': 'Enda Kenny', 'relation': 'POSITION_HELD', 'tail': 'Prime Minister'}
{'head': 'Enda Kenny', 'relation': 'CANDIDACY_IN_ELECTION', 'tail': '2007 election'}
{'head': 'Enda Kenny', 'relation': 'EMPLOYER', 'tail': 'TG4'}
{'head': 'Enda Kenny', 'relation': 'EMPLOYER', 'tail': 'RTÉ'}
{'head': 'Enda Kenny', 'relation': 'EMPLOYER', 't

**All committee members related to Food Wise, wherever 'Food Wise' is in head or tail part of the triple.**

In [61]:
from neo4j import GraphDatabase, basic_auth

# Establish connection
driver = GraphDatabase.driver(
    "bolt://3.83.136.242:7687",
    auth=basic_auth("neo4j", "firings-lock-conflicts")
)

def get_all_food_wise_relations(driver):
    query = """
    MATCH (n)-[r]->(m)
    WHERE (n.name CONTAINS 'Food Wise' OR m.name CONTAINS 'Food Wise' OR TYPE(r) CONTAINS '_COMMITTEE_MEMBER')
    RETURN n AS node1, r AS relation, m AS node2
    """

    relations = []
    with driver.session() as session:
        result = session.run(query)
        for record in result:
            node1 = record['node1']
            relation = record['relation']
            node2 = record['node2']
            relations.append({
                'node1': node1.get('name', ''),
                'relation': relation.type,
                'node2': node2.get('name', '')
            })

    return relations

# Execute the query and get the relations
food_wise_relations = get_all_food_wise_relations(driver)

# Print the relations
print("Committee Members of 'Food Wise':")
for relation in food_wise_relations:
    print(relation)

# Close the driver connection
driver.close()


Committee Members of 'Food Wise':
{'node1': 'Tom Arnold', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}
{'node1': 'Sharon Buckley', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}
{'node1': 'Laura Burke', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}
{'node1': 'Ailish Byrne', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}
{'node1': 'Kieran Calnan', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}
{'node1': 'Philip Carroll', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}
{'node1': 'Frank Convery', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}
{'node1': 'Thomas Duffy', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}
{'node1': 'Brendan Dunford', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}
{'node1': 'Julie Ennis', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}
{'node1': 'Paul Finnerty', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}
{'node1': 'Joe

In [62]:
from neo4j import GraphDatabase, basic_auth

# Establish connection
driver = GraphDatabase.driver(
    "bolt://3.83.136.242:7687",
    auth=basic_auth("neo4j", "firings-lock-conflicts")
)

def get_all_food_wise_relations(driver):
    query = """
    MATCH (n)-[r]->(m)
    WHERE n.name CONTAINS 'European Union'
    RETURN r AS relation, m as node2
    """


    relations = []
    with driver.session() as session:
        result = session.run(query)
        for record in result:
            #node1 = record['node1']
            relation = record['relation']
            node2 = record['node2']
            relations.append({
                #'node1': node1,
                'relation': relation.type,
                'node2': node2.get('name', '')
            })

    return relations

# Execute the query and get the relations
food_wise_relations = get_all_food_wise_relations(driver)

# Print the relations
print("Head containing 'European Union':")
for relation in food_wise_relations:
    print(relation)

# Close the driver connection
driver.close()


Head containing 'European Union':
{'relation': 'SIGNIFICANT_EVENT', 'node2': 'Brexit referendum'}
{'relation': 'PARTICIPANT_IN', 'node2': 'Brexit'}
{'relation': 'SIGNIFICANT_EVENT', 'node2': 'Brexit'}
{'relation': 'MEMBER_OF', 'node2': 'international'}
{'relation': 'INSTANCE_OF', 'node2': 'international'}
{'relation': 'INCEPTION', 'node2': '2013'}
{'relation': 'COUNTRY', 'node2': 'Ireland'}


In [63]:
from neo4j import GraphDatabase, basic_auth

# Establish connection
driver = GraphDatabase.driver(
    "bolt://3.83.136.242:7687",
    auth=basic_auth("neo4j", "firings-lock-conflicts")
)

def get_all_food_wise_relations(driver):
    query = """
    MATCH (n)-[r]->(m)
    WHERE n.name CONTAINS 'Agriculture' OR r.name CONTAINS 'Agriculture' OR m.name CONTAINS 'Agriculture'
    RETURN n.name AS node1, r AS relation, m AS node2
    """


    relations = []
    with driver.session() as session:
        result = session.run(query)
        for record in result:
            node1 = record['node1']
            relation = record['relation']
            node2 = record['node2']
            relations.append({
                'node1': node1,
                'relation': relation.type,
                'node2': node2.get('name', '')
            })

    return relations

# Execute the query and get the relations
food_wise_relations = get_all_food_wise_relations(driver)

# Print the relations
print("Head, Relation or Tail containing 'Agriculture':")
for relation in food_wise_relations:
    print(relation)

# Close the driver connection
driver.close()


Head, Relation or Tail containing 'Agriculture':
{'node1': 'Department of Agriculture Food and the Marine', 'relation': 'PARENT_ORGANIZATION', 'node2': 'Government of Ireland'}
{'node1': 'Government of Ireland', 'relation': 'SUBSIDIARY', 'node2': 'Department of Agriculture Food and the Marine'}
{'node1': 'Minister for Agriculture Food and the Marine', 'relation': 'PART_OF', 'node2': 'Government of Ireland'}
{'node1': 'Agriculture House Kildare Street', 'relation': 'LOCATED_IN_THE_ADMINISTRATIVE_TERRITORIAL_ENTITY', 'node2': 'Dublin'}
{'node1': 'Department of Agriculture Food and the Marine', 'relation': 'HEADQUARTERS_LOCATION', 'node2': 'headquarters'}
{'node1': 'Department of Agriculture', 'relation': 'INCEPTION', 'node2': '1919'}
{'node1': 'Robert Barton', 'relation': 'POSITION_HELD', 'node2': 'Minister for Agriculture'}
{'node1': 'Horace Plunkett', 'relation': 'POSITION_HELD', 'node2': 'Minister for Agriculture'}
{'node1': 'Department of Agriculture', 'relation': 'OFFICE_HELD_BY_HEA

In [64]:
from neo4j import GraphDatabase, basic_auth

# Establish connection
driver = GraphDatabase.driver(
    "bolt://3.83.136.242:7687",
    auth=basic_auth("neo4j", "firings-lock-conflicts")
)

def get_all_food_wise_relations(driver):
    query = """
    MATCH (n)-[r]->(m)
    WHERE n.name CONTAINS 'Angela' OR m.name CONTAINS 'Angela'
    RETURN n as node1, r as relation, m as node2
    """


    relations = []
    with driver.session() as session:
        result = session.run(query)
        for record in result:
            node1 = record['node1']
            relation = record['relation']
            node2 = record['node2']
            relations.append({
                'node1': node1.get('name', ''),
                'relation': relation.type,
                'node2': node2.get('name', '')
            })

    return relations

# Execute the query and get the relations
food_wise_relations = get_all_food_wise_relations(driver)

# Print the relations
print("Head OR Tail containing 'Angela':")
for relation in food_wise_relations:
    print(relation)

# Close the driver connection
driver.close()


Head OR Tail containing 'Angela':
{'node1': 'German chancellor', 'relation': 'OFFICEHOLDER', 'node2': 'Angela Merkel'}
{'node1': 'Angela Merkel', 'relation': 'POSITION_HELD', 'node2': 'German chancellor'}
{'node1': 'chancellor', 'relation': 'OFFICEHOLDER', 'node2': 'Angela Merkel'}
{'node1': 'Angela Merkel', 'relation': 'POSITION_HELD', 'node2': 'chancellor'}
{'node1': 'German Chancellor', 'relation': 'OFFICEHOLDER', 'node2': 'Angela Merkel'}
{'node1': 'Angela Merkel', 'relation': 'POSITION_HELD', 'node2': 'German Chancellor'}
{'node1': 'Angela Merkel', 'relation': 'MEMBER_OF_POLITICAL_PARTY', 'node2': 'Merkels CDU'}
{'node1': 'Merkels CDU', 'relation': 'CHAIRPERSON', 'node2': 'Angela Merkel'}
{'node1': 'Angela Merkel', 'relation': 'MEMBER_OF_POLITICAL_PARTY', 'node2': 'Merkels CDU party'}
{'node1': 'Merkels CDU party', 'relation': 'CHAIRPERSON', 'node2': 'Angela Merkel'}


In [65]:
from neo4j import GraphDatabase, basic_auth

# Establish connection
driver = GraphDatabase.driver(
    "bolt://3.83.136.242:7687",
    auth=basic_auth("neo4j", "firings-lock-conflicts")
)

def get_all_food_wise_relations(driver):
    query = """
    MATCH (n)-[r]->(m)
    WHERE n.name CONTAINS 'Tom Arnold' OR m.name CONTAINS 'Tom Arnold'
    RETURN n as node1, r as relation, m as node2
    """


    relations = []
    with driver.session() as session:
        result = session.run(query)
        for record in result:
            node1 = record['node1']
            relation = record['relation']
            node2 = record['node2']
            relations.append({
                'node1': node1.get('name', ''),
                'relation': relation.type,
                'node2': node2.get('name', '')
            })

    return relations

# Execute the query and get the relations
food_wise_relations = get_all_food_wise_relations(driver)

# Print the relations
print("Head or Tail containing 'Tom Arnold':")
for relation in food_wise_relations:
    print(relation)

# Close the driver connection
driver.close()

Head or Tail containing 'Tom Arnold':
{'node1': 'Tom Arnold', 'relation': 'FIELD_OF_WORK', 'node2': 'agricultural economist'}
{'node1': 'Tom Arnold', 'relation': 'OCCUPATION', 'node2': 'agricultural economist'}
{'node1': 'Tom Arnold', 'relation': 'EDUCATED_AT', 'node2': 'University College Dublin'}
{'node1': 'Tom Arnold', 'relation': 'EDUCATED_AT', 'node2': 'Université catholique de Louvain'}
{'node1': 'Tom Arnold', 'relation': 'EDUCATED_AT', 'node2': 'Trinity College Dublin'}
{'node1': 'Tom Arnold', 'relation': 'POSITION_HELD', 'node2': 'European Commissioner for Social Affairs'}
{'node1': 'Tom Arnold', 'relation': '_COMMITTEE_MEMBER', 'node2': 'Food Wise 2025'}


In [69]:
from neo4j import GraphDatabase, basic_auth

# Establish connection
driver = GraphDatabase.driver(
    "bolt://3.83.136.242:7687",
    auth=basic_auth("neo4j", "firings-lock-conflicts")
)

def get_all_food_wise_relations(driver):
    query = """
    MATCH (n)-[r]->(m)
    WHERE n.name CONTAINS 'Glenisk' OR m.name CONTAINS 'Glenisk'
    RETURN n as node1, r as relation, m as node2
    """


    relations = []
    with driver.session() as session:
        result = session.run(query)
        for record in result:
            node1 = record['node1']
            relation = record['relation']
            node2 = record['node2']
            relations.append({
                'node1': node1.get('name', ''),
                'relation': relation.type,
                'node2': node2.get('name', '')
            })

    return relations

# Execute the query and get the relations
food_wise_relations = get_all_food_wise_relations(driver)

# Print the relations
print("Head or Tail containing 'Glenisk':")
for relation in food_wise_relations:
    print(relation)

# Close the driver connection
driver.close()

Head or Tail containing 'Glenisk':
{'node1': 'Statistics StatisticsGlenisk', 'relation': 'INSTANCE_OF', 'node2': 'As Irelands bestl'}
{'node1': 'As Irelands bestl', 'relation': 'PUBLISHER', 'node2': 'Statistics StatisticsGlenisk'}
{'node1': 'Glenisk', 'relation': 'LOCATED_IN_THE_ADMINISTRATIVE_TERRITORIAL_ENTITY', 'node2': 'Killeigh Co Offaly'}
{'node1': 'Glenisk', 'relation': 'LOCATED_IN_THE_ADMINISTRATIVE_TERRITORIAL_ENTITY', 'node2': 'Killeigh'}
{'node1': 'Glenisk', 'relation': 'PRODUCT_OR_MATERIAL_PRODUCED', 'node2': 'yogurt'}
{'node1': 'Glenisk', 'relation': 'HEADQUARTERS_LOCATION', 'node2': 'Killeigh'}


In [70]:
from neo4j import GraphDatabase, basic_auth

# Establish connection
driver = GraphDatabase.driver(
    "bolt://3.83.136.242:7687",
    auth=basic_auth("neo4j", "firings-lock-conflicts")
)

def get_all_food_wise_relations(driver):
    query = """
    MATCH (n)-[r]->(m)
    WHERE n.name CONTAINS 'dairy' OR m.name CONTAINS 'dairy'
    RETURN n as node1, r as relation, m as node2
    """


    relations = []
    with driver.session() as session:
        result = session.run(query)
        for record in result:
            node1 = record['node1']
            relation = record['relation']
            node2 = record['node2']
            relations.append({
                'node1': node1.get('name', ''),
                'relation': relation.type,
                'node2': node2.get('name', '')
            })

    return relations

# Execute the query and get the relations
food_wise_relations = get_all_food_wise_relations(driver)

# Print the relations
print("Head or Tail containing 'dairy':")
for relation in food_wise_relations:
    print(relation)

# Close the driver connection
driver.close()

Head or Tail containing 'dairy':
{'node1': 'dairy farms', 'relation': 'PART_OF', 'node2': 'agricultural sector'}
{'node1': 'dairy farms', 'relation': 'SUBCLASS_OF', 'node2': 'agricultural sector'}
{'node1': 'milk', 'relation': 'SUBCLASS_OF', 'node2': 'dairy'}
{'node1': 'yogurt', 'relation': 'SUBCLASS_OF', 'node2': 'dairy'}
{'node1': 'milk', 'relation': 'SUBCLASS_OF', 'node2': 'dairy product'}
{'node1': 'organic milk', 'relation': 'SUBCLASS_OF', 'node2': 'dairy products'}
{'node1': 'organic milk', 'relation': 'SUBCLASS_OF', 'node2': 'dairy product'}
