# DSAA 5002 - Data Mining and Knowledge Discovery in Data Science
---

# Task 2 (50 marks) Application of Knowledge Graph

**Background:** 
**In addition to explicitly mentioning listed companies, each news article may also implicitly impact the other 
companies, either positively or negatively.**

# Q3 Constructing a Knowledge Graph
---

## 1. Neo4j Setting

In [41]:
from py2neo import Graph, Node, Relationship
import csv
from tqdm import tqdm

# Neo4j database connection information
uri = "http://localhost:7474"  # Neo4j database address
username = "neo4j"  # Neo4j database username
password = "zhuoyang200101"  # Neo4j database password

# Connect to the Neo4j database
graph = Graph(uri, auth=(username, password))

## 2. From Table to Graph

In [44]:
# Loading node information from a CSV file into the Neo4j database
def load_nodes(file_path):
    with open(file_path, 'r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        rows = list(reader)  # Convert reader to a list
        for row in tqdm(rows, desc="Loading nodes"):  # Display progress bar with tqdm
            node_properties = {
                'ID': row[':ID'],
                'company_name': row['company_name'],
                'code': row['code']
            }
            node = Node('company', **node_properties)
            graph.create(node)

# Loading relationship information from a CSV file into the Neo4j database
def load_relationships(file_path, rel_type):
    with open(file_path, 'r', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile) 
        rows = list(reader)
        for row in tqdm(rows, desc=f"Loading {rel_type} relationships"):
            start_node = graph.nodes.match(ID=row[':START_ID']).first()
            end_node = graph.nodes.match(ID=row[':END_ID']).first()
            
            # Check if 'time' column exists for the current relation type
            if 'time' in row:
                rel_properties = {
                    'time': row['time']  # Use the value in the 'time' column as a relationship property
                }
            else:
                rel_properties = {}  # If there is no 'time' column, create an empty property dictionary
            
            # For bidirectional relationships
            if rel_type in ['compete', 'cooperate', 'dispute', 'same_industry']:
                rel = Relationship(start_node, rel_type, end_node, **rel_properties)
                graph.create(rel)
                rel_reverse = Relationship(end_node, rel_type, start_node, **rel_properties)  # Add a reverse relationship
                graph.create(rel_reverse)
            # For unidirectional relationships
            elif rel_type in ['invest', 'supply']:
                rel = Relationship(start_node, rel_type, end_node, **rel_properties)
                graph.create(rel)


### 2.1 Load the NODE's Information

In [38]:
# Load the NODE's Information
load_nodes('KnowledgeGraph/hidy.nodes.company.csv')
print("Nodes import completed")

Loading nodes: 100%|██████████████████████████████████████████████████████████████| 3974/3974 [00:15<00:00, 259.00it/s]

Nodes import completed





### 2.2 Load the Relationships's Information

In [45]:
# Load the Relationships's Information
relationship_files = {
    'compete': 'KnowledgeGraph/hidy.relationships.compete.csv',
    'cooperate': 'KnowledgeGraph/hidy.relationships.cooperate.csv',
    'dispute': 'KnowledgeGraph/hidy.relationships.dispute.csv',
    'invest': 'KnowledgeGraph/hidy.relationships.invest.csv',
    'same_industry': 'nowledgeGraph/hidy.relationships.same_industry.csv',
    'supply': 'KnowledgeGraph/hidy.relationships.supply.csv'
}

for rel_type, file_path in relationship_files.items():
    load_relationships(file_path, rel_type)
    
print("Relationships import completed")

Loading compete relationships: 100%|███████████████████████████████████████████████████| 25/25 [00:00<00:00, 80.96it/s]
Loading cooperate relationships: 100%|█████████████████████████████████████████████| 3603/3603 [00:45<00:00, 79.36it/s]
Loading dispute relationships: 100%|█████████████████████████████████████████████████| 439/439 [00:05<00:00, 82.44it/s]
Loading invest relationships: 100%|██████████████████████████████████████████████████| 559/559 [00:05<00:00, 98.45it/s]
Loading same_industry relationships: 100%|█████████████████████████████████████████| 5596/5596 [01:18<00:00, 71.33it/s]
Loading supply relationships: 100%|███████████████████████████████████████████████| 1444/1444 [00:14<00:00, 101.56it/s]

Relationships import completed



