# Ingesting Data to Neo4j

 ### **Nodes** 

 *   **Project:** This would be the central node, with each project representing a unique entity. 
 *   **Project Manager:** Each project manager would be a distinct node. 
 *   **Department:** Each department involved in the projects would be a node. 
 *   **Region:** Each geographical region would be represented as a node. 
 *   **Project Type:** Different types of projects would be individual nodes. 
 *   **Phase:** Each project phase would be a separate node. 
 *   **Status:** The different project statuses would be distinct nodes. 

 ### **Relationships** 

 *   A **Project** node would have a `MANAGED_BY` relationship with a **Project Manager** node. 
 *   A **Project** node would be `ASSIGNED_TO` a **Department** node. 
 *   A **Project** node would be `LOCATED_IN` a **Region** node. 
 *   A **Project** node `HAS_TYPE` of a **Project Type** node. 
 *   A **Project** node is `IN_PHASE` of a **Phase** node. 
 *   A **Project** node `HAS_STATUS` of a **Status** node. 

 ### **Properties** 

 The remaining fields in the dataset can be stored as properties within the corresponding nodes. 

 *   **Project Node Properties:** 
     *   `name` (e.g., "Rhinestone") 
     *   `description` 
     *   `cost` 
     *   `benefit` 
     *   `complexity` 
     *   `completionPercentage` 
     *   `year` 
     *   `month` 
     *   `startDate` 
     *   `endDate` 
 *   **Project Manager Node Properties:** 
     *   `name` (e.g., "Yael Wilcox") 
 *   **Department Node Properties:** 
     *   `name` (e.g., "Admin & BI") 
 *   **Region Node Properties:** 
     *   `name` (e.g., "North") 
 *   **Project Type Node Properties:** 
     *   `name` (e.g., "INCOME GENERATION") 
 *   **Phase Node Properties:** 
     *   `name` (e.g., "Phase 4 - Implement") 
 *   **Status Node Properties:** 
     *   `name` (e.g., "In - Progress") 



### Bring up Neo4j by running **docker compose up -d**

In [5]:
from dotenv import load_dotenv

load_dotenv(override=True)

True

### Clean the data

In [34]:
import pandas as pd

df = pd.read_csv("../data/cleaned_data.csv")

df.head()


Unnamed: 0.1,Unnamed: 0,ProjectName,ProjectDescription,ProjectType,ProjectManager,Region,Department,ProjectCost,ProjectBenefit,Complexity,Status,CompletionPercentage,Phase,Year,Month,StartDate,EndDate
0,0,Rhinestone,Associations Now Is A Casual Game To Teach You...,INCOME GENERATION,Yael Wilcox,North,Admin & BI,3648615.0,8443980.0,High,In - Progress,77,Phase 4 - Implement,2021,2,2021-02-01,2021-06-01
1,1,A Triumph Of Softwares,Is A Fully Managed Content Marketing Software ...,INCOME GENERATION,Brenda Chandler,West,eCommerce,4018835.0,9012225.0,High,Cancelled,80,Phase 2 - Develop,2021,3,2021-03-01,2021-06-01
2,2,The Blue Bird,Most Content Marketers Know The Golden Rule: Y...,INCOME GENERATION,Nyasia Hunter,North,Warehouse,4285483.0,9078339.0,High,Completed,100,Phase 4 - Implement,2021,3,2021-03-01,2021-06-01
3,3,Remembering Our Ancestors,"Utilize And Utilizes (Verb Form) The Open, Inc...",PROCESS IMPROVEMENT,Brenda Chandler,East,Sales and Marketing,5285864.0,8719006.0,High,Cancelled,75,Phase 5 - Measure,2021,3,2021-03-01,2021-06-01
4,4,Skyhawks,Is A Solution For Founders Who Want To Win At ...,WORKING CAPITAL IMPROVEMENT,Jaylyn Mckenzie,East,eCommerce,5785601.0,8630148.0,High,Completed,100,Phase 1 - Explore,2021,3,2021-03-01,2021-06-01


### Available Database in Neo4j

In [12]:
with driver.session(database="system") as session:
    result = session.run("SHOW DATABASES")
    for record in result:
        print(record["name"])


neo4j
system


### Trying to create a new DB if it fails then it'll use the default neo4j DB

In [None]:
from neo4j import GraphDatabase

NEO4J_URI = "bolt://localhost:7688"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "password"

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))

db_name = "my_new_database"

# Try creating a new database (Enterprise only)
try:
    with driver.session(database="system") as session:
        session.run(f"CREATE DATABASE {db_name}")
        print(f"Database '{db_name}' creation command sent.")
    active_db = db_name
except Exception as e:
    print(f"Error creating database '{db_name}': {e}")
    print("Falling back to default 'neo4j' database.")
    active_db = "neo4j"




Error creating database 'my_new_database': {code: Neo.ClientError.Statement.UnsupportedAdministrationCommand} {message: Unsupported administration command: CREATE DATABASE my_new_database}
Falling back to default 'neo4j' database.
Test node created in 'neo4j' database.


### Create the nodes and relationships

In [33]:
import pandas as pd
# Assume 'driver' is already configured and 'df' is your DataFrame with the new column names.

def insert_data(tx, row_data):
    # The Cypher query itself doesn't need to change, as it uses abstract parameters.
    tx.run('''
        MERGE (project:Project {name: $ProjectName})
        ON CREATE SET
           project.description = $ProjectDescription,
           project.cost = toFloat($ProjectCost),
           project.benefit = toFloat($ProjectBenefit),
           project.complexity = $Complexity,
           project.completionPercentage = toInteger($CompletionPercentage),
           project.year = toInteger($Year),
           project.month = toInteger($Month),
           project.startDate = date($StartDate),
           project.endDate = date($EndDate)

        MERGE (manager:ProjectManager {name: $ProjectManager})
        MERGE (dept:Department {name: $Department})
        MERGE (region:Region {name: $Region})
        MERGE (type:ProjectType {name: $ProjectType})
        MERGE (phase:Phase {name: $Phase})
        MERGE (status:Status {name: $Status})

        MERGE (project)-[:MANAGED_BY]->(manager)
        MERGE (project)-[:ASSIGNED_TO]->(dept)
        MERGE (project)-[:LOCATED_IN]->(region)
        MERGE (project)-[:HAS_TYPE]->(type)
        MERGE (project)-[:IN_PHASE]->(phase)
        MERGE (project)-[:HAS_STATUS]->(status)
    ''',
    row_data
    )

# --- Run the transaction ---
with driver.session() as session:
    for _, row in df.iterrows():
        session.execute_write(insert_data, row.to_dict())

print("Data insertion complete.")

Data insertion complete.
