# Introduction

Nowadays, traditional databases often struggle with highly connected data, leading to slow and complex queries. Neo4j, a graph database released in 2007, solves this problem by storing data as nodes and relationships instead of tables. This approach makes it faster and easier to explore connections between entities.

Neo4j is widely used in areas like social networks, recommendation systems, and knowledge graphs—where relationships matter most. In this tutorial, we will explore its capabilities by analyzing a startup ecosystem, using Cypher queries, PageRank, and Louvain community detection to uncover key insights.

# Comparison with Relational Databases 

* Advantages of Neo4j over relational databases

* Drawbacks of Neo4j compared to relational databases

* Example Cypher query to illustrate a key difference with SQL

* Key takeaways

# Installation & configuration

## Installing Neo4J
If you don't have Docker installed, you can install it from [here](https://www.docker.com/). 

First, in the terminal, pull the Neo4J image from Docker:

`docker pull neo4j`

Now, create a Neo4J instance (thanks to the docker-compose.yml file).

`docker compose up -d`

You can now access the Neo4J browser by going to [http://localhost:7474](http://localhost:7474). We will use this to visualize the graph.

The default username is `neo4j` and the default password is `password`.



In [139]:
# Installation of Neo4j 
!pip install neo4j
!pip install yfiles_jupyter_graphs_for_neo4j




[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [140]:
# Loading the libraries
from neo4j import GraphDatabase
from yfiles_jupyter_graphs_for_neo4j import Neo4jGraphWidget

# Connecting to the Neo4j database
driver = GraphDatabase.driver(uri="bolt://localhost:7687", auth=("neo4j", "password"))
session = driver.session()
g = Neo4jGraphWidget(driver)

# Dataset

First, we clear the existing database

In [141]:
session.run("""
    MATCH (n)
    DETACH DELETE n
""")

<neo4j._sync.work.result.Result at 0x26e0f644e50>

Secondly, we create two JSON datasets: one for startups and the other for investors. PS: We used chatgpt to help us generating realistic data.

In [142]:
# Here we load the dataset into the Neo4j database

# Clear existing database
session.run("""
    CREATE 
    // AI Startups (Community 1)
    (s1:Startup {name: 'OpenAI', communityId: 1, country: 'USA', technology: 'AI'}),
    (s2:Startup {name: 'DeepMind', communityId: 1, country: 'UK', technology: 'AI'}),
    (s3:Startup {name: 'Anthropic', communityId: 1, country: 'USA', technology: 'AI'}),
    (s4:Startup {name: 'Cohere', communityId: 1, country: 'Canada', technology: 'AI'}),
    (s5:Startup {name: 'Adept AI', communityId: 1, country: 'USA', technology: 'AI'}),
    (s6:Startup {name: 'Stability AI', communityId: 1, country: 'UK', technology: 'AI'}),

    // Aerospace Startups (Community 2)
    (s7:Startup {name: 'SpaceX', communityId: 2, country: 'USA', technology: 'Aerospace'}),
    (s8:Startup {name: 'Blue Origin', communityId: 2, country: 'USA', technology: 'Aerospace'}),
    (s9:Startup {name: 'Rocket Lab', communityId: 2, country: 'New Zealand', technology: 'Aerospace'}),
    (s10:Startup {name: 'Relativity Space', communityId: 2, country: 'USA', technology: 'Aerospace'}),
    (s11:Startup {name: 'Virgin Galactic', communityId: 2, country: 'USA', technology: 'Aerospace'}),
    (s12:Startup {name: 'Firefly Aerospace', communityId: 2, country: 'USA', technology: 'Aerospace'}),

    // FinTech Startups (Community 3)
    (s13:Startup {name: 'Stripe', communityId: 3, country: 'USA', technology: 'FinTech'}),
    (s14:Startup {name: 'Revolut', communityId: 3, country: 'UK', technology: 'FinTech'}),
    (s15:Startup {name: 'Klarna', communityId: 3, country: 'Sweden', technology: 'FinTech'}),
    (s16:Startup {name: 'Brex', communityId: 3, country: 'USA', technology: 'FinTech'}),
    (s17:Startup {name: 'Chime', communityId: 3, country: 'USA', technology: 'FinTech'}),
    (s18:Startup {name: 'Plaid', communityId: 3, country: 'USA', technology: 'FinTech'}),

    // Electric Vehicle Startups (Community 4)
    (s19:Startup {name: 'Tesla', communityId: 4, country: 'USA', technology: 'EV'}),
    (s20:Startup {name: 'Rivian', communityId: 4, country: 'USA', technology: 'EV'}),
    (s21:Startup {name: 'Lucid Motors', communityId: 4, country: 'USA', technology: 'EV'}),
    (s22:Startup {name: 'Nio', communityId: 4, country: 'China', technology: 'EV'}),
    (s23:Startup {name: 'Xpeng', communityId: 4, country: 'China', technology: 'EV'}),
    (s24:Startup {name: 'Fisker', communityId: 4, country: 'USA', technology: 'EV'}),

    // Blockchain Startups (Community 5)
    (s25:Startup {name: 'Binance', communityId: 5, country: 'Malta', technology: 'Blockchain'}),
    (s26:Startup {name: 'Coinbase', communityId: 5, country: 'USA', technology: 'Blockchain'}),
    (s27:Startup {name: 'Chainalysis', communityId: 5, country: 'USA', technology: 'Blockchain'}),
    (s28:Startup {name: 'Ledger', communityId: 5, country: 'France', technology: 'Blockchain'}),
    (s29:Startup {name: 'Kraken', communityId: 5, country: 'USA', technology: 'Blockchain'}),
    (s30:Startup {name: 'Uniswap', communityId: 5, country: 'Global', technology: 'Blockchain'})
""")



session.run("""
    CREATE
    (i1:Investor {name: 'Elon Musk', sector: 'AI, Aerospace, EV'}),
    (i2:Investor {name: 'Andreessen Horowitz', sector: 'FinTech, Blockchain'}),
    (i3:Investor {name: 'Sequoia Capital', sector: 'FinTech'}),
    (i4:Investor {name: 'Tim Draper', sector: 'EV, Blockchain'}),
    (i5:Investor {name: 'Binance Labs', sector: 'Blockchain'}),
    (i6:Investor {name: 'Y Combinator', sector: 'AI, FinTech'}),
    (i7:Investor {name: 'SoftBank', sector: 'AI, EV'}),
    (i8:Investor {name: 'Peter Thiel', sector: 'Aerospace, AI'}),
    (i9:Investor {name: 'Tiger Global', sector: 'FinTech'}),
    (i10:Investor {name: 'Cathie Wood', sector: 'EV, Blockchain'}),
    (i11:Investor {name: 'Lightspeed Ventures', sector: 'AI, FinTech'}),
    (i12:Investor {name: 'General Catalyst', sector: 'FinTech'}),
    (i13:Investor {name: 'Khosla Ventures', sector: 'AI, Aerospace'}),
    (i14:Investor {name: 'Founders Fund', sector: 'Aerospace, Blockchain'}),
    (i15:Investor {name: 'Coinbase Ventures', sector: 'Blockchain'}),
    (i16:Investor {name: 'Google Ventures', sector: 'AI, FinTech'}),
    (i17:Investor {name: 'Accel Partners', sector: 'FinTech'}),
    (i18:Investor {name: 'Bessemer Venture Partners', sector: 'FinTech'}),
    (i19:Investor {name: 'Benchmark', sector: 'EV, FinTech'}),
    (i20:Investor {name: 'Union Square Ventures', sector: 'Blockchain'})
""")





<neo4j._sync.work.result.Result at 0x26e1285f450>

Creation of the relationships between investors and startups

In [143]:
# AI Sector Investments
session.run("""
    MATCH (i1:Investor {name: 'Elon Musk'}), (s1:Startup {name: 'OpenAI'}), (s2:Startup {name: 'Anthropic'}),
          (s3:Startup {name: 'Adept AI'}), (s4:Startup {name: 'DeepMind'})
    CREATE (i1)-[:INVESTS_IN]->(s1),
           (i1)-[:INVESTS_IN]->(s2),
           (i1)-[:INVESTS_IN]->(s3),
           (i1)-[:INVESTS_IN]->(s4)
""")

session.run("""
    MATCH (i2:Investor {name: 'Andreessen Horowitz'}), (s1:Startup {name: 'OpenAI'}), (s2:Startup {name: 'Cohere'}),
          (s3:Startup {name: 'Hugging Face'}), (s4:Startup {name: 'Stability AI'})
    CREATE (i2)-[:INVESTS_IN]->(s1),
           (i2)-[:INVESTS_IN]->(s2),
           (i2)-[:INVESTS_IN]->(s3),
           (i2)-[:INVESTS_IN]->(s4)
""")

# Aerospace Sector Investments
session.run("""
    MATCH (i7:Investor {name: 'SoftBank'}), (s1:Startup {name: 'SpaceX'}), (s2:Startup {name: 'Blue Origin'}),
          (s3:Startup {name: 'Rocket Lab'}), (s4:Startup {name: 'Relativity Space'})
    CREATE (i7)-[:INVESTS_IN]->(s1),
           (i7)-[:INVESTS_IN]->(s2),
           (i7)-[:INVESTS_IN]->(s3),
           (i7)-[:INVESTS_IN]->(s4)
""")

session.run("""
    MATCH (i8:Investor {name: 'Peter Thiel'}), (s1:Startup {name: 'SpaceX'}), (s2:Startup {name: 'Rocket Lab'})
    CREATE (i8)-[:INVESTS_IN]->(s1),
           (i8)-[:INVESTS_IN]->(s2)
""")

session.run("""
    MATCH (i7:Investor {name: 'SoftBank'}), (s1:Startup {name: 'OpenAI'}), (s2:Startup {name: 'SpaceX'}),
          (s3:Startup {name: 'Tesla'}), (s4:Startup {name: 'Revolut'})
    CREATE (i7)-[:INVESTS_IN]->(s1),
           (i7)-[:INVESTS_IN]->(s2),
           (i7)-[:INVESTS_IN]->(s3),
           (i7)-[:INVESTS_IN]->(s4)
""")

session.run("""
    MATCH (i2:Investor {name: 'Andreessen Horowitz'}), (s1:Startup {name: 'OpenAI'}), (s2:Startup {name: 'Stripe'}),
          (s3:Startup {name: 'Coinbase'}), (s4:Startup {name: 'Tesla'})
    CREATE (i2)-[:INVESTS_IN]->(s1),
           (i2)-[:INVESTS_IN]->(s2),
           (i2)-[:INVESTS_IN]->(s3),
           (i2)-[:INVESTS_IN]->(s4)
""")

session.run("""
    MATCH (i9:Investor {name: 'Tiger Global'}), (s1:Startup {name: 'Stripe'}), (s2:Startup {name: 'Binance'}),
          (s3:Startup {name: 'Tesla'}), (s4:Startup {name: 'Hugging Face'})
    CREATE (i9)-[:INVESTS_IN]->(s1),
           (i9)-[:INVESTS_IN]->(s2),
           (i9)-[:INVESTS_IN]->(s3),
           (i9)-[:INVESTS_IN]->(s4)
""")


# FinTech Sector Investments
session.run("""
    MATCH (i3:Investor {name: 'Sequoia Capital'}), (s1:Startup {name: 'Stripe'}), (s2:Startup {name: 'Revolut'}),
          (s3:Startup {name: 'Klarna'}), (s4:Startup {name: 'Brex'})
    CREATE (i3)-[:INVESTS_IN]->(s1),
           (i3)-[:INVESTS_IN]->(s2),
           (i3)-[:INVESTS_IN]->(s3),
           (i3)-[:INVESTS_IN]->(s4)
""")

session.run("""
    MATCH (i9:Investor {name: 'Tiger Global'}), (s1:Startup {name: 'Stripe'}), (s2:Startup {name: 'Klarna'}),
          (s3:Startup {name: 'Brex'})
    CREATE (i9)-[:INVESTS_IN]->(s1),
           (i9)-[:INVESTS_IN]->(s2),
           (i9)-[:INVESTS_IN]->(s3)
""")

# Electric Vehicle Sector Investments
session.run("""
    MATCH (i10:Investor {name: 'Cathie Wood'}), (s1:Startup {name: 'Tesla'}), (s2:Startup {name: 'Nio'}),
          (s3:Startup {name: 'Rivian'})
    CREATE (i10)-[:INVESTS_IN]->(s1),
           (i10)-[:INVESTS_IN]->(s2),
           (i10)-[:INVESTS_IN]->(s3)
""")

session.run("""
    MATCH (i11:Investor {name: 'Mark Cuban'}), (s1:Startup {name: 'Tesla'}), (s2:Startup {name: 'Lucid Motors'})
    CREATE (i11)-[:INVESTS_IN]->(s1),
           (i11)-[:INVESTS_IN]->(s2)
""")

# Blockchain Sector Investments
session.run("""
    MATCH (i5:Investor {name: 'Binance Labs'}), (s1:Startup {name: 'Binance'}), (s2:Startup {name: 'Ledger'})
    CREATE (i5)-[:INVESTS_IN]->(s1),
           (i5)-[:INVESTS_IN]->(s2)
""")

session.run("""
    MATCH (i12:Investor {name: 'Accel Partners'}), (s1:Startup {name: 'Chainalysis'}), (s2:Startup {name: 'Coinbase'})
    CREATE (i12)-[:INVESTS_IN]->(s1),
           (i12)-[:INVESTS_IN]->(s2)
""")



<neo4j._sync.work.result.Result at 0x26e122f2c90>

Relationships startup-startup

In [144]:
session.run("""
    MATCH (s1:Startup {name: 'OpenAI'}), (s2:Startup {name: 'Tesla'})
    CREATE (s1)-[:COLLABORATES_WITH]->(s2)
""")

session.run("""
    MATCH (s1:Startup {name: 'Revolut'}), (s2:Startup {name: 'Stripe'})
    CREATE (s1)-[:COLLABORATES_WITH]->(s2)
""")

session.run("""
    MATCH (s1:Startup {name: 'Binance'}), (s2:Startup {name: 'Coinbase'})
    CREATE (s1)-[:COMPETES_WITH]->(s2)
""")

session.run("""
    MATCH (s1:Startup {name: 'Tesla'}), (s2:Startup {name: 'Lucid Motors'})
    CREATE (s1)-[:COMPETES_WITH]->(s2)
""")

session.run("""
    MATCH (s1:Startup {name: 'SpaceX'}), (s2:Startup {name: 'Blue Origin'})
    CREATE (s1)-[:COMPETES_WITH]->(s2)
""")

session.run("""
    MATCH (s1:Startup {name: 'DeepMind'}), (s2:Startup {name: 'Mistral AI'})
    CREATE (s1)-[:PARTNERS_WITH]->(s2)
""")


<neo4j._sync.work.result.Result at 0x26e12aaddd0>

In [145]:
# Define the Cypher query to visualize Startups and Investors

g.show_cypher("MATCH (s)-[r]->(t) RETURN s, r, t")


GraphWidget(layout=Layout(height='790px', width='100%'))

# PageRank algorithm

In [146]:
# Implementation of the PageRank algorithm to find the most important nodes in the graph

# Louvain algorithm

In [147]:
# Implementation of the Louvain algorithm to find communities

# Cross-Analyzing PageRank & Communities 

In [148]:
# Code for the cross-analyzing PageRank & Communities 

#  Real-World Use Cases 

Describe and/or cite real-world examples of how the database technology is used in different industries and applications.

# Conclusion

Conclude the tutorial with a summary of the main points and the benefits (and drawbacks) of using Neo4j for graph databases.

# References

1. [NODES 2024 – Advanced Graph Visualizations in Jupyter Notebooks](https://neo4j.com/videos/nodes-2024-advanced-graph-visualizations-in-jupyter-notebooks/)