# Introduction

Nowadays, traditional databases often struggle with highly connected data, leading to slow and complex queries. Neo4j, a graph database released in 2007, solves this problem by storing data as nodes and relationships instead of tables. This approach makes it faster and easier to explore connections between entities.

Neo4j is widely used in areas like social networks, recommendation systems, and knowledge graphs—where relationships matter most. In this tutorial, we will explore its capabilities by analyzing a startup ecosystem, using Cypher queries, PageRank, and Louvain community detection to uncover key insights.

# Comparison with Relational Databases 

To compare Neo4j and Cypher with traditional relational databases, we'll explore the advantages and drawbacks of using a graph-based approach over a tabular one.

### Advantages of Neo4j and cypher over relational SQL
One of the main advantages of Neo4j and its query language, Cypher, is that data is stored in a graph structure composed o nodes and relationships rather than in tables. This allows for efficient traversal of relationships between entities. In Neo4j, navigating from one node to another via a relationship can be done in constant time, regardless of the graph's size.

In contrast, relational databases require joins between tables to follow relationships, which can become computationally expensive, especially as the number of joins increases or when dealing with deep or complex relationships.

Therefore, this is particularly true for relationship-heavy queries, where SQL databases often need to create recursive views or perform iterative joins to explore multi-level relationships. In contrast, Neo4j can efficiently leverage its graph-native structure to maintain linear performance relative to the number of hops.

Another key advantage of Neo4j and Cypher is the simplicity and readability of queries when working with graph data. 
For example, if we want to find the names of people who are friends with someone named Alice, we can write the query in Cypher as follows:

```sql
MATCH (p:Person)-[:FRIEND_OF]->(:Person {name: 'Alice'})
RETURN p.name
```

In relational SQL, achieving the same result would require:

```sql
SELECT p1.name
FROM Person p1
JOIN Friendship f ON p1.id = f.person_id
JOIN Person p2 ON f.friend_id = p2.id
WHERE p2.name = 'Alice';
```

As shown above, relationship queries are significantly more intuitive in Cypher. 
The graph pattern-matching syntax closely reflects the actual structure of the data, making queries easier to write, read, and reason about.

The third advantage is Neo4j’s schema-free data model. In Neo4j, we are not required to define all possible relationships or properties for each node type in advance. This allows the data model to handle irregular or evolving data naturally, offering greater flexibility as your application and data grow over time.

In contrast, SQL databases rely on rigid schemas where tables, columns, and relationships must be defined upfront. Adapting to changes in data structure often requires schema migrations, making it less adaptable to dynamic or semi-structured data.

Finally, as you will see throughout this tutorial, Neo4j is both optimized for and well-suited to implementing graph algorithms. By using in-memory graph projections, we can efficiently run algorithms to uncover patterns such as communities, influence, shortest paths, and more. This makes Neo4j not just a data store, but a powerful analytical tool.

### Drawbacks of Neo4j and cypher over relational SQL
Interestingly, some of Neo4j’s strengths can also become limitations depending on the use case. If your data is highly structured and the focus is on transactions rather than relationships, Neo4j may be less efficient than relational databases. Traditional SQL databases typically perform better for workflows centered around aggregations, batch updates, or structured reporting.

Another drawback is that Neo4j is not ideal for real-time analysis on rapidly changing data. As you will see in this tutorial, running graph algorithms often requires creating in-memory projections, which work best with relatively stable snapshots of the data rather than constantly updating streams.

Finally, graph databases tend to have a higher memory overhead compared to relational SQL databases. This is due to how they store nodes and relationships in memory to enable fast traversal, which can increase resource usage—especially for large-scale datasets.

### Conclusion

As with any decision involving database management systems, it's important to carefully analyze your **use cases** before choosing the right tool. Neo4j excels in certain scenarios, but may not be the best fit for others.

A simple roadmap of requirements that might indicate Neo4j is a good choice includes:

- **Unstructured or evolving data**: When the data model is flexible and may change over time.
- **Relationship-focused queries**: When your workload involves exploring or analyzing relationships and traversals (e.g., social networks, recommendations, graphs of dependencies).
- **Stable datasets or snapshot-based analysis**: When the data is relatively stable, or when it's acceptable to analyze it using periodic snapshots rather than in real-time.


source:

https://www.quora.com/What-are-the-pros-and-cons-of-using-graph-databases-compared-to-traditional-relational-databases-in-modern-web-development

https://aws.amazon.com/fr/compare/the-difference-between-graph-and-relational-database/

https://www.thatdot.com/blog/understanding-the-scale-limitations-of-graph-databases/#:~:text=Graph%20databases%20are%20great%20at,on%20streaming%20data%20are%20desired.

https://www.nebula-graph.io/posts/why-use-graph-databases#:~:text=Graph%20databases%20provide%20a%20flexible,based%20on%20the%20collected%20insights.

https://neo4j.com/docs/getting-started/?utm_source=GSearch&utm_medium=PaidSearch&utm_campaign=Evergreen&utm_content=EMEA-Search-SEMCE-DSA-None-SEM-SEM-NonABM&utm_term=&utm_adgroup=DSA&gad_source=1&gclid=Cj0KCQjwtJ6_BhDWARIsAGanmKfTPoCcMxVBQzAo82Ng60-loTIjCV3yfWp9R_PvEh0qp6mz84Ks6yQaAiQlEALw_wcB

# Installation & configuration

## Installing Neo4J
If you don't have Docker installed, you can install it from [here](https://www.docker.com/). 

First, in the terminal, pull the Neo4J image from Docker:

`docker pull neo4j`

Now, create a Neo4J instance (thanks to the docker-compose.yml file).

`docker compose up -d`

You can now access the Neo4J browser by going to [http://localhost:7474](http://localhost:7474). We will use this to visualize the graph.

The default username is `neo4j` and the default password is `password`.



In [1]:
# Installation of Neo4j 
!pip install neo4j
!pip install yfiles_jupyter_graphs_for_neo4j




[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip is available: 24.0 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [28]:
# Loading the libraries
from neo4j import GraphDatabase
from yfiles_jupyter_graphs_for_neo4j import Neo4jGraphWidget
import json

# Connecting to the Neo4j database
driver = GraphDatabase.driver(uri="bolt://localhost:7687", auth=("neo4j", "password"))
session = driver.session()
g = Neo4jGraphWidget(driver)

# Dataset

First, we clear the existing database

In [29]:
session.run("""
    MATCH (n)
    DETACH DELETE n
""")

<neo4j._sync.work.result.Result at 0x124f39343d0>

Secondly, we create two JSON datasets: one for startups and the other for investors. PS: We used chatgpt to help us generating realistic data.

In [30]:
with open('startups.json', 'r') as file:
    data = json.load(file)
    
    # Create startups with their technology
    for tech_name, tech_data in data['technologies'].items():
        for startup in tech_data['startups']:
            session.run("""
                CREATE (s:Startup {
                    name: $name,
                    country: $country,
                    technology: $technology
                })
            """, {
                'name': startup['name'],
                'country': startup['country'],
                'technology': tech_name
            })
    
    # Create investors with their sectors
    for investor in data['investors']:
        session.run("""
            CREATE (i:Investor {
                name: $name,
                sector: $sector
            })
        """, {
            'name': investor['name'],
            'sector': ', '.join(investor['sectors'])
        })


Creation of the relationships between investors and startups

In [31]:
# AI Sector Investments
session.run("""
    MATCH (i1:Investor {name: 'Elon Musk'}), (s1:Startup {name: 'OpenAI'}), (s2:Startup {name: 'Anthropic'}),
          (s3:Startup {name: 'Adept AI'}), (s4:Startup {name: 'DeepMind'})
    CREATE (i1)-[:INVESTS_IN]->(s1),
           (i1)-[:INVESTS_IN]->(s2),
           (i1)-[:INVESTS_IN]->(s3),
           (i1)-[:INVESTS_IN]->(s4)
""")

session.run("""
    MATCH (i2:Investor {name: 'Andreessen Horowitz'}), (s1:Startup {name: 'OpenAI'}), (s2:Startup {name: 'Cohere'}),
          (s3:Startup {name: 'Hugging Face'}), (s4:Startup {name: 'Stability AI'})
    CREATE (i2)-[:INVESTS_IN]->(s1),
           (i2)-[:INVESTS_IN]->(s2),
           (i2)-[:INVESTS_IN]->(s3),
           (i2)-[:INVESTS_IN]->(s4)
""")

# Aerospace Sector Investments
session.run("""
    MATCH (i7:Investor {name: 'SoftBank'}), (s1:Startup {name: 'SpaceX'}), (s2:Startup {name: 'Blue Origin'}),
          (s3:Startup {name: 'Rocket Lab'}), (s4:Startup {name: 'Relativity Space'})
    CREATE (i7)-[:INVESTS_IN]->(s1),
           (i7)-[:INVESTS_IN]->(s2),
           (i7)-[:INVESTS_IN]->(s3),
           (i7)-[:INVESTS_IN]->(s4)
""")

session.run("""
    MATCH (i8:Investor {name: 'Peter Thiel'}), (s1:Startup {name: 'SpaceX'}), (s2:Startup {name: 'Rocket Lab'})
    CREATE (i8)-[:INVESTS_IN]->(s1),
           (i8)-[:INVESTS_IN]->(s2)
""")

session.run("""
    MATCH (i7:Investor {name: 'SoftBank'}), (s1:Startup {name: 'OpenAI'}), (s2:Startup {name: 'SpaceX'}),
          (s3:Startup {name: 'Tesla'}), (s4:Startup {name: 'Revolut'})
    CREATE (i7)-[:INVESTS_IN]->(s1),
           (i7)-[:INVESTS_IN]->(s2),
           (i7)-[:INVESTS_IN]->(s3),
           (i7)-[:INVESTS_IN]->(s4)
""")

session.run("""
    MATCH (i2:Investor {name: 'Andreessen Horowitz'}), (s1:Startup {name: 'OpenAI'}), (s2:Startup {name: 'Stripe'}),
          (s3:Startup {name: 'Coinbase'}), (s4:Startup {name: 'Tesla'})
    CREATE (i2)-[:INVESTS_IN]->(s1),
           (i2)-[:INVESTS_IN]->(s2),
           (i2)-[:INVESTS_IN]->(s3),
           (i2)-[:INVESTS_IN]->(s4)
""")

session.run("""
    MATCH (i9:Investor {name: 'Tiger Global'}), (s1:Startup {name: 'Stripe'}), (s2:Startup {name: 'Binance'}),
          (s3:Startup {name: 'Tesla'}), (s4:Startup {name: 'Hugging Face'})
    CREATE (i9)-[:INVESTS_IN]->(s1),
           (i9)-[:INVESTS_IN]->(s2),
           (i9)-[:INVESTS_IN]->(s3),
           (i9)-[:INVESTS_IN]->(s4)
""")


# FinTech Sector Investments
session.run("""
    MATCH (i3:Investor {name: 'Sequoia Capital'}), (s1:Startup {name: 'Stripe'}), (s2:Startup {name: 'Revolut'}),
          (s3:Startup {name: 'Klarna'}), (s4:Startup {name: 'Brex'})
    CREATE (i3)-[:INVESTS_IN]->(s1),
           (i3)-[:INVESTS_IN]->(s2),
           (i3)-[:INVESTS_IN]->(s3),
           (i3)-[:INVESTS_IN]->(s4)
""")

session.run("""
    MATCH (i9:Investor {name: 'Tiger Global'}), (s1:Startup {name: 'Stripe'}), (s2:Startup {name: 'Klarna'}),
          (s3:Startup {name: 'Brex'})
    CREATE (i9)-[:INVESTS_IN]->(s1),
           (i9)-[:INVESTS_IN]->(s2),
           (i9)-[:INVESTS_IN]->(s3)
""")

# Electric Vehicle Sector Investments
session.run("""
    MATCH (i10:Investor {name: 'Cathie Wood'}), (s1:Startup {name: 'Tesla'}), (s2:Startup {name: 'Nio'}),
          (s3:Startup {name: 'Rivian'})
    CREATE (i10)-[:INVESTS_IN]->(s1),
           (i10)-[:INVESTS_IN]->(s2),
           (i10)-[:INVESTS_IN]->(s3)
""")

session.run("""
    MATCH (i11:Investor {name: 'Mark Cuban'}), (s1:Startup {name: 'Tesla'}), (s2:Startup {name: 'Lucid Motors'})
    CREATE (i11)-[:INVESTS_IN]->(s1),
           (i11)-[:INVESTS_IN]->(s2)
""")

# Blockchain Sector Investments
session.run("""
    MATCH (i5:Investor {name: 'Binance Labs'}), (s1:Startup {name: 'Binance'}), (s2:Startup {name: 'Ledger'})
    CREATE (i5)-[:INVESTS_IN]->(s1),
           (i5)-[:INVESTS_IN]->(s2)
""")

session.run("""
    MATCH (i12:Investor {name: 'Accel Partners'}), (s1:Startup {name: 'Chainalysis'}), (s2:Startup {name: 'Coinbase'})
    CREATE (i12)-[:INVESTS_IN]->(s1),
           (i12)-[:INVESTS_IN]->(s2)
""")



<neo4j._sync.work.result.Result at 0x124f38a11d0>

Relationships startup-startup

In [32]:
session.run("""
    MATCH (s1:Startup {name: 'OpenAI'}), (s2:Startup {name: 'Tesla'})
    CREATE (s1)-[:COLLABORATES_WITH]->(s2)
""")

session.run("""
    MATCH (s1:Startup {name: 'Revolut'}), (s2:Startup {name: 'Stripe'})
    CREATE (s1)-[:COLLABORATES_WITH]->(s2)
""")

session.run("""
    MATCH (s1:Startup {name: 'Binance'}), (s2:Startup {name: 'Coinbase'})
    CREATE (s1)-[:COMPETES_WITH]->(s2)
""")

session.run("""
    MATCH (s1:Startup {name: 'Tesla'}), (s2:Startup {name: 'Lucid Motors'})
    CREATE (s1)-[:COMPETES_WITH]->(s2)
""")

session.run("""
    MATCH (s1:Startup {name: 'SpaceX'}), (s2:Startup {name: 'Blue Origin'})
    CREATE (s1)-[:COMPETES_WITH]->(s2)
""")

session.run("""
    MATCH (s1:Startup {name: 'DeepMind'}), (s2:Startup {name: 'Mistral AI'})
    CREATE (s1)-[:PARTNERS_WITH]->(s2)
""")


<neo4j._sync.work.result.Result at 0x124f3738350>

In [34]:
# Define the Cypher query to visualize Startups and Investors

g.show_cypher("MATCH (s)-[r]->(t) RETURN s, r, t")


AttributeError: 'NoneType' object has no attribute 'lower'

# PageRank algorithm

The PageRank algorithm ranks the nodes of a graph based on their influence. It’s a recursive algorithm where a node’s score depends on the scores of the nodes linking to it, as well as how many other nodes those linking nodes connect to. This algorithm is implemented in the "graph-data-science plugin", and we’ll break down its core functionalities.

The first step is to create an in-memory projection of the graph. The primary goal of doing this projection is to streamline the graph and isolate it from live data. This permits faster execution times as we work on a simplified version of the graph. To create such a projection, we first need to retrieve the labels of our nodes as well as the possible relationships between them.

In [8]:
labelsResponse = session.run("""CALL db.labels()""")
labels = labelsResponse.data()
print(labels)

relationshipResponse = session.run("CALL db.relationshipTypes()")
relationships = relationshipResponse.data()
print(relationships)


[{'label': 'Startup'}, {'label': 'Investor'}]
[{'relationshipType': 'INVESTS_IN'}, {'relationshipType': 'COLLABORATES_WITH'}, {'relationshipType': 'COMPETES_WITH'}]


Then, by using the labels and relationships type, we can create the projection.

In [9]:
projection_query = """
CALL gds.graph.project(
  'generalProjection', 
  ['Startup', 'Investor'], 
  {
    INVESTS_IN: {},
    COLLABORATES_WITH: {},
    COMPETES_WITH: {}
  }
)
"""
session.run(projection_query)

<neo4j._sync.work.result.Result at 0x21ad453d4d0>

Now we can run the PageRank algorithm using the stream mode. In stream mode, the algorithm computes a score for every node, which allows us to post-process the results without affecting the underlying data. Additionally, we limit the query to return only the top 10 nodes.

In [10]:
pagerankGeneralQuery = """
CALL gds.pageRank.stream('generalProjection')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
"""
pagerankGeneralRes = session.run(pagerankGeneralQuery)
for record in pagerankGeneralRes:
    print(f"Name: {record['name']}, Score: {record['score']}")

Name: Lucid Motors, Score: 0.52021484375
Name: Tesla, Score: 0.43554687500000006
Name: Coinbase, Score: 0.42731250000000004
Name: Stripe, Score: 0.424390625
Name: Blue Origin, Score: 0.37471875000000004
Name: SpaceX, Score: 0.24562500000000004
Name: OpenAI, Score: 0.22968750000000004
Name: Rocket Lab, Score: 0.22968750000000004
Name: Klarna, Score: 0.22437500000000005
Name: Brex, Score: 0.22437500000000005


Note that we can filter the type of relationship on the projection. For exemple, if we want to find the most influent node with regard to only the INVESTS_IN relationships, we can do the following projection.

In [11]:
filteredProjectionQuery = """
CALL gds.graph.project(
  'filteredProjection',
  ['Startup', 'Investor'],
  {
    INVESTS_IN: {}
  }
)"""

pagerankFilteredQuery = """
CALL gds.pageRank.stream('filteredProjection')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
"""

session.run(filteredProjectionQuery)

pagerankFilteredRes = session.run(pagerankFilteredQuery)
for record in pagerankFilteredRes:
    print(f"Name: {record['name']}, Score: {record['score']}")


Name: Stripe, Score: 0.25625000000000003
Name: SpaceX, Score: 0.24562500000000004
Name: Coinbase, Score: 0.24562500000000004
Name: Tesla, Score: 0.24031250000000004
Name: Rocket Lab, Score: 0.22968750000000004
Name: OpenAI, Score: 0.22968750000000004
Name: Brex, Score: 0.22437500000000005
Name: Klarna, Score: 0.22437500000000005
Name: Binance, Score: 0.21375000000000002
Name: Chainalysis, Score: 0.21375000000000002


In a similar fashion, we can filter nodes based on their labels. For example, if we want to focus solely on nodes with the label Startup, we can create a projection like this:

In [12]:
filteredProjectionQuery2 = """
CALL gds.graph.project(
  'filteredProjection2', 
  ['Startup'], 
  {
    INVESTS_IN: {},
    COLLABORATES_WITH: {},
    COMPETES_WITH: {}
  }
)
"""

pagerankFilteredQuery2 = """
CALL gds.pageRank.stream('filteredProjection2')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
"""

session.run(filteredProjectionQuery2)

pagerankFilteredRes2 = session.run(pagerankFilteredQuery2)
for record in pagerankFilteredRes2:
    print(f"Name: {record['name']}, Score: {record['score']}")


Name: Lucid Motors, Score: 0.385875
Name: Stripe, Score: 0.2775
Name: Blue Origin, Score: 0.2775
Name: Coinbase, Score: 0.2775
Name: Tesla, Score: 0.2775
Name: SpaceX, Score: 0.15000000000000002
Name: Stability AI, Score: 0.15000000000000002
Name: Anthropic, Score: 0.15000000000000002
Name: Relativity Space, Score: 0.15000000000000002
Name: Adept AI, Score: 0.15000000000000002


There are numerous options available to fine-tune and extend the PageRank algorithm. You can read the Neo4j documentation for more details (https://neo4j.com/docs/graph-data-science/current/algorithms/page-rank/). However, to keep this tutorial straightforward, we'll focus solely on estimating the memory cost of running the algorithm on our projections. Note that we must specify the execution mode; here, we are using stream mode.

In [13]:
estimateQuery = """
CALL gds.louvain.stream.estimate('generalProjection', {})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
"""

estimateRes = session.run(estimateQuery)
for record in estimateRes.data():
    for key, value in record.items():
        print(f"{key}: {value}")

nodeCount: 50
relationshipCount: 37
bytesMin: 8497
bytesMax: 568584
requiredMemory: [8497 Bytes ... 555 KiB]


# Louvain algorithm

The Louvain algorithm is used for community detection in graphs by maximizing a metric called modularity. It works iteratively to group nodes into communities such that nodes within the same community are densely connected, while connections between different communities are sparser. This algorithm is also implemented in the "graph-data-science plugin," and we’ll break down its core functionalities.

The Louvain algorithm also operates on an in-memory graph projection. In this tutorial, we'll use the same projection that we created for the PageRank algorithm. Like PageRank, Louvain offers various execution modes, and we'll use stream mode to stay consistent with our previous approach.

We create a query that orders the communities by the number of nodes they contain. The query also list the nodes in each community and limit the output to only the top 5 communities.

In [14]:
louvainGeneralQuery = """
CALL gds.louvain.stream('generalProjection')
YIELD nodeId, communityId
WITH communityId, 
     collect(gds.util.asNode(nodeId).name) AS nodes, 
     count(*) AS communitySize
ORDER BY communitySize DESC
LIMIT 5
RETURN communityId AS community, communitySize, nodes
"""

louvainGeneralRes = session.run(louvainGeneralQuery)
for record in louvainGeneralRes:
    print(f"community: {record['community']}, communitySize: {record['communitySize']}, nodes: {record['nodes']}")


community: 25, communitySize: 13, nodes: ['OpenAI', 'Tesla', 'Rivian', 'Lucid Motors', 'Nio', 'Binance', 'Coinbase', 'Chainalysis', 'Ledger', 'Andreessen Horowitz', 'Binance Labs', 'Cathie Wood', 'Accel Partners']
community: 15, communitySize: 6, nodes: ['Stripe', 'Revolut', 'Klarna', 'Brex', 'Sequoia Capital', 'Tiger Global']
community: 9, communitySize: 6, nodes: ['SpaceX', 'Blue Origin', 'Rocket Lab', 'Relativity Space', 'SoftBank', 'Peter Thiel']
community: 4, communitySize: 4, nodes: ['DeepMind', 'Anthropic', 'Adept AI', 'Elon Musk']
community: 3, communitySize: 1, nodes: ['Cohere']


In [15]:
louvainFilteredQuery = """
CALL gds.louvain.stream('filteredProjection')
YIELD nodeId, communityId
WITH communityId, 
     collect(gds.util.asNode(nodeId).name) AS nodes, 
     count(*) AS communitySize
ORDER BY communitySize DESC
LIMIT 5
RETURN communityId AS community, communitySize, nodes
"""

louvainFilteredRes = session.run(louvainFilteredQuery)
for record in louvainFilteredRes:
    print(f"community: {record['community']}, communitySize: {record['communitySize']}, nodes: {record['nodes']}")

community: 21, communitySize: 9, nodes: ['OpenAI', 'Tesla', 'Rivian', 'Nio', 'Coinbase', 'Chainalysis', 'Andreessen Horowitz', 'Cathie Wood', 'Accel Partners']
community: 15, communitySize: 6, nodes: ['Stripe', 'Revolut', 'Klarna', 'Brex', 'Sequoia Capital', 'Tiger Global']
community: 9, communitySize: 6, nodes: ['SpaceX', 'Blue Origin', 'Rocket Lab', 'Relativity Space', 'SoftBank', 'Peter Thiel']
community: 4, communitySize: 4, nodes: ['DeepMind', 'Anthropic', 'Adept AI', 'Elon Musk']
community: 27, communitySize: 3, nodes: ['Binance', 'Ledger', 'Binance Labs']


In [16]:
louvainFilteredQuery2 = """
CALL gds.louvain.stream('filteredProjection2')
YIELD nodeId, communityId
WITH communityId, 
     collect(gds.util.asNode(nodeId).name) AS nodes, 
     count(*) AS communitySize
ORDER BY communitySize DESC
LIMIT 5
RETURN communityId AS community, communitySize, nodes
"""

louvainFilteredRes2 = session.run(louvainFilteredQuery2)
for record in louvainFilteredRes2:
    print(f"community: {record['community']}, communitySize: {record['communitySize']}, nodes: {record['nodes']}")

community: 20, communitySize: 3, nodes: ['OpenAI', 'Tesla', 'Lucid Motors']
community: 25, communitySize: 2, nodes: ['Binance', 'Coinbase']
community: 12, communitySize: 2, nodes: ['Stripe', 'Revolut']
community: 7, communitySize: 2, nodes: ['SpaceX', 'Blue Origin']
community: 2, communitySize: 1, nodes: ['Anthropic']


There are also numerous options available for configuring the Louvain algorithm. You can find more details in the Neo4j Graph Data Science documentation at https://neo4j.com/docs/graph-data-science/current/algorithms/louvain/. The memory cost estimation for running this algorithm is similiar as how we porceeded for pageRank.

In [17]:
estimateQuery = """
CALL gds.louvain.stream.estimate('generalProjection', {})
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
"""

estimateRes = session.run(estimateQuery)
for record in estimateRes.data():
    for key, value in record.items():
        print(f"{key}: {value}")

nodeCount: 50
relationshipCount: 37
bytesMin: 8497
bytesMax: 568584
requiredMemory: [8497 Bytes ... 555 KiB]


# Cross-Analyzing PageRank & Communities 

In [18]:
# Code for the cross-analyzing PageRank & Communities 

#  Real-World Use Cases 

Describe and/or cite real-world examples of how the database technology is used in different industries and applications.

# Conclusion

Conclude the tutorial with a summary of the main points and the benefits (and drawbacks) of using Neo4j for graph databases.

# References

1. [NODES 2024 – Advanced Graph Visualizations in Jupyter Notebooks](https://neo4j.com/videos/nodes-2024-advanced-graph-visualizations-in-jupyter-notebooks/)