# Leiden Community Generation
This workbook focuses on detecting communities within our Neo4j dataset and saving this community information to the database nodes.

It only targets those nodes with the :v3 label at the moment. This matches the set of data casually known as Atomic Indexing.


## Project the Graph into Memory
This has to be done in order for the algorithm to act upon it. You cannot use the algorithm against a graph unless it is projected into memory first.

In this case, I've arbitrarily chosen the name "comm1" for the projection, but you can choose any name that doesn't already exist.

`weight` is the metadata property on each edge that affects this projection.

Leiden is also not intended for directed graphs, so we specify that all relationships (edges) should be treated as undirected.

```sql
MATCH (source:v3)-[r:CONTAINS]->(target:v3)
RETURN gds.graph.project(
	'comm1',
	source,
	target,
	{
		sourceNodeProperties: source { },
		targetNodeProperties: target { },
		relationshipProperties: r { .weight }
	},
	{ undirectedRelationshipTypes: ['*'] }
)
```

## Memory Estimation
This gives an estimate of how much memory is needed to load and build the communities.

Keep in mind the memory limit of Neo4j that's set in your local container or in OpenShift.

This estimate also reports the number of nodes and edges that this procedure will encompass.

```sql
CALL gds.leiden.write.estimate('comm1', {writeProperty: 'communityId', randomSeed: 19})
YIELD nodeCount, relationshipCount, requiredMemory
```

## Stats
This can give a count of the number of communities this call is expected to generate.

```sql
CALL gds.leiden.stats('comm1', { 
  includeIntermediateCommunities: true,
  concurrency: 1, 
  randomSeed: 19 
})
YIELD communityCount
```


## Stream Results
You can display the results of a Leiden algorithm without actually writing to the database. This can be useful to examine the output without committing to any write operations.

This example uses intermediate communities. The array of community IDs is in order of the pass in which it was generated. For example, index 0 holds the community ID generated from the first pass. 

Each iteration (7 total) combines communities to create larger ones. Therefore, the first community ID is also the smallest community in terms of node count.

```sql
CALL gds.leiden.stream('comm1', { 
  randomSeed: 19,
  includeIntermediateCommunities: true,
  concurrency: 1
})
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).text AS text, communityId, intermediateCommunityIds
ORDER BY communityId ASC
```

## Write Results
To actually write the information on intermediate communities back to the database, you must run the write command.

```sql
CALL gds.leiden.write('comm1', {
  writeProperty: 'intermediateCommunities',
  randomSeed: 19,
  includeIntermediateCommunities: true,
  concurrency: 1
})
YIELD communityCount, modularity, modularities
```

The modularity scores here in the context of the Leiden algorithm (or other community detection algorithms) measure the quality of the community structure detected in the graph. Modularity is a scalar value between -1 and 1 that quantifies how well the graph is divided into communities. Here's what the modularity scores represent:
- A higher modularity score indicates that the communities are well-defined, meaning there are more edges within communities and fewer edges between communities than expected by chance.
- Typical modularity values range from 0 to 1:
  - Close to 1: Strong community structure (dense connections within communities, sparse connections between them).
  - Close to 0: Weak or no community structure.
  - Negative: The community structure is worse than random (rare in practice).
- The modularity typically increases with each iteration as the algorithm optimizes the community assignments, but it may plateau or slightly fluctuate near the end.

## Querying a Community
In order to view all the nodes in a community, you can run a query like the one below. Just change the community number to the one you which to retrieve.

```sql
MATCH (n:v3)
WHERE 178493 IN(n.intermediateCommunities)
RETURN n
```