# Introducing py2neo

py2neo is the most popular of the Python drivers used to interact with Neo4j. For simplicity, this example assumes that you've got authentication turned off. 

You can turn authentication off by uncommenting this line in your neo4j.conf file:

`dbms.security.auth_enabled=false`

Now we'll import py2neo and write a simple query to find all the groups that have 'Python' in the name:

In [5]:
from py2neo import authenticate, Graph
authenticate("localhost:7474", "neo4j", "Paparasta1+")


In [6]:
graph = Graph()

In [8]:
query = """
MATCH (group:Group)-[:HAS_TOPIC]->(topic)
WHERE group.name CONTAINS "Python" 
RETURN group.name, COLLECT(topic.name) AS topics
"""

result = graph.run(query)

for row in result:
    print(row) 

('group.name': 'Python for Quant Finance', 'topics': ['Cloud Computing', 'New Technology', 'Python', 'Open Source', 'Machine Learning', 'Trading', 'Finance', 'Big Data', 'Computer programming', 'Predictive Analytics', 'Data Visualization', 'Data Analytics', 'Data Mining'])
('group.name': 'Python and Django Coding Session', 'topics': ['HTML', 'Computer programming', 'Front-end Development', 'Django', 'Web Technology', 'Programming Languages', 'Web Development', 'Python', 'MySQL', 'Software Development', 'CSS', 'Web Design', 'Open Source'])
('group.name': 'London Python Project Nights', 'topics': ['Open source python', 'Python Web Development', 'Projects', 'Computer programming', 'New Technology', 'Technology', 'Python', 'Software Development', 'Open Source'])


You should see a few groups and a list of the topics that they have.

# Calculating topic similarity

Now that we've got the hang of executing Neo4j queries from Python let's calculate topic similarity based on common groups so that we can use it in our queries.

We'll first import the igraph library:

In [9]:
from igraph import Graph as IGraph

Next we'll write a query which finds all pairs of topics and then works out the number of common groups. We'll use that as our 'weight' in the similarity calculation.

In [11]:
query = """
MATCH (topic:Topic)<-[:HAS_TOPIC]-()-[:HAS_TOPIC]->(other:Topic)
WHERE ID(topic) < ID(other)
RETURN topic.name, other.name, COUNT(*) AS weight
ORDER BY weight DESC
LIMIT 20
"""

graph.run(query)

<py2neo.database.Cursor at 0x7f44ca275f98>

Now let's run the query again and wrap the output in igraph:

In [14]:
query = """
MATCH (topic:Topic)<-[:HAS_TOPIC]-()-[:HAS_TOPIC]->(other:Topic)
WHERE ID(topic) < ID(other)
RETURN topic.name, other.name, COUNT(*) AS weight
"""

ig = IGraph.TupleList(graph.run(query), weights=True)
ig

<igraph.Graph at 0x7f44ca2ecc78>

We're now ready to run a community detection algorithm over the graph to see what clusters/communities we have:

In [15]:
clusters = IGraph.community_walktrap(ig, weights="weight")
clusters = clusters.as_clustering()
len(clusters)

49

Let's have a quick look at what we've got:

In [16]:
nodes = [node["name"] for node in ig.vs]
nodes = [{"id": x, "label": x} for x in nodes]
nodes[:5]

for node in nodes:
    idx = ig.vs.find(name=node["id"]).index
    node["group"] = clusters.membership[idx]
    
nodes[:10]

[{'group': 0, 'id': 'Puppet', 'label': 'Puppet'},
 {'group': 0,
  'id': 'Red Hat Enterprise Linux (RHEL)',
  'label': 'Red Hat Enterprise Linux (RHEL)'},
 {'group': 1, 'id': 'Open Source', 'label': 'Open Source'},
 {'group': 2, 'id': 'Apache Spark', 'label': 'Apache Spark'},
 {'group': 0,
  'id': 'Technology Professionals',
  'label': 'Technology Professionals'},
 {'group': 0,
  'id': 'High Availability and Disaster Recovery',
  'label': 'High Availability and Disaster Recovery'},
 {'group': 1, 'id': 'Social Issues', 'label': 'Social Issues'},
 {'group': 1,
  'id': 'Economic and Social Justice',
  'label': 'Economic and Social Justice'},
 {'group': 2, 'id': 'Data Analytics', 'label': 'Data Analytics'},
 {'group': 2, 'id': 'Data Management', 'label': 'Data Management'}]

And finally we're going to write a Cypher query which takes the results of our community detection algorithm and writes the results back into Neo4j:

In [12]:
query = """
UNWIND {params} AS p 
MATCH (t:Topic {name: p.id}) 
MERGE (cluster:Cluster {name: p.group})
MERGE (t)-[:IN_CLUSTER]->(cluster)
"""

graph.un(query, params = nodes)

