https://neo4j.com/graphacademy/online-training/data-science/part-3/

https://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

In [1]:
import pandas as pd
from py2neo import Graph
graph = Graph("bolt://localhost:7687", user='neo4j', password='newPassword')
# graph = Graph()

In [2]:
import matplotlib 
import matplotlib.pyplot as plt

## Page ranking

In the 2nd part of this quest for a recommendations engine, we use the PageRank algorithm to make client/customer  recommendations to an merchant or merchant reccommendations to a client. 

PageRank is an algorithm that measures the so-called transitive influence or connectivity of nodes. It can be computed by either iteratively distributing one node’s rank (originally based on degree) over its neighbors or by randomly traversing the graph and counting the frequency of hitting each node during these walks.

Check that our database is running:

In [None]:
graph.run("CALL db.schema.visualization()").data()

https://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

Define a COMPANY and merchant links between merchants of the same company:

In [None]:
companyname='DISCHEM'

In [None]:
create_merchantlink_query="""
MATCH (merchant0:Merchant {companyname: $companyname})<-[:TRANSACTED_AT]-(client:Client)-[:TRANSACTED_AT]->(merchant1:Merchant {companyname: $companyname})
WHERE merchant0.franchisename<>merchant1.franchisename 
MERGE (merchant0)-[link:MERCHANT_LINK]-(merchant1)
ON CREATE SET link.count = 1
ON MATCH SET link.count = link.count + 1
RETURN merchant0.franchisename, merchant1.franchisename;
"""
graph.run(author_articles_query,  {"companyname": companyname}).to_data_frame()

#### Find popular merchants:

In [None]:
popular_merchants_query = """
MATCH (merchant:Merchant)
RETURN merchant.franchisename, size((merchant)<-[:TRANSACTED_AT]-()) AS transactions
ORDER BY transactions DESC
LIMIT 10
"""

graph.run(popular_merchants_query).to_data_frame()

#### Pick one merchant, 'franchisename:'DIS-CHEM DAINFERN' 

In [None]:
franchisename='DIS-CHEM DAINFERN'

In [None]:
query = """
MATCH (m:Merchant {franchisename:$franchisename})
RETURN m.companyname
"""
graph.run(query, {"franchisename": franchisename}).to_data_frame()

In [None]:
# query = """
#     CALL db.index.fulltext.createNodeIndex('merchants', ['Merchant'], ['franchisename'])
# """
# graph.run(query).data()

#### Retrieve FULLTEXT indices:

In [None]:
query = """
CALL db.indexes()
YIELD name, uniqueness, type
WHERE type = "FULLTEXT"
RETURN *
"""
graph.run(query).to_data_frame()

#### Conduct a full text search on franchisename over the entire graph:

In [None]:
query = """
CALL db.index.fulltext.queryNodes("merchants", "columbine")
YIELD node, score
RETURN node.franchisename, score, [(client)-[:TRANSACTED_AT]-(node) | client.dedupestatic] AS clients
LIMIT 10
"""
graph.run(query).to_data_frame()

In [None]:
my_node = graph.run("""MATCH (merchant:Merchant {companyname:'DISCHEM'})
RETURN merchant """).data()

In [None]:
my_node

In [None]:
my_node = graph.run("""MATCH (merchant:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c:Client) 
RETURN merchant.franchisename, c.dedupestatic """).data()

In [None]:
len(my_node)

Retrieve the customers this merchant had and how many transactions they had:

In [None]:
author_articles_query = """
MATCH (:Merchant {franchisename: $franchisename})<-[:TRANSACTED_AT]-(client)
RETURN client.dedupestatic AS client, size((client)-[:TRANSACTED_AT]->()) AS other_transactions
ORDER BY other_transactions DESC
LIMIT 20
"""
graph.run(author_articles_query,  {"franchisename": franchisename}).to_data_frame()

Retrieve the customers this merchant had and how many other transactions they had EXCLUDING franchisename:

In [None]:
author_articles_query = """
MATCH (merchant1:Merchant {franchisename: $franchisename})<-[:TRANSACTED_AT]-(client)-[:TRANSACTED_AT]->(merchant2)
WHERE merchant1<>merchant2
RETURN client.dedupestatic AS client, count((client)-[:TRANSACTED_AT]->(merchant2)) AS other_transactions
ORDER BY other_transactions DESC
LIMIT 20
"""
graph.run(author_articles_query,  {"franchisename": franchisename}).to_data_frame()

In [None]:
collaborations_query = """
MATCH (:Merchant {franchisename: $franchisename})<-[:TRANSACTED_AT]-(client)-[:TRANSACTED_AT]->(comerchant)
RETURN comerchant.franchisename AS franchisename, count(*) AS cotransactions
ORDER BY cotransactions DESC
LIMIT 10
"""

graph.run(collaborations_query,  {"franchisename": franchisename}).to_data_frame()

In [None]:
my_node = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c:Client)-
[:TRANSACTED_AT]-(other:Merchant)
WHERE other.franchisename<>dischem.franchisename 
RETURN dischem.franchisename, c.dedupestatic, other.franchisename, other.companyname""").data()

In [None]:
my_node

In [None]:
my_node = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c:Client)-
[:TRANSACTED_AT]-(other:Merchant)
WHERE other.franchisename<>dischem.franchisename 
RETURN DISTINCT(c.dedupestatic), count(DISTINCT other)""").data()
my_node

In [None]:
my_node = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c:Client)-
[othertransaction:TRANSACTED_AT]-(other:Merchant)
WHERE other.franchisename<>dischem.franchisename 
RETURN DISTINCT(other.franchisename) AS other_franchisename, count(othertransaction) AS number_transactions""").data()
my_node

In [None]:
my_node = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c:Client)-
[othertransaction:TRANSACTED_AT]-(other:Merchant)
WHERE other.franchisename<>dischem.franchisename 
WITH other.franchisename AS other_franchisename, count(othertransaction) AS number_transactions 
RETURN DISTINCT(other_franchisename), number_transactions""").data()
my_node

### Common Neighbours

Common neighbors captures the idea that two strangers who have a friend in common are more likely to be introduced than those who don’t have any friends in common.

In retail and a bank graph db, this notion may be extended to imply that merchants who share clients do so for a number of reasons.  The product offering may be supplementary.  They share cclients with the same value prefeerences.

Take one client, one previously identified as a DIS-CHEM DAINFERN shopper, and measure shared nodes or commonNeighbours:

In [None]:
commonNeighbors_df = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c0:Client)
MATCH (c1:Client {dedupestatic:'2.11279273006e+11'})  
WHERE c0.dedupestatic <> c1.dedupestatic  
RETURN c0.dedupestatic as client1,c1.dedupestatic as client2, gds.alpha.linkprediction.commonNeighbors(c0, c1) as commonNeighbors
ORDER BY commonNeighbors DESC""").to_data_frame()
commonNeighbors_df

Below all clients, taken from the DainfernSquare complex is taken and compared via common neighbours.  This procedure takes LONGER to run, and it needss to be speeded up still.  There are apporximately 63000 ombinations.

In [None]:
commonNeighbors_df = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c0:Client) 
MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c1:Client) 
WHERE c0.dedupestatic <> c1.dedupestatic
RETURN c0.dedupestatic as client1,c1.dedupestatic as client2, gds.alpha.linkprediction.commonNeighbors(c0, c1) as commonNeighbors
ORDER BY commonNeighbors DESC """).to_data_frame()
commonNeighbors_df

In [None]:
my_node = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c0:Client)
WITH collect(distinct c0) as clients 
MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c1:Client) 
WHERE c1 NOT in clients AND gds.alpha.linkprediction.commonNeighbors(c0, c1)>5 
RETURN c0.dedupestatic, c1.dedupestatic""").data()
my_node

In [None]:
gds_graph_create="""CALL gds.graph.create(
    'myGraph',
    'Page',
    'LINKS',
)"""