https://neo4j.com/developer/guide-build-a-recommendation-engine/

https://www.r-bloggers.com/from-random-walks-to-personalized-pagerank/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964346/

In [1]:
import pandas as pd
from py2neo import Graph
graph = Graph("bolt://localhost:7687", user='neo4j', password='newPassword')
# graph = Graph()

In [2]:
import matplotlib 
import matplotlib.pyplot as plt

## merchantRank  (pageRank)

PageRank is an algorithm that measures connectivity of nodes, so they can be compared and ranked(connectivity: the transitive influence) It can be computed by either

1. iteratively distributing one node’s rank (originally based on degree) over its neighbors or
2. by randomly traversing the graph and counting the frequency of hitting each node during these walks.

The following PageRank code is run over the whole graph to find out the most influential Merchant in terms of transactions:

https://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

Check that our database is running:

In [3]:
graph.run("CALL db.schema.visualization()").data()

[{'nodes': [(_-2:Merchant {constraints: ['CONSTRAINT ON ( merchant:Merchant ) ASSERT (merchant.franchisename) IS UNIQUE'], indexes: [], name: 'Merchant'}),
   (_-1:Client {constraints: ['CONSTRAINT ON ( client:Client ) ASSERT (client.dedupestatic) IS UNIQUE'], indexes: [], name: 'Client'}),
   (_-3:Segment {constraints: ['CONSTRAINT ON ( segment:Segment ) ASSERT (segment.seg_l3_num) IS UNIQUE'], indexes: [], name: 'Segment'})],
  'relationships': [(Client)-[:TRANSACTED_AT {}]->(Merchant),
   (Merchant)-[:MERCHANT_VALUE_LINK {}]->(Merchant),
   (Merchant)-[:MERCHANT_LINK {}]->(Merchant),
   (Merchant)-[:MERCHANT_FEET_LINK {}]->(Merchant)]}]

### MERCHANT_FEET_LINK

The feet link is an edge between Merchants.  
1. It is created when one or more clients shopped at any time at each of the two merchants during the same month.
2. The feet link is DIRECTED by the transaction count at each of the two merchants by clients who transacted at both.   
3. If client c transacted 'a' times at A and 'b' times at B and a>b AND c is the ONLY client who transacted at both A and B, then the MERCHANT_FEET_LINK edge points from B to A. 
4. The COUNT property of the MERCHANT_FEET_LINK is equal to the number of clients who completed the edge.

In [4]:
MERCHANT_FEET_LINK_query="""MATCH (m0:Merchant)-[rel:MERCHANT_FEET_LINK]-(m1:Merchant)
WHERE ID(m0)=rel.ID0 AND ID(m1)=rel.ID1
WITH m0,m1,rel, rel.transactioncount0<=rel.transactioncount1 AS mustpointright, 
rel.transactioncount0>rel.transactioncount1 AS mustpointleft,
NOT (startNode(rel) = m0) as pointsleft,
(startNode(rel) = m0) as pointsright
RETURN ID(m0) as IDm0, ID(m1) as IDm1, m0.franchisename as franchisename0,
m1.franchisename as franchisename1, rel.transactioncount0 as transactioncount0,
rel.transactioncount1 as transactioncount1,
mustpointright, pointsright, mustpointleft, pointsleft;
"""
df=graph.run(MERCHANT_FEET_LINK_query).to_data_frame()

In [5]:
df

Unnamed: 0,IDm0,IDm1,franchisename0,franchisename1,transactioncount0,transactioncount1,mustpointright,pointsright,mustpointleft,pointsleft
0,4655367,4655368,Clicks Canal Walk,DIS-CHEM CANAL WALK,27,31,True,True,False,False
1,4655368,4655417,DIS-CHEM CANAL WALK,DIS-CHEM DAINFERN,2,2,True,True,False,False
2,4655387,4655450,DIS-CHEM KILLARNEY PHAMAC,DIS-CHEM SANDTON CITY PHA,18,12,False,False,True,True
3,4655368,4655450,DIS-CHEM CANAL WALK,DIS-CHEM SANDTON CITY PHA,1,1,True,True,False,False
4,4655387,4655484,DIS-CHEM KILLARNEY PHAMAC,Clicks Rand Steam,8,7,False,False,True,True
...,...,...,...,...,...,...,...,...,...,...
258055,4966020,5226623,DIS-CHEM CENTU,CLICKS VILLAGE,1,1,True,True,False,False
258056,5193978,5226623,CLICKS CLEARWA,CLICKS VILLAGE,1,1,True,True,False,False
258057,5186253,5229416,Dischem Sunnin,CLICKS WATERFA,2,1,False,False,True,True
258058,4971648,5229846,DIS-CHEM THREE,DIS-CHEM THR,1,1,True,True,False,False


### Create a graph built on the Dischem Clicks Merchant nodes linked only by MERCHANT_FEET_LINK:

List all existing graphs on this databse:

In [8]:
graph_list_query="""CALL gds.graph.list()"""
graph.run(graph_list_query).to_data_frame()

Unnamed: 0,graphName,memoryUsage,sizeInBytes,nodeProjection,relationshipProjection,nodeQuery,relationshipQuery,nodeCount,relationshipCount,degreeDistribution,creationTime,modificationTime,schema
0,DischemClicksFeetGraph,18 MiB,18983448,"{'Merchant': {'properties': {}, 'label': 'Merc...",{'MERCHANT_FEET_LINK': {'orientation': 'NATURA...,,,92388,258060,"{'p99': 77, 'min': 0, 'max': 680, 'mean': 2.79...",2020-06-09T15:36:59.597548000+02:00,2020-06-09T15:37:01.524147999,{'relationships': {'MERCHANT_FEET_LINK': {'cou...


If the query doesnt exist, create it as follows:

In [7]:
MERCHANT_FEET_LINK_query="""CALL gds.graph.create(
    'DischemClicksFeetGraph',
    'Merchant',
    'MERCHANT_FEET_LINK',
    {
        relationshipProperties: 'count'
    }
);
"""
graph.run(MERCHANT_FEET_LINK_query).to_data_frame()

Unnamed: 0,nodeProjection,relationshipProjection,graphName,nodeCount,relationshipCount,createMillis
0,"{'Merchant': {'properties': {}, 'label': 'Merc...",{'MERCHANT_FEET_LINK': {'orientation': 'NATURA...,DischemClicksFeetGraph,92388,258060,752


In [11]:
MERCHANT_RANK_query="""CALL gds.pageRank.stream('DischemClicksFeetGraph', { maxIterations: 50, dampingFactor: 0.85 })
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).franchisename AS franchisename, gds.util.asNode(nodeId).companyname AS companyname,score
ORDER BY score DESC, franchisename ASC;"""
df=graph.run(MERCHANT_RANK_query).to_data_frame()

ServiceUnavailable: Failed to read from defunct connection ('localhost', 7687) (Address(host='127.0.0.1', port=7687))

In [10]:
df

Unnamed: 0,franchisename,companyname,score
0,POS Clicks INS FUNDS,CLICKS,21.133721
1,Clicks The Mar,CLICKS,7.508745
2,DIS-CHEM GLEN,DISCHEM,6.979960
3,DIS-CHEM SOUTH,DISCHEM,6.702191
4,DIS-CHEM WONDE,DISCHEM,5.898821
...,...,...,...
92383,www.game.com.tw,GAME,0.150000
92384,www.sparkhaus-shop.com,SPAR,0.150000
92385,xmglobal limit,XMGLOBAL,0.150000
92386,xmglobal limited,XMGLOBAL,0.150000
