https://neo4j.com/developer/guide-build-a-recommendation-engine/

https://www.r-bloggers.com/from-random-walks-to-personalized-pagerank/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964346/

In [1]:
import pandas as pd
from py2neo import Graph
graph = Graph("bolt://localhost:7687", user='neo4j', password='newPassword')
# graph = Graph()

In [2]:
import matplotlib 
import matplotlib.pyplot as plt

## merchantRank  (pageRank)

PageRank is an algorithm that measures connectivity of nodes, so they can be compared and ranked(connectivity: the transitive influence) It can be computed by either

1. iteratively distributing one node’s rank (originally based on degree) over its neighbors or
2. by randomly traversing the graph and counting the frequency of hitting each node during these walks.

The following PageRank code is run over the whole graph to find out the most influential Merchant in terms of transactions:

https://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

Check that our database is running:

In [3]:
graph.run("CALL db.schema.visualization()").data()

[{'nodes': [(_-8:Merchant {constraints: ['CONSTRAINT ON ( merchant:Merchant ) ASSERT (merchant.franchisename) IS UNIQUE'], indexes: [], name: 'Merchant'}),
   (_-7:Client {constraints: ['CONSTRAINT ON ( client:Client ) ASSERT (client.dedupestatic) IS UNIQUE'], indexes: [], name: 'Client'}),
   (_-9:Segment {constraints: ['CONSTRAINT ON ( segment:Segment ) ASSERT (segment.seg_l3_num) IS UNIQUE'], indexes: [], name: 'Segment'})],
  'relationships': [(Client)-[:TRANSACTED_AT {}]->(Merchant),
   (Merchant)-[:MERCHANT_VALUE_LINK {}]->(Merchant),
   (Merchant)-[:MERCHANT_LINK {}]->(Merchant),
   (Merchant)-[:MERCHANT_FEET_LINK {}]->(Merchant)]}]

### MERCHANT_FEET_LINK

The feet link is an edge between Merchants.  
1. It is created when one or more clients shopped at any time at each of the two merchants during the same month.
2. The feet link is DIRECTED by the transaction count at each of the two merchants by clients who transacted at both.   
3. If client c transacted 'a' times at A and 'b' times at B and a>b AND c is the ONLY client who transacted at both A and B, then the MERCHANT_FEET_LINK edge points from B to A. 
4. The COUNT property of the MERCHANT_FEET_LINK is equal to the number of clients who completed the edge.

In [6]:
MERCHANT_FEET_LINK_query="""MATCH (m0:Merchant)-[rel:MERCHANT_FEET_LINK]-(m1:Merchant)
WHERE ID(m0)=rel.ID0 AND ID(m1)=rel.ID1
WITH m0,m1,rel, rel.transactioncount0<=rel.transactioncount1 AS mustpointright, 
rel.transactioncount0>rel.transactioncount1 AS mustpointleft,
NOT (startNode(rel) = m0) as pointsleft,
(startNode(rel) = m0) as pointsright
RETURN ID(m0) as IDm0, ID(m1) as IDm1, rel.count as count, m0.franchisename as franchisename0,
m1.franchisename as franchisename1, rel.transactioncount0 as transactioncount0,
rel.transactioncount1 as transactioncount1,
mustpointright, pointsright, mustpointleft, pointsleft
ORDER BY count DESC;
"""
df=graph.run(MERCHANT_FEET_LINK_query).to_data_frame()

In [8]:
df.head(20)

Unnamed: 0,IDm0,IDm1,count,franchisename0,franchisename1,transactioncount0,transactioncount1,mustpointright,pointsright,mustpointleft,pointsleft
0,4965772,4966140,552,DIS-CHEM MALL,Clicks Mall of,753,677,False,False,True,True
1,4673941,4966853,392,DIS-CHEM PAARL,Clicks Paarl M,668,512,False,False,True,True
2,4965150,4965570,382,Clicks Claremo,DIS-CHEM CLARE,523,558,True,True,False,False
3,4964871,4971515,368,DIS-CHEM BLUE,Clicks Blue R,547,498,False,False,True,True
4,4967300,4970802,348,DIS-CHEM CANAL,Clicks Canal W,477,454,False,False,True,True
5,4965796,4973598,339,DIS-CHEM KENIL,Clicks Kenilwo,479,483,True,True,False,False
6,4964861,4973989,334,DIS-CHEM BALLI,Clicks Ballito,744,459,False,False,True,True
7,4968932,4969424,330,DIS-CHEM CAPE,Clicks Cape Ga,468,426,False,False,True,True
8,4965402,4966069,325,DIS-CHEM SOMER,Clicks Somerse,496,445,False,False,True,True
9,4975172,4979607,318,Clicks South C,DIS-CHEM SOUTH,410,507,True,True,False,False


### Create a graph built on the Dischem Clicks Merchant nodes linked only by MERCHANT_FEET_LINK:

List all existing graphs on this databse:

In [9]:
graph_list_query="""CALL gds.graph.list()"""
graph.run(graph_list_query).to_data_frame()

If the query doesnt exist, create it as follows:

In [10]:
MERCHANT_FEET_LINK_query="""CALL gds.graph.create(
    'DischemClicksFeetGraph',
    'Merchant',
    'MERCHANT_FEET_LINK',
    {
        relationshipProperties: 'count'
    }
);
"""
graph.run(MERCHANT_FEET_LINK_query).to_data_frame()

Unnamed: 0,nodeProjection,relationshipProjection,graphName,nodeCount,relationshipCount,createMillis
0,"{'Merchant': {'properties': {}, 'label': 'Merc...",{'MERCHANT_FEET_LINK': {'orientation': 'NATURA...,DischemClicksFeetGraph,91942,141704,818


In [11]:
MERCHANT_RANK_query="""CALL gds.pageRank.stream('DischemClicksFeetGraph', { maxIterations: 50, dampingFactor: 0.85 })
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).franchisename AS franchisename, gds.util.asNode(nodeId).companyname AS companyname,score
ORDER BY score DESC, franchisename ASC;"""
df=graph.run(MERCHANT_RANK_query).to_data_frame()

In [13]:
df.head(20)

Unnamed: 0,franchisename,companyname,score
0,POS Clicks INS FUNDS,CLICKS,27.976779
1,DISCHEM GLENFAIR,DISCHEM,11.292279
2,DISCHEM MONTANA,DISCHEM,9.600776
3,DISCHEM LANGENHOVEN PARK,DISCHEM,8.885685
4,DISCHEM PRELLER SQAURE,DISCHEM,8.61462
5,Clicks The Mar,CLICKS,8.306158
6,CLICKS PENFORD SHOPPING C,CLICKS,8.300163
7,DISCHEM JEAN AVENUE,DISCHEM,8.251791
8,DISCHEM FLEURDAL,DISCHEM,8.243455
9,DIS-CHEM GLEN,DISCHEM,7.828613


In [16]:
df[df.companyname.isin(['DISCHEM','CLICKS'])].tail(100)

Unnamed: 0,franchisename,companyname,score
28046,Dischem Albermarle Gar,DISCHEM,0.15
28047,Dischem Ballito Lifest,DISCHEM,0.15
28048,Dischem Birch Acres,DISCHEM,0.15
28049,Dischem Braa,DISCHEM,0.15
28050,Dischem Braamfontein RH,DISCHEM,0.15
...,...,...,...
91847,payD Zapper*Dis-Chem L,DISCHEM,0.15
91848,payD Zapper*Dis-Chem M,DISCHEM,0.15
91849,payD Zapper*Dis-Chem O,DISCHEM,0.15
91850,payD Zapper*Dis-Chem P,DISCHEM,0.15
