https://neo4j.com/graphacademy/online-training/data-science/part-3/

https://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

In [1]:
import pandas as pd
from py2neo import Graph
graph = Graph("bolt://localhost:7687", user='neo4j', password='newPassword')
# graph = Graph()

In [2]:
import matplotlib 
import matplotlib.pyplot as plt

## Part 2, Recommendations

### Page Ranking

In the 2nd part of this quest for a recommendations engine, we use the PageRank algorithm to make client/customer  recommendations to an merchant or merchant reccommendations to a client. 

PageRank is an algorithm that measures the so-called transitive influence or connectivity of nodes. It can be computed by either iteratively distributing one node’s rank (originally based on degree) over its neighbors or by randomly traversing the graph and counting the frequency of hitting each node during these walks.

Check that our database is running:

In [3]:
graph.run("CALL db.schema.visualization()").data()

[{'nodes': [(_-14:Merchant {constraints: ['CONSTRAINT ON ( merchant:Merchant ) ASSERT (merchant.franchisename) IS UNIQUE'], indexes: ['franchisename'], name: 'Merchant'}),
   (_-13:Client {constraints: ['CONSTRAINT ON ( client:Client ) ASSERT (client.dedupestatic) IS UNIQUE'], indexes: [], name: 'Client'})],
  'relationships': [(Client)-[:TRANSACTED_AT {}]->(Merchant)]}]

https://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

The following PageRank code is run over the whole graph to find out the most influential Merchant in terms of transactions:

#### Find popular merchants:

In [4]:
popular_merchants_query = """
MATCH (merchant:Merchant)
RETURN merchant.franchisename, size((merchant)<-[:TRANSACTED_AT]-()) AS transactions
ORDER BY transactions DESC
LIMIT 10
"""

graph.run(popular_merchants_query).to_data_frame()

Unnamed: 0,merchant.franchisename,transactions
0,BOXER SUPERSTO,93970
1,NETFLIX.COM,84771
2,CheckersHyper,70692
3,APPLE.COM/BILL,64212
4,payD Vodacom E,63589
5,Truworths,61202
6,CHICKEN LICKEN,56052
7,UBER SA HELP.U,44448
8,MTN Eazi Rec,41583
9,PEP CELL,38983


#### Pick one merchant, 'franchisename:'DIS-CHEM DAINFERN' 

In [5]:
franchisename='DIS-CHEM DAINFERN'

In [6]:
query = """
MATCH (m:Merchant {franchisename:$franchisename})
RETURN m.companyname
"""
graph.run(query, {"franchisename": franchisename}).to_data_frame()

Unnamed: 0,m.companyname
0,DISCHEM


In [7]:
# query = """
#     CALL db.index.fulltext.createNodeIndex('merchants', ['Merchant'], ['franchisename'])
# """
# graph.run(query).data()

#### Retrieve FULLTEXT indices:

In [8]:
query = """
CALL db.indexes()
YIELD name, uniqueness, type
WHERE type = "FULLTEXT"
RETURN *
"""
graph.run(query).to_data_frame()

Unnamed: 0,name,type,uniqueness
0,merchants,FULLTEXT,NONUNIQUE


#### Conduct a full text search on franchisename over the entire graph:

In [10]:
query = """
CALL db.index.fulltext.queryNodes("merchants", "columbine")
YIELD node, score
RETURN node.franchisename, score, [(client)-[:TRANSACTED_AT]-(node) | client.dedupestatic] AS clients
LIMIT 10
"""
graph.run(query).to_data_frame()

Unnamed: 0,node.franchisename,score,clients
0,WOOLWORTHS COLUMBINE,4.751985,"[2.113942945e+11, 2.114112577e+11, 1.912449891..."
1,CHECKERS COLUMBINE,4.751985,"[1.91210293005e+11, 1.10028582802e+11, 1.10103..."
2,Clicks Columbine,4.751985,"[2.11242239909e+11, 2.11482389736e+11, 1.91871..."
3,MCDONALDS COLUMBINE,4.751985,"[1.10021078408e+11, 1.91136412536e+11, 2.11587..."
4,Checkers Columbine,4.751985,"[2.113942945e+11, 1.10262820501e+11, 1.1022603..."
5,ENGEN COLUMBINE,4.751985,"[1.91336439034e+11, 2.11084497304e+11, 2.11321..."
6,MTN COLUMBINE,4.751985,"[1.91136694905e+11, 1.91220942836e+11, 2.11255..."
7,COLUMBINE SERVICE,4.751985,"[1.91054314407e+11, 1.91442782905e+11, 1.91227..."
8,COLUMBINE SERV,4.751985,"[2.11649785003e+11, 2.11686938808e+11, 1.91178..."
9,Columbine Co - -,4.751985,"[1.91522046201e+11, 1.10105532002e+11, 1.10001..."


In [11]:
my_node = graph.run("""MATCH (merchant:Merchant {companyname:'DISCHEM'})
RETURN merchant """).data()

In [12]:
my_node

[{'merchant': (_59:Merchant {companyindex: '49', companyname: 'DISCHEM', franchisename: 'DIS-CHEM CANAL WALK'})},
 {'merchant': (_78:Merchant {companyindex: '49', companyname: 'DISCHEM', franchisename: 'DIS-CHEM KILLARNEY PHAMAC'})},
 {'merchant': (_108:Merchant {companyindex: '49', companyname: 'DISCHEM', franchisename: 'DIS-CHEM DAINFERN'})},
 {'merchant': (_141:Merchant {companyindex: '49', companyname: 'DISCHEM', franchisename: 'DIS-CHEM SANDTON CITY PHA'})},
 {'merchant': (_442:Merchant {companyindex: '49', companyname: 'DISCHEM', franchisename: 'DIS-CHEM ATHOL OAKLANDS I'})},
 {'merchant': (_497:Merchant {companyindex: '49', companyname: 'DISCHEM', franchisename: 'DIS-CHEM ATHOL OAKLAND'})},
 {'merchant': (_551:Merchant {companyindex: '49', companyname: 'DISCHEM', franchisename: 'DIS-CHEM GLEN LUCIA PHN L'})},
 {'merchant': (_630:Merchant {companyindex: '49', companyname: 'DISCHEM', franchisename: 'Dischem Ballito Lifesty'})},
 {'merchant': (_681:Merchant {companyindex: '49', com

In [13]:
my_node = graph.run("""MATCH (merchant:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c:Client) 
RETURN merchant.franchisename, c.dedupestatic """).data()

In [14]:
len(my_node)

252

Retrieve the customers this merchant had and how many transactions they had:

In [15]:
author_articles_query = """
MATCH (:Merchant {franchisename: $franchisename})<-[:TRANSACTED_AT]-(client)
RETURN client.dedupestatic AS client, size((client)-[:TRANSACTED_AT]->()) AS other_transactions
ORDER BY other_transactions DESC
LIMIT 20
"""
graph.run(author_articles_query,  {"franchisename": franchisename}).to_data_frame()

Unnamed: 0,client,other_transactions
0,191030354806.0,91
1,110084536500.0,79
2,110037086901.0,79
3,110021154503.0,70
4,191091606207.0,68
5,191843174300.0,68
6,191061835402.0,66
7,191831161106.0,66
8,110021255709.0,64
9,146353058376.0,61


Retrieve the customers this merchant had and how many other transactions they had EXCLUDING franchisename:

In [16]:
author_articles_query = """
MATCH (merchant1:Merchant {franchisename: $franchisename})<-[:TRANSACTED_AT]-(client)-[:TRANSACTED_AT]->(merchant2)
WHERE merchant1<>merchant2
RETURN client.dedupestatic AS client, count((client)-[:TRANSACTED_AT]->(merchant2)) AS other_transactions
ORDER BY other_transactions DESC
LIMIT 20
"""
graph.run(author_articles_query,  {"franchisename": franchisename}).to_data_frame()

Unnamed: 0,client,other_transactions
0,191030354806.0,90
1,110037086901.0,78
2,110084536500.0,78
3,110021154503.0,69
4,191091606207.0,67
5,191843174300.0,67
6,191831161106.0,65
7,191061835402.0,65
8,110021255709.0,63
9,146353058376.0,60


In [17]:
collaborations_query = """
MATCH (:Merchant {franchisename: $franchisename})<-[:TRANSACTED_AT]-(client)-[:TRANSACTED_AT]->(comerchant)
RETURN comerchant.franchisename AS franchisename, count(*) AS cotransactions
ORDER BY cotransactions DESC
LIMIT 10
"""

graph.run(collaborations_query,  {"franchisename": franchisename}).to_data_frame()

Unnamed: 0,franchisename,cotransactions
0,PNP CRP DEINFERN SQUAR,116
1,WOOLWORTHS MAROUN SQ,111
2,APPLE.COM/BILL,72
3,Spar Broadacres Spar,72
4,Clicks Dairnfern,62
5,NETFLIX.COM,50
6,WOOLWORTHS- BROADACR,42
7,UBER SA HELP.UBER.CO,39
8,WOOLWORTHS MAROUN SQUARE,34
9,BUILDERS WH FOURWAYS,34


In [18]:
my_node = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c:Client)-
[:TRANSACTED_AT]-(other:Merchant)
WHERE other.franchisename<>dischem.franchisename 
RETURN dischem.franchisename, c.dedupestatic, other.franchisename, other.companyname""").data()

In [19]:
my_node

[{'dischem.franchisename': 'DIS-CHEM DAINFERN',
  'c.dedupestatic': '1.10231270801e+11',
  'other.franchisename': 'Spar Broad A',
  'other.companyname': 'SPAR'},
 {'dischem.franchisename': 'DIS-CHEM DAINFERN',
  'c.dedupestatic': '1.10231270801e+11',
  'other.franchisename': 'FOURWAYS GAR',
  'other.companyname': 'Unknown'},
 {'dischem.franchisename': 'DIS-CHEM DAINFERN',
  'c.dedupestatic': '1.10231270801e+11',
  'other.franchisename': 'WOOLWORTHS- BR',
  'other.companyname': 'WOOLWORTHS'},
 {'dischem.franchisename': 'DIS-CHEM DAINFERN',
  'c.dedupestatic': '1.10231270801e+11',
  'other.franchisename': 'Clicks Fourway',
  'other.companyname': 'CLICKS'},
 {'dischem.franchisename': 'DIS-CHEM DAINFERN',
  'c.dedupestatic': '1.10231270801e+11',
  'other.franchisename': 'BUILDERS EXP C',
  'other.companyname': 'BWH'},
 {'dischem.franchisename': 'DIS-CHEM DAINFERN',
  'c.dedupestatic': '1.10231270801e+11',
  'other.franchisename': 'Spar Broadacre',
  'other.companyname': 'SPAR'},
 {'dischem

In [20]:
my_node = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c:Client)-
[:TRANSACTED_AT]-(other:Merchant)
WHERE other.franchisename<>dischem.franchisename 
RETURN DISTINCT(c.dedupestatic), count(DISTINCT other)""").data()
my_node

[{'(c.dedupestatic)': '1.10231270801e+11', 'count(DISTINCT other)': 29},
 {'(c.dedupestatic)': '1.20000229728e+11', 'count(DISTINCT other)': 13},
 {'(c.dedupestatic)': '1.91188171709e+11', 'count(DISTINCT other)': 40},
 {'(c.dedupestatic)': '1.91152212801e+11', 'count(DISTINCT other)': 36},
 {'(c.dedupestatic)': '1.28010539734e+11', 'count(DISTINCT other)': 7},
 {'(c.dedupestatic)': '1.10190737808e+11', 'count(DISTINCT other)': 8},
 {'(c.dedupestatic)': '1.20000107549e+11', 'count(DISTINCT other)': 41},
 {'(c.dedupestatic)': '1.91045038036e+11', 'count(DISTINCT other)': 9},
 {'(c.dedupestatic)': '1.10026067218e+11', 'count(DISTINCT other)': 22},
 {'(c.dedupestatic)': '1.91085617034e+11', 'count(DISTINCT other)': 16},
 {'(c.dedupestatic)': '1.20000085189e+11', 'count(DISTINCT other)': 16},
 {'(c.dedupestatic)': '1.10063411407e+11', 'count(DISTINCT other)': 17},
 {'(c.dedupestatic)': '1.91070107508e+11', 'count(DISTINCT other)': 7},
 {'(c.dedupestatic)': '1.10020319002e+11', 'count(DISTI

In [21]:
my_node = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c:Client)-
[othertransaction:TRANSACTED_AT]-(other:Merchant)
WHERE other.franchisename<>dischem.franchisename 
RETURN DISTINCT(other.franchisename) AS other_franchisename, count(othertransaction) AS number_transactions""").data()
my_node

[{'other_franchisename': 'Spar Broad A', 'number_transactions': 2},
 {'other_franchisename': 'FOURWAYS GAR', 'number_transactions': 9},
 {'other_franchisename': 'WOOLWORTHS- BR', 'number_transactions': 13},
 {'other_franchisename': 'Clicks Fourway', 'number_transactions': 4},
 {'other_franchisename': 'BUILDERS EXP C', 'number_transactions': 5},
 {'other_franchisename': 'Spar Broadacre', 'number_transactions': 19},
 {'other_franchisename': 'FOURNOS FOURWA', 'number_transactions': 1},
 {'other_franchisename': 'HERBERT EVAN', 'number_transactions': 4},
 {'other_franchisename': 'PNA DAINFERN', 'number_transactions': 34},
 {'other_franchisename': 'Dainfern squar', 'number_transactions': 5},
 {'other_franchisename': 'THE PANTRY C', 'number_transactions': 1},
 {'other_franchisename': 'Microsoft*Offi', 'number_transactions': 2},
 {'other_franchisename': 'PALM GARDENS V', 'number_transactions': 2},
 {'other_franchisename': 'WOOLWORTHS CED', 'number_transactions': 3},
 {'other_franchisename': 'H

In [22]:
my_node = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c:Client)-
[othertransaction:TRANSACTED_AT]-(other:Merchant)
WHERE other.franchisename<>dischem.franchisename 
WITH other.franchisename AS other_franchisename, count(othertransaction) AS number_transactions 
RETURN DISTINCT(other_franchisename), number_transactions""").data()
my_node

[{'other_franchisename': 'Spar Broad A', 'number_transactions': 2},
 {'other_franchisename': 'FOURWAYS GAR', 'number_transactions': 9},
 {'other_franchisename': 'WOOLWORTHS- BR', 'number_transactions': 13},
 {'other_franchisename': 'Clicks Fourway', 'number_transactions': 4},
 {'other_franchisename': 'BUILDERS EXP C', 'number_transactions': 5},
 {'other_franchisename': 'Spar Broadacre', 'number_transactions': 19},
 {'other_franchisename': 'FOURNOS FOURWA', 'number_transactions': 1},
 {'other_franchisename': 'HERBERT EVAN', 'number_transactions': 4},
 {'other_franchisename': 'PNA DAINFERN', 'number_transactions': 34},
 {'other_franchisename': 'Dainfern squar', 'number_transactions': 5},
 {'other_franchisename': 'THE PANTRY C', 'number_transactions': 1},
 {'other_franchisename': 'Microsoft*Offi', 'number_transactions': 2},
 {'other_franchisename': 'PALM GARDENS V', 'number_transactions': 2},
 {'other_franchisename': 'WOOLWORTHS CED', 'number_transactions': 3},
 {'other_franchisename': 'H

### Common Neighbours

Common neighbors captures the idea that two strangers who have a friend in common are more likely to be introduced than those who don’t have any friends in common.

In retail and a bank graph db, this notion may be extended to imply that merchants who share clients do so for a number of reasons.  The product offering may be supplementary.  They share cclients with the same value prefeerences.

Take one client, one previously identified as a DIS-CHEM DAINFERN shopper, and measure shared nodes or commonNeighbours:

In [30]:
commonNeighbors_df = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c0:Client)
MATCH (c1:Client {dedupestatic:'2.11279273006e+11'})  
WHERE c0.dedupestatic <> c1.dedupestatic  
RETURN c0.dedupestatic as client1,c1.dedupestatic as client2, gds.alpha.linkprediction.commonNeighbors(c0, c1) as commonNeighbors
ORDER BY commonNeighbors DESC""").to_data_frame()
commonNeighbors_df

Unnamed: 0,client1,client2,commonNeighbors
0,2.110886841e+11,2.11279273006e+11,11.0
1,1.91188171709e+11,2.11279273006e+11,10.0
2,1.91831161106e+11,2.11279273006e+11,9.0
3,1.10231270801e+11,2.11279273006e+11,8.0
4,1.10183522702e+11,2.11279273006e+11,8.0
...,...,...,...
246,1.91087949007e+11,2.11279273006e+11,1.0
247,1.10090793803e+11,2.11279273006e+11,1.0
248,1.10029036305e+11,2.11279273006e+11,1.0
249,1.91100151227e+11,2.11279273006e+11,1.0


Below all clients, taken from the DainfernSquare complex is taken and compared via common neighbours.  This procedure takes LONGER to run, and it needss to be speeded up still.  There are apporximately 63000 ombinations.

In [33]:
commonNeighbors_df = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c0:Client) 
MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c1:Client) 
WHERE c0.dedupestatic <> c1.dedupestatic
RETURN c0.dedupestatic as client1,c1.dedupestatic as client2, gds.alpha.linkprediction.commonNeighbors(c0, c1) as commonNeighbors
ORDER BY commonNeighbors DESC """).to_data_frame()
commonNeighbors_df

Unnamed: 0,client1,client2,commonNeighbors
0,1.91061835402e+11,1.91084371003e+11,16.0
1,1.91084371003e+11,1.91061835402e+11,16.0
2,1.918431743e+11,1.91030354806e+11,14.0
3,1.918431743e+11,1.100845365e+11,14.0
4,1.91061835402e+11,1.10265834708e+11,14.0
...,...,...,...
63247,1.30000322651e+11,6.00000357947e+11,1.0
63248,1.10206451601e+11,6.00000357947e+11,1.0
63249,1.10152180508e+11,6.00000357947e+11,1.0
63250,1.91032174406e+11,6.00000357947e+11,1.0


In [None]:
my_node = graph.run("""MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c0:Client)
WITH collect(distinct c0) as clients 
MATCH (dischem:Merchant {franchisename:'DIS-CHEM DAINFERN'})-[:TRANSACTED_AT]-(c1:Client) 
WHERE c1 NOT in clients AND gds.alpha.linkprediction.commonNeighbors(c0, c1)>5 
RETURN c0.dedupestatic, c1.dedupestatic""").data()
my_node

In [25]:
gds_graph_create="""CALL gds.graph.create(
    'myGraph',
    'Page',
    'LINKS',
)"""