https://neo4j.com/graphacademy/online-training/data-science/part-2/

/media/lnr-ai/applications/./neo4j-desktop-offline-1.2.7-x86_64.AppImage 

In [None]:
import pandas as pd
from py2neo import Graph
graph = Graph("bolt://localhost:7687", user='neo4j', password='newPassword')
# graph = Graph()

In [None]:
import matplotlib 
import matplotlib.pyplot as plt

### Part 1, EDA

https://colab.research.google.com/github/neo4j-contrib/training-v2/blob/master/Courses/DataScience/notebooks/02_EDA.ipynb#scrollTo=0r69d4ek5huR

#### Is the Neo4J db up and running?

In [None]:
graph.run("CALL db.schema.visualization()").data()

#### In the browser, execute the following command:
MATCH (c:Client)-[transacted_at:TRANSACTED_AT]->(merchant:Merchant) RETURN c,transacted_at,merchant LIMIT 50

#### Let's drill down into the Nedbank Behaviour db. How many nodes do we have for each label?

In [None]:
# https://neo4j.com/graphacademy/online-training/data-science/part-2/
result = {"label": [], "count": []}
for label in graph.run("CALL db.labels()").to_series():
    query = f"MATCH (:`{label}`) RETURN count(*) as count"
    count = graph.run(query).to_data_frame().iloc[0]['count']
    result["label"].append(label)
    result["count"].append(count)
nodes_df = pd.DataFrame(data=result)
nodes_df.sort_values("count")

#### Visualize counts:

In [None]:
nodes_df.plot(kind='bar', x='label', y='count', legend=None, title="Node Cardinalities")
plt.yscale("log")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

#### Here are the types of relationships and their counts in the db:

In [None]:
result = {"relType": [], "count": []}
for relationship_type in graph.run("CALL db.relationshipTypes()").to_series():
    query = f"MATCH ()-[:`{relationship_type}`]->() RETURN count(*) as count"
    count = graph.run(query).to_data_frame().iloc[0]['count']
    result["relType"].append(relationship_type)
    result["count"].append(count)
rels_df = pd.DataFrame(data=result)
rels_df.sort_values("count")

#### Visualize relationship cardinalities:

In [None]:
rels_df.plot(kind='bar', x='relType', y='count', legend=None, title="Relationship Cardinalities")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

#### Explore client nodes:

Lets look at the relationship COUNTS between client and Merchant:

In [None]:
exploratory_client_query = """
MATCH (client:Client)-[transacted_at:TRANSACTED_AT]->(merchant:Merchant)
WITH client.dedupestatic as client, count(transacted_at) as number_merchant_relationships
RETURN client, number_merchant_relationships
ORDER BY number_merchant_relationships
"""
df=graph.run(exploratory_client_query).to_data_frame()
df.tail()

In [None]:
int(float(list(df['client'])[-1]))

Choose one client Dedupegroup = 91060000171000594432, the one with the most relationships.

In [None]:
dedupestatic=list(df['client'])[-1]
a_client_query = """
MATCH (client:Client {dedupestatic:$dedupestatic})-[:TRANSACTED_AT]->(merchant:Merchant)
WITH client.dedupestatic as client, merchant.franchisename as franchisename, merchant.companyname as companyname
RETURN client, franchisename,companyname
"""
graph.run(a_client_query, {"dedupestatic": dedupestatic}).to_data_frame()

In [None]:
a_client_query

#### Now let's explore the Merchant data. 

This looks at the COMPANY level first and counts the unique relationships:

In [None]:
exploratory_company_query = """
MATCH (client:Client)-[transacted_at:TRANSACTED_AT]->(merchant:Merchant)
WHERE merchant.companyname<>'Unknown'
WITH merchant.companyname as company, count(transacted_at) as number_company_relationships
RETURN company, number_company_relationships
ORDER BY number_company_relationships DESC
"""
df=graph.run(exploratory_company_query).to_data_frame()
df.head()

This looks at the MERCHANT level and counts the unique relationships:

In [None]:
exploratory_franchise_query = """
MATCH ()-[transacted_at:TRANSACTED_AT]->(merchant:Merchant)
WHERE merchant.companyname<>'Unknown'
WITH merchant.franchisename as merchant, count(transacted_at) as number_merchant_relationships
RETURN merchant, number_merchant_relationships
ORDER BY number_merchant_relationships DESC
"""
df=graph.run(exploratory_franchise_query).to_data_frame()
df.head()

Now let's explore the transaction data in more detail.  We need to zoom in on one Merchant. The following query finds Dischem and Dischem Dainfern Square in particular, the unique clients that visited this merchant, the number of unique Nedbank Clients that visited the Merchant (Merchant1) and then any other Merchant (Merchant2) these client may have visited and how many unique clients (merchant2_transactions) transacted at Merchant 2:

In [None]:
exploratory_query = """
MATCH (merchant1:Merchant {franchisename:'DIS-CHEM DAINFERN'})<-[:TRANSACTED_AT]-(client:Client)-[:TRANSACTED_AT]->(merchant2:Merchant)
WHERE merchant1<>merchant2
RETURN merchant1.franchisename AS merchant1, client.dedupestatic AS dedupestatic,  merchant2.franchisename AS merchant2, 
       size((merchant1)-[:TRANSACTED_AT]-()) AS merchant1_transactions, 
       size((merchant2)-[:TRANSACTED_AT]-()) AS merchant2_transactions
ORDER BY rand()
"""

graph.run(exploratory_query).to_data_frame()

In [None]:
exploratory_query = """
MATCH (merchant1:Merchant {franchisename:'DIS-CHEM DAINFERN'})<-[:TRANSACTED_AT]-()-[:TRANSACTED_AT]->(merchant2:Merchant)
WHERE merchant1<>merchant2
RETURN merchant1.franchisename AS merchant1, merchant2.franchisename AS merchant2, 
       size((merchant1)-[:TRANSACTED_AT]-()) AS merchant1_transactions, 
       size((merchant2)-[:TRANSACTED_AT]-()) AS merchant2_transactions
ORDER BY rand()
"""
graph.run(exploratory_query).to_data_frame()

In [None]:
query = """
MATCH (m:Merchant {companyname:'DISCHEM'})-[transaction:TRANSACTED_AT]-(client:Client)
RETURN m.franchisename AS Merchant, count(transaction) AS transactions
"""
transactions_df = graph.run(query).to_data_frame()
transactions_df.describe([.25, .5, .75, .9, .99])

In [None]:
citation_df