## Accessing AWS Neo4J

## Easy Questions 
1) Which attributes have null values and if they do how many null values are there?  
2) How many items from the metadata dataset do not fall under the 5-core constraint?  (run against prod)  
3) What is the average price and sales rank (how well the product sells within its category) of those products?  

In [None]:
##1) Which attributes have null values and if they do how many null values are there?
#number Persons with name = NULL; if returned count = 0 then property does not contain NULL
query1_1="MATCH (p:Person) WHERE p.id IS NULL RETURN count(p) as null_cnt" #0
query1_2="MATCH (p:Person) WHERE p.name IS NULL RETURN count(p) as null_cnt" #59975
query1_3="MATCH (p:Product) WHERE p.id IS NULL RETURN count(p) as null_cnt" #0
query1_4="MATCH (p:Product) WHERE p.name IS NULL RETURN count(p) as null_cnt" #289886
query1_5="MATCH (p:Product) WHERE p.brand IS NULL RETURN count(p) as null_cnt" #1129355
query1_6="MATCH (p:Product) WHERE p.price IS NULL RETURN count(p) as null_cnt" #274408
query1_7="MATCH (p:Product) WHERE p.categories IS NULL RETURN count(p) as null_cnt" #
query1_8="MATCH (p:Product) WHERE p.rank IS NULL RETURN count(p) as null_cnt" #
query1_9="MATCH (p:Product) WHERE p.rankCat IS NULL RETURN count(p) as null_cnt" #
query1_10="MATCH (p:Product) WHERE p.imUrl IS NULL RETURN count(p) as null_cnt" #
query1_11="MATCH ()-[r:Reviewed]->() WHERE r.ts IS NULL RETURN count(r) as null_cnt" #0
query1_12="MATCH ()-[r:Reviewed]->() WHERE r.score IS NULL RETURN count(r) as null_cnt" #
query1_13="MATCH ()-[r:Reviewed]->() WHERE r.helpful1 IS NULL RETURN count(r) as null_cnt" #
query1_14="MATCH ()-[r:Reviewed]->() WHERE r.helpful0 IS NULL RETURN count(r) as null_cnt" #
query1_15="MATCH ()-[r:Reviewed]->() WHERE r.summary IS NULL RETURN count(r) as null_cnt" #
query1_16="MATCH ()-[r:Reviewed]->() WHERE r.reviewText IS NULL RETURN count(r) as null_cnt" #

In [None]:
##2) How many products from the metadata dataset are not 5-core products?
#find products that have no reviews
query2="MATCH (n:Product) WHERE not ((n)<-[:Reviewed]-(:Person)) RETURN count(n) as null_cnt"

In [None]:
##3) What is the average price of those products? by category?
#overall average price
query3_1="MATCH (n:Product) RETURN AVG(tointeger(n.price))"
#average price by category
query3_2="MATCH (n:Product) RETURN AVG(tointeger(n.price)),n.categories"
#max price by category
query3_3="MATCH (n:Product) RETURN MAX(tointeger(n.price)) as max_price,n.categories ORDER BY max_price DESC LIMIT 5"
#overall average rating
query3_4="MATCH (n:Product)-[r]-() RETURN avg(tointeger(r.score))"
#average rating by category
query3_5="MATCH (n:Product)-[r]-() RETURN count(n),avg(tointeger(r.score)) AS avg_s,n.categories ORDER BY avg_s DESC"
#max number of helpful of a single review
query3_6="""MATCH (n:Product)<-[r:Reviewed]-(:Person) 
            RETURN max(toInteger(r.helpful1)),r,n.name,n.categories 
            ORDER BY r.helpful1 DESC LIMIT 10"""

## Medium Questions
4) Can two products with different IDs share the same name?  
5) Can a user review the same product twice?  

In [None]:
##4) Can two products with different IDs share the same name?  
#get count of all products with names that appear in more than one product node
query4="MATCH (n:Product) WITH n.name as name, count(*) as cnt WHERE cnt>1 RETURN name, cnt"

In [None]:
##5) Can a user review the same product twice?
#find more than one set of identical paths
query5="MATCH (a)-[r]->(b) WITH a, b, TAIL (COLLECT (r)) as rr WHERE length(rr) > 0 RETURN a, b"

## Hard Questions
6) Create product recommendation algorithm.  


In [None]:
#Find a book that have the words "..."
query6_1="""MATCH (a:Product)
WHERE a.categories="[['Books']]" and a.name=~"The Warriors.*"
RETURN a limit 5"""

#calculate cost of path between two nodes
query6_2="""MATCH p=(a:Product {id: '0007278446'})<-[:Reviewed*2]->(b:Product {id: '0060080817'})
RETURN REDUCE (total = 0, r in relationships(p) | total + tointeger(r.score))"""

#get all 2 degree path for node, id = 0007278446; demo in neo4J to show graph
query6_3="""MATCH p=(a:Product {id: '0007278446'})<-[:Reviewed*2]->(b:Product)
RETURN p"""

#Return top 5 highest cost path where second edge weight > first edge weight (Recommended Products)
query6_4="""MATCH p=(a:Product {id: '0007278446'})<-[:Reviewed*2]->(b:Product)
WITH  extract(s in relationships(p) | tointeger(s.score)) as p_collect,a,b
WITH  p_collect[0] as s1,p_collect[1] as s2,a,b
WHERE s2>=s1 and (b.name is not null or b.name <>"")
RETURN a.id,b.id,b.name, s1,s2,s1+s2 as cost
ORDER BY cost DESC LIMIT 5"""

In [None]:
#replace "query" with intended run query from above, i.e. graph.evaluate(query6_1)
graph.evaluate(query)