## Easy questions
1) Which attributes have null values and if they do how many null values are there?
2) How many items from the metadata dataset do not fall under the 5-core constraint?
3) What is the average price and sales rank (how well the product sells within its category) of those products?
4) Which product category contains the most of those non-5-core products?

## Medium questions
1) Average rating score of all products and also the average by each category.
2) Can a user review the same product twice?
3) Can two products with different IDs share the same name?

## Hard questions
4) If the length of the written review corresponds to the rating of the product.
5) Whether individuals tend to write reviews within one product category or whether they tend to review a variety of products. Avg. number of categories reviewed per person.
6) Do they give high ratings within one product category and low ratings in a different category. Avg. rating per category per person.
7) Clusters of reviewers who review many of the same products. isomophism
8) Average distance spanning any two reviewers or any two products.

In [1]:
#setup neo4j instance connection
from py2neo import Graph
from py2neo import *
#enter url for neo4j server here
url="34.236.229.56:7474"
str_list=['http:/',url,'db','data']
#enter password here
pw="AgentSmith"

authenticate(url,"neo4j", pw)
graph = Graph('/'.join(str_list))


In [None]:
###1) Which attributes have null values and if they do how many null values are there?
#number of nulls for each property
query1="MATCH (n:Person) WHERE n.name IS NULL RETURN count(n) as name_null_cnt"

#number of distinct properties combination each person node has
#since there are person nodes with only id attribute hence all other properties other than id have null values
#same goes for product
query2="MATCH (n:Person) RETURN DISTINCT keys(n)"
query3="MATCH (n:Product) RETURN DISTINCT keys(n)"
#find node with the most to least properties
query4="MATCH (n) RETURN labels(n), keys(n), size(keys(n)), count(*) ORDER BY size(keys(n)) DESC"

In [None]:
###2) How many items from the metadata dataset do not fall under the 5-core constraint?
#find products with no reviews = 0
query5="MATCH (n:Product) WHERE not ((n)<-[:Reviewed]-(:Person)) RETURN count(n)"

################
#person who reviewed no products
#receives SERVICE UNAVAILABLE error, might be OPTIONAL MATCH call
"MATCH (a:Person) OPTIONAL MATCH (a)-[r:Reviewed]->() RETURN a.id, r"

In [None]:
###3) What is the average price of those products?
#overall average price
query6="MATCH (n:Product) RETURN AVG(tointeger(n.price))"
#average price by category
query7="MATCH (n:Product) RETURN AVG(tointeger(n.price)),n.categories"

In [None]:
###4) Which product category contains the most of those non-5-core products?
#get number of products per category
"MATCH (n:Product) RETURN count(n),n.categories"

In [None]:
###1) Average rating score of all products and also the average by each category.
#overall average rating
query6="MATCH (n:Product)-[r]-() RETURN avg(tointeger(r.score))"
#average rating by category
query7="MATCH (n:Product)-[r]-() RETURN count(n),avg(tointeger(r.score)),n.categories"

In [None]:
###2) Can a user review the same product twice?


In [None]:
graph.evaluate(query)