## Create Nodes & Edges in Neo4j Graph

### Products Node

`LOAD CSV WITH HEADERS FROM 'file:///products.csv' AS row
CREATE (p:Products {productID: toInteger(row.productID), productName: row.productName, category: row.category, price: toFloat(row.price)})`

![Alt text](../graph_output/notebook_image/image-1.png)

Cypher query above is used to load data from products.csv to the Neo4j Database with all of the column names (productID, productName, category, and price). Each of the nodes above will have all of the attributes loaded to the database.

### Users Node

`LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row
CREATE (u:Users {userID: toInteger(row.userID), username: row.username, registrationDate: date(row.registrationDate), location: row.location})`

![Alt text](../graph_output/notebook_image/image.png)

Cypher query above is used to load data from users.csv to the Neo4j Database with all of the column names (userID, userName, registrationDate, and location). Each of the nodes above will have all of the attributes loaded to the database.

### Transactions Edges - Relationship between Products node and Users node

`LOAD CSV WITH HEADERS FROM 'file:///transactions.csv' AS row
MATCH (p1:Users {userID: toInteger(row.userID)}), (p2:Products {productID: toInteger(row.productID)})
CREATE (p1)-[:PURCHASED {transactionID: toInteger(row.transactionID), transactionDate: date(row.transactionDate)}]->(p2)`


![Alt text](../graph_output/notebook_image/image-2.png)

Cypher query above is used to load the transactions.csv data, containing all of the relationship between the products node and the users node. Consisting of transactionID, userID, productID, and transactionDate, this files will generate all of the edges between nodes, with all of the attributes are attached in each of the edges.

Zoomed out, all graph data

![Alt text](../graph_output/notebook_image/image-6.png)

The image above is the larger view of all the graph data that is being loaded to the Neo4j database.

## Basic Queries

### Task 1: Using cypher query language, find the top 5 users who purchased the most products.

`MATCH (u:Users)-[:PURCHASED]->(p:Products)
WITH u, COUNT(p) AS distinct_product
RETURN u, distinct_product 
ORDER BY distinct_product DESC
LIMIT 5`

![Alt text](../graph_output/notebook_image/image-7.png)

Using the cypher query above, it does find all of the users first that is purchasing the products. After that, it counts the number of products for each user, ordered it with the largest number of products, and return the top five users that have the most products purchased. From the result, it shown that from the top 5 users, 4 of them purchased the same amount of products of 9 products.

### Identify products that are often bought together

`MATCH (p1:Products)<-[:PURCHASED]-(u:Users)-[:PURCHASED]->(p2:Products)
WHERE p1 <> p2 AND p1.productID < p2.productID
WITH p1, p2, COUNT(u) AS co_purchased
ORDER BY co_purchased DESC
RETURN p1.productName AS product_1, p2.productName AS product_2, co_purchased`

![Alt text](image-4.png)

Using the cypher query above, it does find all of the products that is purchased by the same users. To eliminate the duplicated products, where statement is added to filter the data, and tried to count the number of users per product-pairs. From the result, it shown that Smart Phone and Running Shoes are the most purchased products together with 38 times. The following Smart Phone and Backpack with 35 times, Coffee Maker and Laptop 34 times, etc.

### Discover users who have similar purchasing habits (bought the same set of products).

`MATCH (u1:Users)-[:PURCHASED]->(p:Products)<-[:PURCHASED]-(u2:Users)
WHERE u1 <> u2 AND u1.userID < u2.userID
WITH u1, u2, COUNT(p) AS shared_products
ORDER BY shared_products DESC
RETURN u1.username AS user_1, u2.username AS user_2, shared_products`

![Alt text](../graph_output/notebook_image/image-5.png)

Using the cypher query above, it does find all of the user-pairs that purchased the same products, and calculate the number of purchased products per user-pairs. From the result, it shown that user "daviskristen" and user "joelperez" is having the most similar behavior of purchasing the same products with 16 times.

## Advanced Analysis

###  Create a recommendation system to suggest products to users based on their purchase history

`MATCH (u1:Users)-[:PURCHASED]->(p0:Products)
MATCH (p1:Products)<-[:PURCHASED]-(u2:Users)-[:PURCHASED]->(p2:Products)
WHERE p0 = p1 AND p0 <> p2
RETURN u1.username AS username, p1.productName AS purchased_product, p2.productName AS product_recommendation`

![Alt text](../graph_output/notebook_image/image-8.png)

Cypher query above is used to make a simple product recommendation from the user historical purchase. The first match syntax is used to list all of the users and its purchased_products. After we get the first data, the second match syntax is used to list all of the products that usually purchased together. With this two data, the two data are joinned together with the purchased_product act as anchor products, and the product-pair is used as a recommendation for the user because it is usually purchased together.

### Identify potential ”influencer” users in the network whose purchase might influence others

`MATCH (u1:Users)-[:PURCHASED]->(p:Products)<-[:PURCHASED]-(u2:Users)
WHERE u1 <> u2 AND u1.userID < u2.userID
WITH u1, COUNT(u2) AS num_interaction_user, COUNT(p) AS number_of_product
ORDER BY num_interaction_user DESC, number_of_product DESC
RETURN u1.username AS influential_user, num_interaction_user, number_of_product`

![Alt text](../graph_output/notebook_image/image-9.png)

Potential users can be identified with the number of users that is interact with and the number of products that have been purchased. With the users have similar purchasing behavior, this potential users will be become an influencer because it have a lot of users that have similar behavior to them. In addition, with a lot of purchased products thats also being purchased by the other users makes the potential users really an Influencer to the others.