# Assignment 2

This assignment will be concerned with demonstrating the following course contents:
* Relational databases (SQL) and MongoDB (NoSQL)
* Graph Databases
* Streaming algorithms



## SQL and NoSQL

SQL databases has over the last decades been the core norm of persisting data into non-volatile memory. Although SQL is a great way to represent certain data sets, it can be quite time expensive when queries starts to compute a high number of large cartesian products between tables. To mitigate some of these performance problems, there has been an increasing interest within the area of NoSQL (i.e. a database that is not meant to have normalized relations). MongoDB is a popular NoSQL database and provides a schema-less persistency of the data in a json-like format (called bson). In contrary to SQL, the data in a NoSQL database should be denormalized as much as possible (that is, you should store it wihin a nested structure). This makes MongoDB especially interesting when storing large documents or highly detailed log-files with a high throughput.

In this exercise we're going to use the Northwind database. To give the reader a idea of how this database looks like, from a relational perspective, a ER diagram is displayed below.

<img src="http://i3.codeplex.com/download?ProjectName=northwinddatabase&DownloadId=269240" width="800" height="800" />

The equivalent MongoDB database, that we are going to use, has onfortunately not been denormalized into one collection. Because of this, the way MongoDB is being used in this exercise is somehow _a special case_.

### Exercise 1

_Task: Establish connection to MongoDB and SQLite_

In the first exercise, we are going to make a basic connection to both SQLite (a light-weight SQL engine that persists data into files) and to MongoDB.

#### Connection to SQLite

The first thing we need to do, is to import sqlite3, which is the Python module we're going to use.

In [121]:
import sqlite3 as lite

We can create a small function that returns a connection to the northwind.db (a file in the same directory).

In [122]:
def get_sqlite_connection():
    conn = lite.connect('northwind.db')
    conn.text_factory = str
    return conn

From here on, getting a connection to the SQLite DB is easy...

In [123]:
conn = get_sqlite_connection()

Given the `conn` object we can create a cursor, which is used to query the database.

In [124]:
cur = conn.cursor()

Using the cursor `cur` we can can retrieve all the products by making the sql query `SELECT * FROM PRODUCTS`.

In [125]:
cur.execute('SELECT * FROM PRODUCTS')

<sqlite3.Cursor at 0x106c00570>

The query has been executed and the result set can be obtained by calling fetchall. Here, we are just going to see the first element.

In [126]:
cur.fetchall()[:1] # Take the first element

[(1, 'Chai', 1, 1, '10 boxes x 20 bags', 18, 39, 0, 10, '0')]

We can query the database once again, this time, however, we are going to sort the result set using the `customerid`, which is a column in the table `CUSTOMERS`.

In [127]:
cur.execute('SELECT * FROM customers ORDER BY customerid ASC')
cur.fetchall()[:1]

[('ALFKI',
  'Alfreds Futterkiste',
  'Maria Anders',
  'Sales Representative',
  'Obere Str. 57',
  'Berlin',
  None,
  '12209',
  'Germany',
  '030-0074321',
  '030-0076545')]

#### Connection to MongoDB

Connecting to MongoDB is slightly more difficult, than connecting to a SQLite DB, since MongoDB is a client-server database engine and has to be configured.

First, we need to install MongoDB. After MongoDB is installed, we need to run a shell script that imports the northwind database into MongoDB. After this, we can start the MongoDB service on localhost and the default port using `mongod` in bash.

<img src="img/mongod-service-start.png" width="800" height="800" />

After we've done this, we can `pip install pymongo`, and we are finally ready to start interacting with Northwind MongoDB through Python.

In [99]:
from pymongo import MongoClient
import pymongo
import pandas as pd

We can create a function, that connects to MongoDB and then returns a connection to "Nortwind"

In [100]:
def get_mongo_connection():
    client=MongoClient('localhost',27017)
    return client['Northwind']   # Get the database

At this point, we can easily make return an MongoDB conn instance

In [101]:
c = get_mongo_connection()

We have established a connection and can use it to retrieve all the products and list the first entry.

In [102]:
products = c.products.find()
list(products)[0]

{u'CategoryID': 1,
 u'Discontinued': 0,
 u'ProductID': 1,
 u'ProductName': u'Chai',
 u'QuantityPerUnit': u'10 boxes x 20 bags',
 u'ReorderLevel': 10,
 u'SupplierID': 1,
 u'UnitPrice': 18.0,
 u'UnitsInStock': 39,
 u'UnitsOnOrder': 0,
 u'_id': ObjectId('5609b12899ded9537b06c7e5')}

Although, this is just a dict, it can be easily seen as a bson object, which is the original way that MongoDB persists data.

If we wanted to make yet another query, but this time we wanted to sort the result, we can apply a sort filter to a specific bson key. From here on, we're going to use Pandas DataFrame since they provide an easy way of pretty-printing tabular data.

In [103]:
c = get_mongo_connection()
customers = c.customers.find().sort('CustomerID', pymongo.ASCENDING)
top10 = [customer['CustomerID'] for customer in customers][:10]
pd.DataFrame(top10, columns=['CustomerID'])

Unnamed: 0,CustomerID
0,ALFKI
1,ANATR
2,ANTON
3,AROUT
4,BERGS
5,BLAUS
6,BLONP
7,BOLID
8,BONAP
9,BOTTM


The list above shows the top 10 customer ids, sorted in ascending order, from the customer collection.

### Exercise 2

In this exercise we're going to retrieve all orders made by customer, with customerid `'ALFKI'` for both databases.

In [104]:
cur.execute("SELECT * FROM customers AS c "
            "INNER JOIN orders AS o ON c.customerid=o.customerid "
            "WHERE o.customerid='ALFKI'")

<sqlite3.Cursor at 0x106c60650>

Similarily to the last exercise, we can fetch the data from the SQLite DB and into a pandas DataFrame.

In [105]:
pd.DataFrame(cur.fetchall())

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,15,16,17,18,19,20,21,22,23,24
0,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,...,1997-09-22 00:00:00.000,1997-09-02 00:00:00.000,1,29.46,Alfreds Futterkiste,Obere Str. 57,Berlin,,12209,Germany
1,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,...,1997-10-31 00:00:00.000,1997-10-13 00:00:00.000,2,61.02,Alfred-s Futterkiste,Obere Str. 57,Berlin,,12209,Germany
2,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,...,1997-11-24 00:00:00.000,1997-10-21 00:00:00.000,1,23.94,Alfred-s Futterkiste,Obere Str. 57,Berlin,,12209,Germany
3,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,...,1998-02-12 00:00:00.000,1998-01-21 00:00:00.000,3,69.53,Alfred-s Futterkiste,Obere Str. 57,Berlin,,12209,Germany
4,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,...,1998-04-27 00:00:00.000,1998-03-24 00:00:00.000,1,40.42,Alfred-s Futterkiste,Obere Str. 57,Berlin,,12209,Germany
5,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,...,1998-05-07 00:00:00.000,1998-04-13 00:00:00.000,1,1.21,Alfred-s Futterkiste,Obere Str. 57,Berlin,,12209,Germany


Doing a inner join is straight-forward with SQL, since it is one of the core concepts of relational databases.

Getting the same data from Northwind MongoDB is slightly less clean, since we have to do this in a nested for-loop. In a real world scenario, we should not be forced to merge collection on such a trivial query.

In [106]:
for customer in c.customers.find({'CustomerID':'ALFKI'}):
    # Join on the customerid
    for order in c.orders.find({'CustomerID': customer['CustomerID']}):
        # Join on the orderid
        for details in c['order-details'].find({'OrderID': order['OrderID']}):
            joined_result = order.copy()
            joined_result.update(details)
r = cur.fetchall()

Finally, we can print the dictionary values from `joined_result` into a DataFrame and display the result.

In [107]:
# Pretty print using pandas
pd.DataFrame([joined_result.values()], columns=joined_result.keys())

Unnamed: 0,OrderID,ShipVia,ShippedDate,ShipName,EmployeeID,ShipPostalCode,ShipCity,ShipRegion,OrderDate,CustomerID,Discount,Quantity,Freight,RequiredDate,_id,UnitPrice,ShipAddress,ProductID,ShipCountry
0,11011,1,1998-04-13 00:00:00.000,Alfred's Futterkiste,3,12209,Berlin,,1998-04-09 00:00:00.000,ALFKI,0,20,1.21,1998-05-07 00:00:00.000,5609b12899ded9537b06c3f0,21.5,Obere Str. 57,71,Germany


### Exercise 3
In this exercise we're going to retrieve *all orders (with products) made by ALFKI that contain at least 2 product types.*.
To do this, we need to make an inner join between the tables `CUSTOMERS` and `ORDERS` on the foreign key `customerid`:

The `SELECT` statement below gets _all_ (\*) fields from the joined result of `CUSTOMERS`, `ORDERS`, `ORDER DETAILS` and `PRODUCTS`. The joins is a `INNER JOIN`, meaning it is only going to match the intersection of the tables where their shared key is equal.
A `WHERE` clause is provided, instructing the SQL engine to only return tuples where `customerid` is equal to `'ALFKI'`. Finally, the `Order Details` tuples that haves more than 2 unique `productid` values is chosen and grouped by `orderid`, which aggregates the results.

In [108]:
cur.execute('SELECT * FROM customers AS c'
                    ' INNER JOIN orders AS o ON c.customerid=o.customerid '
                    ' INNER JOIN "Order Details" AS od ON od.orderid=o.orderid '
                    ' INNER JOIN products AS p ON p.productid=od.productid '
                    " WHERE c.customerid='ALFKI'"
                    ' GROUP BY o.orderid HAVING count(distinct od.productid) >= 2')

<sqlite3.Cursor at 0x106c60650>

Yet again, we can fetch the result from the SQLite DB and into a pandas DataFrame.

In [109]:
r = cur.fetchall()
pd.DataFrame(r)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,30,31,32,33,34,35,36,37,38,39
0,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,...,46,Spegesild,21,8,4 - 450 g glasses,12.0,95,0,0,0
1,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,...,76,Lakkalik��ri,23,1,500 ml,18.0,57,0,20,0
2,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,...,77,Original Frankfurter gr�ne So�e,12,2,12 boxes,13.0,32,0,15,0
3,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,...,28,R�ssle Sauerkraut,12,7,25 - 825 g cans,45.6,26,0,0,1
4,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,...,71,Flotemysost,15,4,10 - 500 g pkgs.,21.5,26,0,0,0


Performing the same query on the MongoDB is slightly more complex. Just as we did before, we need to make a nested iterator that joins the collections on their shared key. This time, however, we also need to write a Reduce function in JavaScript that counts the occurunces of each unique `orderid` in the `Order Details` collection.

In [24]:
from bson.code import Code
reducer = Code(
        """
            function(obj, prev){
              prev.count++;
            }
        """)

The reducer above takes the current object and a variable `prev` that is kept in memory from the last fold. After each reduce job has run, the result thus the same as a `count(DISTINCT od.productid)` in SQL. However, if wanted, this reducer can be easily parallelized across multiple nodes in a hadoop cluster.

We can take the nested for-loop from the previous MongoDB query and modify it to our needs. This time we need to apply the reducer in a group by aggregation and filter out any objects that has count =< 2.

In [26]:
c = get_mongo_connection()
orders = []

for customer in c.customers.find({'CustomerID':'ALFKI'}):
    # Join customer and orders on customerid
    for order in c.orders.find({'CustomerID': customer['CustomerID']}):
        # Apply the reducer in a group aggregation
        results = c['order-details'].group(key={"OrderID": 1}, condition={}, initial={"count": 0}, reduce=reducer)
        for details in results:
            if details['OrderID'] == order['OrderID'] and details['count'] >= 2:
                orders.append(order)
                joined_result = order.copy()
                joined_result.update(details)

Finally, we can pretty print the result using a Pandas DataFrame

In [27]:
pd.DataFrame(orders)

Unnamed: 0,CustomerID,EmployeeID,Freight,OrderDate,OrderID,RequiredDate,ShipAddress,ShipCity,ShipCountry,ShipName,ShipPostalCode,ShipRegion,ShipVia,ShippedDate,_id
0,ALFKI,6,29.46,1997-08-25 00:00:00.000,10643,1997-09-22 00:00:00.000,Obere Str. 57,Berlin,Germany,Alfreds Futterkiste,12209,,1,1997-09-02 00:00:00.000,5609b12899ded9537b06c632
1,ALFKI,4,23.94,1997-10-13 00:00:00.000,10702,1997-11-24 00:00:00.000,Obere Str. 57,Berlin,Germany,Alfred's Futterkiste,12209,,1,1997-10-21 00:00:00.000,5609b12899ded9537b06c66d
2,ALFKI,1,69.53,1998-01-15 00:00:00.000,10835,1998-02-12 00:00:00.000,Obere Str. 57,Berlin,Germany,Alfred's Futterkiste,12209,,3,1998-01-21 00:00:00.000,5609b12899ded9537b06c6f2
3,ALFKI,1,40.42,1998-03-16 00:00:00.000,10952,1998-04-27 00:00:00.000,Obere Str. 57,Berlin,Germany,Alfred's Futterkiste,12209,,1,1998-03-24 00:00:00.000,5609b12899ded9537b06c767
4,ALFKI,3,1.21,1998-04-09 00:00:00.000,11011,1998-05-07 00:00:00.000,Obere Str. 57,Berlin,Germany,Alfred's Futterkiste,12209,,1,1998-04-13 00:00:00.000,5609b12899ded9537b06c7a2


If we cross-check this result set with the previous result set, produced by SQLite, we can see that the tuples and objects refer to the same data points (except for the order)

#### Exercise 4

In this exercise we're going to investigate who have ordered 'Uncle Bob’s Organic Dried Pears' (ProductID 7) and how many they've ordered.

The following `SELECT` statement retrieves the tuple fields `customerid` and `product_count` (alias for `count()` function). In order to count the product for each customerid we, once again, need to make a `INNER JOIN` between the tables on their shared keys. Finally, we can group by customerid to aggregate the result and then order the result by `product_count`.

In [40]:
cur.execute('SELECT c.customerid, count(od.productid) as product_count FROM customers AS c'
                    ' INNER JOIN orders AS o ON c.customerid=o.customerid '
                    ' INNER JOIN "Order Details" AS od ON od.orderid=o.orderid '
                    ' INNER JOIN products AS p ON p.productid=od.productid '
                    ' WHERE p.productid=7 '
                    ' GROUP BY o.customerid '
                    ' ORDER BY product_count DESC')

<sqlite3.Cursor at 0x10785b5e0>

The result is retrieved by calling `fetchall()` and load it into a DataFrame

In [41]:
pd.DataFrame(cur.fetchall(), columns=['CustomerID','Product count'])

Unnamed: 0,CustomerID,Product count
0,RATTC,3
1,BONAP,2
2,BSBEV,2
3,EASTC,2
4,OTTIK,2
5,QUICK,2
6,REGGC,2
7,VICTE,2
8,BOTTM,1
9,ERNSH,1


The result shows that the customer, with customerid `RATTC` has made the highest purchase count on 3 times _Uncle Bob’s Organic Dried Pears_.

To do the same thing in MongoDB, we once again need to make a nested for-loop. This time we're not going to make a reducer, but just going to count the occorunces using a `defaultdict`. The reason is that the `pymongo` module does not come with a feature of grouping across more than two collections. In a real-world case, this would usually not be the case since the data is not joined across many collections.

In [64]:
from collections import defaultdict
product_count = defaultdict(lambda: 0)

for customer in c.customers.find():
    # For each customer, find the orders
    for order in c.orders.find({'CustomerID':customer['CustomerID']}):
        # For each order, find the order details
        for details in c['order-details'].find({'OrderID':order['OrderID'], 'ProductID': 7}):
                # Count each occorunce by the key CustomerID
                product_count[customer['CustomerID']] += 1

In [65]:
pd.DataFrame(product_count.items(), columns=['CustomerID', 'Product count']).sort('Product count', ascending=False)

Unnamed: 0,CustomerID,Product count
10,RATTC,3
4,BSBEV,2
16,EASTC,2
6,BONAP,2
15,VICTE,2
8,REGGC,2
9,OTTIK,2
13,QUICK,2
12,LILAS,1
18,OCEAN,1


Looking at the result it is apparent that the same result set is returned by both databases.

#### Exercise 5

In this exercise we are interesed in retrieving those products that have been bought by customers, who also bought 'Uncle Bob’s Organic Dried Pears' (ProductID 7) and how many different products.
This exercise is slightly more complex than the previous exercises, since we need to make a self join on a query that joins multiple tables (i.e. `orders`, `order details` and `products`).

In [130]:
stmt = """
SELECT count(p.productid), p.productname FROM orders
INNER JOIN [Order Details] AS od ON od.orderid=orders.orderid
INNER JOIN products AS p ON p.productid=od.productid
WHERE CustomerID IN
            (SELECT CustomerID FROM orders
             INNER JOIN [Order Details] ON [Order Details].orderid=orders.orderid
             WHERE [Order Details].productid=7) AND od.productid!=7
GROUP BY od.productid
"""

The above `SELECT` statement makes an `inner join` between the tables `orders`, `order details` and `products`. The `WHERE` clause then matches the intersection between the `CustomerID` from the parent query and the `CustomerID` from the subquery, where the `ProductID` equals `7`. Finally, the tuples are grouped by productid and the result is counted.

In the code below, the statement is executed and sent to stdout in a DataFrame

In [142]:
cur.execute(stmt)
df_sql = pd.DataFrame(cur.fetchall(), columns=['count','ProductName_sqlite']).sort('ProductName_sqlite')
df_sql

Unnamed: 0,count,ProductName_sqlite
15,16,Alice Mutton
2,7,Aniseed Syrup
38,13,Boston Crab Meat
58,19,Camembert Pierrot
16,9,Carnarvon Tigers
0,8,Chai
1,18,Chang
37,10,Chartreuse verte
3,6,Chef Anton's Cajun Seasoning
4,6,Chef Anton's Gumbo Mix


We can do the same operation on MongoDB. Since we need to join 3 collections, we need to make some iterations. This process is significantly more time-consuming, than if the data was stored in the same collection.

The first thing we need to do, is to import a `defaultdict`, which we're going to use for counting the products. In order to not make the code too complex, we've not used a reducer-function to do the counting.

In [132]:
from collections import defaultdict

The first thing we need to do, is to find the customers that have ordered `product_id` 7. The function below takes a product_id as an argument and returns a set of customer_ids.

In [133]:
def get_customers_with_productid(product_id):
    """
        Takes product_id as an argument and returns a set of customer_ids
        Args:
            product_id (str)
        Returns:
            customers - (set) customer_ids
    """
    customers = set()
    # Find all the order details with the corresponding product_id
    for details in c['order-details'].find({'ProductID': product_id}):
        # For each detail, find the order.
        for order in c.orders.find({'OrderID':details['OrderID']}).distinct('CustomerID'):
            # Take the customer id and add it to a set
            customers.add(order)
    
    return customers

We can call the function and pass `product_id` 7 as an argument, to get the customers that bought this product.

In [134]:
customers = get_customers_with_productid(7)

We can construct another function to retrieve the products that was bought by the customers who also bought `productid` 7.

In [136]:
def get_products_bought_by_customers(list_of_customers):
    """
        Takes a list/set of customer_ids and returns the products that the customer also bought,
        together with a count in a dictionary.
        Args:
            list-of-customers -- (iterable) list/set of customer_ids
        Returns
            products -- (dict) dictionary of products together with their count
    """
    products = defaultdict(lambda: 0)
    
    # For each of the customer_ids in the given list..
    for customer_id in customers:
        # Find the orders for that customer_id
        for order in c.orders.find({'CustomerID':customer_id}):
            # For each order, find the order detail
            for detail in c['order-details'].find({'OrderID':order['OrderID']}):
                # Given the order detail, find the product name
                for product_name in c.products.find({'ProductID': detail['ProductID']}):
                    # Increment default dictionary for each product_name
                    products[product_name['ProductName']] += 1
    return products

We can finally use the function above, to get a dictionary of products together with their count

In [137]:
products = get_products_bought_by_customers(customers)

In this case we are not interested in the product "Uncle Bob's Organic Dried Pears", so we can just pop it off the dictionary.

In [138]:
products.pop("Uncle Bob's Organic Dried Pears")

29

Looking at the result below, it is apparent that the result obtained is similar to the previous result obtained for SQLite.

In [139]:
df_mongo = pd.DataFrame(products.items(), columns=['ProductName_mongo', 'count']).sort('ProductName_mongo')
df_mongo

Unnamed: 0,ProductName_mongo,count
38,Alice Mutton,16
74,Aniseed Syrup,7
42,Boston Crab Meat,13
31,Camembert Pierrot,19
59,Carnarvon Tigers,9
2,Chai,8
53,Chang,18
9,Chartreuse verte,10
0,Chef Anton's Cajun Seasoning,6
19,Chef Anton's Gumbo Mix,6


#### Exercise 6

Of those products ordered by customers who have also ordered “Uncle Bob’s Organic Dried Pears”, which one has been ordered the most (by the same set of customers).

To achieve this result, we can just sort the DataFrame we instantiated earlier in descending order by the count.

In [153]:
df_sql.sort('count', ascending=False)

Unnamed: 0,count,ProductName_sqlite
39,22,Jack's New England Clam Chowder
60,22,Tarte au sucre
57,19,Raclette Courdavault
58,19,Camembert Pierrot
1,18,Chang
54,18,Gnocchi di nonna Alice
72,17,Rh�nbr�u Klosterbier
29,17,Gorgonzola Telino
49,16,Manjimup Dried Apples
69,16,Mozzarella di Giovanni


Since we also have a similar DataFrame from MongoDB called `df_mongo` we can again call `sort()` on the DataFrame to get the count in descending order.

In [155]:
df_mongo.sort('count', ascending=False)

Unnamed: 0,ProductName_mongo,count
33,Jack's New England Clam Chowder,22
18,Tarte au sucre,22
48,Raclette Courdavault,19
31,Camembert Pierrot,19
53,Chang,18
69,Gnocchi di nonna Alice,18
8,Rhönbräu Klosterbier,17
72,Gorgonzola Telino,17
49,Manjimup Dried Apples,16
34,Mozzarella di Giovanni,16


#### Exercise 7

The customer with customerID ALFKI has bought a series of products. Determine which other customers have bought most of the same products (product types – 10 apples is no better than 1 apple).

In [None]:
# TODO

### Week 6

#### Exercise 1

In [None]:
# TODO

#### Exercise 2

We can get all the orders placed by the customer `ALFKI` by making a simple Cypher query containing a `WHERE` clause on `customerID`.

MATCH (c:Customer)-->(o:Order)-->(p:Product) WHERE c.customerID = 'ALFKI' RETURN c,o,p

The `WHERE` clause in Cypher Query Language is similar to SQL when performing equality comparisons.

<img src="img/exercises_week6/exercise6-2.png" width="400" height="400" />

### Exercise 3

In [160]:
# TODO

The customer with customerID ALFKI has made a number of orders containing some products. Return orders made by ALFKI that contain at least 2 products. Also return the products.

```
MATCH (c:Customer)-->(o:Order)-->(p:Product) 
WITH c as c, count(p) as p_count 
WHERE c.customerID = 'ALFKI' AND p_count > 1  
RETURN c,p_count
```

<img src="img/exercises_week6/exercise6-3.png" width="200" height="200" />

### Exercise 4

Determine how many and who has ordered “Uncle Bob’s Organic Dried Pears” (productID 7).

We can start by drawing a graph of those people that have bought “Uncle Bob’s Organic Dried Pears” (productID 7).

```
MATCH (c:Customer)-->(o:Order)-->(p:Product) 
WHERE p.productID = '7' 
RETURN p,o,c, count(o)
```

The query above joins the relationships between `Customer`,`Order` and `Product`. The query is then grouped in reverse order.

<img src="img/exercises_week6/exercise6-4.png" width="600" height="600" />

This graph, however, does not depict the product count for each customer. To do this, we can represent the data in a tabular format instead. Since we don't need to draw the query, we can skip the intermediary `Product` relationship:
```
MATCH (c:Customer)-->(o:Order)-->(p:Product) 
WHERE p.productID = '7' RETURN c, count(o)
```

| c.customerID | count(o) | 
|--------------|----------| 
| VAFFE        | 1        | 
| BOTTM        | 1        | 
| BONAP        | 2        | 
| SPLIR        | 1        | 
| GOURL        | 1        | 
| OCEAN        | 1        | 
| ERNSH        | 1        | 
| LILAS        | 1        | 
| SAVEA        | 1        | 
| QUICK        | 2        | 
| FOLKO        | 1        | 
| OTTIK        | 2        | 
| RATTC        | 3        | 
| LACOR        | 1        | 
| BSBEV        | 2        | 
| EASTC        | 2        | 
| SANTG        | 1        | 
| VICTE        | 2        | 
| FOLIG        | 1        | 
| REGGC        | 2        | 


#### Exercise 5

How many different and which products have been ordered by customers who have also ordered “Uncle Bob’s Organic Dried Pears”?

In [None]:
# TODO

#### Exercise 6

Of those products ordered by customers who have also ordered “Uncle Bob’s Organic Dried Pears”, which one has been ordered the most (by the same set of customers).

In [None]:
# TODO

#### Exercise 7

The customer with customerID ALFKI has bought a series of products. Determine which other customers have bought most of the same products.

In [None]:
# TODO