## Visualizing Instacart Data

## QUERIES

#### Query 1


From this query, we wanted to extract the most "loyal" customers (i.e. users that have used instacart the most). 
The query shows that there are a lot of loyal customers. 

In [16]:
%%bigquery
SELECT user_id, COUNT(order_id) AS frequency
FROM instacart_modeled.Orders
GROUP BY user_id
ORDER BY frequency DESC
LIMIT 1375

Unnamed: 0,user_id,frequency
0,17997,100
1,6398,100
2,41509,100
3,9437,100
4,53488,100
...,...,...
1370,66600,100
1371,58853,100
1372,81342,100
1373,199280,100


#### Query 2

Following the same logic as the previous query, exactly 1,374 customers have ordered a total of 100 times, 99 customers a total of 47 times, etc.

In [17]:
%%bigquery
SELECT frequency AS frequency_ordered, COUNT(frequency) AS total_customers
FROM (select user_id, COUNT(order_id) AS frequency
FROM instacart_modeled.Orders
GROUP BY user_id)
GROUP BY frequency_ordered
ORDER BY frequency_ordered DESC

Unnamed: 0,frequency_ordered,total_customers
0,100,1374
1,99,47
2,98,50
3,97,54
4,96,67
...,...,...
92,8,11700
93,7,13850
94,6,16165
95,5,19590


We see the max order frequency is 100.

In [18]:
%%bigquery
SELECT MAX(frequency_ordered) AS max_order_freq
FROM
    (SELECT frequency AS frequency_ordered, COUNT(frequency) AS total_customers
    FROM 
        (SELECT user_id, COUNT(order_id) AS frequency
        FROM instacart_modeled.Orders
        GROUP BY user_id)
    GROUP BY frequency_ordered
    ORDER BY frequency_ordered DESC)

Unnamed: 0,max_order_freq
0,100


#### Query 3

Analyze the orders of the most "loyal" customers and get the top 40 most popular items

In [1]:
%%bigquery
SELECT p.product_name AS Product, count(1) AS Frequency
FROM
  (SELECT os.order_id
  FROM
    (SELECT user_id, COUNT(order_id) as frequency
    FROM instacart_modeled.Orders
    GROUP BY user_id
    HAVING(frequency=100)) us
  INNER JOIN instacart_modeled.Orders os
  ON us.user_id=os.user_id) o
INNER JOIN instacart_modeled.Order_Products op
ON o.order_id=op.order_id
INNER JOIN instacart_modeled.Products p
ON op.product_id=p.product_id
GROUP BY p.product_name
ORDER BY frequency desc
LIMIT 40

Unnamed: 0,Product,Frequency
0,Bag of Organic Bananas,17805
1,Banana,16574
2,Organic Strawberries,13247
3,Organic Hass Avocado,10715
4,Organic Baby Spinach,9654
5,Organic Raspberries,7524
6,Organic Whole Milk,6996
7,Limes,5633
8,Organic Yellow Onion,5360
9,Organic Avocado,4828


Checking the query above by looking at the occurrence of bananas.

In [2]:
%%bigquery
SELECT COUNT(1) AS frequency
FROM
(SELECT os.order_id
FROM
  (SELECT user_id, COUNT(order_id) AS frequency
  FROM instacart_modeled.Orders
  GROUP BY user_id
  HAVING(frequency=100)) us
INNER JOIN instacart_modeled.Orders os
ON us.user_id=os.user_id) o
INNER JOIN instacart_modeled.Order_Products op
ON o.order_id=op.order_id
INNER JOIN instacart_modeled.Products p
ON op.product_id=p.product_id
WHERE p.product_name='Banana'
ORDER BY frequency desc

Unnamed: 0,frequency
0,16574


#### Query 4

This query would be useful to stores because it details which days of the week involve the most item purchases. Here, we see that the most items are purchased on Sundays.

In [5]:
%%bigquery
SELECT os.order_dow, count(distinct op.product_id) as Total_Purchases
  FROM
    (SELECT order_id
    FROM instacart_modeled.Orders
    GROUP BY order_id) us
  RIGHT OUTER JOIN instacart_modeled.Orders_Beam_DF os
  ON us.order_id=os.order_id
RIGHT OUTER JOIN instacart_modeled.Order_Products op
ON us.order_id=op.order_id
RIGHT OUTER JOIN instacart_modeled.Products p
ON op.product_id=p.product_id
WHERE os.order_dow IS NOT NULL
GROUP BY os.order_dow
ORDER BY Total_Purchases desc

Unnamed: 0,order_dow,Total_Purchases
0,sunday,46429
1,monday,46081
2,saturday,45931
3,friday,45723
4,tuesday,45375
5,wednesday,45258
6,thursday,45159


#### Query 5

This query would be useful to the Instacart company because it details which days of the week involve the most orders. Here, we see that the most orders are placed on Sundays, which means that Sunday is the hottest time of the week to offer deals for customers.

In [9]:
%%bigquery
SELECT os.order_dow, count(distinct op.order_id) as Total_Orders
  FROM
    (SELECT order_id
    FROM instacart_modeled.Orders
    GROUP BY order_id) us
  RIGHT OUTER JOIN instacart_modeled.Orders_Beam_DF os
  ON us.order_id=os.order_id
RIGHT OUTER JOIN instacart_modeled.Order_Products op
ON us.order_id=op.order_id
RIGHT OUTER JOIN instacart_modeled.Products p
ON op.product_id=p.product_id
WHERE os.order_dow IS NOT NULL
GROUP BY os.order_dow
ORDER BY Total_Orders desc

Unnamed: 0,order_dow,Total_Orders
0,sunday,585237
1,monday,576377
2,tuesday,458074
3,friday,443388
4,saturday,437749
5,wednesday,428087
6,thursday,417171


#### Query 6

This query describes which of the most popular items are sold 7 days a week. 

In [15]:
%%bigquery
SELECT p.product_name, opb.frequency, count(distinct os.order_dow) as No_days_sold
  FROM
    (SELECT order_id
    FROM instacart_modeled.Orders
    GROUP BY order_id) us
  RIGHT OUTER JOIN instacart_modeled.Orders_Beam_DF os
  ON us.order_id=os.order_id
RIGHT OUTER JOIN instacart_modeled.Order_Products op
ON us.order_id=op.order_id
RIGHT OUTER JOIN instacart_modeled.Products p
ON op.product_id=p.product_id
RIGHT OUTER JOIN instacart_modeled.Order_Products_Beam_DF opb
ON opb.product_id  = p.product_id
GROUP BY p.product_name, opb.frequency
HAVING (No_days_sold = 7)
ORDER BY opb.frequency desc
limit 12

Unnamed: 0,product_name,frequency,No_days_sold
0,Banana,491291,7
1,Bag of Organic Bananas,394930,7
2,Organic Strawberries,275577,7
3,Organic Baby Spinach,251705,7
4,Organic Hass Avocado,220877,7
5,Organic Avocado,184224,7
6,Large Lemon,160792,7
7,Strawberries,149445,7
8,Limes,146660,7
9,Organic Whole Milk,142813,7


### VIEWS

#### Query 3 VIEW
Analyze the orders of the most "loyal" customers and get the top 10 most popular items

In [3]:
%%bigquery
CREATE OR REPLACE VIEW instacart_modeled.v_Loyal_Customers_Top_10_Products AS
    (SELECT p.product_name AS Product, count(1) AS Frequency
    FROM
      (SELECT os.order_id
      FROM
        (SELECT user_id, COUNT(order_id) as frequency
        FROM `responsive-cab-267123.instacart_modeled.Orders`
        GROUP BY user_id
        HAVING(frequency=100)) us
      INNER JOIN `responsive-cab-267123.instacart_modeled.Orders` os
      ON us.user_id=os.user_id) o
    INNER JOIN `responsive-cab-267123.instacart_modeled.Order_Products` op
    ON o.order_id=op.order_id
    INNER JOIN `responsive-cab-267123.instacart_modeled.Products` p
    ON op.product_id=p.product_id
    GROUP BY p.product_name
    ORDER BY frequency DESC
    LIMIT 10)

Checking correct making of view

In [4]:
%%bigquery
SELECT * 
FROM instacart_modeled.v_Loyal_Customers_Top_10_Products
ORDER BY frequency DESC
LIMIT 5

Unnamed: 0,Product,Frequency
0,Bag of Organic Bananas,17805
1,Banana,16574
2,Organic Strawberries,13247
3,Organic Hass Avocado,10715
4,Organic Baby Spinach,9654


#### Query 5 VIEW
Analyze the orders of the most popular Weekday orders by day

In [25]:
%%bigquery
CREATE OR REPLACE VIEW instacart_modeled.v_Weekday_Orders AS
   ( SELECT os.order_dow, count(distinct op.order_id) as Total_Orders
  FROM
    (SELECT order_id
    FROM `responsive-cab-267123.instacart_modeled.Orders`
    GROUP BY order_id) us
  RIGHT OUTER JOIN `responsive-cab-267123.instacart_modeled.Orders_Beam_DF` os
  ON us.order_id=os.order_id
RIGHT OUTER JOIN `responsive-cab-267123.instacart_modeled.Order_Products` op
ON us.order_id=op.order_id
RIGHT OUTER JOIN `responsive-cab-267123.instacart_modeled.Products` p
ON op.product_id=p.product_id
WHERE os.order_dow IS NOT NULL
GROUP BY os.order_dow
ORDER BY Total_Orders desc)

Checking correct making of view

In [None]:
%%bigquery
SELECT * 
FROM instacart_modeled.v_Weekday_Orders
ORDER BY Total_Orders DESC
LIMIT 5

Executing query with job ID: 5747c117-d19d-4f96-ba62-fb9bb1cce328
Query executing: 26.34s

# MILESTONE 8

Write 4 queries with at least 1 subquery per query.
* At least 1 query must use an aggregate function.
* At least 2 queries must use a JOIN clause.
* At least 2 queries must use a WHERE clause.
* At least 2 queries must use an ORDER BY clause

#### QUERY 1

Finding product id's that are less popular (less than the average frequency for all items)

In [4]:
%%bigquery
SELECT DISTINCT bp.product_id
FROM instacart_modeled.Order_Products_Beam_DF bp
WHERE bp.frequency <
    (SELECT AVG(frequency)
    FROM instacart_modeled.Order_Products_Beam_DF)

Unnamed: 0,product_id
0,3641
1,26536
2,9905
3,33629
4,31746
...,...
42769,3761
42770,29124
42771,18285
42772,23916


#### QUERY 2

Finding top 40 most popular items on the most active day(day with most orders), Sunday.

In [54]:
%%bigquery
SELECT p.product_name, pf.frequency
FROM instacart_modeled.Products p
INNER JOIN 
(SELECT op.product_id, COUNT(*) as frequency
FROM instacart_modeled.Order_Products op
INNER JOIN instacart_modeled.Orders_Beam_DF o
ON op.order_id=o.order_id
WHERE o.order_dow = 
    (SELECT n.order_dow
    FROM
        (SELECT o.order_dow, COUNT(*) as num_orders
        FROM instacart_modeled.Orders_Beam_DF o 
        GROUP BY o.order_dow) n
    WHERE n.num_orders = 
        (SELECT MAX(n.num_orders)
         FROM 
             (SELECT o.order_dow, COUNT(*) as num_orders
              FROM instacart_modeled.Orders_Beam_DF o
              GROUP BY o.order_dow) n))
GROUP BY op.product_id
ORDER BY frequency DESC
LIMIT 40) pf
ON p.product_id=pf.product_id
ORDER BY pf.frequency DESC

Unnamed: 0,product_name,frequency
0,Banana,101474
1,Bag of Organic Bananas,75052
2,Organic Baby Spinach,57556
3,Organic Strawberries,56635
4,Organic Hass Avocado,45841
5,Organic Avocado,41877
6,Large Lemon,35826
7,Limes,31878
8,Strawberries,29046
9,Organic Raspberries,27756


#### QUERY 3

This query returns a list of the users that have orders just under the average size. Further investigation could later be done to see whether these customers should be sent promotions or other incentives so that they can use the service much more and be in the "average" order size category.

In [2]:
%%bigquery
select user_id, count(*) as freq
FROM instacart_modeled.Orders o
INNER JOIN instacart_modeled.Order_Products op
ON op.order_id=o.order_id
WHERE o.order_id = op.order_id
group by o.user_id
having freq < 
  (select avg(order_size) 
  from (SELECT COUNT(product_id) as order_size
      FROM instacart_modeled.Orders o
      INNER JOIN instacart_modeled.Order_Products op
      ON op.order_id=o.order_id
      WHERE o.order_id = op.order_id
      group by op.order_id))
order by freq desc
limit 20



Unnamed: 0,user_id,freq
0,44100,10
1,4854,10
2,148508,10
3,56742,10
4,183648,10
5,48387,10
6,25577,10
7,61811,10
8,153820,10
9,73655,10


#### QUERY 4

This query finds out how many units of the most popular item are in the average purchase. In this case, the most popular item was bananas, and the average purchase included around 10 bananas.

In [12]:
%%bigquery
select avg(num_of_bananas) as average_count_of_most_popular_item from (select op.order_id, count(*) as num_of_bananas
from instacart_modeled.Order_Products op
inner join instacart_modeled.Products p
on op.product_id = p.product_id
where p.product_name = (select product_name from (select max(frequency) from (SELECT product_name, COUNT(p.product_id) as frequency
      FROM instacart_modeled.Orders o
      INNER JOIN instacart_modeled.Order_Products op
      ON op.order_id=o.order_id
      inner join instacart_modeled.Products p
      on op.product_id = p.product_id
      group by p.product_name)))
 group by order_id)
 

Unnamed: 0,average_count_of_most_popular_item
0,10.107073


##View for Query 2

Create view to show the most popular items in Sunday orders (Sunday is the most popular day for instacart users).

In [14]:
%%bigquery
CREATE OR REPLACE VIEW instacart_modeled.v_Sunday_Most_Popular_Products AS
   (SELECT p.product_name, pf.frequency
FROM responsive-cab-267123.instacart_modeled.Products p
INNER JOIN 
(SELECT op.product_id, COUNT(*) as frequency
FROM responsive-cab-267123.instacart_modeled.Order_Products op
INNER JOIN responsive-cab-267123.instacart_modeled.Orders_Beam_DF o
ON op.order_id=o.order_id
WHERE o.order_dow = 
    (SELECT n.order_dow
    FROM
        (SELECT o.order_dow, COUNT(*) as num_orders
        FROM responsive-cab-267123.instacart_modeled.Orders_Beam_DF o 
        GROUP BY o.order_dow) n
    WHERE n.num_orders = 
        (SELECT MAX(n.num_orders)
         FROM 
             (SELECT o.order_dow, COUNT(*) as num_orders
              FROM responsive-cab-267123.instacart_modeled.Orders_Beam_DF o
              GROUP BY o.order_dow) n))
GROUP BY op.product_id
ORDER BY frequency DESC
LIMIT 10) pf
ON p.product_id=pf.product_id
ORDER BY pf.frequency DESC)

In [16]:
%%bigquery
SELECT * 
FROM instacart_modeled.v_Sunday_Most_Popular_Products
LIMIT 5

Unnamed: 0,product_name,frequency
0,Banana,101474
1,Bag of Organic Bananas,75052
2,Organic Baby Spinach,57556
3,Organic Strawberries,56635
4,Organic Hass Avocado,45841
