## Instacart Orders and USDA ERS Food Prices Cross Analysis

In [1]:
dataset_id = "reporting"

In [2]:
!bq --location=US mk --dataset {dataset_id}#Note: This will not work if you already have a dataset with this name

Too many positional args, still have ['will', 'not', 'work', 'if', 'you', 'already', 'have', 'a', 'dataset', 'with', 'this', 'name']


#### Query 1

By predicting the 2017 food prices based on linear regression, we are able to approximate how much Instcart users spent on each order. Then, we are able to calculate the average total for Instacart orders by day. Here we do this for the years 2004, 2010, and 2017. This allows us to see how prices were affected over the years. 

In [1]:
%%bigquery
(SELECT o.order_dow as day, p.year as year, AVG(p.Total) as average_price
FROM
(SELECT op.order_id, ap.year, SUM(ap.avg_price) as Total
FROM `responsive-cab-267123.instacart_modeled.Order_Products` op
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Map_Beam_DF` m ON op.product_id = m.product_id
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Market_Beam_DF` ap ON m.food_id = ap.food_id
GROUP BY op.order_id, ap.year) p
INNER JOIN `responsive-cab-267123.instacart_modeled.Orders` o ON p.order_id = o.order_id
WHERE p.year IN (2004,2010,2017)
GROUP BY o.order_dow, p.year
ORDER BY o.order_dow ASC)

Unnamed: 0,day,year,average_price
0,0,2017,6.646936
1,0,2004,4.711761
2,0,2010,5.446213
3,1,2004,4.289487
4,1,2017,6.072891
5,1,2010,4.970282
6,2,2004,4.015646
7,2,2010,4.648009
8,2,2017,5.673816
9,3,2004,3.9155


Creating view for query

In [2]:
%%bigquery
CREATE OR REPLACE VIEW reporting.v_Year_Day_Order_Totals AS
(SELECT o.order_dow as day, p.year as year, AVG(p.Total) as average_price
FROM
(SELECT op.order_id, ap.year, SUM(ap.avg_price) as Total
FROM `responsive-cab-267123.instacart_modeled.Order_Products` op
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Map_Beam_DF` m ON op.product_id = m.product_id
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Market_Beam_DF` ap ON m.food_id = ap.food_id
GROUP BY op.order_id, ap.year) p
INNER JOIN `responsive-cab-267123.instacart_modeled.Orders` o ON p.order_id = o.order_id
WHERE p.year IN (2004,2010,2017)
GROUP BY o.order_dow, p.year
ORDER BY o.order_dow ASC)

#### Query 2

We can further break down the previous query by looking at daily totals by hour.

In [3]:
%%bigquery
(SELECT o.order_dow as day, o.order_hour_of_day as hour, AVG(p.Total) as average_price
FROM
(SELECT op.order_id, SUM(ap.avg_price) as Total
FROM `responsive-cab-267123.instacart_modeled.Order_Products` op
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Map_Beam_DF` m ON op.product_id = m.product_id
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Market_Beam_DF` ap ON m.food_id = ap.food_id
WHERE ap.year = 2017
GROUP BY op.order_id, ap.year) p
INNER JOIN `responsive-cab-267123.instacart_modeled.Orders` o ON p.order_id = o.order_id
GROUP BY o.order_dow, o.order_hour_of_day
ORDER BY o.order_dow ASC)

Unnamed: 0,day,hour,average_price
0,0,13,6.678643
1,0,9,6.900512
2,0,17,6.413692
3,0,10,6.826878
4,0,16,6.555570
...,...,...,...
163,6,5,5.833285
164,6,6,6.275434
165,6,3,5.737900
166,6,2,5.829064


Creating view for query

In [5]:
%%bigquery
CREATE OR REPLACE VIEW reporting.v_Hour_Day_Order_Totals AS
(SELECT o.order_dow as day, o.order_hour_of_day as hour, AVG(p.Total) as average_price
FROM
(SELECT op.order_id, SUM(ap.avg_price) as Total
FROM `responsive-cab-267123.instacart_modeled.Order_Products` op
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Map_Beam_DF` m ON op.product_id = m.product_id
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Market_Beam_DF` ap ON m.food_id = ap.food_id
WHERE ap.year = 2017
GROUP BY op.order_id, ap.year) p
INNER JOIN `responsive-cab-267123.instacart_modeled.Orders` o ON p.order_id = o.order_id
GROUP BY o.order_dow, o.order_hour_of_day
ORDER BY o.order_dow ASC)

#### Query 3

On average, how long does it take users to order again and how much do they spend?

In [6]:
%%bigquery
(SELECT o.days_since_prior_order as days_since_prior_order, AVG(p.Total) as average_price
FROM
(SELECT op.order_id, SUM(ap.avg_price) as Total
FROM `responsive-cab-267123.instacart_modeled.Order_Products` op
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Map_Beam_DF` m ON op.product_id = m.product_id
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Market_Beam_DF` ap ON m.food_id = ap.food_id
WHERE ap.year = 2017
GROUP BY op.order_id, ap.year) p
INNER JOIN `responsive-cab-267123.instacart_modeled.Orders` o ON p.order_id = o.order_id
GROUP BY o.days_since_prior_order
ORDER BY days_since_prior_order ASC)

Unnamed: 0,days_since_prior_order,average_price
0,,6.002144
1,0.0,4.14498
2,1.0,3.907488
3,2.0,4.567422
4,3.0,5.240322
5,4.0,5.728883
6,5.0,6.092313
7,6.0,6.519899
8,7.0,6.810005
9,8.0,6.654097


Creating view for query

In [7]:
%%bigquery
CREATE OR REPLACE VIEW reporting.v_Prior_Order_Totals AS
(SELECT o.days_since_prior_order as days_since_prior_order, AVG(p.Total) as average_price
FROM
(SELECT op.order_id, SUM(ap.avg_price) as Total
FROM `responsive-cab-267123.instacart_modeled.Order_Products` op
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Map_Beam_DF` m ON op.product_id = m.product_id
INNER JOIN `responsive-cab-267123.USDA_ERS_modeled.Food_Market_Beam_DF` ap ON m.food_id = ap.food_id
WHERE ap.year = 2017
GROUP BY op.order_id, ap.year) p
INNER JOIN `responsive-cab-267123.instacart_modeled.Orders` o ON p.order_id = o.order_id
GROUP BY o.days_since_prior_order
ORDER BY days_since_prior_order ASC)