# Case Study #6: Clique Bait
The case study questions presented here are created by [**Data With Danny**](https://linktr.ee/datawithdanny). They are part of the [**8 Week SQL Challenge**](https://8weeksqlchallenge.com/).

My SQL queries are written in the `PostgreSQL 15` dialect, integrated into `Jupyter Notebook`, which allows us to instantly view the query results and document the queries.

For more details about the **Case Study #6**, click [**here**](https://8weeksqlchallenge.com/case-study-6/).

## Table of Contents

### [1. Importing Libraries](#Import)

### [2. Tables of the Database](#Tables)

### [3. Case Study Questions](#CaseStudyQuestions)

- [A. Enterprise Relationship Diagram](#A)
- [B. Digital Analysis](#B)
- [C. Product Funnel Analysis](#C)
- [D. Campaigns Analysis](#D)

___
<a id = 'Import'></a>
## 1. Importing Libraries

In [1]:
import psycopg2 as pg2
import pandas as pd
import os
import warnings

warnings.filterwarnings("ignore")

### Connecting PostgreSQL database from Jupyter Notebook

In [2]:
# Get my PostgreSQL password
mypassword = os.getenv("POSTGRESQL_PASSWORD")

# Connecting to database
conn = pg2.connect(user = 'postgres', password = mypassword, database = 'clique_bait')
cursor = conn.cursor()

___
<a id = 'Tables'></a>
## 2. Tables of the Database

First, let's verify if the connected database contains the 5 dataset names. 

In [3]:
cursor.execute("""
SELECT table_schema, table_name
FROM information_schema.tables
WHERE table_schema = 'clique_bait'
""")

table_names = []
print('--- Tables within "data_mart" database --- ')
for table in cursor:
    print(table[1])
    table_names.append(table[1])

--- Tables within "data_mart" database --- 
event_identifier
campaign_identifier
page_hierarchy
users
events


#### Here are the 5 datasets of the "clique_bait" database. For more details about each dataset, please click [here](https://8weeksqlchallenge.com/case-study-6/).

In [4]:
for table in table_names:
    print("\nTable: ", table)
    display(pd.read_sql("SELECT * FROM clique_bait." + table, conn))


Table:  event_identifier


Unnamed: 0,event_type,event_name
0,1,Page View
1,2,Add to Cart
2,3,Purchase
3,4,Ad Impression
4,5,Ad Click



Table:  campaign_identifier


Unnamed: 0,campaign_id,products,campaign_name,start_date,end_date
0,1,1-3,BOGOF - Fishing For Compliments,2020-01-01,2020-01-14
1,2,4-5,25% Off - Living The Lux Life,2020-01-15,2020-01-28
2,3,6-8,Half Off - Treat Your Shellf(ish),2020-02-01,2020-03-31



Table:  page_hierarchy


Unnamed: 0,page_id,page_name,product_category,product_id
0,1,Home Page,,
1,2,All Products,,
2,3,Salmon,Fish,1.0
3,4,Kingfish,Fish,2.0
4,5,Tuna,Fish,3.0
5,6,Russian Caviar,Luxury,4.0
6,7,Black Truffle,Luxury,5.0
7,8,Abalone,Shellfish,6.0
8,9,Lobster,Shellfish,7.0
9,10,Crab,Shellfish,8.0



Table:  users


Unnamed: 0,user_id,cookie_id,start_date
0,1,c4ca42,2020-02-04
1,2,c81e72,2020-01-18
2,3,eccbc8,2020-02-21
3,4,a87ff6,2020-02-22
4,5,e4da3b,2020-02-01
...,...,...,...
1777,25,46dd2f,2020-03-29
1778,94,59511b,2020-03-22
1779,49,d345a8,2020-02-23
1780,211,a26e03,2020-02-20



Table:  events


Unnamed: 0,visit_id,cookie_id,page_id,event_type,sequence_number,event_time
0,ccf365,c4ca42,1,1,1,2020-02-04 19:16:09.182546
1,ccf365,c4ca42,2,1,2,2020-02-04 19:16:17.358191
2,ccf365,c4ca42,6,1,3,2020-02-04 19:16:58.454669
3,ccf365,c4ca42,9,1,4,2020-02-04 19:16:58.609142
4,ccf365,c4ca42,9,2,5,2020-02-04 19:17:51.729420
...,...,...,...,...,...,...
32729,355a6a,87a4ba,10,1,15,2020-03-18 22:44:16.541396
32730,355a6a,87a4ba,11,1,16,2020-03-18 22:44:18.900830
32731,355a6a,87a4ba,11,2,17,2020-03-18 22:45:12.670472
32732,355a6a,87a4ba,12,1,18,2020-03-18 22:45:54.081818


___
<a id = 'CaseStudyQuestions'></a>
## 3. Case Study Questions

<a id = 'A'></a>
## A. Enterprise Relationship Diagram
Using the following DDL schema details to create an ERD for all the Clique Bait datasets.</br>
[Click here](https://dbdiagram.io/) to access the DB Diagram tool to create the ERD.
</br>
</br>
</br>
**ANSWER**</br>
Here is the code that I used for creating the **ERD** for all the Clique Bait datasets on [DB Diagram tool](https://dbdiagram.io/). 

```
TABLE event_identifier {
  "event_type" INTEGER
  "event_name" VARCHAR(13)
}

TABLE campaign_identifier {
  "campaign_id" INTEGER
  "products" VARCHAR(3)
  "campaign_name" VARCHAR(33)
  "start_date" TIMESTAMP
  "end_date" TIMESTAMP
}

TABLE page_hierarchy {
  "page_id" INTEGER
  "page_name" VARCHAR(14)
  "product_category" VARCHAR(9)
  "product_id" INTEGER
}

TABLE users {
  "user_id" INTEGER
  "cookie_id" VARCHAR(6)
  "start_date" TIMESTAMP
}

TABLE events {
  "visit_id" VARCHAR(6)
  "cookie_id" VARCHAR(6)
  "page_id" INTEGER
  "event_type" INTEGER
  "sequence_number" INTEGER
  "event_time" TIMESTAMP
}

// Establish connections or references between datasets
Ref: "events"."event_type" > "event_identifier"."event_type"
Ref: "events"."page_id" > "page_hierarchy"."page_id"
Ref: "events"."cookie_id" > "users"."cookie_id"
```

**Result**
![CliqueBait_ERD](images/CliqueBait_ERD.png)

___
<a id = 'B'></a>
## B. Digital Analysis

#### 1. How many users are there?

In [5]:
pd.read_sql("""
SELECT COUNT(DISTINCT user_id) AS nb_users
FROM clique_bait.users;
""", conn)

Unnamed: 0,nb_users
0,500


___
#### 2. How many cookies does each user have on average?

In [6]:
pd.read_sql("""
SELECT ROUND(AVG(nb_cookie_ids))::INTEGER AS avg_cookies_per_user
FROM
(
    SELECT DISTINCT user_id, COUNT(cookie_id) AS nb_cookie_ids
    FROM clique_bait.users
    GROUP BY user_id
) nb_cookies_per_user;
""", conn)

Unnamed: 0,avg_cookies_per_user
0,4


___
#### 3. What is the unique number of visits by all users per month?

In [7]:
pd.read_sql("""
SELECT 
    DATE_PART('month', u.start_date)::INTEGER AS month, 
    TO_CHAR(u.start_date,'Month') AS month_name,
    COUNT(DISTINCT e.visit_id) AS nb_visits
FROM clique_bait.users u
JOIN clique_bait.events e ON u.cookie_id = e.cookie_id
GROUP BY month, month_name
ORDER BY month
""", conn)

Unnamed: 0,month,month_name,nb_visits
0,1,January,876
1,2,February,1488
2,3,March,916
3,4,April,248
4,5,May,36


___
#### 4. What is the number of events for each event type?

In [8]:
pd.read_sql("""
SELECT 
    ei.event_name, 
    COUNT(*) AS nb_events
FROM clique_bait.events e
JOIN clique_bait. event_identifier ei ON e.event_type = ei.event_type
GROUP BY ei.event_name
ORDER BY nb_events DESC
""", conn)

Unnamed: 0,event_name,nb_events
0,Page View,20928
1,Add to Cart,8451
2,Purchase,1777
3,Ad Impression,876
4,Ad Click,702


___
#### 5. What is the percentage of visits which have a purchase event?

In [9]:
pd.read_sql("""
SELECT 
    COUNT(*) AS nb_purchase_event,
    (SELECT COUNT(DISTINCT visit_id) FROM clique_bait.events) AS total_nb_visits,
    CONCAT(ROUND(COUNT(*)/(SELECT COUNT(DISTINCT visit_id) FROM clique_bait.events)::NUMERIC * 100, 1), ' %') AS purchase_percent
FROM clique_bait.events e
JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
WHERE ei.event_name = 'Purchase'
""", conn)

Unnamed: 0,nb_purchase_event,total_nb_visits,purchase_percent
0,1777,3564,49.9 %


___
#### 6. What is the percentage of visits which view the checkout page but do not have a purchase event?

In [10]:
pd.read_sql("""
SELECT 
    (nb_checkouts - nb_purchases) AS nb_checkouts_without_purchase,
    (SELECT COUNT(DISTINCT visit_id) FROM clique_bait.events) AS total_visits,
    ROUND((nb_checkouts - nb_purchases)/(SELECT COUNT(DISTINCT visit_id) FROM clique_bait.events)::NUMERIC * 100,2) AS percent
FROM
(
    SELECT 
        SUM(CASE WHEN ph.page_name = 'Checkout' AND ei.event_name = 'Page View' THEN 1 ELSE 0 END) AS nb_checkouts,
        SUM(CASE WHEN ei.event_name = 'Purchase' THEN 1 ELSE 0 END) AS nb_purchases
    FROM clique_bait.events e 
    JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
    JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
) c
""", conn)

Unnamed: 0,nb_checkouts_without_purchase,total_visits,percent
0,326,3564,9.15


___
#### 7. What are the top 3 pages by number of views?

In [11]:
pd.read_sql("""
SELECT 
    ph.page_name, 
    COUNT(*) AS nb_views
FROM clique_bait.events e
JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
WHERE ei.event_name = 'Page View'
GROUP BY ph.page_name
ORDER BY nb_views DESC
LIMIT 3
""", conn)

Unnamed: 0,page_name,nb_views
0,All Products,3174
1,Checkout,2103
2,Home Page,1782


___
#### 8. What is the number of views and cart adds for each product category?

In [12]:
pd.read_sql("""
SELECT 
    ph.product_category, 
    SUM(CASE WHEN event_name = 'Page View' THEN 1 ELSE 0 END) AS nb_views,
    SUM(CASE WHEN event_name = 'Add to Cart' THEN 1 ELSE 0 END) AS nb_card_adds
FROM clique_bait.events e
JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
WHERE ph.product_category IS NOT NULL
GROUP BY ph.product_category
ORDER BY nb_views DESC
""", conn)

Unnamed: 0,product_category,nb_views,nb_card_adds
0,Shellfish,6204,3792
1,Fish,4633,2789
2,Luxury,3032,1870


___
#### 9. What are the top 3 products by purchases?

In [13]:
pd.read_sql("""
WITH visitID_with_purchases_cte AS
(
    -- Retrieve visit IDS that have made purchases
    
    SELECT e.visit_id
    FROM clique_bait.events e
    JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
    WHERE ei.event_name = 'Purchase'
)
SELECT 
    ph.page_name as product, 
    COUNT(*) AS nb_purchases
FROM visitID_with_purchases_cte cte
JOIN clique_bait.events e ON cte.visit_id = e.visit_id
JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
WHERE ph.product_category IS NOT NULL 
AND ei.event_name = 'Add to Cart'
GROUP BY ph.page_name 
ORDER BY nb_purchases DESC
LIMIT 3
""", conn)

Unnamed: 0,product,nb_purchases
0,Lobster,754
1,Oyster,726
2,Crab,719


Alternatively, we can utilize the Window Function `LAST_VALUE() OVER (PARTITION BY... ORDER BY...)` to extract the most recent event recorded during each visit on the Clique Bait website. The **Purchase** event typically represents the final action taken by users in each visit. Therefore, by applying the `LAST_VALUE()` in the `add_last_event_cte`, we can subsequently filter and identify the products that have indeed been purchased, as indicated in the following statement:</br> `WHERE event_name = 'Add to Cart' AND last_event = 'Purchase'`.

In [14]:
pd.read_sql("""
WITH add_last_event_cte AS
(
    SELECT         
        e.visit_id,
        e.sequence_number,
        ph.page_name, 
        ph.product_category,
        ei.event_name,
        LAST_VALUE(ei.event_name) OVER (PARTITION BY e.visit_id ORDER BY e.sequence_number ROWS BETWEEN UNBOUNDED preceding AND UNBOUNDED following) AS last_event
    FROM clique_bait.events e 
    JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
    JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
)
SELECT 
    page_name, 
    COUNT(event_name) AS nb_purchases
FROM add_last_event_cte
WHERE product_category IS NOT NULL AND event_name = 'Add to Cart' AND last_event = 'Purchase'
GROUP BY page_name
ORDER BY nb_purchases DESC
""", conn)

Unnamed: 0,page_name,nb_purchases
0,Lobster,754
1,Oyster,726
2,Crab,719
3,Salmon,711
4,Black Truffle,707
5,Kingfish,707
6,Abalone,699
7,Russian Caviar,697
8,Tuna,697


___
<a id = 'C'></a>
## C. Product Funnel Analysis
Using a single SQL query - create a new output table which has the following details:

- How many times was each product viewed?
- How many times was each product added to cart?
- How many times was each product added to a cart but not purchased (abandoned)?
- How many times was each product purchased?

Additionally, create another table which further aggregates the data for the above points but this time **for each product category** instead of individual products.

### Table 1: Aggregate the data by products

In [15]:
# Create table
cursor.execute("DROP TABLE IF EXISTS clique_bait.products;")
cursor.execute("""
CREATE TABLE clique_bait.products
(
    "product" VARCHAR(255),
    "nb_views" INTEGER,
    "nb_cart_adds" INTEGER,
    "nb_abandoned" INTEGER,
    "nb_purchases" INTEGER
);
""")


# Populate the table
cursor.execute("""
INSERT INTO clique_bait.products
WITH add_last_event_cte AS
(
    SELECT 
        e.visit_id,
        e.sequence_number,
        ph.page_name, 
        ph.product_category,
        ei.event_name,
        LAST_VALUE(ei.event_name) OVER (PARTITION BY e.visit_id ORDER BY e.sequence_number ROWS BETWEEN UNBOUNDED preceding AND UNBOUNDED following) AS last_event 
    FROM clique_bait.events e 
    JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
    JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
)
SELECT 
    page_name AS product, 
    SUM(CASE WHEN event_name = 'Page View' THEN 1 ELSE 0 END) AS nb_views,
    SUM(CASE WHEN event_name = 'Add to Cart' THEN 1 ELSE 0 END) AS nb_cart_adds,
    SUM(CASE WHEN event_name = 'Add to Cart' AND last_event != 'Purchase' THEN 1 ELSE 0 END) AS nb_abandoned,
    SUM(CASE WHEN event_name = 'Add to Cart' AND last_event = 'Purchase' THEN 1 ELSE 0 END) AS nb_purchases
FROM add_last_event_cte
WHERE product_category IS NOT NULL 
GROUP BY page_name
""")


# Saving
conn.commit()

**Result**

In [16]:
pd.read_sql("""SELECT * FROM clique_bait.products""", conn)

Unnamed: 0,product,nb_views,nb_cart_adds,nb_abandoned,nb_purchases
0,Abalone,1525,932,233,699
1,Oyster,1568,943,217,726
2,Salmon,1559,938,227,711
3,Crab,1564,949,230,719
4,Tuna,1515,931,234,697
5,Lobster,1547,968,214,754
6,Kingfish,1559,920,213,707
7,Russian Caviar,1563,946,249,697
8,Black Truffle,1469,924,217,707


### Table 2: Aggregate the data by product categories

In [17]:
# Create the table
cursor.execute("DROP TABLE IF EXISTS clique_bait.product_category;")
cursor.execute("""
CREATE TABLE clique_bait.product_category
(
    "product" VARCHAR(255),
    "nb_views" INTEGER,
    "nb_cart_adds" INTEGER,
    "nb_abandoned" INTEGER,
    "nb_purchases" INTEGER
);
""")


# Populate the table
cursor.execute("""
INSERT INTO clique_bait.product_category
WITH add_last_event_cte AS
(
    SELECT 
        e.visit_id,
        e.sequence_number, 
        ph.product_category,
        ei.event_name,
        LAST_VALUE(ei.event_name) OVER (PARTITION BY e.visit_id ORDER BY e.sequence_number ROWS BETWEEN UNBOUNDED preceding AND UNBOUNDED following) AS last_event 
    FROM clique_bait.events e 
    JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
    JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
)
SELECT 
    product_category, 
    SUM(CASE WHEN event_name = 'Page View' THEN 1 ELSE 0 END) AS nb_views,
    SUM(CASE WHEN event_name = 'Add to Cart' THEN 1 ELSE 0 END) AS nb_cart_adds,
    SUM(CASE WHEN event_name = 'Add to Cart' AND last_event != 'Purchase' THEN 1 ELSE 0 END) AS nb_abandoned,
    SUM(CASE WHEN event_name = 'Add to Cart' AND last_event = 'Purchase' THEN 1 ELSE 0 END) AS nb_purchases
FROM add_last_event_cte
WHERE product_category IS NOT NULL 
GROUP BY product_category

""")

# Saving
conn.commit()

**Result**

In [18]:
pd.read_sql("SELECT * FROM clique_bait.product_category", conn)

Unnamed: 0,product,nb_views,nb_cart_adds,nb_abandoned,nb_purchases
0,Luxury,3032,1870,466,1404
1,Shellfish,6204,3792,894,2898
2,Fish,4633,2789,674,2115


___
### Use your 2 new output tables - answer the following questions:

#### 1. Which product had the most views, cart adds and purchases?

In [19]:
pd.read_sql("""
SELECT 
    product, 
    nb_views, 
    nb_cart_adds, 
    nb_purchases
FROM clique_bait.products
ORDER BY nb_views DESC, nb_cart_adds DESC, nb_purchases DESC
LIMIT 1;
""", conn)

Unnamed: 0,product,nb_views,nb_cart_adds,nb_purchases
0,Oyster,1568,943,726


The output above shows that **Oyster** is the product that had the most views, cart adds and purchases.</br>
Alternatively, we could break down the data for each individual product, which reveals that:

- **Oyster** is the most viewed product.
- **Lobster** is the most added product to cart and the most purchased product

In [20]:
pd.read_sql("""
WITH max_cte AS 
(
    SELECT
     *,
     CASE WHEN MAX(nb_views) OVER (ORDER BY nb_views DESC) = nb_views THEN product ELSE ''  END AS most_viewed,
     CASE WHEN MAX(nb_cart_adds) OVER (ORDER BY nb_cart_adds DESC) = nb_cart_adds THEN product ELSE ''  END AS most_cart_added,
     CASE WHEN MAX(nb_purchases) OVER (ORDER BY nb_purchases DESC) = nb_purchases THEN product ELSE ''  END AS most_purchased
    FROM clique_bait.products
)
SELECT most_viewed, most_cart_added, most_purchased
FROM max_cte
WHERE most_viewed != '' OR most_cart_added !='' OR most_purchased !=''
""", conn)

Unnamed: 0,most_viewed,most_cart_added,most_purchased
0,Oyster,,
1,,Lobster,Lobster


___
#### 2. Which product was most likely to be abandoned?

In [21]:
pd.read_sql("""
SELECT product, nb_abandoned
FROM clique_bait.products
ORDER BY nb_abandoned DESC
LIMIT 1;
""", conn)

Unnamed: 0,product,nb_abandoned
0,Russian Caviar,249


___
#### 3. Which product had the highest view to purchase percentage?

In [22]:
pd.read_sql("""
SELECT 
    product, 
    nb_views, 
    nb_purchases, 
    CONCAT(ROUND(nb_purchases/nb_views::NUMERIC * 100,2), ' %') AS percent
FROM clique_bait.products
ORDER BY percent DESC
LIMIT 1;
""", conn)

Unnamed: 0,product,nb_views,nb_purchases,percent
0,Lobster,1547,754,48.74 %


___
#### 4. What is the average conversion rate from view to cart add?

In [23]:
pd.read_sql("""
SELECT 
    CONCAT(ROUND(AVG(nb_cart_adds/nb_views::NUMERIC * 100),2), ' %') AS conversion_rate
FROM clique_bait.products
""", conn)

Unnamed: 0,conversion_rate
0,60.95 %


___
#### 5. What is the average conversion rate from cart add to purchase?

In [24]:
pd.read_sql("""
SELECT 
    CONCAT(ROUND(AVG(nb_purchases/nb_cart_adds::NUMERIC * 100),2), ' %') AS conversion_rate
FROM clique_bait.products
""", conn)

Unnamed: 0,conversion_rate
0,75.93 %


___
<a id = 'D'></a>
## D. Campaigns Analysis
Generate a table that has 1 single row for every unique `visit_id` record and has the following columns:

- `user_id`
- `visit_id`
- `visit_start_time`: the earliest event_time for each visit
- `page_views`: count of page views for each visit
- `cart_adds`: count of product cart add events for each visit
- `purchase`: 1/0 flag if a purchase event exists for each visit
- `campaign_name`: map the visit to a campaign if the visit_start_time falls between the start_date and end_date
- `impression`: count of ad impressions for each visit
- `click`: count of ad clicks for each visit
- `cart_products`**(Optional column)**: a comma separated text value with products added to the cart sorted by the order they were added to the cart (hint: use the `sequence_number`)

Use the subsequent dataset to generate at least 5 insights for the Clique Bait team - bonus: prepare a single A4 infographic that the team can use for their management reporting sessions, be sure to emphasise the most important points from your findings.

Some ideas you might want to investigate further include:

- Identifying users who have received impressions during each campaign period and comparing each metric with other users who did not have an impression event
- Does clicking on an impression lead to higher purchase rates?
- What is the uplift in purchase rate when comparing users who click on a campaign impression versus users who do not receive an impression? What if we compare them with users who just an impression but do not click?
- What metrics can you use to quantify the success or failure of each campaign compared to eachother?

In [25]:
# Create table
cursor.execute("DROP TABLE IF EXISTS clique_bait.campaign_analysis;")
cursor.execute("""
CREATE TABLE clique_bait.campaign_analysis
(
    "user_id" INTEGER,
    "visit_id" VARCHAR(6),
    "visit_start_time" TIMESTAMP,
    "page_views" INTEGER,
    "cart_adds" INTEGER,
    "purchases" INTEGER,
    "impressions" INTEGER,
    "clicks" INTEGER,
    "campaign_name" VARCHAR(33),
    "cart_products" VARCHAR(255)
);
""")


# Populate the table
cursor.execute("""
INSERT INTO clique_bait.campaign_analysis
WITH cart_products_cte AS
(
    -- Generate a sequence of products added to the cart  
    
    SELECT    
        u.user_id, 
        e.visit_id,
        STRING_AGG(ph.page_name, ', ' ORDER BY sequence_number) AS cart_products
    FROM clique_bait.users u
    JOIN clique_bait.events e ON u.cookie_id = e.cookie_id
    JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
    JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
    WHERE ei.event_name = 'Add to Cart'
    GROUP BY u.user_id, e.visit_id
)

SELECT 
    u.user_id, 
    e.visit_id, 
    MIN(e.event_time) AS visit_start_time,
    SUM(CASE WHEN ei.event_name = 'Page View' THEN 1 ELSE 0 END) AS page_views,
    SUM(CASE WHEN ei.event_name = 'Add to Cart' THEN 1 ELSE 0 END) AS cart_adds,
    SUM(CASE WHEN ei.event_name = 'Purchase' THEN 1 ELSE 0 END) AS purchases,
    SUM(CASE WHEN ei.event_name = 'Ad Impression' THEN 1 ELSE 0 END) AS impressions,
    SUM(CASE WHEN ei.event_name = 'Ad Click' THEN 1 ELSE 0 END) AS clicks,
    CASE WHEN MIN(e.event_time) BETWEEN ci.start_date AND ci.end_date THEN ci.campaign_name ELSE '' END AS campaign_name,
    CASE WHEN cp.cart_products IS NULL THEN '' ELSE cp.cart_products END AS cart_products
   
FROM clique_bait.users u
JOIN clique_bait.events e ON u.cookie_id = e.cookie_id
JOIN clique_bait.page_hierarchy ph ON e.page_id = ph.page_id
JOIN clique_bait.event_identifier ei ON e.event_type = ei.event_type
JOIN clique_bait.campaign_identifier ci ON e.event_time BETWEEN ci.start_date AND ci.end_date
LEFT JOIN cart_products_cte cp ON cp.user_id = u.user_id AND cp.visit_id = e.visit_id
GROUP BY u.user_id, e.visit_id, ci.start_date, ci.end_date, ci.campaign_name, cp.cart_products
ORDER BY u.user_id, e.visit_id;
""")


# Saving the updates
conn.commit()

**Result**</br>
Here is the table listing the first 5 users only.

In [26]:
pd.read_sql("""
SELECT * 
FROM clique_bait.campaign_analysis
WHERE user_id < 6
""", conn)

Unnamed: 0,user_id,visit_id,visit_start_time,page_views,cart_adds,purchases,impressions,clicks,campaign_name,cart_products
0,1,02a5d5,2020-02-26 16:57:26.260871,4,0,0,0,0,Half Off - Treat Your Shellf(ish),
1,1,0826dc,2020-02-26 05:58:37.918618,1,0,0,0,0,Half Off - Treat Your Shellf(ish),
2,1,0fc437,2020-02-04 17:49:49.602976,10,6,1,1,1,Half Off - Treat Your Shellf(ish),"Tuna, Russian Caviar, Black Truffle, Abalone, ..."
3,1,30b94d,2020-03-15 13:12:54.023936,9,7,1,1,1,Half Off - Treat Your Shellf(ish),"Salmon, Kingfish, Tuna, Russian Caviar, Abalon..."
4,1,41355d,2020-03-25 00:11:17.860655,6,1,0,0,0,Half Off - Treat Your Shellf(ish),Lobster
5,1,ccf365,2020-02-04 19:16:09.182546,7,3,1,0,0,Half Off - Treat Your Shellf(ish),"Lobster, Crab, Oyster"
6,1,eaffde,2020-03-25 20:06:32.342989,10,8,1,1,1,Half Off - Treat Your Shellf(ish),"Salmon, Tuna, Russian Caviar, Black Truffle, A..."
7,1,f7c798,2020-03-15 02:23:26.312543,9,3,1,0,0,Half Off - Treat Your Shellf(ish),"Russian Caviar, Crab, Oyster"
8,2,0635fb,2020-02-16 06:42:42.735730,9,4,1,0,0,Half Off - Treat Your Shellf(ish),"Salmon, Kingfish, Abalone, Crab"
9,2,1f1198,2020-02-01 21:51:55.078775,1,0,0,0,0,Half Off - Treat Your Shellf(ish),


### further investigation ... 
### Does clicking on an impression lead to higher purchase rates?


Among users who received an impression, the purchase rate is significantly higher for those who clicked on the advertisement compared to those who did not click, accounting for a purchase rate of 71.89% vs 13.12%.

In [27]:
pd.read_sql("""
SELECT 
    ROUND(SUM(CASE WHEN clicks = 1 THEN purchases ELSE 0 END)/COUNT(visit_id)::NUMERIC * 100, 2) AS clicked_purchase_rate,
    ROUND(SUM(CASE WHEN clicks = 0 THEN purchases ELSE 0 END)/COUNT(visit_id)::NUMERIC * 100, 2) AS no_clicked_purchase_rate
FROM clique_bait.campaign_analysis
WHERE impressions = 1 
""", conn)

Unnamed: 0,clicked_purchase_rate,no_clicked_purchase_rate
0,71.89,13.12


___
### What is the uplift in purchase rate when comparing users who click on a campaign impression versus users who do not receive an impression? What if we compare them with users who just an impression but do not click?


Among all users who visited the Clique Bait website:
- The purchase rate is higher for users who do not receive an impression, with a rate of 28.6%.
- Clicking on a campaign impression leads to a lower purchase rate of 17.6% compared to the previous statement.
- However, even if users receive campaign impressions, not clicking on them results in the lowest purchase rate of only 3.2%.

In [28]:
pd.read_sql("""
SELECT 
    ROUND(SUM(CASE WHEN impressions = 1 AND clicks = 1 THEN purchases ELSE 0 END)/COUNT(visit_id)::NUMERIC * 100, 1) AS clicked_purchase_rate,
    ROUND(SUM(CASE WHEN impressions = 1 AND clicks = 0 THEN purchases ELSE 0 END)/COUNT(visit_id)::NUMERIC * 100, 1) AS no_clicked_purchase_rate,
    ROUND(SUM(CASE WHEN impressions = 0 AND clicks = 0 THEN purchases ELSE 0 END)/COUNT(visit_id)::NUMERIC * 100, 1) AS no_impression_purchase_rate
FROM clique_bait.campaign_analysis
""", conn)

Unnamed: 0,clicked_purchase_rate,no_clicked_purchase_rate,no_impression_purchase_rate
0,17.6,3.2,28.6


In [29]:
conn.close()