## Case Study #6 - Clique Bait

### Problem Statement

Clique Bait is not like your regular online seafood store - the founder and CEO Danny, was also a part of a digital data analytics team and wanted to expand his knowledge into the seafood industry!

In this case study - you are required to support Danny’s vision and analyse his dataset and come up with creative solutions to calculate funnel fallout rates for the Clique Bait online store.

### Data

For this case study there is a total of 5 datasets which you will need to combine to solve all of the questions.

#### Users

Customers who visit the Clique Bait website are tagged via their `cookie_id`.

![image](week6a.png)

#### Events

Customer visits are logged in this `events` table at a `cookie_id` level and the `event_type` and `page_id` values can be used to join onto relevant satellite tables to obtain further information about each event.

The `sequence_number` is used to order the events within each visit.

![image](week6b.png)

#### Event Identifier

The `event_identifier` table shows the types of events which are captured by Clique Bait’s digital data systems.

![image](week6c.png)

#### Campaign Identifier

This table shows information for the 3 campaigns that Clique Bait has ran on their website so far in 2020.

![image](week6d.png)

#### Page Hierarchy

This table lists all of the pages on the Clique Bait website which are tagged and have data passing through from user interaction events.

![image](week6e.png)

Import modules

In [8]:
# SQL Engine imports
from dotenv import load_dotenv
import os
import psycopg2
from sqlalchemy import create_engine
from sqlalchemy.sql import text
import warnings
warnings.filterwarnings("ignore")

# Python data analysis imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
pd.set_option('display.max_columns', None)

Initialize SQL

In [9]:
load_dotenv()
user = os.environ.get("USER")
pw = os.environ.get("PASS")
db = os.environ.get("DB")
host = os.environ.get("HOST")
api = os.environ.get("API")
port = 5432
schema = 'clique_bait'

In [10]:
uri = f"postgresql+psycopg2://{user}:{pw}@{host}:{port}/{db}"
alchemyEngine = create_engine(uri)
conn = alchemyEngine.connect()

Verify tables

In [11]:
rs = conn.execute(text(f"SELECT table_name FROM information_schema.tables WHERE table_schema='{schema}'"))
tables = [table[0] for table in rs.fetchall()]
print(f'The tables in the database are: \n- {'\n- '.join(tables)}')

The tables in the database are: 
- event_identifier
- campaign_identifier
- page_hierarchy
- users
- events


Fetch table information

In [12]:
for table in tables:
    print("=================================")
    print(f'Table [{table}]')
    df = pd.read_sql_query(f'SELECT * FROM {schema}.{table} LIMIT 5', conn)
    print(f'Dimensions: {df.shape[0]} rows x {df.shape[1]} columns\n')
    print(df.head())
    info_df = pd.DataFrame.from_dict({'Datatypes':df.dtypes, 'NULL count':df.isna().sum()})
    print()
    print(info_df)
    print()

Table [event_identifier]
Dimensions: 5 rows x 2 columns

   event_type     event_name
0           1      Page View
1           2    Add to Cart
2           3       Purchase
3           4  Ad Impression
4           5       Ad Click

           Datatypes  NULL count
event_type     int64           0
event_name    object           0

Table [campaign_identifier]
Dimensions: 3 rows x 5 columns

   campaign_id products                      campaign_name start_date  \
0            1      1-3    BOGOF - Fishing For Compliments 2020-01-01   
1            2      4-5      25% Off - Living The Lux Life 2020-01-15   
2            3      6-8  Half Off - Treat Your Shellf(ish) 2020-02-01   

    end_date  
0 2020-01-14  
1 2020-01-28  
2 2020-03-31  

                    Datatypes  NULL count
campaign_id             int64           0
products               object           0
campaign_name          object           0
start_date     datetime64[ns]           0
end_date       datetime64[ns]           0

T

In [13]:
def query(stmt: str):
    """Executes a given SQL statement and returns a Pandas DataFrame given the results.
    
    Parameters
    ----------
    stmt: str
        The SQL statement to be executed
    """
    global conn
    result = pd.read_sql_query(stmt, conn)
    return result

## Case Study Questions

The following case study questions include some general data exploration analysis for the nodes and transactions before diving right into the core business questions and finishes with a challenging final request!

**A. Enterprise Relationship Diagram**

Using the following DDL schema details to create an ERD for all the Clique Bait datasets.

```sql
CREATE TABLE clique_bait.event_identifier (
  "event_type" INTEGER,
  "event_name" VARCHAR(13)
);

CREATE TABLE clique_bait.campaign_identifier (
  "campaign_id" INTEGER,
  "products" VARCHAR(3),
  "campaign_name" VARCHAR(33),
  "start_date" TIMESTAMP,
  "end_date" TIMESTAMP
);

CREATE TABLE clique_bait.page_hierarchy (
  "page_id" INTEGER,
  "page_name" VARCHAR(14),
  "product_category" VARCHAR(9),
  "product_id" INTEGER
);

CREATE TABLE clique_bait.users (
  "user_id" INTEGER,
  "cookie_id" VARCHAR(6),
  "start_date" TIMESTAMP
);

CREATE TABLE clique_bait.events (
  "visit_id" VARCHAR(6),
  "cookie_id" VARCHAR(6),
  "page_id" INTEGER,
  "event_type" INTEGER,
  "sequence_number" INTEGER,
  "event_time" TIMESTAMP
);
```

**B. Digital Analysis**

Q2: How many users are there?

Q3: How many cookies does each user have on average?

Q4: What is the unique number of visits by all users per month?

Q5: What is the number of events for each event type?

Q6: What is the percentage of visits which have a purchase event?

Q7: What is the percentage of visits which view the checkout page but do not have a purchase event?

Q8: What are the top 3 pages by number of views?

Q9: What is the number of views and cart adds for each product category?

Q10: What are the top 3 products by purchases?

**C. Product Funnel Analysis**

Q11: Using a single SQL query - create a new output table which has the following details:

- How many times was each product viewed?
- How many times was each product added to cart?
- How many times was each product added to a cart but not purchased (abandoned)?
- How many times was each product purchased?

Q12: Continuing from Q11, create another table which further aggregates the data for the above points but this time for each product category instead of individual products.

Q13: Based on the new output tables earlier, which product had the most views, cart adds and purchases?

Q14: Based on the new output tables earlier, which product was most likely to be abandoned?

Q15: Based on the new output tables earlier, which product had the highest view to purchase percentage?

Q16: Based on the new output tables earlier, what is the average conversion rate from view to cart add?

Q17: Based on the output tables earlier, what is the average conversion rate from cart add to purchase?

**D. Campaigns Analysis**

Q18: Generate a table that has 1 single row for every unique `visit_id` record and has the following columns:

- `user_id`
- `visit_id`
- `visit_start_time`: the earliest `event_time` for each visit
- `page_views`: count of page views for each visit
- `cart_adds`: count of product cart add events for each visit
- `purchase`: 1/0 flag if a purchase event exists for each visit
- `campaign_name`: map the visit to a campaign if the `visit_start_time` falls between the `start_date` and `end_date`
- `impression`: count of ad impressions for each visit
- `click`: count of ad clicks for each visit
- (Optional column) `cart_products`: a comma separated text value with products added to the cart sorted by the order they were added to the cart (hint: use the `sequence_number`)

Q19: Use the subsequent dataset to generate at least 5 insights for the Clique Bait team that it can use for their management reporting sessions, be sure to emphasise the most important points from your findings.

Some ideas you might want to investigate further include:

- Identifying users who have received impressions during each campaign period and comparing each metric with other users who did not have an impression event
- Does clicking on an impression lead to higher purchase rates?
- What is the uplift in purchase rate when comparing users who click on a campaign impression versus users who do not receive an impression? What if we compare them with users who just an impression but do not click?
- What metrics can you use to quantify the success or failure of each campaign compared to eachother?

### Conclusion

This case study is based off my many years working with Digital datasets in consumer banking and retail supermarkets - all of the datasets are designed based off real datasets I’ve come across in challenging problem solving scenarios and the questions reflect similar problems which I worked on.

Campaign analysis is almost everywhere in the data world, especially in marketing, digital, UX and retail industries - and being able to analyse views, clicks and other digital behaviour is a critical skill to have in your toolbelt as a data professional!