# Cool TShirts

In this project we will be looking at how do visitors arrive at the website Cool TShirts (CTS) and what avenues we should focus on to increase traffic.

## Database Schema

The table `page_visits` in the database `cool-tshirts.db` has the following schema:

|   Column    | Description |
| :---: | :---: |
| user_id     | A unique identifier for each visitor to a page       |
| timestamp   | The time at which the visitor came to the page        |
| page_name     | The title of the section of the page that was visited       |
| timestamp   | The time at which the visitor came to the page        |
| utm_source    | Identifies which site sent the traffic (i.e., google, newsletter, or facebook_ad)       |
| utm_campaign   | Identifies the specific ad or email blast (i.e., june-21-newsletter or memorial-day-sale)    |

First lets set up our connection to the database.

In [1]:
import sqlite3

# connection to db
con = sqlite3.connect('cool-tshirts.db')
# cursor to perform commands
cur = con.cursor()

## Data Investigation

How many campaigns and sources does CTS use, and what are the sources for each campaign?

In [2]:
# number of campaigns
cur.execute('SELECT COUNT(DISTINCT utm_campaign) AS num_campaigns FROM page_visits;')
print(cur.fetchone())

# number of campaigns
cur.execute('SELECT COUNT(DISTINCT utm_source) AS num_sources FROM page_visits;')
print(cur.fetchone())

# number of campaigns
cur.execute('SELECT DISTINCT utm_campaign, utm_source FROM page_visits;')
print(cur.fetchall())

(8,)
(6,)
[('getting-to-know-cool-tshirts', 'nytimes'), ('weekly-newsletter', 'email'), ('ten-crazy-cool-tshirts-facts', 'buzzfeed'), ('retargetting-campaign', 'email'), ('retargetting-ad', 'facebook'), ('interview-with-cool-tshirts-founder', 'medium'), ('paid-search', 'google'), ('cool-tshirts-search', 'google')]


8 different campaigns, 6 sources, and the relation is:

| utm_campaign | utm_source |
| :---: | :---: |
|getting-to-know-cool-tshirts|	nytimes|
|weekly-newsletter|	email|
|ten-crazy-cool-tshirts-facts	|buzzfeed|
|retargetting-campaign	|email|
|retargetting-ad	|facebook|
|interview-with-cool-tshirts-founder|	medium|
|paid-search	|google|
|cool-tshirts-search|	google|

## What is the user journey?


### First Touches
Let's start looking into the user journey by seeing how many first touches each campaign is responsible for.

The query is a little complicated so the breakdown is:
- `first_touch`: all first touches
- `ft_attr`: the same set with sources and campaign columns added

then the data is grouped by `utm_source` and `utm_campaign` and counted.

In [3]:
cur.execute('''
WITH first_touch AS (
    SELECT user_id,
        MIN(timestamp) as first_touch_at
    FROM page_visits
    GROUP BY user_id),
ft_attr AS (
  SELECT ft.user_id,
         ft.first_touch_at,
         pv.utm_source,
         pv.utm_campaign
  FROM first_touch ft
  JOIN page_visits pv
    ON ft.user_id = pv.user_id
    AND ft.first_touch_at = pv.timestamp
)
SELECT ft_attr.utm_source,
       ft_attr.utm_campaign,
       COUNT(*)
FROM ft_attr
GROUP BY 1, 2
ORDER BY 3 DESC;
''')
print(cur.fetchall())

[('medium', 'interview-with-cool-tshirts-founder', 623), ('nytimes', 'getting-to-know-cool-tshirts', 615), ('buzzfeed', 'ten-crazy-cool-tshirts-facts', 577), ('google', 'cool-tshirts-search', 171)]



| utm_source|	utm_campaign|	COUNT(*)|
|:---:|:---:|:---:|
|medium	|interview-with-cool-tshirts-founder|	623|
|nytimes|	getting-to-know-cool-tshirts|	615|
|buzzfeed	|ten-crazy-cool-tshirts-facts	|577|
|google|	cool-tshirts-search|	171|

### Last Touches
Now let's look at the last touches for each campaign and source.

The query is a little complicated so the breakdown is:
- `first_touch`: all first touches
- `ft_attr`: the same set with sources and campaign columns added

then the data is grouped by `utm_source` and `utm_campaign` and counted.

In [4]:
cur.execute('''
WITH last_touch AS (
  SELECT user_id,
         MAX(timestamp) as last_touch_at
    FROM page_visits
    GROUP BY user_id),
lt_attr AS (
  SELECT lt.user_id,
         lt.last_touch_at,
         pv.utm_source,
         pv.utm_campaign,
         pv.page_name
  FROM last_touch lt
  JOIN page_visits pv
    ON lt.user_id = pv.user_id
    AND lt.last_touch_at = pv.timestamp
)
SELECT lt_attr.utm_source,
       lt_attr.utm_campaign,
       COUNT(*)
FROM lt_attr
GROUP BY 1, 2
ORDER BY 3 DESC;
''')
print(cur.fetchall())

[('facebook', 'retargetting-ad', 452), ('email', 'weekly-newsletter', 451), ('email', 'retargetting-campaign', 248), ('nytimes', 'getting-to-know-cool-tshirts', 233), ('buzzfeed', 'ten-crazy-cool-tshirts-facts', 192), ('medium', 'interview-with-cool-tshirts-founder', 185), ('google', 'paid-search', 181), ('google', 'cool-tshirts-search', 62)]


| utm_source|	utm_campaign|	COUNT(*)|
|:---:|:---:|:---:|
|facebook	|retargetting-ad	|443|
|email	|weekly-newsletter|	451|
|email|	retargetting-campaign|	248
|nytimes	|getting-to-know-cool-tshirts|	233|
|buzzfeed|	ten-crazy-cool-tshirts-facts	|192|
|medium	|interview-with-cool-tshirts-founder|	185|
|google|	paid-search	|181|
|google	|cool-tshirts-search|	62|

## How many visitors from each campaign make a purchase?

We can see which campaign is responsible for each purchase at CTS by using last touch and the `4 - purchase` page.

In [5]:
cur.execute('''
WITH last_touch AS (
  SELECT user_id,
         MAX(timestamp) as last_touch_at
    FROM page_visits
    WHERE page_name = '4 - purchase'
    GROUP BY user_id),
lt_attr AS (
  SELECT lt.user_id,
         lt.last_touch_at,
         pv.utm_source,
         pv.utm_campaign,
         pv.page_name
  FROM last_touch lt
  JOIN page_visits pv
    ON lt.user_id = pv.user_id
    AND lt.last_touch_at = pv.timestamp
)
SELECT lt_attr.utm_source,
       lt_attr.utm_campaign,
       COUNT(*)
FROM lt_attr
GROUP BY 1, 2
ORDER BY 3 DESC;
''')
print(cur.fetchall())

[('facebook', 'retargetting-ad', 122), ('email', 'weekly-newsletter', 119), ('email', 'retargetting-campaign', 57), ('google', 'paid-search', 55), ('buzzfeed', 'ten-crazy-cool-tshirts-facts', 10), ('nytimes', 'getting-to-know-cool-tshirts', 10), ('medium', 'interview-with-cool-tshirts-founder', 8), ('google', 'cool-tshirts-search', 3)]


|utm_source	|utm_campaign	|COUNT(*)|
|:---:|:---:|:---:|
|email|	weekly-newsletter|	115|
|facebook	|retargetting-ad	|113|
|email|	retargetting-campaign|	54|
|google|	paid-search	|52|
|buzzfeed	|ten-crazy-cool-tshirts-facts	|9|
|nytimes|	getting-to-know-cool-tshirts|	9|
|medium	|interview-with-cool-tshirts-founder|	7|
|google	|cool-tshirts-search	|2|

## Conclusion

We can see that when it comes time to reinvest in advertising CTS should focus most on (in order of importance):
- The weekly newletter
- Facebook retargetting ads
- The email retargetting campaign
- Google search ads

These avenues led to the most purchases from the site and were responsible for the vast majority of the revenue.