# 3. Product Metric Deep Dive

**Context:**

- Over the past several days, our startup experienced two significant events that impacted user metrics. 
- First, marketing launched a major advertising campaign that brought a substantial number of new users to the application. 
- Second, we observed a sharp, unexpected drop in active audience on a specific day, requiring immediate investigation.

**Objectives:**

To conduct a deep dive analysis addressing three critical business questions:
1. Analyze retention patterns of users acquired through the advertising campaign to understand their long-term engagement and value
2. Investigate the sudden audience drop by identifying users who experienced issues with the news feed and determining common characteristics among them
3. Build weekly audience segmentation to track user lifecycle and identify trends in new, retained, and churned users
   
**Data Sources:**

- `feed_actions` - News feed activity
- `message_actions` - Messaging activity 

**Key Achievements:**

- **Retention Analysis Completed:** 
  - Conducted comprehensive cohort analysis of advertising campaign users versus organic users
- **User Segmentation Identified:** 
  - Segmented and analyzed users who experienced news feed access issues
- **Root Cause Investigation:** 
  - Identified common characteristics and potential technical factors behind the audience drop
- **User Lifecycle Tracking Implemented:** 
  - Built ClickHouse query and Superset visualization for monitoring New, Retained, and Churned user cohorts
- **Data Products Delivered:** 
  - Created Superset dashboards for ongoing monitoring of retention patterns and user engagement issues
- **Goal Achieved:** 
  - Provided data-driven insights that enabled marketing to optimize campaign strategies and engineering to address critical technical issues
  
**Key Findings:**

- Advertising Campaign Analysis
  - Campaign successfully attracted large volume of new users, significantly exceeding previous acquisition rates
  - Initial engagement metrics showed no significant difference from organic users
  - Critical Issue: Day-1 retention rate for campaign users was substantially lower than historical Friday cohorts

- Audience Drop Investigation
  - Significant drop in feed engagement occurred on August 24th
  - Affected Areas: Moscow, St. Petersburg, Yekaterinburg, Novosibirsk showed concentrated impact
  - Isolated Issue: Messenger functionality remained unaffected in these cities
  
**Key Recommendation:**  

- Implement more targeted advertising based on analysis of existing active user profiles rather than broad audience targeting
- Investigate technical infrastructure in affected regions and implement redundancy measures to prevent future service disruptions

**Business Impact:**

- Provided actionable insights to optimize user acquisition strategies and improve product experience based on actual user behavior patterns.

## 3.1 Data Description

In the product database on ClickHouse, the data is stored in the following tables

Table feed_actions

Field | Description
-|-
user_id | User ID
post_id | Post ID
action | Action: view or like
time | Timestamp
gender | User's gender
age | User's age (1 = Male)
os | User's OS
source | Traffic source
country | User's country
city | User's city
exp_group | A/B test group

Table message_actions

Field | Description
-|-
user_id | Sender's ID
receiver_id | Receiver's ID
time | Send timestamp
gender | Sender's gender
age | Sender's age (1 = Male)
os | Sender's OS
source | Sender's traffic source
country | Sender's country
city | Sender's city
exp_group | Sender's A/B test group

## 3.2 Marketing Campaign Analysis

### 3.2.1 SQL Queries

We will write SQL queries to the product database to build the charts.

#### 3.2.1.1 Hypothesis Validation

**Any Service New Users By Source**

In [None]:
SELECT 
  user_id 
  , toDate(min(time)) as first_date
  , any(source) as source
  , 'New in Feed' as service 
FROM 
  feed_actions
GROUP BY 
  user_id 
UNION ALL
SELECT 
  user_id 
  , toDate(min(time)) as first_date
  , any(source) as source
  , 'New in Messenger' as service
FROM 
  message_actions
GROUP BY 
  user_id

**Feed New Users By Source**

In [None]:
SELECT 
  user_id 
  , toDate(min(time)) as first_date
  , any(source) as source
  , 'New in Feed' as service 
FROM 
  feed_actions
GROUP BY 
  user_id 
UNION ALL
SELECT 
  user_id 
  , toDate(min(time)) as first_date
  , any(source) as source
  , 'New in Messenger' as service
FROM 
  message_actions
GROUP BY 
  user_id

**Messenger New Users By Source**

In [None]:
SELECT 
  user_id 
  , toDate(min(time)) as first_date
  , any(source) as source
  , 'New in Feed' as service 
FROM 
  feed_actions
GROUP BY 
  user_id 
UNION ALL
SELECT 
  user_id 
  , toDate(min(time)) as first_date
  , any(source) as source
  , 'New in Messenger' as service
FROM 
  message_actions
GROUP BY 
  user_id

#### 3.2.1.2 Profile of Advertising Campaign Users

In [None]:
SELECT 
  user_id 
  , post_id
  , action
  , os as os
  , age as age
  , multiIf(gender = 1, 'female', gender = 0, 'male', 'unknown') as gender  
  , multiIf(f.gender = 0, 'female', f.gender = 1, 'male', 'unknown') as dummy_gender  
  , multiIf(
      age < 25, ' <25'
      , age between 25 and 40, '25-40' 
      , age between 41 and 55, '41-55'
      , age >= 55, '55+'
      , 'unknown'
  ) as age_group  
  , country as country
  , city as city
FROM 
  feed_actions f
WHERE 
  toDate(time) = '2025-08-15'
  and source = 'ads'

#### 3.2.1.3 Metrics on the First Day

**Metrics Compare With Previous Days**

In [None]:
WITH feed as (
  SELECT 
    user_id
    , toDate(time) as date
    , post_id
    , action
  FROM 
    feed_actions
  WHERE 
    date <= '2025-08-15'
    and source = 'ads'
    and toDayOfWeek(date) = 5    
)
, first_activity AS (
    SELECT 
        user_id, 
        min(toDate(time)) as first_date
    FROM feed_actions
    WHERE source = 'ads'
      AND toDate(time) <= '2025-08-15'
    GROUP BY user_id 
),
daily_metrics AS (
    SELECT 
        fa.first_date as date,
        fa.first_date = '2025-08-15' as is_target_day,
        uniq(f.user_id) as users,
        uniq(f.post_id) as posts,
        countIf(f.action = 'view') as views,
        countIf(f.action = 'like') as likes,
        likes / nullIf(views, 0) as ctr,
        posts / users as posts_per_user,
        views / users as views_per_user,
        likes / users as likes_per_user
    FROM
      feed f
    INNER JOIN first_activity fa ON f.user_id = fa.user_id 
                                AND f.date = fa.first_date
    GROUP BY fa.first_date
)
SELECT 
    if(is_target_day, 'Target Day', 'Previous Fridays') as period,
    count() as days_count,
    median(users) as users,
    median(posts) as posts,
    median(views) as views,
    median(likes) as likes,
    median(ctr) as ctr,
    median(posts_per_user) as posts_per_user,
    median(views_per_user) as views_per_user,
    median(likes_per_user) as likes_per_user
FROM daily_metrics
GROUP BY is_target_day

**Metrics  Distribution**

In [None]:
WITH feed as (
  SELECT 
    user_id
    , toDate(time) as date
    , post_id
    , action
  FROM 
    feed_actions
  WHERE 
    date <= '2025-08-15'
    and source = 'ads'
    and toDayOfWeek(date) = 5    
)
, first_activity AS (
    SELECT 
        user_id, 
        min(toDate(time)) as first_date
    FROM feed_actions
    WHERE source = 'ads'
      AND toDate(time) <= '2025-08-15'
    GROUP BY user_id 
)
SELECT 
    user_id,
    toString(fa.first_date) as first_date,
    uniq(f.post_id) as posts,
    countIf(f.action = 'view') as views,
    countIf(f.action = 'like') as likes,
    likes / nullIf(views, 0) as ctr
FROM
  feed f
  INNER JOIN first_activity fa ON f.user_id = fa.user_id 
                            AND f.date = fa.first_date
GROUP BY fa.first_date, f.user_id

#### 3.2.1.4 Retention Analysis

**Retention Heatmap**

In [None]:
WITH first_activity AS (
    SELECT 
        user_id, 
        min(toDate(time)) as first_date
    FROM feed_actions
    WHERE 
      source = 'ads'
      and toDate(time) <= '2025-08-15'
    GROUP BY user_id 
    HAVING toDayOfWeek(first_date) = 5 
)
, cohorts as (
  SELECT 
    toDate(time) - first_date as lifetime
    , toString(first_date) as cohort
    , uniq(user_id) as users
    , countIf(action = 'view') as views
    , countIf(action = 'like') as likes
    , likes / nullIf(countIf(action = 'view'), 0) as ctr
  FROM 
    feed_actions f
    JOIN first_activity as fa using(user_id)
  GROUP BY 
    cohort
    , lifetime
)
SELECT 
  c.cohort
  , c.lifetime
  , c.users
  , c.ctr
  , c.views / nullIf(c0.users, 0) as views_per_cohort_user
  , c.likes / nullIf(c0.users, 0) as likes_per_cohort_user
  , c.views / nullIf(c.users, 0) as views_per_user
  , c.likes / nullIf(c.users, 0) as likes_per_user  
  , c.users / nullIf(c0.users, 0) as retention
FROM 
  cohorts c
  JOIN cohorts c0 ON c.cohort = c0.cohort and c0.lifetime = 0
WHERE 
  c.lifetime <= toDate(now()) - toDate('2025-08-15')

**Retention Compare with Previous**

In [None]:
WITH first_activity AS (
    SELECT 
        user_id, 
        min(toDate(time)) as first_date
    FROM feed_actions
    WHERE 
      source = 'ads'
      and toDate(time) <= '2025-08-15'
    GROUP BY user_id 
    HAVING toDayOfWeek(first_date) = 5 
)
, cohorts as (
  SELECT 
    toDate(time) - first_date as lifetime
    , toString(first_date) as cohort
    , uniq(user_id) as users
    , countIf(action = 'view') as views
    , countIf(action = 'like') as likes
    , likes / nullIf(countIf(action = 'view'), 0) as ctr
  FROM 
    feed_actions f
    JOIN first_activity as fa using(user_id)
  GROUP BY 
    cohort
    , lifetime
)
, cohort_metrics as (
  SELECT 
    c.cohort
    , c.lifetime
    , c.cohort = '2025-08-15' as is_target_cohort
    , c.users
    , c.ctr
    , c.views / nullIf(c0.users, 0) as views_per_cohort_user
    , c.likes / nullIf(c0.users, 0) as likes_per_cohort_user
    , c.views / nullIf(c.users, 0) as views_per_user
    , c.likes / nullIf(c.users, 0) as likes_per_user  
    , c.users / nullIf(c0.users, 0) as retention
  FROM 
    cohorts c
    JOIN cohorts c0 ON c.cohort = c0.cohort and c0.lifetime = 0
  WHERE 
    c.lifetime <= toDate(now()) - toDate('2025-08-15')    
)
SELECT 
  if(is_target_cohort, 'Target', 'Prev Fridays') as cohort_type
  , toDate('2000-01-01') + interval lifetime year as lifetime
  , count() as cohorts_count
  , median(users) as median_users
  , median(ctr) as median_ctr
  , median(views_per_user) as median_views_per_user
  , median(likes_per_user) as median_likes_per_user
  , median(views_per_cohort_user) as median_views_per_cohort_user
  , median(likes_per_cohort_user) as median_likes_per_cohort_user  
  , median(retention) as median_retention
FROM 
  cohort_metrics
GROUP BY 
  cohort_type
  , lifetime

**Rolling Retention Compare with Previous**

In [None]:
WITH first_activity AS (
    SELECT 
        user_id, 
        min(toDate(time)) as first_date
    FROM feed_actions
    WHERE 
      source = 'ads'
      and toDate(time) <= '2025-08-15'
    GROUP BY user_id 
    HAVING toDayOfWeek(first_date) = 5 
)
, cohorts as (
  SELECT 
    f.user_id
    , fa.first_date as cohort
    , uniq(user_id) over (partition by cohort) as cohort_size
    , max(toDate(time) - fa.first_date) over (partition by user_id) as max_lifetime
  FROM 
    feed_actions f
    JOIN first_activity fa USING(user_id)
  WHERE 
    toDate(time) - fa.first_date <= 14
)
, cohort_metrics as (
  SELECT 
    c.cohort
    , al.lifetime
    , c.cohort = '2025-08-15' as is_target_cohort
    , any(c.cohort_size) as cohort_size
    , uniqIf(c.user_id, al.lifetime <= c.max_lifetime) as rolling_users
    , rolling_users / cohort_size as rolling_retention_14_days
  FROM
    cohorts c
    CROSS JOIN 
      (SELECT arrayJoin(range(15)) as lifetime) al
  GROUP BY 
    c.cohort
    , al.lifetime
  ORDER BY 
    c.cohort
    , al.lifetime
)
SELECT 
  if(is_target_cohort, 'Target', 'Prev Fridays') as cohort_type
  , toDate('2000-01-01') + interval lifetime year as lifetime
  , count() as cohorts_count
  , median(rolling_retention_14_days) as median_rolling_retention_14_days
FROM 
  cohort_metrics
GROUP BY 
  cohort_type
  , lifetime

**Retention Compare with Previous by Gender**

In [None]:
WITH first_activity AS (
    SELECT 
        user_id, 
        any(gender) as gender,
        min(toDate(time)) as first_date
    FROM feed_actions
    WHERE 
      source = 'ads'
      and toDate(time) <= '2025-08-15'
    GROUP BY user_id 
    HAVING toDayOfWeek(first_date) = 5 
)
, cohorts as (
  SELECT 
    toDate(time) - first_date as lifetime
    , toString(first_date) as cohort
    , multiIf(gender = 1, 'female', gender = 0, 'male', 'unknown') as gender  
    , uniq(user_id) as users
  FROM 
    feed_actions f
    JOIN first_activity as fa using(user_id)
  GROUP BY 
    cohort
    , lifetime
    , gender
)
, cohort_metrics as (
  SELECT 
    c.cohort
    , c.lifetime
    , c.gender
    , c.cohort = '2025-08-15' as is_target_cohort
    , c.users
    , c.users / nullIf(c0.users, 0) as retention
  FROM 
    cohorts c
    JOIN cohorts c0 ON c.cohort = c0.cohort 
    and c.gender = c0.gender
    and c0.lifetime = 0
  WHERE 
    lifetime <= toDate(now()) - toDate('2025-08-15')  
)
SELECT 
  if(is_target_cohort, 'Target', 'Prev Fridays') as cohort_type
  , toDate('2000-01-01') + interval lifetime year as lifetime
  , gender
  , count() as cohorts_count
  , median(retention) as median_retention
FROM 
  cohort_metrics
GROUP BY 
  cohort_type
  , lifetime
  , gender

**Retention Compare with Previous by OS**

In [None]:
WITH first_activity AS (
    SELECT 
        user_id, 
        any(os) as os,
        min(toDate(time)) as first_date
    FROM feed_actions
    WHERE 
      source = 'ads'
      and toDate(time) <= '2025-08-15'
    GROUP BY user_id 
    HAVING toDayOfWeek(first_date) = 5 
)
, cohorts as (
  SELECT 
    toDate(time) - first_date as lifetime
    , toString(first_date) as cohort
    , os
    , uniq(user_id) as users
  FROM 
    feed_actions f
    JOIN first_activity as fa using(user_id)
  GROUP BY 
    cohort
    , lifetime
    , os
)
, cohort_metrics as (
  SELECT 
    c.cohort
    , c.lifetime
    , c.os
    , c.cohort = '2025-08-15' as is_target_cohort
    , c.users
    , c.users / nullIf(c0.users, 0) as retention
  FROM 
    cohorts c
    JOIN cohorts c0 ON c.cohort = c0.cohort 
      and c.os = c0.os
      and c0.lifetime = 0
  WHERE 
    c.lifetime <= toDate(now()) - toDate('2025-08-15')    
)
SELECT 
  if(is_target_cohort, 'Target', 'Prev Fridays') as cohort_type
  , toDate('2000-01-01') + interval lifetime year as lifetime
  , os
  , count() as cohorts_count
  , median(retention) as median_retention
FROM 
  cohort_metrics
GROUP BY 
  cohort_type
  , lifetime
  , os

**Retention Compare with Previous by Top Countries**

In [None]:
WITH top_countries as(
  SELECT 
    country 
  FROM 
    feed_actions
  GROUP BY 
    country
  ORDER BY 
    uniq(user_id) DESC
  LIMIT 3
)
, first_activity AS (
    SELECT 
        user_id, 
        any(country) as country,
        min(toDate(time)) as first_date
    FROM feed_actions
    WHERE 
      source = 'ads'
      and toDate(time) <= '2025-08-15'
    GROUP BY user_id 
    HAVING 
      toDayOfWeek(first_date) = 5 
      and country in (select * from top_countries)
)
, cohorts as (
  SELECT 
    toDate(time) - first_date as lifetime
    , toString(first_date) as cohort
    , country
    , uniq(user_id) as users
  FROM 
    feed_actions f
    JOIN first_activity as fa using(user_id)
  GROUP BY 
    cohort
    , lifetime
    , country
)
, cohort_metrics as (
  SELECT 
    c.cohort
    , c.lifetime
    , c.country
    , c.cohort = '2025-08-15' as is_target_cohort
    , c.users
    , c.users / nullIf(c0.users, 0) as retention
  FROM 
    cohorts c
    JOIN cohorts c0 ON c.cohort = c0.cohort 
      and c.country = c0.country
      and c0.lifetime = 0
  WHERE 
    c.lifetime <= toDate(now()) - toDate('2025-08-15')    
)
SELECT 
  if(is_target_cohort, 'Target', 'Prev Fridays') as cohort_type
  , toDate('2000-01-01') + interval lifetime year as lifetime
  , country
  , count() as cohorts_count
  , median(retention) as median_retention
FROM 
  cohort_metrics
GROUP BY 
  cohort_type
  , lifetime
  , country

**Retention Compare with Previous by Top Cities**

In [None]:
WITH top_cities as(
  SELECT 
    city 
  FROM 
    feed_actions
  GROUP BY 
    city
  ORDER BY 
    uniq(user_id) DESC
  LIMIT 3
)
, first_activity AS (
    SELECT 
        user_id, 
        any(city) as city,
        min(toDate(time)) as first_date
    FROM feed_actions
    WHERE 
      source = 'ads'
      and toDate(time) <= '2025-08-15'
    GROUP BY user_id 
    HAVING 
      toDayOfWeek(first_date) = 5 
      and city in (select * from top_cities)
)
, cohorts as (
  SELECT 
    toDate(time) - first_date as lifetime
    , toString(first_date) as cohort
    , city
    , uniq(user_id) as users
  FROM 
    feed_actions f
    JOIN first_activity as fa using(user_id)
  GROUP BY 
    cohort
    , lifetime
    , city
)
, cohort_metrics as (
  SELECT 
    c.cohort
    , c.lifetime
    , c.city
    , c.cohort = '2025-08-15' as is_target_cohort
    , c.users
    , c.users / nullIf(c0.users, 0) as retention
  FROM 
    cohorts c
    JOIN cohorts c0 ON c.cohort = c0.cohort 
      and c.city = c0.city
      and c0.lifetime = 0
  WHERE 
    c.lifetime <= toDate(now()) - toDate('2025-08-15')    
)
SELECT 
  if(is_target_cohort, 'Target', 'Prev Fridays') as cohort_type
  , toDate('2000-01-01') + interval lifetime year as lifetime
  , city
  , count() as cohorts_count
  , median(retention) as median_retention
FROM 
  cohort_metrics
GROUP BY 
  cohort_type
  , lifetime
  , city

#### 3.2.1.5 Cohort Analysis of Other Metrics

**Metrics Compare with Previous**

In [None]:
WITH first_activity AS (
    SELECT 
        user_id, 
        min(toDate(time)) as first_date
    FROM feed_actions
    WHERE 
      source = 'ads'
      and toDate(time) <= '2025-08-15'
    GROUP BY user_id 
    HAVING toDayOfWeek(first_date) = 5 
)
, cohorts as (
  SELECT 
    toDate(time) - first_date as lifetime
    , toString(first_date) as cohort
    , uniq(user_id) as users
    , countIf(action = 'view') as views
    , countIf(action = 'like') as likes
    , likes / nullIf(countIf(action = 'view'), 0) as ctr
  FROM 
    feed_actions f
    JOIN first_activity as fa using(user_id)
  GROUP BY 
    cohort
    , lifetime
)
, cohort_metrics as (
  SELECT 
    c.cohort
    , c.lifetime
    , c.cohort = '2025-08-15' as is_target_cohort
    , c.users
    , c.ctr
    , c.views / nullIf(c0.users, 0) as views_per_cohort_user
    , c.likes / nullIf(c0.users, 0) as likes_per_cohort_user
    , c.views / nullIf(c.users, 0) as views_per_user
    , c.likes / nullIf(c.users, 0) as likes_per_user  
    , c.users / nullIf(c0.users, 0) as retention
  FROM 
    cohorts c
    JOIN cohorts c0 ON c.cohort = c0.cohort and c0.lifetime = 0
  WHERE 
    c.lifetime <= toDate(now()) - toDate('2025-08-15')    
)
SELECT 
  if(is_target_cohort, 'Target', 'Prev Fridays') as cohort_type
  , toDate('2000-01-01') + interval lifetime year as lifetime
  , count() as cohorts_count
  , median(users) as median_users
  , median(ctr) as median_ctr
  , median(views_per_user) as median_views_per_user
  , median(likes_per_user) as median_likes_per_user
  , median(views_per_cohort_user) as median_views_per_cohort_user
  , median(likes_per_cohort_user) as median_likes_per_cohort_user  
  , median(retention) as median_retention
FROM 
  cohort_metrics
GROUP BY 
  cohort_type
  , lifetime

**Metrics Heatmap**

In [None]:
WITH first_activity AS (
    SELECT 
        user_id, 
        min(toDate(time)) as first_date
    FROM feed_actions
    WHERE 
      source = 'ads'
      and toDate(time) <= '2025-08-15'
    GROUP BY user_id 
    HAVING toDayOfWeek(first_date) = 5 
)
, cohorts as (
  SELECT 
    toDate(time) - first_date as lifetime
    , toString(first_date) as cohort
    , uniq(user_id) as users
    , countIf(action = 'view') as views
    , countIf(action = 'like') as likes
    , likes / nullIf(countIf(action = 'view'), 0) as ctr
  FROM 
    feed_actions f
    JOIN first_activity as fa using(user_id)
  GROUP BY 
    cohort
    , lifetime
)
SELECT 
  c.cohort
  , c.lifetime
  , c.users
  , c.ctr
  , c.views / nullIf(c0.users, 0) as views_per_cohort_user
  , c.likes / nullIf(c0.users, 0) as likes_per_cohort_user
  , c.views / nullIf(c.users, 0) as views_per_user
  , c.likes / nullIf(c.users, 0) as likes_per_user  
  , c.users / nullIf(c0.users, 0) as retention
FROM 
  cohorts c
  JOIN cohorts c0 ON c.cohort = c0.cohort and c0.lifetime = 0
WHERE 
  c.lifetime <= toDate(now()) - toDate('2025-08-15')

### 3.2.2 Dashboard Implementation

Based on the written queries, a dashboard report was created in Superset.

Screenshots of the dashboard are provided below.

<img src="marketing_company_part_1.png" alt="">
<img src="marketing_company_part_2.png" alt="">
<img src="marketing_company_part_3.png" alt="">
<img src="marketing_company_part_4.png" alt="">
<img src="marketing_company_part_5.png" alt="">
<img src="marketing_company_part_6.png" alt="">
<img src="marketing_company_part_7.png" alt="">
<img src="marketing_company_part_8.png" alt="">
<img src="marketing_company_part_9.png" alt="">
<img src="marketing_company_part_10.png" alt="">

## 3.3 User Drop-off Analysis

### 3.3.1 SQL Queries

We will write SQL queries to the product database to build the charts.

**DAU**

In [None]:
SELECT 
  user_id 
  , toDate(time) as date
  , 'Feed' as service 
FROM 
  feed_actions
GROUP BY 
  user_id
  , date
UNION ALL
SELECT 
  user_id 
  , toDate(time) as date
  , 'Messenger' as service
FROM 
  message_actions
GROUP BY 
  user_id
  , date

**Users by Hour**

In [None]:
SELECT 
  multiIf(
      f.user_id != 0 and m.user_id != 0, 'both'
      , f.user_id != 0 and m.user_id = 0, 'feed'
      , f.user_id = 0 and m.user_id != 0, 'messenger'
      , 'unknown'
  ) as segment
  , if(f.user_id != 0, f.date, m.date) as date 
  , if(f.user_id != 0, f.user_id, m.user_id) as user_id 
  , if(f.user_id != 0, f.gender, m.gender) as gender_num 
  , if(f.user_id != 0, f.age, m.age) as age 
  , if(f.user_id != 0, f.source, m.source) as source 
  , if(f.user_id != 0, f.os, m.os) as os 
  , if(f.user_id != 0, f.country, m.country) as country 
  , if(f.user_id != 0, f.city, m.city) as city 
  , multiIf(gender_num = 1, 'female', gender_num = 0, 'male', 'unknown') as gender 
  , multiIf(age < 25, ' <25' , age between 25 and 40, '25-40' , age between 41 and 55, '41-55' , age >= 55, '55+' , 'unknown') as age_group 
  , f.posts
  , f.views
  , f.likes
  , f.ctr
  , m.messages
  , m.receivers
  FROM (
    SELECT
      user_id
      , toStartOfHour(time) as date
      , uniq(post_id) as posts
      , countIf(action = 'view') as views
      , countIf(action = 'like') as likes
      , likes / views as ctr
      , any(source) as source
      , any(gender) as gender
      , any(age) as age
      , any(os) as os
      , any(country) as country
      , any(city) as city
    FROM 
      feed_actions 
    WHERE 
      toDate(time) between '2025-08-21' and '2025-08-27'
    GROUP BY 
      user_id
      , date
    ) f
    FULL JOIN (
    SELECT
      user_id
      , toStartOfHour(time) as date
      , count(receiver_id) as messages
      , uniq(receiver_id) as receivers
      , any(source) as source
      , any(gender) as gender
      , any(age) as age
      , any(os) as os
      , any(country) as country
      , any(city) as city
    FROM 
      message_actions  
    WHERE 
      toDate(time) between '2025-08-21' and '2025-08-27'
    GROUP BY 
      user_id
      , date      
    ) m ON f.user_id = m.user_id and f.date = m.date

**Users by Dimensions**

In [None]:
SELECT 
  multiIf(
      f.user_id != 0 and m.user_id != 0, 'both'
      , f.user_id != 0 and m.user_id = 0, 'feed'
      , f.user_id = 0 and m.user_id != 0, 'messenger'
      , 'unknown'
  ) as segment
  , if(f.user_id != 0, f.date, m.date) as date 
  , if(f.user_id != 0, f.user_id, m.user_id) as user_id 
  , if(f.user_id != 0, f.gender, m.gender) as gender_num 
  , if(f.user_id != 0, f.age, m.age) as age 
  , if(f.user_id != 0, f.source, m.source) as source 
  , if(f.user_id != 0, f.os, m.os) as os 
  , if(f.user_id != 0, f.country, m.country) as country 
  , if(f.user_id != 0, f.city, m.city) as city 
  , multiIf(gender_num = 1, 'female', gender_num = 0, 'male', 'unknown') as gender 
  , multiIf(age < 25, ' <25' , age between 25 and 40, '25-40' , age between 41 and 55, '41-55' , age >= 55, '55+' , 'unknown') as age_group 
  , f.posts
  , f.views
  , f.likes
  , f.ctr
  , m.messages
  , m.receivers
  FROM (
    SELECT
      user_id
      , toDate(time) as date
      , uniq(post_id) as posts
      , countIf(action = 'view') as views
      , countIf(action = 'like') as likes
      , likes / views as ctr
      , any(source) as source
      , any(gender) as gender
      , any(age) as age
      , any(os) as os
      , any(country) as country
      , any(city) as city
    FROM 
      feed_actions 
    WHERE date >= toDate('2025-08-01')
      AND date < toDate('2025-09-01')
    GROUP BY 
      user_id
      , date
    ) f
    FULL JOIN (
    SELECT
      user_id
      , toDate(time) as date
      , count(receiver_id) as messages
      , uniq(receiver_id) as receivers
      , any(source) as source
      , any(gender) as gender
      , any(age) as age
      , any(os) as os
      , any(country) as country
      , any(city) as city
    FROM 
      message_actions  
    WHERE date >= toDate('2025-08-01')
      AND date < toDate('2025-09-01')
    GROUP BY 
      user_id
      , date      
    ) m ON f.user_id = m.user_id and f.date = m.date

### 3.3.2 Dashboard Implementation

Based on the written queries, a dashboard report was created in Superset.

Screenshots of the dashboard are provided below.

<img src="user_dropoff_part_1.png" alt="">
<img src="user_dropoff_part_2.png" alt="">
<img src="user_dropoff_part_3.png" alt="">
<img src="user_dropoff_part_4.png" alt="">
<img src="user_dropoff_part_5.png" alt="">

## 3.4 Users Decomposition

### 3.4.1 SQL Queries

We will write SQL queries to the product database to build the charts.

**Feed Users Decomposition**

Simple method: user is "new" in any week where they have their first activity in our dataset

In [None]:
WITH user_stats as (
  SELECT 
    user_id
    , groupUniqArray(toMonday(time)) as user_weeks
    , arrayJoin(user_weeks) as week_start
  FROM 
    feed_actions  
  GROUP BY 
    user_id
)
, retained_and_new_users as (
  SELECT 
    week_start as current_week
    , uniqExactIf(user_id, has(user_weeks, addWeeks(week_start, -1))) as retained_users
    , uniqExact(user_id) - retained_users as new_users
  FROM 
    user_stats
  GROUP BY 
    week_start
)
, gone_users as (
  SELECT 
    addWeeks(week_start, 1) as current_week
    , -uniqExactIf(user_id, NOT has(user_weeks, current_week)) as gone_users
  FROM 
    user_stats   
  GROUP BY 
    week_start    
)
SELECT 
  rn.current_week
  , rn.new_users
  , rn.retained_users
  , g.gone_users
FROM 
  retained_and_new_users rn
  LEFT JOIN gone_users g ON rn.current_week = g.current_week

Advanced method: user is only "new" in their absolute first week across entire history

In [None]:
WITH feed_weekly as (
  SELECT 
    user_id
    , toMonday(time) as week_start
  FROM 
    feed_actions
  GROUP BY 
    user_id
    , week_start
)
, user_first_week AS (
    SELECT 
      user_id
      , min(week_start) as first_week
    FROM 
      feed_weekly
    GROUP BY 
      user_id
)
, week_stats as (
  SELECT  
    fw.week_start
    , uniq(fw.user_id) as active_users
    , uniqIf(fw.user_id, fw.week_start = ufw.first_week) as new_users
  FROM 
    feed_weekly fw
    JOIN user_first_week ufw USING(user_id)
  GROUP BY 
    fw.week_start
)
, week_stats_cum as (
  SELECT 
    week_start
    , new_users
    , sum(new_users) OVER (ORDER BY week_start ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as new_users_cum_before
    , active_users - new_users as retained_users
  FROM 
    week_stats
)
SELECT 
  week_start
  , new_users
  , retained_users
  , -1 * (new_users_cum_before - retained_users) as gone_users
FROM 
  week_stats_cum  

**Messenger Users Decomposition**

In [None]:
WITH mess_weekly as (
  SELECT 
    user_id
    , toMonday(time) as week_start
  FROM 
    message_actions 
  GROUP BY 
    user_id
    , week_start
)
, user_first_week AS (
    SELECT 
      user_id
      , min(week_start) as first_week
    FROM 
      mess_weekly
    GROUP BY 
      user_id
)
, week_stats as (
  SELECT  
    fw.week_start
    , uniq(fw.user_id) as active_users
    , uniqIf(fw.user_id, fw.week_start = ufw.first_week) as new_users
  FROM 
    mess_weekly fw
    JOIN user_first_week ufw USING(user_id)
  GROUP BY 
    fw.week_start
)
, week_stats_cum as (
  SELECT 
    week_start
    , new_users
    , sum(new_users) OVER (ORDER BY week_start ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as new_users_cum_before
    , active_users - new_users as retained_users
  FROM 
    week_stats
)
SELECT 
  week_start
  , new_users
  , retained_users
  , -1 * (new_users_cum_before - retained_users) as gone_users
FROM 
  week_stats_cum

**Both Services Users Decomposition**

In [None]:
WITH feed_weekly as (
  SELECT 
    user_id
    , toMonday(time) as week_start
  FROM 
    feed_actions
  GROUP BY 
    user_id
    , week_start
)
, mess_weekly as (
  SELECT 
    user_id
    , toMonday(time) as week_start
  FROM 
    message_actions 
  GROUP BY 
    user_id
    , week_start
)
, both_weekly as (
  SELECT 
    f.user_id
    , f.week_start
  FROM 
    feed_weekly f
    JOIN mess_weekly m ON f.user_id = m.user_id and f.week_start = m.week_start
)
, user_first_week AS (
    SELECT 
      user_id
      , min(week_start) as first_week
    FROM 
      both_weekly
    GROUP BY 
      user_id
)
, week_stats as (
  SELECT  
    bw.week_start
    , uniq(bw.user_id) as active_users
    , uniqIf(bw.user_id, bw.week_start = ufw.first_week) as new_users
  FROM 
    both_weekly bw
    JOIN user_first_week ufw USING(user_id)
  GROUP BY 
    bw.week_start
)
, week_stats_cum as (
  SELECT 
    week_start
    , new_users
    , sum(new_users) OVER (ORDER BY week_start ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as new_users_cum_before
    , active_users - new_users as retained_users
  FROM 
    week_stats
)
SELECT 
  week_start
  , new_users
  , retained_users
  , -1 * (new_users_cum_before - retained_users) as gone_users
FROM 
  week_stats_cum  

**Only Feed Users Decomposition**

In [None]:
WITH feed_weekly as (
  SELECT 
    user_id
    , toMonday(time) as week_start
  FROM 
    feed_actions
  GROUP BY 
    user_id
    , week_start
)
, mess_weekly as (
  SELECT 
    user_id
    , toMonday(time) as week_start
  FROM 
    message_actions 
  GROUP BY 
    user_id
    , week_start
)
, only_feed_weekly as (
  SELECT 
    f.user_id
    , f.week_start
  FROM 
    feed_weekly f
    ANTI JOIN mess_weekly m ON f.user_id = m.user_id and f.week_start = m.week_start
)
, user_first_week AS (
    SELECT 
      user_id
      , min(week_start) as first_week
    FROM 
      only_feed_weekly
    GROUP BY 
      user_id
)
, week_stats as (
  SELECT  
    fw.week_start
    , uniq(fw.user_id) as active_users
    , uniqIf(fw.user_id, fw.week_start = ufw.first_week) as new_users
  FROM 
    only_feed_weekly fw
    JOIN user_first_week ufw USING(user_id)
  GROUP BY 
    fw.week_start
)
, week_stats_cum as (
  SELECT 
    week_start
    , new_users
    , sum(new_users) OVER (ORDER BY week_start ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as new_users_cum_before
    , active_users - new_users as retained_users
  FROM 
    week_stats
)
SELECT 
  week_start
  , new_users
  , retained_users
  , -1 * (new_users_cum_before - retained_users) as gone_users
FROM 
  week_stats_cum  

**Only Messenger Users Decomposition**

In [None]:
WITH feed_weekly as (
  SELECT 
    user_id
    , toMonday(time) as week_start
  FROM 
    feed_actions
  GROUP BY 
    user_id
    , week_start
)
, mess_weekly as (
  SELECT 
    user_id
    , toMonday(time) as week_start
  FROM 
    message_actions 
  GROUP BY 
    user_id
    , week_start
)
, only_mess_weekly as (
  SELECT 
    m.user_id
    , m.week_start
  FROM 
    mess_weekly m
    ANTI JOIN feed_weekly f ON f.user_id = m.user_id and f.week_start = m.week_start
)
, user_first_week AS (
    SELECT 
      user_id
      , min(week_start) as first_week
    FROM 
      only_mess_weekly
    GROUP BY 
      user_id
)
, week_stats as (
  SELECT  
    omw.week_start
    , uniq(omw.user_id) as active_users
    , uniqIf(omw.user_id, omw.week_start = ufw.first_week) as new_users
  FROM 
    only_mess_weekly omw
    JOIN user_first_week ufw USING(user_id)
  GROUP BY 
    omw.week_start
)
, week_stats_cum as (
  SELECT 
    week_start
    , new_users
    , sum(new_users) OVER (ORDER BY week_start ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as new_users_cum_before
    , active_users - new_users as retained_users
  FROM 
    week_stats
)
SELECT 
  week_start
  , new_users
  , retained_users
  , -1 * (new_users_cum_before - retained_users) as gone_users
FROM 
  week_stats_cum  

### 3.4.2 Dashboard Implementation

Based on the written queries, a dashboard report was created in Superset.

Screenshots of the dashboard are provided below.

<img src="user_decomposition_part_1.png" alt="">
<img src="user_decomposition_part_2.png" alt="">

## 3.5 General Conclusion

**Implementation Summary:**

- **Retention Analysis Completed:** Conducted comprehensive cohort analysis of advertising campaign users versus organic users
- **User Segmentation Identified:** Segmented and analyzed users who experienced news feed access issues
- **Root Cause Investigation:** Identified common characteristics and potential technical factors behind the audience drop
- **User Lifecycle Tracking Implemented:** Built ClickHouse query and Superset visualization for monitoring New, Retained, and Churned user cohorts
- **Data Products Delivered:** Created Superset dashboards for ongoing monitoring of retention patterns and user engagement issues
- **Goal Achieved:** Provided data-driven insights that enabled marketing to optimize campaign strategies and engineering to address critical technical issues
  
**Main Conclusion:**

- Advertising Campaign Analysis
  - Campaign successfully attracted large volume of new users, significantly exceeding previous acquisition rates
  - Initial engagement metrics showed no significant difference from organic users
  - Critical Issue: Day-1 retention rate for campaign users was substantially lower than historical Friday cohorts

- Audience Drop Investigation
  - Significant drop in feed engagement occurred on August 24th
  - Affected Areas: Moscow, St. Petersburg, Yekaterinburg, Novosibirsk showed concentrated impact
  - Isolated Issue: Messenger functionality remained unaffected in these cities
  
**Key Recommendation:**  

- Implement more targeted advertising based on analysis of existing active user profiles rather than broad audience targeting
- Investigate technical infrastructure in affected regions and implement redundancy measures to prevent future service disruptions