# Case Study: Growth

## Introduction

Company A is a popular photo editing product with millions of active users. The Growth Unit is responsible for expanding the user base, increasing user engagement, and retaining users. 
They are interested in understanding **user behavior and engagement** with its product. Below, a dataset containing user activity data was provided. Your task is to analyze this data and provide **actionable insights** and recommendations to **improve user growth and engagement**. 
In this task, you will be working with datasets that contain event logs and app open data for a specified period.

#### Part 1
1- Write a SQL query to derive the first session funnel per platform from the provided dataset using all available events. The first funnel step is the user’s first app open.
2- Calculate the WAU Growth Accounting (New, Retained, Resurrected, Churn users) from November 13th to November 19th
3- Write a SQL query to calculate the user’s daily retention rate for a 7-day period. Calculate both Sticky and Cohorted Retention

#### Part 2 
1- Please perform an analysis using the dataset provided. The goal is to find insights and suggest action(s) to stakeholders through your findings. Use Python, R, or a similar programming language to generate an analysis that could be shared/reviewed by other analysts.

Expected output: A memo in a form of Google doc that describes the finding and outlines action items. Additional attachments (spreadsheets, Jupyter notebooks, etc.) only if it is necessary

## Data Understanding and cleaning

#### App Open 
______________________________________________________

| Column        | Data Type         | Description                               | Unique Key|
| ----------- | ----------- | ----------- | ----------- |
| timestamp     |TIMESTAMP          |Timestamp when the app was opened          |No|
| device_skey   |STRING             |Unique identifier for the device           |No|
| session_skey  |STRING             |Unique identifier for the session          |No|
| user_skey     |STRING             |Unique identifier for the users            |No|
| is_first_app_open |BOOLEAN        |Indicator of whether it's the first time app open for the user |No|
| platform      |STRING             |Platform on which the app was opened (e.g., android, apple) |No|
| country_code  |STRING             |Country from which the app was opened (e.g., us, de) |No|

In [1]:
apps_open = _deepnote_execute_sql('SELECT \n    timestamp::TIMESTAMP_S AS date_time\n    , device_skey::VARCHAR AS device_skey\n    , session_skey::VARCHAR AS session_skey\n    , user_skey::VARCHAR AS user_skey\n    , is_first_app_open::BOOLEAN AS is_first_app_open\n    , platform::VARCHAR AS platform\n    , country_code::VARCHAR AS country_code\nFROM \'ps_challanges/growth_case_study/Datasets/App Opens.csv\'', 'SQL_DEEPNOTE_DATAFRAME_SQL', audit_sql_comment='', sql_cache_mode='cache_disabled')
apps_open

Unnamed: 0,date_time,device_skey,session_skey,user_skey,is_first_app_open,platform,country_code
0,2023-11-08 02:07:19,7891415701568881512,2011583431778737851,-8717167265529084706,False,android,br
1,2023-11-08 06:26:35,2841158506866498595,-710585864896703953,3858142552250413010,False,android,vn
2,2023-11-08 11:00:09,8319180276841915965,8851130170214692120,-6901249382849255824,False,apple,vn
3,2023-11-08 17:03:55,-7044717482868540199,2562809385083234043,42,False,apple,vn
4,2023-11-08 10:09:35,8330275025360404105,6269721242185127990,-8100364697395241128,False,android,de
...,...,...,...,...,...,...,...
196085,2023-11-15 11:47:15,-1985881793666309748,1078423291596019483,3098934470375878848,False,android,br
196086,2023-11-15 15:24:53,2134527128238454256,-3747564580958942091,-8099055897267782491,False,apple,br
196087,2023-11-15 15:31:40,-2987017123302243105,-6832478710689874856,-8250782251817560378,False,android,vn
196088,2023-11-15 20:41:23,1438462127647457196,-5384106018072454101,-46678441358607097,False,android,br


#### Events
______________________________________________________


| Column| Data Type| Description| Unique Key|
| ----------- | ----------- | ----------- | ----------- |
event_name |STRING |Type of event occurred |No
timestamp |TIMESTAMP |Timestamp when the event occurred |No
event_skey |STRING |Unique identifier for the event |Yes
device_skey |STRING |Unique identifier for the device |No
user_skey |STRING |Unique identifier for the user |No
session_skey |STRING |Unique identifier for the session |No
platform |STRING |Platform on which the event occurred (e.g., android, apple) |No
source |STRING |Source screen from which the event was triggered |No
country_code |STRING |Country from which the app was opened (e.g., us, de) | No

In [2]:
events = _deepnote_execute_sql('SELECT \n    event_name::STRING AS event_name\n    , epoch_ms(timestamp)::TIMESTAMP_S AS date_time\n    , event_skey::STRING AS event_skey\n    , device_skey::STRING AS device_skey\n    , user_skey::STRING AS user_skey\n    , session_skey::STRING AS session_skey\n    , platform::STRING AS platform\n    , source::STRING AS source\n    , country_code::STRING AS country_code\nFROM \'ps_challanges/growth_case_study/Datasets/Events.csv\'', 'SQL_DEEPNOTE_DATAFRAME_SQL', audit_sql_comment='', sql_cache_mode='cache_disabled')
events

Unnamed: 0,event_name,date_time,event_skey,device_skey,user_skey,session_skey,platform,source,country_code
0,registration_open,2023-11-11 03:32:05,-927509160255567445,-724186991312688217,,5342905048311674142,apple,app_start,vn
1,registration_open,2023-11-11 03:16:02,6726005894254876970,167579040429311423,,1033335939576521366,android,app_start,vn
2,registration_open,2023-11-10 15:18:40,-2921948449119298022,-753449636207296314,,7698330632377898188,apple,app_start,vn
3,registration_open,2023-11-10 15:46:26,920368455108193822,6151555484025284746,,1455590834684669000,android,user_profile,br
4,registration_open,2023-11-10 15:46:35,4301569710999595535,6151555484025284746,,1455590834684669000,android,user_profile,br
...,...,...,...,...,...,...,...,...,...
762098,object_export,2023-11-06 03:19:28,8762455078106490544,-218442561903040844,-7394739150698595004,4931081474184387585,apple,editor_screen,vn
762099,object_export,2023-11-06 03:19:33,1236197224848669172,-8953068508614011099,-5616148740068329164,1939940093157332532,apple,editor_add_objects,vn
762100,object_export,2023-11-06 00:40:15,-1128268909705829967,7622198276030932501,-4476277103857062859,-914698409805777184,apple,share_screen,br
762101,object_export,2023-11-06 03:46:08,-5041445763328466000,-2972284481220770541,5642301744160861251,-2767914012090684930,android,editor_add_objects,eg


#### Visitors
______________________________________________________

| Column| Data Type| Description| Unique Key|
| ----------- | ----------- | ----------- | ----------- |
|Date |STRING| Date of visit | yes
Store Visitors. Source: search | BIGINT |Number of store visitors from the source search |No
Store Visitors. Source: explore | BIGINT |Number of store visitors from the source explore |No
Store Visitors. Source: referrals | BIGINT |Number of store visitors from the source referrals |No

In [3]:
visitors = _deepnote_execute_sql('SELECT \n    Date::TIMESTAMP_S AS date_time\n    , "Store Visitors. Source: search"::BIGINT AS store_visitors_source_search\n    , "Store Visitors. Source: explore"::BIGINT AS store_visitors_source_explore\n    , "Store Visitors. Source: referrals"::BIGINT AS store_visitors_source_referrals\nFROM \'ps_challanges/growth_case_study/Datasets/store_visitors.csv\'', 'SQL_DEEPNOTE_DATAFRAME_SQL', audit_sql_comment='', sql_cache_mode='cache_disabled')
visitors

Unnamed: 0,date_time,store_visitors_source_search,store_visitors_source_explore,store_visitors_source_referrals
0,2023-11-01,6,6,5
1,2023-11-02,7,8,7
2,2023-11-03,3,3,2
3,2023-11-04,286,3,3
4,2023-11-05,7,9,6
5,2023-11-06,2,4,2
6,2023-11-07,4,7,4
7,2023-11-08,1,3,2
8,2023-11-09,6,12,6
9,2023-11-10,8,15,7


## Part 1 - SQL

> 1- Write an SQL query to derive the first session funnel per platform from the provided dataset using all available events. The first funnel step is the user’s first app open.

In [4]:
ranked = _deepnote_execute_sql('WITH base AS (\n    SELECT \n      events.date_time\n      , events.user_skey\n      , events.platform\n      , events.session_skey\n      , events.event_name\n    FROM events \n    INNER JOIN apps_open USING(session_skey)\n)\n, ranked as (\n    SELECT     \n    session_skey\n    , user_skey\n    , platform\n    , event_name\n    , RANK() OVER (PARTITION BY session_skey ORDER BY date_time) AS funnel_step\n    FROM base\n)\n, ordered AS (SELECT \n     funnel_step\n    , event_name\n    , COUNT(DISTINCT CASE WHEN platform = \'apple\' THEN user_skey END) AS apple\n    , COUNT(DISTINCT CASE WHEN platform = \'android\' THEN user_skey END) AS android\nFROM ranked\nGROUP BY funnel_step, event_name\n)\n\n-- SELECT * FROM ordered\n\nSELECT DISTINCT\n    funnel_step\n    , FIRST(CONCAT(event_name, \': \', apple, \' users\')) OVER (PARTITION BY funnel_step ORDER BY apple desc) AS apple_event \n    , FIRST(CONCAT(event_name, \': \', android, \' users\')) OVER (PARTITION BY funnel_step ORDER BY android desc) AS android_event\nFROM ordered\nWHERE funnel_step <= 5\nORDER BY funnel_step ASC', 'SQL_DEEPNOTE_DATAFRAME_SQL', audit_sql_comment='', sql_cache_mode='cache_disabled')
ranked

Unnamed: 0,funnel_step,apple_event,android_event
0,1,create_flow_open: 203 users,create_flow_open: 226 users
1,2,subscription_offer_open: 95 users,editor_open: 127 users
2,3,editor_open: 113 users,object_export: 89 users
3,4,object_export: 81 users,object_export: 70 users
4,5,object_export: 42 users,object_export: 45 users


> 2. Calculate the WAU Growth Accounting (New, Retained, Resurrected, Churn users) from November 13th to November 19th

1. New Users: Users who start using the application during the current week.
2. Retained Users: Users who also used the application in the current week and the week before.
3. Resurrected Users: Users who used the app in the current week, but not the previous week, and at some point before the previous week.
4. Churned Users: Users who used the app in the previous week, but not the current week.

In [5]:
base_users = _deepnote_execute_sql('WITH  base AS (\nSELECT \n    date_trunc(\'week\', date_time) AS week\n    , user_skey\n    , is_first_app_open\nFROM apps_open\n) \n\n, normalized AS (\n    SELECT \n        user_skey\n        , CASE WHEN week = \'2023-11-06\' AND is_first_app_open = TRUE THEN \'new_week_1\'\n            WHEN week = \'2023-11-06\' AND is_first_app_open = FALSE THEN \'existing_week_1\'\n            WHEN week = \'2023-11-13\' AND is_first_app_open = TRUE THEN \'new_week_2\'\n            WHEN week = \'2023-11-13\' AND is_first_app_open = FALSE THEN \'existing_week_2\' END AS category \n    FROM base\n)\n\n, pivoted AS (PIVOT normalized ON category USING COUNT(DISTINCT user_skey) GROUP BY user_skey)\n\n, final AS (\nSELECT DISTINCT\n *\n    , CASE WHEN new_week_1 = 1 THEN \'new\'\n        WHEN existing_week_1 = 1 THEN \'existing\' END AS week_1\n    , CASE WHEN new_week_2 = 1  AND new_week_1 = 0 THEN \'new\'\n        WHEN existing_week_2 = 1 THEN \'existing\' END AS week_2\nFROM pivoted \n)\n\n    SELECT \n      COUNT(DISTINCT CASE WHEN week_1 = \'new\' THEN user_skey END) AS new_week_1\n      , COUNT( DISTINCT CASE WHEN week_2 = \'new\' THEN user_skey END) AS new_week_2\n\n      , COUNT( DISTINCT CASE WHEN week_1 = \'existing\' THEN user_skey END) AS existing_week_1\n      , COUNT( DISTINCT CASE WHEN week_2 = \'existing\' THEN user_skey END) AS existing_week_2\n\n      , COUNT( DISTINCT CASE WHEN week_1 IS NOT NULL THEN user_skey END) AS total_week_1\n      , COUNT( DISTINCT CASE WHEN week_2 IS NOT NULL THEN user_skey END) AS total_week_2\n\n      , ((total_week_2/total_week_1 -1) * 100)::DECIMAL(10,2) AS growth\n      , COUNT( DISTINCT CASE WHEN week_1 IS NULL AND week_2 = \'existing\' THEN user_skey END) AS resurrected\n      , COUNT( DISTINCT CASE WHEN week_1 IS NOT NULL AND week_2 = \'existing\' THEN user_skey END) AS retained\n      , COUNT( DISTINCT CASE WHEN week_1 IS NOT NULL AND week_2 IS NULL THEN user_skey END) AS churned\n    FROM final', 'SQL_DEEPNOTE_DATAFRAME_SQL', audit_sql_comment='', sql_cache_mode='cache_disabled')
base_users

Unnamed: 0,new_week_1,new_week_2,existing_week_1,existing_week_2,total_week_1,total_week_2,growth,resurrected,retained,churned
0,2,1,68070,69412,68072,69413,1.97,67441,1971,66101


> 3. Write a SQL query to calculate the user’s daily retention rate for 7 days. Calculate both Sticky and Cohorted Retention

Retention rate is a measure of how many users return to the app after their first visit. 
- Cohorted Retention: This looks at users who started using the application on a specific date, then checks to see if they're still using the app after a certain number of days.
- Sticky Retention: This is a measure of how often the user is engaged with the app (usually on a daily or monthly basis). It's calculated by dividing the Daily Active Users (DAU) by the Monthly/Week Active Users (M/WAU).

In [7]:
cohort = _deepnote_execute_sql('WITH cohorts AS (\n    SELECT\n        apps_open.user_skey,\n        DATE_TRUNC(\'day\', MIN(apps_open.date_time)) AS start_date\n    FROM apps_open\n    WHERE date_time > \'2023-11-07\'\n    GROUP BY 1\n)\n\n, cohort AS (\n    SELECT\n        start_date\n        , \'day\' AS beginning\n        , date_diff(\'day\', start_date, DATE_TRUNC(\'day\', date_time)) AS cohort_date\n        , COUNT(DISTINCT user_skey) AS total_users\n    FROM apps_open\n    JOIN cohorts USING (user_skey)\n    GROUP BY 1,2,3\n)\n\n,pivoted AS (\nPIVOT (SELECT * FROM cohort WHERE cohort_date <= 7) \n    ON beginning,cohort_date USING SUM(total_users)\n)\n\nSELECT \n    start_date\n        , day_1\n        , (day_2/day_1 * 100)::DECIMAL(10,2) || \'%\' AS retention_rate_day_2\n        , (day_3/day_1 * 100)::DECIMAL(10,2) || \'%\' AS retention_rate_day_3\n        , (day_4/day_1 * 100)::DECIMAL(10,2) || \'%\' AS retention_rate_day_4\n        , (day_5/day_1 * 100)::DECIMAL(10,2) || \'%\' AS retention_rate_day_5\n        , (day_6/day_1 * 100)::DECIMAL(10,2) || \'%\' AS retention_rate_day_6\n        , (day_7/day_1 * 100)::DECIMAL(10,2) || \'%\' AS retention_rate_day_7\nFROM pivoted\n', 'SQL_DEEPNOTE_DATAFRAME_SQL', audit_sql_comment='', sql_cache_mode='cache_disabled')
cohort

Unnamed: 0,start_date,day_1,retention_rate_day_2,retention_rate_day_3,retention_rate_day_4,retention_rate_day_5,retention_rate_day_6,retention_rate_day_7
0,2023-11-08,70.0,74.29%,47.14%,68.57%,48.57%,67.14%,71.43%
1,2023-11-07,76.0,80.26%,80.26%,59.21%,60.53%,64.47%,68.42%
2,2023-11-12,64.0,82.81%,46.88%,73.44%,62.50%,79.69%,68.75%
3,2023-11-10,53.0,94.34%,86.79%,86.79%,83.02%,101.89%,71.70%
4,2023-11-11,62.0,87.10%,62.90%,80.65%,75.81%,66.13%,91.94%
5,2023-11-09,56.0,64.29%,96.43%,64.29%,85.71%,60.71%,91.07%
6,2023-11-17,52.0,75.00%,,,,,
7,2023-11-14,52.0,73.08%,101.92%,88.46%,67.31%,,
8,2023-11-19,,,,,,,
9,2023-11-16,44.0,90.91%,95.45%,,,,


In [8]:
stickiness = _deepnote_execute_sql('WITH daily_users AS (\n    SELECT \n        DATE_TRUNC(\'day\', date_time) AS day,\n        COUNT(DISTINCT user_skey) AS dau\n    FROM events\n    GROUP BY 1\n),\n\nweekly_users AS (\n    SELECT\n        DATE_TRUNC(\'week\', date_time) AS week,\n        COUNT(DISTINCT user_skey) AS wau\n    FROM events\n    GROUP BY 1\n)\n\nSELECT\n    week,\n    (AVG(dau) / wau *100)::DECIMAL(10,5) || \' %\'AS sticky_retention\nFROM daily_users\nJOIN weekly_users ON DATE_TRUNC(\'week\', day) = week\nGROUP BY week, wau', 'SQL_DEEPNOTE_DATAFRAME_SQL', audit_sql_comment='', sql_cache_mode='cache_disabled')
stickiness

Unnamed: 0,week,sticky_retention
0,2023-11-06,14.51583 %
1,2023-11-13,14.51684 %


## Part 2 - User Activity Analysis

> Please perform an analysis using the dataset provided. The goal is to find insights and suggest action(s) to stakeholders through your findings. Use Python, R, or a similar programming language to generate an analysis that could be shared/reviewed by other analysts.

1. Examine the datasets.
2. Analyze the App Open dataset.
    - Analyze the distribution of users according to the platforms used.
    - Analyze the variety of countries represented.
    - Evaluate the frequency distribution over time.
3. Analyze the Events dataset.
    - Identify the number of unique events and their occurrence.
    - Analyze the variety of platforms and countries represented for these events.
    - Evaluate the frequency distribution of these events over time.
4. Analyze the Visitor's dataset.
    - Evaluate the variety and trends of visitors' sources.
5. Identify key insights and provide recommendations.

In [106]:
apps_open_df['date_time'] = pd.to_datetime(apps_open_df['date_time'])
apps_open_df['date_time'] = apps_open_df['date_time'].dt.date

unique_users_per_day = apps_open_df.groupby(['date_time', 'platform'])['user_skey'].nunique().reset_index()
unique_users_per_day.rename(columns={'user_skey':'num_users'}, inplace=True)

fig = px.line(unique_users_per_day, x='date_time', y='num_users', color='platform', title='Number of unique users per day per platform')

fig.show()

In [132]:
platform_user_counts['total'] = platform_user_counts.groupby('date_time')['num_users'].transform('sum')
platform_user_counts['proportion'] = platform_user_counts['num_users'] / platform_user_counts['total'] * 100

fig = px.area(platform_user_counts, x="date_time", y="proportion", color="platform", title='Proportion of Daily Active Users by Platform', labels={'date_time':'Date', 'proportion':'Percentage of Users'}, 
        category_orders={"platform": ["android", "apple"]})

fig.update_layout(yaxis=dict(title='Percentage of Users', tickformat='.2f'), xaxis=dict(title='Date'),
                  autosize=False, width=900, height=500)

fig.show()

There is a notable consistency in the proportion of daily active users on both platforms. The user engagement is on an upward trajectory, indicating a positive trend. A projection may be derived from the dataset, however, relying solely on data collected over two weeks may yield unreliable and biased results for a project, as this timeframe may not capture the full range of user behaviors and fluctuations.

In [124]:
session_lengths = events.groupby(['session_skey','user_skey', 'platform'])['date_time'].agg(['min', 'max'])
session_lengths['length_minutes'] = (session_lengths['max'] - session_lengths['min']).dt.total_seconds()/60
session_lengths.reset_index(inplace=True)
print(session_lengths.groupby('platform')['length_minutes'].agg(['mean', 'median']))

              mean    median
platform                    
android   3.376485  1.050000
apple     3.309095  1.033333


#### In-app times spent
Related to time spent in-app per session, both platforms have similar time, with Android having a slightly higher average and median.

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=ea0dd5ec-7a99-4815-85b5-15ffe4204711' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>