Marketing Campaign Success [Advanced]

You have a table of in-app purchases by user. Users that make their first in-app purchase are placed in a marketing campaign where they see call-to-actions for more in-app purchases. Find the number of users that made additional in-app purchases due to the success of the marketing campaign.


The marketing campaign doesn't start until one day after the initial in-app purchase so users that only made one or multiple purchases on the first day do not count, nor do we count users that over time purchase only the products they purchased on the first day.

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime

In [4]:
marketing_campaign = pd.read_csv("../CSV/marketing_campaign.csv", usecols=[0,1,2,3,4])
marketing_campaign.head()

Unnamed: 0,user_id,created_at,product_id,quantity,price
0,10,2019-01-01,101,3,55
1,10,2019-01-02,119,5,29
2,10,2019-03-31,111,2,149
3,11,2019-01-02,105,3,234
4,11,2019-03-31,120,3,99


# users who meet campaign criteria

In [7]:
users_impacted_by_campaign = marketing_campaign.groupby('user_id')[['created_at', 'product_id']].nunique().add_prefix(
    'cnt_').reset_index()
users_impacted_by_campaign.head()

Unnamed: 0,user_id,cnt_created_at,cnt_product_id
0,10,3,3
1,11,2,2
2,12,2,2
3,13,2,2
4,14,2,3


In [9]:
users_impacted_by_campaign = users_impacted_by_campaign[
    (users_impacted_by_campaign["cnt_created_at"] > 1) & (users_impacted_by_campaign["cnt_product_id"] > 1)]

users_impacted_by_campaign.head()

Unnamed: 0,user_id,cnt_created_at,cnt_product_id
0,10,3,3
1,11,2,2
2,12,2,2
3,13,2,2
4,14,2,3


# #products not meeting the campaign criteria

In [10]:
marketing_campaign['user_product'] = marketing_campaign['user_id'].map(str) + "_" + marketing_campaign[
    'product_id'].map(str)

marketing_campaign.head()

Unnamed: 0,user_id,created_at,product_id,quantity,price,user_product
0,10,2019-01-01,101,3,55,10_101
1,10,2019-01-02,119,5,29,10_119
2,10,2019-03-31,111,2,149,10_111
3,11,2019-01-02,105,3,234,11_105
4,11,2019-03-31,120,3,99,11_120


In [11]:
marketing_campaign['created_at'] = pd.to_datetime(marketing_campaign['created_at'])
marketing_campaign.head()

Unnamed: 0,user_id,created_at,product_id,quantity,price,user_product
0,10,2019-01-01,101,3,55,10_101
1,10,2019-01-02,119,5,29,10_119
2,10,2019-03-31,111,2,149,10_111
3,11,2019-01-02,105,3,234,11_105
4,11,2019-03-31,120,3,99,11_120


In [14]:
marketing_campaign['first_order'] = marketing_campaign.groupby('user_id')['created_at'].transform('min')
marketing_campaign

Unnamed: 0,user_id,created_at,product_id,quantity,price,user_product,first_order
0,10,2019-01-01,101,3,55,10_101,2019-01-01
1,10,2019-01-02,119,5,29,10_119,2019-01-01
2,10,2019-03-31,111,2,149,10_111,2019-01-01
3,11,2019-01-02,105,3,234,11_105,2019-01-02
4,11,2019-03-31,120,3,99,11_120,2019-01-02
...,...,...,...,...,...,...,...
97,63,2019-03-27,120,5,99,63_120,2019-03-27
98,64,2019-03-27,105,3,234,64_105,2019-03-27
99,65,2019-03-27,103,4,79,65_103,2019-03-27
100,66,2019-03-31,107,2,27,66_107,2019-03-31


In [17]:
first_order = marketing_campaign[marketing_campaign['created_at'] == marketing_campaign['first_order']]
first_order

Unnamed: 0,user_id,created_at,product_id,quantity,price,user_product,first_order
0,10,2019-01-01,101,3,55,10_101,2019-01-01
3,11,2019-01-02,105,3,234,11_105,2019-01-02
5,12,2019-01-02,112,2,200,12_112,2019-01-02
7,13,2019-01-05,113,1,67,13_113,2019-01-05
9,14,2019-01-06,109,5,199,14_109,2019-01-06
...,...,...,...,...,...,...,...
97,63,2019-03-27,120,5,99,63_120,2019-03-27
98,64,2019-03-27,105,3,234,64_105,2019-03-27
99,65,2019-03-27,103,4,79,65_103,2019-03-27
100,66,2019-03-31,107,2,27,66_107,2019-03-31


In [20]:
result = marketing_campaign[(marketing_campaign['user_id'].isin(users_impacted_by_campaign['user_id']))] #& (
    #~marketing_campaign['user_product'].isin(first_order['user_product']))]['user_id'].nunique()
result

Unnamed: 0,user_id,created_at,product_id,quantity,price,user_product,first_order
0,10,2019-01-01,101,3,55,10_101,2019-01-01
1,10,2019-01-02,119,5,29,10_119,2019-01-01
2,10,2019-03-31,111,2,149,10_111,2019-01-01
3,11,2019-01-02,105,3,234,11_105,2019-01-02
4,11,2019-03-31,120,3,99,11_120,2019-01-02
5,12,2019-01-02,112,2,200,12_112,2019-01-02
6,12,2019-03-31,110,2,299,12_110,2019-01-02
7,13,2019-01-05,113,1,67,13_113,2019-01-05
8,13,2019-03-31,118,3,35,13_118,2019-01-05
9,14,2019-01-06,109,5,199,14_109,2019-01-06


In [22]:
res = result[(~marketing_campaign['user_product'].isin(first_order['user_product']))]#['user_id'].nunique()
res

  res = result[(~marketing_campaign['user_product'].isin(first_order['user_product']))]#['user_id'].nunique()


Unnamed: 0,user_id,created_at,product_id,quantity,price,user_product,first_order
1,10,2019-01-02,119,5,29,10_119,2019-01-01
2,10,2019-03-31,111,2,149,10_111,2019-01-01
4,11,2019-03-31,120,3,99,11_120,2019-01-02
6,12,2019-03-31,110,2,299,12_110,2019-01-02
8,13,2019-03-31,118,3,35,13_118,2019-01-05
11,14,2019-03-31,112,3,200,14_112,2019-01-06
13,15,2019-01-09,110,4,299,15_110,2019-01-08
14,15,2019-03-31,116,2,499,15_116,2019-01-08
16,16,2019-03-31,107,4,27,16_107,2019-01-10
18,17,2019-03-31,104,1,154,17_104,2019-01-11


In [24]:
print(len(res))
print(res['user_id'].nunique())

29
23


Solution Walkthrough
This solution is aimed at finding the number of users who made additional in-app purchases due to the success of a marketing campaign. The marketing campaign starts one day after the initial in-app purchase, so users who made purchases only on the first day or only purchased the products they bought on the first day are excluded.

Understanding The Data
The data consists of a table that records in-app purchases made by users. It includes columns such as user_id, created_at (timestamp of the purchase), and product_id. We also have an additional column called user_product, which is a combination of user_id and product_id.

The Problem Statement
We need to identify the number of users who made additional in-app purchases due to the success of the marketing campaign. We need to exclude users who made purchases only on the first day and users who only purchased the products they bought on the first day.

Breaking Down The Code
Import the necessary libraries:

pandas as pd: This library is used for data manipulation and analysis.
numpy as np: This library is used for mathematical operations on arrays and matrices.
datetime from datetime: This class is used for working with dates and times.
Group the data by user_id and get the count of unique values for created_at and product_id columns. Prefix the column names with "cnt_" and reset the index:

users_impacted_by_campaign = marketing_campaign.groupby('user_id')['created_at', 'product_id'].nunique().add_prefix('cnt_').reset_index()
Filter the users_impacted_by_campaign DataFrame to include only users who have more than 1 unique value for cnt_created_at and cnt_product_id:

users_impacted_by_campaign = users_impacted_by_campaign[(users_impacted_by_campaign["cnt_created_at"] > 1) & (users_impacted_by_campaign["cnt_product_id"] > 1)]
Merge the user_id and product_id columns to create a new column called user_product:

marketing_campaign['user_product'] = marketing_campaign['user_id'].map(str) + "_" + marketing_campaign['product_id'].map(str)
Convert the created_at column to datetime format:

marketing_campaign['created_at'] = pd.to_datetime(marketing_campaign['created_at'])
Find the earliest purchase date for each user using the groupby and transform functions:

marketing_campaign['first_order'] = marketing_campaign.groupby('user_id')['created_at'].transform('min')
first_order = marketing_campaign[marketing_campaign['created_at'] == marketing_campaign['first_order']]
Find the number of users who meet the criteria (users who are in users_impacted_by_campaign and not in first_order) using the isin and ~ operators:

result = marketing_campaign[(marketing_campaign['user_id'].isin(users_impacted_by_campaign['user_id'])) & (~marketing_campaign['user_product'].isin(first_order['user_product']))]['user_id'].nunique()
Bringing It All Together
The code first groups the data by user_id and calculates the count of unique values for created_at and product_id. It then filters the dataframe to include only users who have more than 1 unique value for cnt_created_at and cnt_product_id.

Next, it merges the user_id and product_id columns to create a new column called user_product. The created_at column is converted to datetime format and the earliest purchase date for each user is found using the groupby and transform functions.

Finally, the number of users who meet the criteria (users who are in users_impacted_by_campaign and not in first_order) is calculated and stored in the variable result.

Conclusion
The provided code successfully identifies the number of users who made additional in-app purchases due to the success of the marketing campaign, excluding users who made purchases only on the first day or only purchased the products they bought on the first day.