PROJECT DESCRIPTION:
===================

Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week. Help them analyze the data using aggregate measures.


PROJECT TASKS:
==============

1. Examine the first few rows of ad_clicks.

2. Your manager wants to know which ad platform is getting you the most views.

How many views (i.e., rows of the table) came from each utm_source?

3. If the column ad_click_timestamp is not null, then someone actually clicked on the ad that was displayed.

Create a new column called is_click, which is True if ad_click_timestamp is not null and False otherwise.

4. We want to know the percent of people who clicked on ads from each utm_source.

Start by grouping by utm_source and is_click and counting the number of user_id‘s in each of those groups. Save your answer to the variable clicks_by_source.

5. Now let’s pivot the data so that the columns are is_click (either True or False), the index is utm_source, and the values are user_id.

Save your results to the variable clicks_pivot.

6. Create a new column in clicks_pivot called percent_clicked which is equal to the percent of users who clicked on the ad from each utm_source.

Was there a difference in click rates for each source?

Analyzing an A/B Test

7. The column experimental_group tells us whether the user was shown Ad A or Ad B.

Were approximately the same number of people shown both ads?

8. Using the column is_click that we defined earlier, check to see if a greater percentage of users clicked on Ad A or Ad B.

9. The Product Manager for the A/B test thinks that the clicks might have changed by day of the week.

Start by creating two DataFrames: a_clicks and b_clicks, which contain only the results for A group and B group, respectively.

10. For each group (a_clicks and b_clicks), calculate the percent of users who clicked on the ad by day.

11. Compare the results for A and B. What happened over the course of the week?

Do you recomme
nd that your company use Ad A or Ad B?


In [45]:

import pandas as pd

ad_clicks = pd.read_csv('ad_clicks.csv')

# Task - 1

print("Dataframe structure of Ad_clicks:\n")
print(ad_clicks.head(5))


Dataframe structure of Ad_clicks:

                                user_id utm_source           day  \
0  008b7c6c-7272-471e-b90e-930d548bd8d7     google  6 - Saturday   
1  009abb94-5e14-4b6c-bb1c-4f4df7aa7557   facebook    7 - Sunday   
2  00f5d532-ed58-4570-b6d2-768df5f41aed    twitter   2 - Tuesday   
3  011adc64-0f44-4fd9-a0bb-f1506d2ad439     google   2 - Tuesday   
4  012137e6-7ae7-4649-af68-205b4702169c   facebook    7 - Sunday   

  ad_click_timestamp experimental_group  
0               7:18                  A  
1                NaN                  B  
2                NaN                  A  
3                NaN                  B  
4                NaN                  B  


In [59]:
print("\nCount of Ad clicks thru each UTM source:\n")

# Task - 2: To find count of ad clicks through each utm_source 
source_count = ad_clicks.groupby('utm_source').user_id.count().reset_index()

print(source_count)




Count of Ad clicks thru each UTM source:

  utm_source  user_id
0      email      255
1   facebook      504
2     google      680
3    twitter      215


In [61]:
print("\nUser cliked Ad or not??:\n")

# Task - 3: To check whether the ad is displayed for the user
ad_clicks['is_click'] = ad_clicks['ad_click_timestamp'].notna()  #checks for non-null values

print(ad_clicks)




User cliked Ad or not??:

                                   user_id utm_source            day  \
0     008b7c6c-7272-471e-b90e-930d548bd8d7     google   6 - Saturday   
1     009abb94-5e14-4b6c-bb1c-4f4df7aa7557   facebook     7 - Sunday   
2     00f5d532-ed58-4570-b6d2-768df5f41aed    twitter    2 - Tuesday   
3     011adc64-0f44-4fd9-a0bb-f1506d2ad439     google    2 - Tuesday   
4     012137e6-7ae7-4649-af68-205b4702169c   facebook     7 - Sunday   
...                                    ...        ...            ...   
1649  fe8b5236-78f6-4192-9da6-a76bba67cfe6    twitter     7 - Sunday   
1650  fed3db6d-8c92-40e3-a4fb-1fb9d7337eb1   facebook     5 - Friday   
1651  ff3a22ff-521c-478c-87ca-7dc7b8f34372    twitter  3 - Wednesday   
1652  ff3af0d6-b092-4c4d-9f2e-2bdd8f7c0732     google     1 - Monday   
1653  ffdfe7ec-0c74-4623-8d90-d95d80f1ba34   facebook   6 - Saturday   

     ad_click_timestamp experimental_group  is_click  
0                  7:18                  A      True 

In [63]:
# Task - 4: No of people who clicked and not clicked on ads from each utm_source

clicks_by_source = ad_clicks.groupby(['utm_source','is_click']).user_id.count().reset_index() 

print(clicks_by_source)



  utm_source  is_click  user_id
0      email     False      175
1      email      True       80
2   facebook     False      324
3   facebook      True      180
4     google     False      441
5     google      True      239
6    twitter     False      149
7    twitter      True       66


In [65]:
# Task - 5: Pivotting the table

clicks_pivot = clicks_by_source.pivot(columns = 'is_click',
index='utm_source',
values='user_id').reset_index()

print(clicks_pivot)



is_click utm_source  False  True
0             email    175    80
1          facebook    324   180
2            google    441   239
3           twitter    149    66


In [67]:
#  Task - 6: Percent of users who clicked on the ad from each utm_source

clicks_pivot['percent_clicked'] =  clicks_pivot[True] / (clicks_pivot[True] + clicks_pivot[False])

print(clicks_pivot)


is_click utm_source  False  True  percent_clicked
0             email    175    80         0.313725
1          facebook    324   180         0.357143
2            google    441   239         0.351471
3           twitter    149    66         0.306977


In [69]:
# Task - 7: Ad groups view by No of users

Ad_group = ad_clicks.groupby('experimental_group').user_id.count().reset_index()

print(Ad_group)


# so from here we can observe that both ad groups (A & B) are shown to exactly same no of users, i.e 827

  experimental_group  user_id
0                  A      827
1                  B      827


In [73]:
# Task - 8: To see if a greater percentage of users clicked on Ad A or Ad B.

Ad_group_percent = ad_clicks.groupby(['experimental_group','is_click']).user_id.count().reset_index().pivot(
index = 'experimental_group',
columns = 'is_click',
values = 'user_id'
).reset_index()

print(Ad_group_percent)


# Therefore, from above observation, we can say that more no of users clicked Ad group A than B.


is_click experimental_group  False  True
0                         A    517   310
1                         B    572   255


In [75]:
# Task - 9: No of clicks by the user for each day of the week


a_clicks = ad_clicks[ad_clicks.experimental_group == 'A']   # for slecting rows which contains only 'A' group

b_clicks = ad_clicks[ad_clicks.experimental_group == 'B']  # for slecting rows which contains only 'B' group

#For A

print("\nNo of Ad Clicks per day for group A:\n")

a_clicks_pivot = a_clicks.groupby(['is_click', 'day']).count().reset_index().pivot(
  index = 'day',
  columns = 'is_click',
  values = 'user_id'
).reset_index()

print(a_clicks_pivot)


#For B

print("\nNo of Ad Clicks per day for group B:\n")

b_clicks_pivot = b_clicks.groupby(['is_click', 'day']).count().reset_index().pivot(
  index = 'day',
  columns = 'is_click',
  values = 'user_id'
).reset_index()

print(b_clicks_pivot)



No of Ad Clicks per day for group A:

is_click            day  False  True
0            1 - Monday     70    43
1           2 - Tuesday     76    43
2         3 - Wednesday     86    38
3          4 - Thursday     69    47
4            5 - Friday     77    51
5          6 - Saturday     73    45
6            7 - Sunday     66    43

No of Ad Clicks per day for group B:

is_click            day  False  True
0            1 - Monday     81    32
1           2 - Tuesday     74    45
2         3 - Wednesday     89    35
3          4 - Thursday     87    29
4            5 - Friday     90    38
5          6 - Saturday     76    42
6            7 - Sunday     75    34


In [77]:

# Task - 10:Percent of users who clicked on the ad by day


#For A

print("\nPercent of Ad Clicks per day for group A:\n")

a_clicks_pivot['percent_clicked'] = a_clicks_pivot[True] / a_clicks_pivot[True] + a_clicks_pivot[False] 

print(a_clicks_pivot)


#For B

print("\nPercent of Ad Clicks per day for group B:\n")

b_clicks_pivot['percent_clicked'] = b_clicks_pivot[True] / b_clicks_pivot[True] + b_clicks_pivot[False] 

print(b_clicks_pivot)



Percent of Ad Clicks per day for group A:

is_click            day  False  True  percent_clicked
0            1 - Monday     70    43             71.0
1           2 - Tuesday     76    43             77.0
2         3 - Wednesday     86    38             87.0
3          4 - Thursday     69    47             70.0
4            5 - Friday     77    51             78.0
5          6 - Saturday     73    45             74.0
6            7 - Sunday     66    43             67.0

Percent of Ad Clicks per day for group B:

is_click            day  False  True  percent_clicked
0            1 - Monday     81    32             82.0
1           2 - Tuesday     74    45             75.0
2         3 - Wednesday     89    35             90.0
3          4 - Thursday     87    29             88.0
4            5 - Friday     90    38             91.0
5          6 - Saturday     76    42             77.0
6            7 - Sunday     75    34             76.0


# Task - 11: Do you recommend that your company use Ad A or Ad B?

From our final observation, we can conclude that Ad "B" shows better performance than A, since it has conistent good click-through rate (CTR) percentage over the week, hence it will be the better choice.