# EXPLORATORY DATA ANALYSIS IN PYTHON
## A/B Testing for ShoeFly.com
Our favorite online shoe store, ShoeFly.com is performing an A/B Test. They have two different versions of an ad, which they have placed in emails, as well as in banner ads on Facebook, Twitter, and Google. They want to know how the two ads are performing on each of the different platforms on each day of the week. Help them analyze the data using aggregate measures.

If you get stuck during this project or would like to see an experienced developer work through it, click “Get Unstuck“ to see a project walkthrough video.

## Tasks


### Analyzing Ad Sources
##### 1. Examine the first few rows of ad_clicks.


`Hint` <br>
`Try pasting the following code:`

In [None]:
print(ad_clicks.head())

##### 2. Your manager wants to know which ad platform is getting you the most views.

How many views (i.e., rows of the table) came from each utm_source?


`Hint` <br>
`Try using the following code:`

In [None]:
ad_clicks.groupby('utm_source')\
    .user_id.count()\
    .reset_index()

##### 3. If the column ad_click_timestamp is not null, then someone actually clicked on the ad that was displayed.

Create a new column called is_click, which is True if ad_click_timestamp is not null and False otherwise.


`Hint` <br>
`Try using the following code:`

In [None]:
ad_clicks['is_click'] = ~ad_clicks\
   .ad_click_timestamp.isnull()

`The ~ is a NOT operator, and isnull() tests whether or not the value of ad_click_timestamp is null.`

##### 4. We want to know the percent of people who clicked on ads from each utm_source.

Start by grouping by utm_source and is_click and counting the number of user_id‘s in each of those groups. Save your answer to the variable clicks_by_source.


`Hint` <br>
`Try using the following code:`

In [None]:
clicks_by_source = ad_clicks\
   .groupby(['utm_source',
             'is_click'])\
   .user_id.count()\
   .reset_index()

##### 5. Now let’s pivot the data so that the columns are is_click (either True or False), the index is utm_source, and the values are user_id.

Save your results to the variable clicks_pivot.


`Hint` <br>
`Try using the following code:`

In [None]:
clicks_pivot = clicks_by_source\
   .pivot(index='utm_source',
          columns='is_click',
          values='user_id')\
   .reset_index()

##### 6. Create a new column in clicks_pivot called percent_clicked which is equal to the percent of users who clicked on the ad from each utm_source.

Was there a difference in click rates for each source?


`Hint` <br>
`Try the following code:`

In [None]:
clicks_pivot['percent_clicked'] = \
   clicks_pivot[True] / \
   (clicks_pivot[True] + 
    clicks_pivot[False])

`clicks_pivot[True] is the number of people who clicked (because is_click was True for those users)`

`clicks_pivot[False] is the number of people who did not click (because is_click was False for those users)`

`So, the percent of people who clicked would be (Total Who Clicked) / (Total Who Clicked + Total Who Did Not Click)`

## Analyzing an A/B Test
##### 7. The column experimental_group tells us whether the user was shown Ad A or Ad B.

Were approximately the same number of people shown both ads?


`Hint` <br>
`We can group by experimental_group and count the number of users.`

##### 8. Using the column is_click that we defined earlier, check to see if a greater percentage of users clicked on Ad A or Ad B.


`Hint` <br>
`Group by both experimental_group and is_click and count the number of user_id‘s.`

`You might want to use a pivot table like we did for the utm_source exercises.`

##### 9. The Product Manager for the A/B test thinks that the clicks might have changed by day of the week.

Start by creating two DataFrames: a_clicks and b_clicks, which contain only the results for A group and B group, respectively.


`Hint` <br>
`To create a_clicks:`

In [None]:
a_clicks = ad_clicks[
   ad_clicks.experimental_group
   == 'A']

##### 10. For each group (a_clicks and b_clicks), calculate the percent of users who clicked on the ad by day.


`Hint` <br>
`First, group by is_click and day. Next, pivot the data so that the columns are based on is_click. Finally, calculate the percent of people who clicked on the ad.`

##### 11. Compare the results for A and B. What happened over the course of the week?

Do you recommend that your company use Ad A or Ad B? A

In [None]:
import codecademylib3
import pandas as pd

ad_clicks = pd.read_csv('ad_clicks.csv')

#task 1
print(ad_clicks.head())

#task 2
most_views_by_platform = ad_clicks\
  .groupby('utm_source')\
  .user_id.count()\
  .reset_index()

print(most_views_by_platform)

#task 3
# ~ will return False instead of True
ad_clicks['is_click'] = ~ad_clicks\
.ad_click_timestamp.isnull()
#print(ad_clicks.head())

#task 4
clicks_by_source_percentage = ad_clicks\
  .groupby(['utm_source',
            'is_click'])\
  .user_id.count()\
  .reset_index()

print(clicks_by_source_percentage)

#task 5
clicks_pivot = clicks_by_source_percentage\
  .pivot(index='utm_source',
         columns='is_click',
         values='user_id')\
  .reset_index()

print(clicks_pivot)

#task 6
clicks_pivot['percent_clicked'] = \
clicks_pivot[True] / \
(clicks_pivot[True] + clicks_pivot[False])

#task 7
count_of_each_ad_shown = ad_clicks\
.groupby('experimental_group')\
.user_id\
.count()\
.reset_index()
print(count_of_each_ad_shown)

#Task 8
print(ad_clicks\
.groupby(['experimental_group', 'is_click'])\
.user_id\
.count()\
.reset_index()\
.pivot(
  index = 'experimental_group',
  columns = 'is_click',
  values = 'user_id'
  )\
.reset_index()
)

#Task 9
a_clicks = ad_clicks[ad_clicks.experimental_group == 'A']
b_clicks = ad_clicks[ad_clicks.experimental_group == 'B']

#task 10
a_clicks_pivot = a_clicks\
.groupby(['is_click', 'day']).user_id\
.count()\
.reset_index()\
.pivot(
  index = 'day',
  columns = 'is_click',
  values = 'user_id'
)\
.reset_index()

a_clicks_pivot['percent_clicked'] = a_clicks_pivot[True] / (a_clicks_pivot [True] + a_clicks_pivot[False])

print(a_clicks_pivot)

b_clicks_pivot = b_clicks\
.groupby(['is_click', 'day']).user_id\
.count()\
.reset_index()\
.pivot(
  index = 'day',
  columns = 'is_click',
  values = 'user_id'
)\
.reset_index()

b_clicks_pivot['percent_clicked'] = b_clicks_pivot[True] / (b_clicks_pivot [True] + b_clicks_pivot[False])

print(b_clicks_pivot)