# Ad-click Analysis

## Goal

Advertising data table monitors ad clicks across **30** **different** **colors**. Our aim is to discover an ad color that generates significantly more clicks than <span style="color:blue; font-weight:bold">blue</span>.  
We will do so by following these steps:

---
## Plan

1. Load and clean our adverstising data using Pandas.
2. Measure the centrality and dispersion of sampled data 
3. Run a permutation test between blue and the other recorded colors.
4. Check the computed p-values for statistical significance using a properly determined significance level.

## Packages

In [7]:
import pandas as pd
import numpy as np

## Initial Data Check

- **Column 1: Color**  
  Each row in the column corresponds to one of 30 possible text colors.

- **Column 2: Click Count: Day 1**  
  The column tallies the times each colored ad was clicked on Day 1 of an experiment.

- **Column 3: View Count: Day 1**  
  The column tallies the times each ad was viewed on Day 1 of an experiment.  
  According to experiment, all daily views are expected to equal **100**.

- **Remaining Columns**  
  The next **38 columns** contain:
  - Click counts per day, and  
  - View counts per day  
  for the remaining **19 days** of the experiment.


In [3]:
df = pd.read_csv("colored_ad_click_table.csv")
num_rows, num_cols = df.shape
print(f"Table contains {num_rows} rows and {num_cols} cols")
print(df.columns)

Table contains 30 rows and 41 cols
Index(['Color', 'Click Count: Day 1', 'View Count: Day 1',
       'Click Count: Day 2', 'View Count: Day 2', 'Click Count: Day 3',
       'View Count: Day 3', 'Click Count: Day 4', 'View Count: Day 4',
       'Click Count: Day 5', 'View Count: Day 5', 'Click Count: Day 6',
       'View Count: Day 6', 'Click Count: Day 7', 'View Count: Day 7',
       'Click Count: Day 8', 'View Count: Day 8', 'Click Count: Day 9',
       'View Count: Day 9', 'Click Count: Day 10', 'View Count: Day 10',
       'Click Count: Day 11', 'View Count: Day 11', 'Click Count: Day 12',
       'View Count: Day 12', 'Click Count: Day 13', 'View Count: Day 13',
       'Click Count: Day 14', 'View Count: Day 14', 'Click Count: Day 15',
       'View Count: Day 15', 'Click Count: Day 16', 'View Count: Day 16',
       'Click Count: Day 17', 'View Count: Day 17', 'Click Count: Day 18',
       'View Count: Day 18', 'Click Count: Day 19', 'View Count: Day 19',
       'Click Count: Day 20'

In [4]:
print(df.Color.values)


['Pink' 'Gray' 'Sapphire' 'Purple' 'Coral' 'Olive' 'Navy' 'Maroon' 'Teal'
 'Cyan' 'Orange' 'Black' 'Tan' 'Red' 'Blue' 'Brown' 'Turquoise' 'Indigo'
 'Gold' 'Jade' 'Ultramarine' 'Yellow' 'Virdian' 'Violet' 'Green'
 'Aquamarine' 'Magenta' 'Silver' 'Bronze' 'Lime']


In [5]:
selected_cols = ['Color','Click Count: Day 1','View Count: Day 1']
print(df[selected_cols].describe())

       Click Count: Day 1  View Count: Day 1
count           30.000000               30.0
mean            23.533333              100.0
std              7.454382                0.0
min             12.000000              100.0
25%             19.250000              100.0
50%             24.000000              100.0
75%             26.750000              100.0
max             49.000000              100.0


In [10]:
view_cols = [column for column in df.columns if 'View' in column]
assert np.all(df[view_cols].values == 100)

Each color receives daily 100 views. Therefore, all 20 View Count columns are redundant.  
Let's remove them.

In [11]:
df.drop(columns=view_cols, inplace=True)
print(df.columns)

Index(['Color', 'Click Count: Day 1', 'Click Count: Day 2',
       'Click Count: Day 3', 'Click Count: Day 4', 'Click Count: Day 5',
       'Click Count: Day 6', 'Click Count: Day 7', 'Click Count: Day 8',
       'Click Count: Day 9', 'Click Count: Day 10', 'Click Count: Day 11',
       'Click Count: Day 12', 'Click Count: Day 13', 'Click Count: Day 14',
       'Click Count: Day 15', 'Click Count: Day 16', 'Click Count: Day 17',
       'Click Count: Day 18', 'Click Count: Day 19', 'Click Count: Day 20'],
      dtype='object')


Our 20 Click Count columns correspond to the number of clicks per 100 daily views, so we can treat these columns as percentages

In [19]:
df.set_index('Color', inplace=True)
print(df.T.Blue.describe())

count    20.000000
mean     28.350000
std       5.499043
min      18.000000
25%      25.750000
50%      27.500000
75%      30.250000
max      42.000000
Name: Blue, dtype: float64


The daily click percentages for blue range from 18% to 42%. The mean percent of clicks is 28.35%.  
  
How does it compare to the other 29 colors ? Let's find it out.