## Analysis of the response to a new banner
- People that answer 'yes'
- People that answer 'No'
- People without an answer 

There are defects in the explanation of the experiment:
- It is not described where the banner is placed (Web page, Instagram, Facebook, ?)
- the banner seems an interactive one with two actions, 'yes' or 'no'
- There are two banners, a dummy one (Image?) and an interactive one (animation?)

(*) probably the second by the column description.

In [1]:
# Import the basic libraries
import pandas as pd
import numpy as np
# Import the Graph libraries
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
%matplotlib inline
# Import the Stat libraries
import scipy.stats as stats
import statsmodels.api as sm
from statsmodels.stats.proportion import proportions_ztest
from scipy.stats import chi2_contingency

In [2]:
Ad_df = pd.read_csv('Data/AdSmartABdata.csv')

## Columns Description

* __auction_id:__ the unique id of the online user who has been presented the BIO. In standard terminologies, this is called an impression id. The user may see the BIO questionnaire but choose not to respond. In that case, both the yes and no columns are zero.
* __experiment:__ which group the user belongs to - control or exposed.
    * __control:__ users who have been shown a dummy ad
    * __exposed:__ users who have been shown a creative, an online interactive ad, with the SmartAd brand.
* __date:__ the date in YYYY-MM-DD format
* __hour:__ the hour of the day in HH format.
* __device_make:__ the name of the type of device the user has e.g. Samsung
* __platform_os:__ the id of the OS the user has.
* __browser:__ the name of the browser the user uses to see the BIO questionnaire.
* __yes:__ 1 if the user chooses the “Yes” radio button for the BIO questionnaire.
* __no:__ 1 if the user chooses the “No” radio button for the BIO questionnaire.

In [3]:
Ad_df

Unnamed: 0,auction_id,experiment,date,hour,device_make,platform_os,browser,yes,no
0,0008ef63-77a7-448b-bd1e-075f42c55e39,exposed,2020-07-10,8,Generic Smartphone,6,Chrome Mobile,0,0
1,000eabc5-17ce-4137-8efe-44734d914446,exposed,2020-07-07,10,Generic Smartphone,6,Chrome Mobile,0,0
2,0016d14a-ae18-4a02-a204-6ba53b52f2ed,exposed,2020-07-05,2,E5823,6,Chrome Mobile WebView,0,1
3,00187412-2932-4542-a8ef-3633901c98d9,control,2020-07-03,15,Samsung SM-A705FN,6,Facebook,0,0
4,001a7785-d3fe-4e11-a344-c8735acacc2c,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0
...,...,...,...,...,...,...,...,...,...
8072,ffea24ec-cec1-43fb-b1d1-8f93828c2be2,exposed,2020-07-05,7,Generic Smartphone,6,Chrome Mobile,0,0
8073,ffea3210-2c3e-426f-a77d-0aa72e73b20f,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0
8074,ffeaa0f1-1d72-4ba9-afb4-314b3b00a7c7,control,2020-07-04,9,Generic Smartphone,6,Chrome Mobile,0,0
8075,ffeeed62-3f7c-4a6e-8ba7-95d303d40969,exposed,2020-07-05,15,Samsung SM-A515F,6,Samsung Internet,0,0


In [4]:
Ad_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8077 entries, 0 to 8076
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   auction_id   8077 non-null   object
 1   experiment   8077 non-null   object
 2   date         8077 non-null   object
 3   hour         8077 non-null   int64 
 4   device_make  8077 non-null   object
 5   platform_os  8077 non-null   int64 
 6   browser      8077 non-null   object
 7   yes          8077 non-null   int64 
 8   no           8077 non-null   int64 
dtypes: int64(4), object(5)
memory usage: 568.0+ KB


In [5]:
Ad_df.isnull().sum()

auction_id     0
experiment     0
date           0
hour           0
device_make    0
platform_os    0
browser        0
yes            0
no             0
dtype: int64

In [6]:
Ad_df['yes'].value_counts()

yes
0    7505
1     572
Name: count, dtype: int64

In [7]:
Ad_df['no'].value_counts()

no
0    7406
1     671
Name: count, dtype: int64

In [8]:
Ad_df.duplicated().sum()

np.int64(0)

In [28]:
counts = Ad_df['device_make'].value_counts()
counts

device_make
Generic Smartphone     4743
iPhone                  433
Samsung SM-G960F        203
Samsung SM-G973F        154
Samsung SM-G950F        148
                       ... 
D5803                     1
Samsung SM-G6100          1
HTC M10h                  1
Samsung SM-G925I          1
XiaoMi Redmi Note 5       1
Name: count, Length: 269, dtype: int64

In [10]:
Ad_df['platform_os'].value_counts()

platform_os
6    7648
5     428
7       1
Name: count, dtype: int64

In [11]:
Ad_df['browser'].value_counts()

browser
Chrome Mobile                 4554
Chrome Mobile WebView         1489
Samsung Internet               824
Facebook                       764
Mobile Safari                  337
Chrome Mobile iOS               51
Mobile Safari UI/WKWebView      44
Chrome                           3
Pinterest                        3
Opera Mobile                     3
Opera Mini                       1
Edge Mobile                      1
Android                          1
Firefox Mobile                   1
Puffin                           1
Name: count, dtype: int64

### Data exploration
- There are no NaN in the data.
- The device's OS doesn't show any information.
- The banner was presented to people using their smartphone by the browsers used.
- The low number of iPhones indicates a global show of the banners (In Europe and the USA, there is a higher percentage of that brand)
- Fault in the definition of the Hour, if it's of the server or the terminal. 

The data frame could be used directly for the study.

In [12]:
Ad_df['auction_id'].value_counts()

auction_id
0008ef63-77a7-448b-bd1e-075f42c55e39    1
aa14b324-5c46-4b3a-8e75-18d78968495b    1
aa84454c-a749-4c98-bf9f-1f99c04416af    1
aa6ecb40-6a48-4c06-a611-4c9aa9023ea8    1
aa6c6cda-e498-4e8f-b886-1d969bd376ea    1
                                       ..
56c87344-e876-41a3-9011-feb8f7e58cd5    1
56bf959a-642f-4814-bf08-55d634554d5a    1
56bd072c-a748-4355-b2d1-258d82d401b0    1
56bb25c7-f778-4690-90be-034b1982fe03    1
fffbb9ff-568a-41a5-a0c3-6866592f80d8    1
Name: count, Length: 8077, dtype: int64

In [14]:
# seems that there are no two devices with the same identification, check it
len(Ad_df) == len(Ad_df['auction_id'])

True

In [15]:
# observation of platform_os
Ad_df['platform_os'].value_counts()

platform_os
6    7648
5     428
7       1
Name: count, dtype: int64

Only 3 operating systems. 

In [22]:
len(Ad_df[Ad_df['platform_os']== 5])

428

In [23]:
Ad_df['device_make'].unique()

array(['Generic Smartphone', 'E5823', 'Samsung SM-A705FN',
       'Samsung SM-G960F', 'Samsung SM-G973F', 'iPhone',
       'Samsung SM-G935F', 'HTC One', 'LG-$2', 'Samsung SM-A202F',
       'XT1032', 'COL-L29', 'Samsung SM-N960U1', 'Samsung SM-A715F',
       'Samsung SM-G930F', 'I3312', 'Samsung SM-G950F', 'FIG-LX1',
       'Samsung SM-G920F', 'MRD-LX1', 'Samsung SM-N950F', 'Moto $2',
       'Samsung SM-G970F', 'Samsung GT-I9505', 'Samsung SM-G981B',
       'Pixel 3a', 'Samsung SM-J600FN', 'Samsung SM-A105FN',
       'OnePlus ONEPLUS A3003', 'POT-LX1', 'Samsung SM-G975F',
       'Samsung SM-J330FN', 'Samsung SM-G770F', 'H3311', 'MAR-LX1A',
       'HTC One $2', 'Samsung SM-G965F', 'ELE-L09', 'Samsung SM-J415FN',
       'Samsung SM-G900F', 'Lenovo A1010a20', 'CLT-L09', 'HTC Desire $2',
       'Samsung SM-G980F', 'Samsung SM-G955F', 'Samsung SM-N960F',
       'Nexus 5', 'Samsung SM-J260F', 'HTC U11', 'Samsung SM-A405FN',
       'Samsung SM-A600FN', 'ANE-LX1', 'VOG-L09', 'Samsung SM-G986B'

I will check only the models with more than 40 terminals. 

In [33]:
reduced_df = Ad_df[Ad_df['device_make'].map(counts) > 39]

In [34]:
reduced_df['device_make'].value_counts()

device_make
Generic Smartphone     4743
iPhone                  433
Samsung SM-G960F        203
Samsung SM-G973F        154
Samsung SM-G950F        148
Samsung SM-G930F        100
Samsung SM-G975F         97
Samsung SM-A202F         88
Samsung SM-A405FN        87
Samsung SM-J330FN        69
Samsung SM-A105FN        66
Samsung SM-G965F         66
Nokia$2$3                64
Samsung SM-G935F         63
Nokia undefined$2$3      60
Samsung SM-G970F         58
Samsung SM-A705FN        56
Samsung SM-A505FN        53
Samsung SM-A520F         51
Samsung SM-G920F         48
LG-$2                    43
POT-LX1                  40
Name: count, dtype: int64

In [37]:
len(Ad_df[Ad_df['device_make']== 'iPhone'])

433

In [38]:
len(Ad_df[Ad_df['platform_os']== 5])== len(Ad_df[Ad_df['device_make']== 'iPhone'])

False

So, the number of iPhones and the OS are not the same, which is an inconsistency in the data. 

## Study 

Different parts of the study will be included in other notebooks.
- 'auction_id' provides no information, except unique identification.
- 'platform_os' introduces errors on the data.    

In [39]:
# I will check if the OS on the iPhones.
iPhone_df = Ad_df[Ad_df['device_make']== 'iPhone']

In [41]:
# if some have a diferent OS
iPhone_df[iPhone_df['platform_os']!= 5]

Unnamed: 0,auction_id,experiment,date,hour,device_make,platform_os,browser,yes,no
794,19b0fc3d-a74b-48be-be3a-dd9aa39f5aab,control,2020-07-03,19,iPhone,6,Mobile Safari,0,1
3407,6dec4a8a-85eb-4252-b86a-31d9dd99dce0,exposed,2020-07-03,6,iPhone,6,Mobile Safari UI/WKWebView,0,0
3907,7dc79e4c-97d7-411a-bac9-aff78ba6af15,control,2020-07-06,22,iPhone,6,Mobile Safari,0,0
5400,aac6fd88-bc1b-40f2-b483-cc1d1479f6c1,control,2020-07-09,6,iPhone,6,Mobile Safari,0,0
7628,f17a54b0-8a9a-4a02-b816-085301c8ad98,exposed,2020-07-07,16,iPhone,6,Mobile Safari UI/WKWebView,0,0


In [42]:
len(Ad_df[Ad_df['platform_os']== 5])- len(Ad_df[Ad_df['device_make']== 'iPhone'])

-5

There is a system to introduce OS Android to an iPhone, that solves the inconsistency. 