In [1]:
import pandas as pd
df = pd.read_csv('Leads.csv')

In [2]:
df.head(2)

Unnamed: 0,Prospect ID,Lead Number,Lead Origin,Lead Source,Do Not Email,Do Not Call,Converted,TotalVisits,Total Time Spent on Website,Page Views Per Visit,...,Get updates on DM Content,Lead Profile,City,Asymmetrique Activity Index,Asymmetrique Profile Index,Asymmetrique Activity Score,Asymmetrique Profile Score,I agree to pay the amount through cheque,A free copy of Mastering The Interview,Last Notable Activity
0,7927b2df-8bba-4d29-b9a2-b6e0beafe620,660737,API,Olark Chat,No,No,0,0.0,0,0.0,...,No,Select,Select,02.Medium,02.Medium,15.0,15.0,No,No,Modified
1,2a272436-5132-4136-86fa-dcc88c88f482,660728,API,Organic Search,No,No,0,5.0,674,2.5,...,No,Select,Select,02.Medium,02.Medium,15.0,15.0,No,No,Email Opened


In [8]:
df[df['Lead Source']=='Organic Search']['Converted'].sum()

436

In [9]:
df[df['Lead Source']=='Organic Search']['TotalVisits'].sum()

6832.0

In [12]:
round(436/6832,2)

0.06

```
Historically there is a 6% chance that a lead coming from Organic Search will convert. In a recent random sample of 120 leads from organic search, the performance marketing team discovered that 12 leads were converted. 

Do we have a reason to believe that the conversion rate has increased?

```

```
H0: Conversion Rate = 6%
Ha: Conversion Rate > 6%

pval = Pr(X>=12) = 1 - Pr(X<=11)
```

In [6]:
from scipy.stats import binom

In [13]:
1 - binom.cdf(11,120,0.06)

0.05703482551930916

#### Find out the historical conversion rate for Lead Origin as API

In [15]:
df['Lead API'] = df['Lead Origin'].map(lambda x: 1 if x=='API' else 0)

In [16]:
df[df['Lead API']==1]['Converted'].sum()

1115

In [18]:
df[df['Lead API']==1]['TotalVisits'].sum()

7912.0

In [19]:
round(1115/7912,2)

0.14

```
Now imagine that out of 200 visits from API source there are 36 conversions, will that mean that the conversion rate from API has increased?
```

```
H0: Rate = 14%
Ha: Rate > 14%

pval = Pr(X>=36) = 1-Pr(X<=35)


In [20]:
1 - binom.cdf(35,200,0.14)

0.06676103717002435

### Number of pages visited

In [24]:
df['Page Views Per Visit'].mean().round(2)

2.36

```
Sample of 40 leads, the avg views per visit = 2.42, std deviation = 0.4

Has the number of pages visited increased?

```

```
H0: There no change in number of pages visited
Ha: There is a change in the number of pages visited

H0: Mean=2.36
Ha: Mean>2.36

SE/Standard Deviation of Sampling distribution = sample_std/sqrt(sample_size)
Pvalue: Pr(X>=2.42) = 1-Pr(X<=2.42)
```

In [1]:
import math
from scipy.stats import norm

In [2]:
pop_mean = 2.36
sample_mean = 2.42
sample_std = 0.4
sample_size = 40
SE = sample_std/math.sqrt(sample_size)

In [3]:
1-norm(pop_mean,SE).cdf(sample_mean)

0.1713908555739555

```
Conclusion
P-value is high so I can't reject the H0
```