## HYPOTHESIS TESTING WITH SCIPY










**Familiar, a promising startup in the new market of blood transfusion! However, it has fallen into some tough times lately, so this project focuses on finding meaningful insights about their product in order to develop their marketing strategy using hypothesis testing.**

In [8]:
from scipy.stats import ttest_1samp
from scipy.stats import ttest_ind
from scipy.stats import chi2_contingency

**There are two transfusion packages available for customers; standard vein pack and premium artery pack. First, It would be a marketing goldmine if we can prove that the vein pack extends the customer's lifespan.**

In [10]:
vein_pack_lifespans = [76.937674313716172, 75.993359130146814, 74.798150123540481, 
                       74.502021471585508, 77.48888897587436, 72.142565731540429,
                       75.993031671911822, 76.341550480952279, 77.484755629998816,
                       76.532101480086695, 76.255089552764176, 77.58398316566651, 
                       77.047370349622938, 72.874751745947108, 77.435045470028442, 
                       77.492341410789194, 78.326720468799522, 73.343702468870674, 
                       79.969157652363464, 74.838005833003251]

Lets find out if the average lifespan of a Vein Pack subscriber is significantly different from the average life expectancy of 71 years.


## 1-Sample T-Test


If the test’s p-value is less than 0.05, print “The Vein Pack Is Proven To Make You Live Longer!”. Otherwise print “The Vein Pack Is Probably Good For You Somehow!”

In [11]:
tstat, pval = ttest_1samp(vein_pack_lifespans, 71)
vein_pack_test = pval

if vein_pack_test < 0.05:
   print('The Vein Pack Is Proven To Make You Live Longer!')
else:
  print('The Vein Pack Is Probably Good For You Somehow!')

The Vein Pack Is Proven To Make You Live Longer!


## Upselling Familiar: Pumping Life Into The Company


**In order to differentiate Familiar’s different product lines, let's compare this lifespan data between different packages. The next step up from the Vein Pack is the Artery Pack.**

In [1]:
artery_pack_lifespans = [76.335370084268348, 76.923082315590619, 75.952441644877794, 
                         74.544983480720305, 76.404504275447195, 73.079248886365761, 
                         77.023544610529925, 74.117420420068797, 77.38650656208344, 
                         73.044765837189928, 74.963118508661665, 73.319543019334859, 
                         75.857401376968625, 76.152653513512547, 73.355102863226705, 
                         73.902212564587884, 73.771211950924751, 68.314898302855781, 
                         74.639757177753282, 78.385477308439789]



**Now we want to show that the subscribers to the Artery Pack experience a significant improvement even beyond what a Vein Pack subscriber’s benefits.**



## 2-Sample T-Test


**If the p-value from our experiment is less than 0.05, the results are significant and we should print out “the Artery Package guarantees even stronger results!”. Otherwise we should print out “the Artery Package is also a great product!”**



In [12]:
package_comparison_results = ttest_ind (artery_pack_lifespans, vein_pack_lifespans)


if package_comparison_results.pvalue < 0.05:
   print('the Artery Package guarantees even stronger results!')
else:
  print('the Artery Package is also a great product!')

the Artery Package is also a great product!


**Well, it shows that the artery pack doesn't guarantee stronger results. hence, Familiar can't claim that there is a significant difference between the life expectancy of the two packages.**

**If users' lifespan isn’t significantly increased by signing up for the Artery Package, maybe the company can make some other claim about the benefits of the package. To that end, companyy sent out a survey collecting the iron levels for customers, and filtered that data into “low”, “normal”, and “high”.**

**They received 200 responses from Vein Package subscribers. 70% of them had low iron levels, 20% had normal, and 10% of them have high iron levels.
They were only able to get 145 responses from our Artery Package subscribers, but only 20% of them had low iron levels. 60% had normal, and 20% have high iron levels.**

In [16]:
iron_contingency_table = [[140, 29],
                          [40, 87],
                          [20, 29]]

Now we need to be able to tell if what seems like a higher number of our Artery Package users is a significant difference from what was reported by Vein Package users.

## Chi-Square Test


**Here’s the big moment: if the iron_pvalue is less than 0.05, print out “The Artery Package Is Proven To Make You Healthier!” otherwise we’ll have to use our other marketing copy: “While We Can’t Say The Artery Package Will Help You, I Bet It’s Nice!”**

In [19]:
chi2, iron_pvalue, dof, expected = chi2_contingency(iron_contingency_table)

if iron_pvalue < 0.05:
  print('The Artery Package Is Proven To Make You Healthier!')
else:
  print("While We Can't Say The Artery Package Will Help You, I Bet It's Nice!")

The Artery Package Is Proven To Make You Healthier!




## Fantastic! we can prove that even though the artery pack won't increase users' lifespans more than the vein pack, it will give them healthier iron levels. Now, With proven benefits to both of the product lines, we can definitely ramp up Familiar's marketing and sales.