In [0]:
from scipy.stats import mannwhitneyu,wilcoxon,kruskal,friedmanchisquare
import pandas as pd
import numpy as np
from pingouin import friedman

### Problem
Determine whether there is a
difference in the average income of families who view PBS television and families who do
not view PBS television. Suppose a sample of 14 families that have identified themselves
as PBS television viewers and a sample of 13 families that have identified themselves as
non-PBS television viewers are selected randomly

### Solution
Population Distribution not known and two independent samples comparision is needed Hence Maan Whitney U test will be used

### Hypothesis Formulation
Null:The incomes of PBS and non-PBS viewers are identical.<br>
Alternate:The incomes of PBS and non-PBS viewers are not identical

In [0]:
pbs = [24500,39400,36800,43000,57960,32000,61000,34000,43500,55000,39000,62500,61400,53000]
non_pbs =[41000,32500,33000,21000,40500,32400,16000,21500,39500,27600,43500,51900,27800]

In [0]:
stat, p_val = mannwhitneyu(pbs,non_pbs,alternative='two-sided')
stat, p_val

(140.5, 0.017399993354093264)

In [0]:
p_val < 0.05

True

**That is, there is a difference between the income of a PBS viewer
and that of a non-PBS viewer.**

### Problem
Suppose a company implemented a quality-control program
and has been operating under it for 2 years. The company’s president wants to
determine whether worker productivity significantly increased since installation of
the program. Company records contain the figures for items produced per worker
during a sample of production runs 2 years ago. Productivity figures on the same
workers are gathered now and compared to the previous figures. The following data
represent items produced per hour.

### Solution
Population Distribution not known and two related samples comparision hence 	Wilcoxon matched-pairs signed rank test

### Hypothesis Formulation
Null:There is no difference in productivity.<br>
Alternate:There is a difference in productivity

In [0]:
# Before & After data for 20 worker in same order
before = [5,4,9,6,3,8,7,10,3,7,2,5,4,5,8,7,9,5,4,3]
after = [11,9,9,8,5,7,9,9,7,9,6,10,9,7,9,6,10,8,5,6]

In [0]:
stats, pval = wilcoxon(before,after)
stats, pval

(10.5, 0.0006224110091149373)

In [0]:
pval < 0.05

True

**The productivity is significantly greater after the implementation
of quality control at this company.**

### Problem
Suppose a researcher wants to determine whether the number of physicians in an
office produces significant differences in the number of office patients seen by each physician
per day. She takes a random sample of physicians from practices in which (1) there are
only two partners, (2) there are three or more partners, or (3) the office is a health maintenance
organization (HMO).

### Solution
Population Distribution not known and Three independent samples comparision hence Kruskal-Wallis test

### Hypothesis Formulation
Null:There is no difference in the number of patients seen by each physician.<br>
Alternate:There is alteast one difference in the number of patients seen by each physician

In [0]:
two_partner = [13,15,20,18,23]
three_plus = [24,16,19,22,25,14,17]
hmo = [26,22,31,27,28,33]

In [0]:
stats,pval = kruskal(two_partner,three_plus,hmo)
stats,pval

(9.570988292011025, 0.008349996610851422)

In [0]:
pval < 0.05

True

**The number of patients seen in the office by a physician is not the same
in these three sizes of offices**

### Problem
As an example, suppose a manufacturing company assembles microcircuits that contain
a plastic housing.Managers are concerned about an unacceptably high number of the
products that sustained housing damage during shipment. The housing component is
made by four different suppliers. Managers have decided to conduct a study of the plastic
housing by randomly selecting five housings made by each of the four suppliers. To determine
whether a supplier is consistent during the production week, one housing is selected
for each day of the week. That is, for each supplier, a housing made on Monday is selected,
one made on Tuesday is selected, and so on.The quality control team wants to determine whether there is any significant
difference in the tensile strength of the plastic housing by supplier. The data are given here
(in pounds per inch).

### Solution
Population Distribution not known and we have to compare treatment level data with blocking variable...treatment being supplier and blocking being day of week, hence Friedman Test will be used

### Hypothesis Formulation
Null:The supplier populations are equal..<br>
Alternate:At least one supplier population yields larger values than at least one other

In [0]:
data = [[62,63,57,61],
       [63,61,59,65],
       [61,62,56,63],
       [62,60,57,64],
       [64,63,58,66]]

df = pd.DataFrame(data,columns=['Supplier_1','Supplier_2','Supplier_3','Supplier_4'],index=['Mon','Tue','Wed','Thur','Fri'])
df

Unnamed: 0,Supplier_1,Supplier_2,Supplier_3,Supplier_4
Mon,62,63,57,61
Tue,63,61,59,65
Wed,61,62,56,63
Thur,62,60,57,64
Fri,64,63,58,66


In [0]:
df.T

Unnamed: 0,Mon,Tue,Wed,Thur,Fri
Supplier_1,62,63,61,62,64
Supplier_2,63,61,62,60,63
Supplier_3,57,59,56,57,58
Supplier_4,61,65,63,64,66


In [0]:
matrix = df.T.values
matrix

array([[62, 63, 61, 62, 64],
       [63, 61, 62, 60, 63],
       [57, 59, 56, 57, 58],
       [61, 65, 63, 64, 66]])

In [0]:
stat, pval = friedmanchisquare(*[matrix[x, :] for x in np.arange(matrix.shape[0])])
stat, pval

(10.679999999999993, 0.0135882729582177)

In [0]:
pval < 0.05

True

In [0]:
supplier = ['Supplier_1'] * 5 + ['Supplier_2'] * 5 + ['Supplier_3'] * 5 + ['Supplier_4'] * 5
Dof = ['Mon','Tue','Wed','Thur','Fri'] * 4
data = matrix.flatten()
new_df = pd.DataFrame({'Suppliers':supplier,'DOF':Dof,'Tensile_strength':data})
new_df

Unnamed: 0,Suppliers,DOF,Tensile_strength
0,Supplier_1,Mon,62
1,Supplier_1,Tue,63
2,Supplier_1,Wed,61
3,Supplier_1,Thur,62
4,Supplier_1,Fri,64
5,Supplier_2,Mon,63
6,Supplier_2,Tue,61
7,Supplier_2,Wed,62
8,Supplier_2,Thur,60
9,Supplier_2,Fri,63


In [0]:
friedman(data=new_df, dv='Tensile_strength', within='Suppliers',subject='DOF')

Unnamed: 0,Source,ddof1,Q,p-unc
Friedman,Suppliers,3,10.68,0.013588


**The decision is to reject the null hypothesis.Statistically, there is a significant difference in the tensile strength of housings
made by different suppliers**