In [0]:
from scipy.stats import chisquare,chi2_contingency
import pandas as pd

### Problem
Dairies would like to know whether the sales of milk are distributed uniformly over a
year so they can plan for milk production and storage. A uniform distribution means
that the frequencies are the same in all categories. In this situation, the producers are
attempting to determine whether the amounts of milk sold are the same for each
month of the year. They ascertain the number of gallons of milk sold by sampling one large supermarket each month during a year, obtaining the following data.

In [0]:
data = {'Jan':1610,'Feb':1585,'Mar':1649,'Apr':1590,'May':1540,'Jun':1397,
        'Jul':1410,'Aug':1350,'Sep':1495,'Oct':1564,'Nov':1602,'Dec':1655}

### Hypothesis Formulation

Null: The monthly figures for milk sales are uniformly distributed.<br>
Alternate: The monthly figures for milk sales are not uniformly distributed<br>

Statistical Test used if Chi Sqiuare goodness of fit

In [0]:
chi,pval = chisquare(list(data.values()))
chi,pval

(74.37583346885673, 1.78545252783034e-11)

In [0]:
if pval < 0.1:
  print('Reject Null')
else:
  print('Fail to Reject Null')

Reject Null


### Problem
Suppose a business researcher wants to determine whether type of gasoline preferred
is independent of a person’s income. She takes a random survey of gasoline purchasers, asking
them one question about gasoline preference and a second question about income. The
respondent is to check whether he or she prefers (1) regular gasoline, (2) premium gasoline,
or (3) extra premium gasoline. The respondent also is to check his or her income
brackets as being (1) less than 30,000, (2) 30,000 to 49,999, (3) 50,000 to $99,999, or
(4) more than 100,000.

### Hypothesis Formulation
Null : Type of gasoline is independent of income <br>
Alternate: Type of gasoline is not independent of income.

As both variables are Catgeoricalm, Chi Square Test of Independence will be used

In [0]:
income = ['Less_30k'] *  3 + ['30_50k']*3 + ['50_90k']*3 + ['100k_Above']*3
gas = ['Regular','Premium','Extra_Premium'] * 4
data= [85,16,6,102,27,13,36,22,15,15,23,25]
df = pd.DataFrame({'Income':income,'Gas':gas,'Count':data})
df

Unnamed: 0,Income,Gas,Count
0,Less_30k,Regular,85
1,Less_30k,Premium,16
2,Less_30k,Extra_Premium,6
3,30_50k,Regular,102
4,30_50k,Premium,27
5,30_50k,Extra_Premium,13
6,50_90k,Regular,36
7,50_90k,Premium,22
8,50_90k,Extra_Premium,15
9,100k_Above,Regular,15


In [0]:
contingency_table = pd.crosstab(df['Income'],df['Gas'],df['Count'],aggfunc='sum')
contingency_table

Gas,Extra_Premium,Premium,Regular
Income,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
100k_Above,25,23,15
30_50k,13,27,102
50_90k,15,22,36
Less_30k,6,16,85


In [0]:
chi, pval, dof, expected = chi2_contingency(contingency_table)

In [0]:
expected

array([[ 9.65454545, 14.4       , 38.94545455],
       [21.76103896, 32.45714286, 87.78181818],
       [11.18701299, 16.68571429, 45.12727273],
       [16.3974026 , 24.45714286, 66.14545455]])

In [0]:
pval

2.899818004765592e-13

In [0]:
if pval < 0.1:
  print('Reject Null')
else:
  print('Fail to Reject Null')

Reject Null


**The business researcher’s decision is to reject the
null hypothesis; that is, type of gasoline preferred is not independent of income.Having established that conclusion, the business researcher can then examine
the outcome to determine which people, by income brackets, tend to purchase which type
of gasoline and use this information in market decisions.**