In [1]:
%autosave 0

Autosave disabled


The chi^2 test will determine if membership to one group affects membership to another.

In [None]:
import numpy as np
import pandas as pd

from pydataset import data
from scipy import stats

Let's read in the mpg dataset from pydataset!

In [None]:
mpg = data('mpg')
mpg.head()

Let's do some feature engineering.

Our goal is to compare above/below average mpg to automatic/manual transmission.

In [None]:
mpg['mean_mpg'] = (mpg.cty + mpg.hwy) / 2
mpg.head()

In [None]:
mpg['mpg_cat'] = pd.qcut(mpg.mean_mpg, 2, labels = ['low_mpg', 'high-mpg']) #instead of continous values, we now hav either high or low mpg.
mpg.head()

In [None]:
mpg.mpg_cat.value_counts() #pd qcut making 2 bins ; we hvae categorized our cars.

In [None]:
mpg['trans_bin'] = np.where(mpg.trans.str.startswith('a'), 'auto', 'manual')
mpg.head()

Now that we have our categorical features, we can prepare to run a chi^2 contingency test!

First, we need to define our null and alternative hypotheses.

- **Null hypothesis (H0)= transmission type does NOT affect mpg**
- **Alternative hypothesis (Ha) = transmission type DOES affect mpg**

We need to generate a crosstab of our two categorical features.

Once we have the crosstab, we can run the test!

In [None]:
#this is the correct type of object to feed into the chi-squared test
ct = pd.crosstab(mpg.mpg_cat, mpg.trans_bin)

In [None]:
chi, p, degf, exp = stats.chi2_contingency(ct)

In [None]:
degf

In [None]:
exp

In [None]:
ct

The test will return four values:
- The test statistic (chi^2)
- The p-value
- The degrees of freedom (sample size minus 1)
- The table of expected values, if the two features were independent of each other

In [None]:
ct

Let's evaluate our result using a 95% confidence interval!

In [None]:
a = 0.05

In [None]:
if p < a:
    print('We reject the null hypothesis. There appears to be a relationship.')
else: print('We failed to reject the null hypothesis.')

# functions to return many values

In [None]:
def the_returner(x):
    return x * 2, x ** 2, x ** 3, x * x

In [None]:
the_returner(5)

In [None]:
times_two, squared, cubes, times_itself = the_returner(5)

In [None]:
squared