<a href="https://colab.research.google.com/github/clam004/notebook_tutorials/blob/main/McNemars.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import matplotlib.pyplot as plt

import numpy as np

from statsmodels.stats.contingency_tables import mcnemar

import scipy.stats as stats

#jupyter stuff
%load_ext autoreload
%autoreload 2
%matplotlib inline

  import pandas.util.testing as tm


In [None]:
# define contingency table

''' 
      Before
      ----------
After |  a  |  b |
      ---- ---
      |  c  |  d |

'''



'''
# Example A: Does an intervention or exposure, change the risk for an outcome, where the outcome can occur more than once 
#  100 people total are in this study. (Before and After, Same Person, can be different with respect to the outcome) 
#  all 100 people were exposed to the intervention (or exposure)
#  each person is their own control so they are evaluated, aka tested, twice, once before and once after intervention
#  30 people had the outcome before intervention and also had the outcome after intervention (a)
#  18 people did not have the outcome before intervention and continued not to have the outcome after intervention (d)
#  12 people had the outcome before intervention, but after the intervention did not get the outcome (c) 
#  40 people did not have the outcome before intervention, but after intervention then got the outcome (b)

data = [[30, 40],
        [12, 18]]


By convention many researchers put the After on top 
but as you can try for yourself, this makes no difference
to the results, all you do is transpose, aka flip along the diagonal

          After 
        ----------
Before |  a  |  b |
       ---- ---
       |  c  |  d  |

data = [[30, 12],
        [40, 18]]
'''

# Example B - Does smoking relate to cancer?  (Propensity Matched Pairs)

data = [[1000, 40],
        [200, 60]]

# 2600 people total are in this study
# 1300 cancer patients and 1300 matched controls are studied
# we select based on risk criteria other than smoking in order to see if smoking is correlated with cancer
# each cell represents the number of pairs: 1-person with cancer and 1-person without cancer, is a pair
# Description of each cell: 
# 1000 pairs we selected ended up having no smoking history in both the cancer patient and the matched non-cancer patient
# 40 is the number of case-control pairs where control was smoker and cancer patient was not
# 200 is the number of case-control pairs where control was non-smoker and cancer patient was smoker
# 60 is the number of case-control pairs where control was smoker and cancer patient was smoker
# McNemar's chi-squared = 105.34, df = 1, p-value < 2.2e-16

In [None]:
'''
mcnemar(table, exact=True, correction=True) 

where:

table: A square contingency table
exact: If exact is true, then the exact binomial distribution will be used. If exact is false, then the Chi-Square distribution will be used, 
       use exact=True if either b or c is small (b + c < 25) to use a exact binomial test instead of chi-squared distribution
correction: If true, a continuity correction is used. As a rule of thumb, 
            this correction is typically applied when any of the cell counts in the table are less than 5.
'''

#McNemar's Test with continuity correction

result = mcnemar(table=data, exact=False, correction=True)  
print('statistic', result.statistic, 'pvalue', result.pvalue)

#McNemar's Test with no continuity correction 

result = mcnemar(table=data, exact=False, correction=False)
print('statistic', result.statistic, 'pvalue', result.pvalue)

statistic 105.3375 pvalue 1.0300818425382127e-24
statistic 106.66666666666667 pvalue 5.26711870469161e-25


In [None]:
# victors code

a = np.array(data)

x2_statistic_asymp = (a[0, 1] - a[1, 0]) ** 2 / (a[0, 1] + a[1, 0])

p = stats.chi2.sf(x2_statistic_asymp, 1)

print(p) # this p val is consistent with the exact=False, correction=False version above

5.26711870469161e-25


In [None]:
# https://www.statology.org/mcnemars-test-python/
# https://en.wikipedia.org/wiki/McNemar%27s_test
# https://aaronschlegel.me/mcnemars-test-paired-data-python.html
# https://stats.stackexchange.com/questions/147559/fisher-exact-test-on-paired-data