[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PennNGG/Quantitative-Neuroscience/blob/master/Hypothesis%20Testing/Python/Proportions.ipynb)

# Definitions

Statistical tests on proportions (e.g., is there more of A or B in a sample that consists only of As and Bs?) are commonplace. Below are several kinds of tests of proporotions, including more thorough descriptions of when and how they can be used.


# Getting started with code

Matlab code is found here: *** LINK ***

Python code is included below. First run the code cell just below to make sure all of the required Python modules are loaded, then you can run the other cell(s).

In [5]:
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
from IPython.display import display, clear_output

# Pearson’s Chi-square test


It is often a good idea, once you obtain some data, to infer whether the underlying population conforms to a theoretical distribution. In neuroscience, we can use this test in lots of different ways. But one interesting way is to determine whether the proportion of significant neurons (to be discussed below what this means in an example) is a reliable proportion or just a chance proportion. Here is a [borderline neuroscience example](https://www.tandfonline.com/doi/abs/10.1080/14992027.2016.1227481?journalCode=iija20) of the use of such a test.

**An important note: when you calculate chi-square, make sure you use the number of observations and not the percentage/proportion!**

## Example 1: Face selectivity

Imagine that you are recording in the region of the inferotemporal cortex that responds selectively to faces. You are testing whether these neurons are sensitive to handsome or non-handsome faces. For handsome faces, you present a stimulus that we will  call "Yale" and record neuronal activity from multiple presentations. For non-handsome faces, you present a different stimulus that we will arbitrarily call "Josh" and record neuronal activity multiple presentations. From these recordings, you calculate an index from the neuron's average firing rate that measures the neurons' selectivity for handsomeness: a value of 0 means that it responds equally well to Yale and Josh; a value of +1 means it only responds to handsome Yale; and a value of -1 means it responds only to non-handsome Josh. You calculate this index for all of your neurons (let's say 1000) and find the mean is 0.8  (assume you have enough neurons so that the distribution is ~ normal) and then conduct a t-test to see whether the mean of this distribution  is different from 0. If it is, you can conclude that, on average, inferotemporal neurons respond preferentially to handsome Yale. 

But, what if you wanted to say something about each neuron? That is, if you wanted to identify those neurons that seem particularly sensitive to handsome Yale. What if you could conduct an analysis to determine whether the index of each neuron was significant (p<0.05 that the index was different than 0)? Imagine that you did this and found that 90 of these neurons had significant index values (i.e., the null hypothesis that the mean is 0.5) is rejected and 10 did not. Well, we know that, with an $\alpha$ of 0.5, we should have 5% false positives or 5 neurons that we think are significant but are actually not. Do 10 neurons exceed this false-positive rate or not? 

A chi-square test allows you to ask this question This is a measure of how far a sample distribution deviates from a theoretical distribution. $H_0$: did the sample data come from a population having a 95:5 ratio of significant: non-significant neurons? 

The code below shows how the probability *p* changes as a function of the number of significant neurons and as a function of the proportion of neurons. This test is often referred to as a measure of goodness of fit; though, depending on your data, it can also be a poorness of fit.

In [1]:
# TO DO

## Example 2: Mendelian genetics

The chi-square test can be used in other situations. For example, let's imagine some form of Mendelian genetics experiment where you quantify the proportion of yellow smooth peas, yellow wrinkled peas, green smooth peas, and green wrinkle peas to be 152:39:53:6. Is this proportion different than a theoretical one of 9:3:3:1? Unlike the neuronal example, in which df (Links to an external site.)=2-1, here the df=4-1=3. What is the chi-squared value for this example? What is the *p* value? Would you reject the null hypothesis? 

### Answers

Answers:8.972; p=0.03; yes (if you set your type 1 error rate to 0.05 in advance).

When your proportion data is multi-dimensional, then the chi-squared test is formulated within the idea of a contingency table and the Fisher's exact test. The Fisher's exact test is a better way to estimate your contingency table, especially when the frequencies are small, like in this [example study that looks at mutation rates in familial Alzheimer disease](https://pubmed.ncbi.nlm.nih.gov/12192622/) (see Table 1).

## Example 3: Puppies and kittens

Imagine the following data that looks at the fur color of puppies and kittens:

&nbsp; | black	| brown	| blond	| red
-- | -- | -- | -- | -- 
Puppies | 32 | 43 | 16 | 9
Kittends | 55 | 65 | 65 | 26

We want to test the following $H_0$: fur color is independent of pet in the population sampled.

Chi-squared statistic: $\sum\sum\frac{(f_{ij}-f^{*}_{ij})^2)}{f_{ij}}$, where $f_{ij}$ is the observed proportion and $f^{*}_{ij}$ is the theoretical value based on the null hypothesis. The chi-squared value is 8.987. I will let you figure out how to do this but will give you a hint. If hair color is in fact independent of pet, then 100 (puppies)/300(total pets) of black-furred pets would be puppies and 200/300 would be expected to be kittens. That is, the expected number of black-furred puppies would be 100/300*87 and 200/300*87 black-furred kittens. $df=(r-1)(c-1)$, where $r$ is the number of rows in the table, and $c$ is the number of columns.

# Fisher’s exact test


The Fisher's exact test gives you a closed-form way to calculate *p* from contingency tables for one- or two-tailed tests.

Consider the following contingency table:

&nbsp; | Outcome A | Outcome B
-- | -- | -- 
Group 1 | $n_{A1}$ |	$n_{B1}$
Group 2	| $n_{A2}$	| $n_{B2}$

We want to test for $H_0$: the proportion of outcome A from Group 1 = the proportion of outcome A from Group 2.

Fisher's test is closed form way to get *p*, but it is a bit computationally intensive:

$p=\frac{\frac{(R_1!R_2!C_1!C_2!)}{n!}}{n_{A1}!n_{A2}!n_{B1}!n_{B2}!}$, where $X!=1*2*3 ...* X$, *R*
is the number of row elements, *C* is the number of column elements, and $n_{**}$ are the entries in the table.

**But** we aren't done yet! The *p* we just computed is for the actual, observed values. For a one-tailed test, we need consider all of the other possibilities explicitly. Doing so it outside the scope of this lesson; see [here](https://mathworld.wolfram.com/FishersExactTest.html) for more details.

# Two-sample Z test


The Z-test is used to compare two proportions under two different experimental conditions, with a simple $H_0: 𝑝_1=𝑝_2$. One can imagine the data comes from a 2x2 contingency table but all you are interested in is the overall proportions. 

The test statistic is:

$z=\frac{\hat{p_1}-\hat{p_2}}{\sqrt{pq(\frac{1}{n_1}+\frac{1}{n_2})}}$, where $\hat{p_1}$ and $\hat{p_2}$ are the sample (measured) proportions, $n_1$ and $n_2$ are the sample sizes, and *p* and *q* are the overall proportions of A and B.

## Example

&nbsp; | Condition 1 | Condition 2
-- | -- | --
Outcome A | 18 | 10
Outcome B | 6 | 15

Compute the p-value for $H_0: p_1=p_2$.

In [12]:
A1 = 18
A2 = 10
B1 = 6
B2 = 15

n1 = A1 + B1
n2 = A2 + B2
p1_hat = A1/n1
p2_hat = A2/n2
p = (A1+A2)/(n1+n2)
q = (B1+B2)/(n1+n2)
z = (p1_hat-p2_hat)/np.sqrt(p*q*(1/n1+1/n2))
p = 2*st.norm.cdf(-np.abs(z))
print(f'p={p:.4f}')

p=0.0133


# McNemar’s test (paired data)

This test is analogous to a paired t-test in that the data is paired. It is essentially a modification of our initial chi-squared test.

The test is applied to a 2x2 contingency table, which counts the outcomes of the two tests from *N* subjects:

&nbsp; | Test 2 positive | Test 2 negative | Total
-- | -- | -- | --
Test 1 positive | $n_{++}$ |	$n_{+-}$ | $n_{++} + n_{+-}$
Test 1 negative	| $n_{-+}$	| $n_{--}$ | $n_{-+} + n_{--}$
Total | $n_{++} + n_{-+}$ | $n_{+-} + n_{--}$

The Null hypothesis is that the row and column distributions are the same (i.e., the table has "marginal homogeneity"), which would imply that the tests do not differ in their efficacy. Specifically, $H_0$ is that: $p(n_{++}) + p(n_{+-}) = p(n_{++}) + p(n_{-+})$, and $p(n_{-+}) + p(n_{--}) = p(n_{+-}) + p(n_{--})$, which is true if $p(b) = P(c)$.

The test statistic is:

$\chi^2=\frac{|(n_{+-}-n_{-+}|-1)^2}{n_{+-}+n_{-+}}$

Here is an interesting example that uses this test to [assess the contribution of STN stimulation to ameliorating hallucinations in patients with Parkinsons](https://www.karger.com/Article/Abstract/195719).

## Example

Imagine that we are assessing the degree to which two different probes can elicit a perception of an itch when they are applied to your left arm (probe 1) and then your right arm (probe 2). You get the following data when you survey subjects and ask them whether they perceive "itch":

&nbsp; | Probe 1 causes itch | Probe 1 doesn't cause itch
-- | -- | --
Probe 2 causes itch | 	11 |	6
Probe 2 doesn't cause itch	| 10	| 24

Here, our $H_0$ is the proportion of subjects experiencing itch is the same for both probes. 



In [7]:
df = 1 
n10 = 6
n01 = 10

chi_square = ((np.abs(n10-n01)-1)**2)/(n10+n01)
p = 1 - st.chi2.cdf(chi_square, df)
print(f'H0: proportion of persons experiencing itch is the same with both\
 probes: p={p:.2f}')


0.5625
H0: proportion of persons experiencing itch is the same with both probes: p=0.45


# Additional Resources


# Credits

Copyright 2021 by Joshua I. Gold, University of Pennsylvania