Phi Coefficient Test

The Phi coefficient (ϕ) is a statistical measure that quantifies the association between two binary variables. It is particularly useful for analyzing 2×2 contingency tables and is mathematically equivalent to Pearson’s correlation coefficient applied to binary data.

For example, given a 2×2 contingency table:

|             | Y = 1 | Y = 0 |
|-------------|-------|-------|
| X = 1       |   a   |   b   |
| X = 0       |   c   |   d   |


The Phi coefficient is calcualted using the equation below.

​![Figure: Psi Equation](Harroun_Psi-equation.png)

ϕ = +1 implies a perfect positive association

ϕ = 0 implies no association

ϕ = -1 implies a perfect negative association

This metric is symmetric and appropriate when the marginal distributions of the binary variables are similar.

![Figure 1: Psi Example Figure](Harroun_Psi-Value-Figure.png)

Figure 1. A visual representation of the 2×2 contingency table.
Source: Putra et al. (2019) 

The Phi coefficient is used when analyzing the association between two binary variables, particularly when outcomes can be organized into a 2×2 contingency table. This makes it ideal for evaluating classification accuracy in binary prediction systems. For example, comparing model forecasts to actual observations.

A compelling example is provided by Putra et al. (2019) in their study "An Evaluation Graph of Hourly Rainfall Estimation in Malang". The researchers used a 2×2 contingency table (shown in Figure 1) to assess the agreement between satellite-based rainfall estimations and ground-truth observations. Each cell in the table represented counts of outcomes:

a = correct rain prediction (true positive),

b = predicted rain but no observed rain (false alarm),

c = observed rain but no prediction (miss),

d = correct no-rain prediction (true negative).

This layout allowed them to compute false alarm rates and miss rates, but these same values also form the basis for computing the Phi coefficient. By applying the Phi coefficient, one can quantify the overall strength of agreement between predicted and observed outcomes, capturing both types of classification error in a single, interpretable metric.

Thus, the Phi coefficient is especially valuable in fields like meteorology, medicine, and machine learning — any time you're comparing binary outcomes and want a concise statistic to summarize the degree of association.

References:

Putra, R. M., Kurniawan, A., Rangga, I. A., Ryan, M., Endarwin, & Luthfi, A. (2019). An Evaluation Graph of Hourly Rainfall Estimation in Malang. IOP Conference Series: Earth and Environmental Science, 303(1), 012031. https://doi.org/10.1088/1755-1315/303/1/012031Links 

Wilks, D. S. (2006). Statistical Methods in the Atmospheric Sciences (2nd ed.). Elsevier Academic Press.

OpenAI. (2024). ChatGPT (April 2024 version) [Large language model]. https://chat.openai.comLinks 

In [1]:

from BIOM480Tests import phicoeff



# Simulated 2x2 binary classification outcome table
table = [[25, 5],   # [True Positives, False Positives]
         [3, 17]]   # [False Negatives, True Negatives]

phi, p = phicoeff(table)
print(f"Phi Coefficient: {phi:.3f}")
print(f"P-value: {p:.4f}")

Phi Coefficient: 0.674
P-value: 0.0000
