**Hypothesis Testing**

The dataset under examination consists of code review comments, each tagged with binary labels indicating whether the comment is considered "social" or "anti-social," as well as whether it is labeled as "toxic" or "non-toxic in source paper."

The primary research question revolves around understanding if there is a significant difference in the distribution of these social categorizations between toxic and non-toxic comments. In statistical terms, we are testing the null hypothesis that there is no significant differences between the two categorical variables: "social" and "non-toxic." Conversely, the alternative hypothesis posits that there is indeed a significant difference in the distribution of social categorizations between non-toxic and toxic comments.

In [47]:
import pandas as pd

df = pd.read_csv('mergedCounterProductiveToxic.csv')
df.head()

Unnamed: 0,description,Personal attacks,Threats or intimidation,Mockery,Lack of specificity,Discouragement without guide,Disregard for other time or boundaries,Unconscious bias,Dismissive attitude,Excessive control,Productive,Toxic,CounterProductive,NonToxic
0,/It may happen that a service die/A service ma...,0,0,0,1,0,0,0,0,0,0,0,1,1
1,"@zhiyan, thanks for helping explanation. Overa...",0,0,0,0,0,0,0,0,0,1,0,0,1
2,all the code you have inline below should be r...,0,0,0,0,0,0,0,0,0,1,0,0,1
3,All you do in the interrupt handler is call wa...,1,0,1,1,0,1,0,0,0,0,0,1,1
4,Are you sure this leads to a color that makes ...,0,0,0,0,0,0,0,0,0,1,0,0,1




1.   Define Hypotheses:

  Null Hypothesis (H0): There is no significant difference in the distribution of "social" and "anti-social" comments between the non-toxic and toxic groups.
  Alternative Hypothesis (H1): There is a significant difference in the distribution of "social" and "anti-social" comments between the non-toxic and toxic groups.

2. Select a Significance Level (α):

  We use 0.05 for α. This represents the probability of rejecting the null hypothesis when it is true.

3. Choose a Statistical Test:

  The Chi-square test is a statistical test used to determine if there is a significant association between two categorical variables. It is a non-parametric test, meaning it doesn't make assumptions about the distribution of the data.

  *   Contingency Table:

    The data is organized into a contingency table, which is a two-dimensional table that displays the frequency (count) of each combination of the two categorical variables. In the context of our hypothesis, the table look like this:
    
    



In [49]:
data = df
pd.crosstab(data['Productive'], data['NonToxic'])

NonToxic,0,1
Productive,Unnamed: 1_level_1,Unnamed: 2_level_1
0,131,49
1,68,62


In [52]:
from scipy.stats import chi2_contingency

def chiSquareHypothesis(data):
    # Create a contingency table
    contingency_table = pd.crosstab(data['Productive'], data['NonToxic'])
    # Perform Chi-square test
    chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)
    return p_value

In [54]:
chiSquarePValue = chiSquareHypothesis(df)
'{:.5f}'.format(chiSquarePValue)

'0.00033'

In [55]:
# Set significance level
alpha = 0.05

if chiSquarePValue < alpha:
    print(
        "Reject the null hypothesis. There is a significant difference in the distribution of 'social' and 'anti-social' comments between non-toxic and toxic groups.")
else:
    print(
        "Fail to reject the null hypothesis. There is no significant difference in the distribution of 'social' and 'anti-social' comments between non-toxic and toxic groups.")


Reject the null hypothesis. There is a significant difference in the distribution of 'social' and 'anti-social' comments between non-toxic and toxic groups.


In [59]:
import math
from dataclasses import dataclass

# --- Normal CDF using erf (no SciPy needed) ---
def norm_cdf(z: float) -> float:
    return 0.5 * (1.0 + math.erf(z / math.sqrt(2.0)))

@dataclass
class PropZTestResult:
    n: int
    k: int
    p_hat: float
    p0: float
    z: float
    p_value: float
    alternative: str
    continuity_correction: bool

def one_sample_proportion_ztest(
    k: int,
    n: int,
    p0: float,
    alternative: str = "less",  # "less", "greater", "two-sided"
    continuity_correction: bool = False,
) -> PropZTestResult:
    """
    One-sample z-test for a proportion.
    - k successes out of n
    - test against p0
    - alternative: "less", "greater", or "two-sided"
    - continuity correction: adjusts p_hat by +/- 0.5/n in the direction of H1
    """
    if n <= 0:
        raise ValueError("n must be positive")
    if not (0 <= k <= n):
        raise ValueError("k must satisfy 0 <= k <= n")
    if not (0 < p0 < 1):
        raise ValueError("p0 must be in (0, 1)")
    if alternative not in {"less", "greater", "two-sided"}:
        raise ValueError('alternative must be one of: "less", "greater", "two-sided"')

    p_hat = k / n
    se = math.sqrt(p0 * (1 - p0) / n)

    # Continuity correction for proportions (common approximation):
    # Adjust p_hat by +/- 0.5/n toward the null boundary in the direction of the alternative.
    adj = 0.0
    if continuity_correction:
        if alternative == "less":
            adj = +0.5 / n   # makes z less negative (more conservative)
        elif alternative == "greater":
            adj = -0.5 / n   # makes z less positive (more conservative)
        else:
            # For two-sided, you can adjust toward p0 depending on which side p_hat is on.
            adj = -0.5 / n if p_hat > p0 else +0.5 / n

    z = (p_hat + adj - p0) / se

    if alternative == "less":
        p_value = norm_cdf(z)
    elif alternative == "greater":
        p_value = 1.0 - norm_cdf(z)
    else:  # two-sided
        p_value = 2.0 * min(norm_cdf(z), 1.0 - norm_cdf(z))

    return PropZTestResult(
        n=n,
        k=k,
        p_hat=p_hat,
        p0=p0,
        z=z,
        p_value=p_value,
        alternative=alternative,
        continuity_correction=continuity_correction,
    )


n = 180
k = 131
p0 = 0.80

# One-sided (less): H1: p < 0.80  (this matches your paper's direction)
res_one = one_sample_proportion_ztest(k, n, p0, alternative="less", continuity_correction=False)
res_one_cc = one_sample_proportion_ztest(k, n, p0, alternative="less", continuity_correction=True)

# Two-sided: H1: p != 0.80
res_two = one_sample_proportion_ztest(k, n, p0, alternative="two-sided", continuity_correction=False)
res_two_cc = one_sample_proportion_ztest(k, n, p0, alternative="two-sided", continuity_correction=True)

print("=== One-sided (less), no CC ===")
print(f"p_hat={res_one.p_hat:.6f}, z={res_one.z:.4f}, p={res_one.p_value:.6g}")

print("=== One-sided (less), with CC ===")
print(f"p_hat={res_one_cc.p_hat:.6f}, z={res_one_cc.z:.4f}, p={res_one_cc.p_value:.6g}")

print("=== Two-sided, no CC ===")
print(f"p_hat={res_two.p_hat:.6f}, z={res_two.z:.4f}, p={res_two.p_value:.6g}")

print("=== Two-sided, with CC ===")
print(f"p_hat={res_two_cc.p_hat:.6f}, z={res_two_cc.z:.4f}, p={res_two_cc.p_value:.6g}")

# ---- Sensitivity sweep: at what p0 does p-value cross 0.05? ----
def find_threshold_crossing(k: int, n: int, alpha: float = 0.05,
                            alternative: str = "less",
                            continuity_correction: bool = False,
                            p0_min: float = 0.70, p0_max: float = 0.90,
                            step: float = 0.0005):
    """
    Sweeps p0 and finds the largest p0 such that p-value < alpha (i.e., still significant).
    For "less", as p0 increases, the test tends to become more significant (p decreases),
    so we report the approximate crossing point.
    """
    last_sig = None
    p0 = p0_min
    while p0 <= p0_max + 1e-12:
        r = one_sample_proportion_ztest(k, n, p0, alternative=alternative,
                                        continuity_correction=continuity_correction)
        if r.p_value < alpha:
            last_sig = p0
        p0 += step

    return last_sig

alpha = 0.05
cross_no_cc = find_threshold_crossing(k, n, alpha=alpha, alternative="less",
                                      continuity_correction=False, p0_min=0.70, p0_max=0.85)
cross_cc = find_threshold_crossing(k, n, alpha=alpha, alternative="less",
                                   continuity_correction=True, p0_min=0.70, p0_max=0.85)

print("\n=== Approx p0 where one-sided p < 0.05 starts (sweep) ===")
print(f"No CC: ~ p0 >= {cross_no_cc:.4f}" if cross_no_cc else "No CC: never significant in sweep range")
print(f"With CC: ~ p0 >= {cross_cc:.4f}" if cross_cc else "With CC: never significant in sweep range")


=== One-sided (less), no CC ===
p_hat=0.727778, z=-2.4224, p=0.00770904
=== One-sided (less), with CC ===
p_hat=0.727778, z=-2.3292, p=0.00992324
=== Two-sided, no CC ===
p_hat=0.727778, z=-2.4224, p=0.0154181
=== Two-sided, with CC ===
p_hat=0.727778, z=-2.3292, p=0.0198465

=== Approx p0 where one-sided p < 0.05 starts (sweep) ===
No CC: ~ p0 >= 0.8500
With CC: ~ p0 >= 0.8500
