Objetivo:
- leer un resultado estadístico sin miedo
- ejecutar 2–3 tests básicos
- explicar qué significa el p-value y qué NO significa
- justificar por qué usas un test y no otro

In [2]:
import numpy as np
import pandas as pd
from scipy import stats

np.random.seed(0)

n = 40

df = pd.DataFrame({
    "group": ["control"] * n + ["treated"] * n,
    "value": np.concatenate([
        np.random.normal(loc=10, scale=2, size=n),   # control
        np.random.normal(loc=11, scale=2, size=n),   # treated (ligero efecto)
    ])
})

df.head()


Unnamed: 0,group,value
0,control,13.528105
1,control,10.800314
2,control,11.957476
3,control,14.481786
4,control,13.735116


In [3]:
control = df[df["group"] == "control"]["value"]
treated = df[df["group"] == "treated"]["value"]

control.describe(), treated.describe()


(count    40.000000
 mean     10.625085
 std       2.155786
 min       4.894020
 25%       9.541111
 50%      10.646742
 75%      12.069297
 max      14.539509
 Name: value, dtype: float64,
 count    40.000000
 mean     10.252746
 std       1.640946
 min       7.547435
 25%       9.203151
 50%      10.160145
 75%      11.282187
 max      14.901551
 Name: value, dtype: float64)

Hipótesis nula y alternativa (aplicado)
Qué es H0 
La media de value es la misma en control y treated.

Qué es H1
La media de value es distinta.

In [None]:
# t-test (Welch) => Comparar medias sin asumir varianzas iguales.
# En la práctica no sabes si las varianzas son iguales → Welch es más seguro
t_stat, p_value = stats.ttest_ind(treated, control, equal_var=False)

t_stat, p_value

(np.float64(-0.8691947230640317), np.float64(0.387595960253478))

t_stat: tamaño de la diferencia en unidades de error
p_value: qué tan compatible es esta diferencia con H0

Si p < 0.05: evidencia contra H0
Si p >= 0.05: no hay evidencia suficiente

In [None]:
# Tamaño de efecto
mean_control = control.mean()
mean_treated = treated.mean()
diff = mean_treated - mean_control

mean_control, mean_treated, diff

(np.float64(10.625084946432349),
 np.float64(10.252745917023612),
 np.float64(-0.372339029408737))

In [6]:
# ¿Y si no hay normalidad? → Mann–Whitney
u_stat, p_u = stats.mannwhitneyu(treated, control, alternative="two-sided")

u_stat, p_u

(np.float64(683.0), np.float64(0.26227859474350024))

distribución “razonable” → t-test
distribución rara / muchos outliers → Mann–Whitney

In [7]:
pd.DataFrame({
    "test": ["Welch t-test", "Mann–Whitney"],
    "p_value": [p_value, p_u]
})

Unnamed: 0,test,p_value
0,Welch t-test,0.387596
1,Mann–Whitney,0.262279


We compared the values between control and treated groups using a Welch t-test.
The difference between groups was statistically significant (p = 0.387596).
The treated group showed a higher mean value than the control group.
This result indicates an association between group and value in this dataset, but does not imply causality.