
# Clase 1 — Introducción a la Inferencia Estadística (Aplicada a Steel Frame)

**Temas**: Estimación puntual, intervalos de confianza, pruebas de hipótesis (t de Student), errores Tipo I y II, potencia, tamaño muestral.  
**Contexto**: datos de perfiles y paneles (espesor, resistencia, recubrimiento, U-value).

Este notebook **genera el dataset sintético internamente** para evitar problemas de descarga.  
No es necesario un archivo externo.


In [3]:

# === Generar dataset sintético de Steel Frame ===
import numpy as np
import pandas as pd

rng = np.random.default_rng(42)
n = 240

supplier = rng.choice(['A','B'], size=n, p=[0.55, 0.45])

mu_A, mu_B = 0.90, 0.88
sd_A, sd_B = 0.06, 0.07

thickness = np.where(supplier=='A',
                     rng.normal(mu_A, sd_A, size=n),
                     rng.normal(mu_B, sd_B, size=n))
thickness = np.clip(thickness, 0.70, 1.10)

yield_strength = 230 + 80*(thickness-0.85) + rng.normal(0, 12, size=n)

coating = np.where(supplier=='A',
                   rng.normal(220, 18, size=n),
                   rng.normal(210, 20, size=n))
coating = np.clip(coating, 160, 280)

base_U = 0.35 - 0.06*(thickness-0.85) + rng.normal(0, 0.015, size=n)
bridge_penalty = np.where(supplier=='A', 0.012, 0.015)
U_value = np.clip(base_U + bridge_penalty, 0.18, 0.55)

meets_U = (U_value <= 0.40)

batch = rng.integers(1001, 1012, size=n)
panel_id = [f"P{b}-{i:03d}" for b, i in zip(batch, rng.integers(1, 160, size=n))]

df = pd.DataFrame({
    "panel_id": panel_id,
    "supplier": supplier,
    "batch": batch,
    "thickness_mm": np.round(thickness, 3),
    "yield_strength_MPa": np.round(yield_strength, 1),
    "zinc_coating_g_m2": np.round(coating, 0),
    "U_W_m2K": np.round(U_value, 3),
    "meets_U_requirement": meets_U
})

df.head()


Unnamed: 0,panel_id,supplier,batch,thickness_mm,yield_strength_MPa,zinc_coating_g_m2,U_W_m2K,meets_U_requirement
0,P1003-023,B,1003,0.923,233.3,249.0,0.364,True
1,P1009-016,A,1009,0.861,216.3,245.0,0.36,True
2,P1003-094,B,1003,0.954,219.6,185.0,0.352,True
3,P1004-001,B,1004,0.8,234.2,191.0,0.324,True
4,P1003-103,A,1003,0.836,224.7,220.0,0.344,True


## 1. Notación y fórmulas clave


- Media muestral: \(\overline{x}=\frac{1}{n}\sum_{i=1}^n x_i\).
- Varianza muestral: \(s^2=\frac{1}{n-1}\sum_{i=1}^n (x_i-\overline{x})^2\).
- IC para media: \(\overline{x} \pm t_{\alpha/2,\ n-1}\ \frac{s}{\sqrt{n}}\).
- t de 1 muestra: \(t=\frac{\overline{x}-\mu_0}{s/\sqrt{n}}\).
- t de 2 muestras (var. desiguales): \(t=\frac{\overline{x}_1-\overline{x}_2}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\).


## 2. Estimación puntual — Espesor (mm)

In [4]:

x = df["thickness_mm"].dropna()
n = x.size
x_mean = x.mean()
x_std = x.std(ddof=1)
n, x_mean, x_std


(240, np.float64(0.8863083333333334), np.float64(0.07090754140550173))

In [5]:

import matplotlib.pyplot as plt

plt.hist(x, bins=20)
plt.title("Histograma de espesor (mm)")
plt.xlabel("Espesor (mm)")
plt.ylabel("Frecuencia")
plt.show()


ModuleNotFoundError: No module named 'matplotlib'

## 3. Intervalo de confianza para la media de espesor (95%)

In [None]:

from scipy import stats
alpha = 0.05
t_crit = stats.t.ppf(1 - alpha/2, df=n-1)
moe = t_crit * x_std/np.sqrt(n)
(x_mean - moe, x_mean + moe)


## 4. Prueba t de 1 muestra — Espesor nominal

In [None]:

mu0 = 0.85
t_stat, p_one_sided = stats.ttest_1samp(x, popmean=mu0, alternative="greater")
t_stat, p_one_sided
