# Selección y Asignación de pesos: Activos (con ElasticNet + Markowitz)

## 1. Universo y filtros iniciales
Partimos de un universo amplio $U$ de $\sim100$ activos ($|U|=100$), compuesto por ETFs core (broad market, sectoriales, commodities, bonos) y acciones *large/mega cap*.  
Aplicamos filtros mínimos:

1. **Liquidez**:  
   $$
   \text{ADV}_i = \operatorname{median}_{t \in [t-60,t]} \big( P_{i,t}\cdot Vol_{i,t} \big) \geq \theta
   $$  
   con $\theta = 5{,}000{,}000\ \text{USD}$.
2. **Datos completos**: al menos $252$ observaciones diarias sin faltantes.
3. **Precio**: descartamos activos con precio efectivo demasiado bajo o con gaps de iliquidez.


---

## 2. Feature engineering
Para cada activo $i$ construimos un vector de características cuantitativas:

- **Momentos acumulados**:  
  $$
  r_{i,h} = \sum_{t=T-h+1}^{T} \ln\left(1+R_{i,t}\right), \quad h \in \{60,126,252\}
  $$
- **Volatilidad**:  
  $$
  \sigma_{i,h} = \sqrt{\frac{1}{h}\sum_{t=T-h+1}^{T}(R_{i,t}-\bar R_{i})^2}
  $$
- **Downside volatility** (solo retornos negativos).  
- **Máx. drawdown**:  
  $$
  \text{MDD}_i = \min_{t \in [0,T]} \left( \frac{W_{i,t}}{\max_{s \leq t} W_{i,s}} - 1 \right), \quad W_{i,t}=(1+R_{i})^{(t)}
  $$
- **Sensibilidad al mercado** (beta) y correlación con SPY:
  $$
  \beta_i = \frac{\operatorname{Cov}(R_i, R_{SPY})}{\operatorname{Var}(R_{SPY})}, 
  \quad \rho_i = \operatorname{Corr}(R_i, R_{SPY})
  $$

---

## 3. Sharpe ratio como objetivo
El ratio de Sharpe individual se calcula como:
$$
S_i = \frac{\mu_i - r_f}{\sigma_i}
$$
donde $\mu_i = E[R_i] \cdot 252$, $\sigma_i = \operatorname{Std}(R_i)\cdot\sqrt{252}$ y $r_f$ es la tasa libre de riesgo anualizada.  

Esto cuantifica el exceso de retorno ajustado por riesgo de cada activo.

---

## 4. ElasticNet
Se entrena un modelo penalizado:
$$
\hat y_i = \mathbf{x}_i^\top \beta,\quad
\beta = \arg\min_{\beta} \Bigg\{ \frac{1}{n}\sum_{i=1}^n (y_i - \mathbf{x}_i^\top \beta)^2 
+ \lambda \Big[ \alpha \|\beta\|_1 + (1-\alpha)\|\beta\|_2^2 \Big] \Bigg\}
$$
- $y_i = S_i$ (Sharpe ratio del activo $i$).  
- $\mathbf{x}_i$ = vector de *features* (momentum, vol, MDD, liquidez, beta, correlación).  
- $\lambda$ controla la magnitud de la penalización, $\alpha \in [0,1]$ combina **Lasso** ($\ell_1$) y **Ridge** ($\ell_2$).

La regresión penalizada identifica qué factores explican mejor el Sharpe ratio y evita sobreajuste y multicolinealidad.

Los coeficientes $\beta_j$ obtenidos se interpretan como **importancia relativa**:  
- $\beta_j > 0$ = característica favorece un Sharpe alto.  
- $\beta_j < 0$ = característica reduce Sharpe esperado.

Esto genera un **score ex-ante**:
$$
\text{Score}_i = \sum_{j} Z_{i,j} \cdot \hat\beta_j
$$
donde $Z_{i,j}$ son las *features* estandarizadas.

---

## 5. Diversificación por clúster
Se construye una matriz de correlaciones $\rho_{ij}$ y una distancia:
$$
d_{ij} = \sqrt{2(1-\rho_{ij})}
$$
Luego se aplica **KMeans** o clustering jerárquico sobre $d_{ij}$ para agrupar activos en clústeres de alta similitud.  
La selección final respeta la restricción:
$$
|C_k \cap P| \leq \text{cap\_cluster}
$$
con $P$ el portafolio elegido. Esto evita redundancia y asegura diversificación.

---

## 6. Optimización de Markowitz (long-only)
Con los activos seleccionados, se optimiza bajo el marco clásico:

- **Varianza mínima**:  
  $$
  \min_w \quad w^\top \Sigma w
  \quad s.a.\quad \mathbf{1}^\top w = 1,\; w \geq 0
  $$

- **Punto de tangencia (máx. Sharpe)**:  
  $$
  \max_w \quad \frac{w^\top \mu - r_f}{\sqrt{w^\top \Sigma w}}
  \quad s.a.\quad \mathbf{1}^\top w = 1,\; w \geq 0
  $$

donde $\mu$ es el vector de retornos esperados y $\Sigma$ la covarianza de Ledoit–Wolf (estimador shrinkage robusto).

Se calcula la **frontera eficiente** variando el retorno objetivo $\tau$:
$$
\min_w \quad w^\top \Sigma w
\quad s.a.\quad \mathbf{1}^\top w = 1,\; \mu^\top w = \tau,\; w \geq 0
$$

---

## 7. Justificación estratégica
- **Sectorial**: exposición diversificada (equity US, internacional, tecnología, energía, defensivos, dividendos).  
- **Liquidez**: todos los activos seleccionados tienen ADV elevado.  
- **Riesgo-retorno**: ElasticNet prioriza momentum + liquidez y penaliza drawdown/volatilidad → portafolio robusto.  
- **Diversificación**: restricción de clúster evita concentrar en un único sector.  
- **Optimización eficiente**: pesos finales resultan de Markowitz con restricciones long-only y límites superiores, lo que garantiza factibilidad operativa.


In [5]:
!pip install ipywidgets




In [None]:
import numpy as np
import pandas as pd
import yfinance as yf
from sklearn.covariance import LedoitWolf
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.linear_model import ElasticNetCV
from scipy.optimize import minimize
import matplotlib.pyplot as plt
from ipywidgets import (
    Text, FloatText, IntSlider, Button, VBox, HBox, Output, HTML, Dropdown, BoundedFloatText, Checkbox
)
from IPython.display import display

UNIVERSE = [

    "SPY","VOO","QQQ","VTI","IWM","DIA","VTV","VOE","VUG","XLK","SOXX","SMH",
    "XLF","XLE","XLV","XLY","XLP","XLU","XLB","XLI","XLC",
    "EFA","EEM","EWJ","EWZ","EWY","EWW","INDA","ASHR",
    "TLT","IEF","LQD","HYG","GOVT","SHY",
    "GLD","SLV","USO","UNG","DBA","DBC",
    "SCHD","VYM","DVY","JEPI","JEPQ",
    "AAPL","MSFT","GOOGL","META","AMZN","NVDA","AVGO","TSLA",
    "JNJ","PG","KO","PEP","MCD","WMT","MA","V","HD","LOW",
    "JPM","BAC","GS","MS","BRK-B","UNH","PFE","ABBV","XOM","CVX","COP",
    "NKE","COST","ADBE","ORCL","CRM","INTC","AMD","AMAT","ASML","TXN",
    "CAT","DE","GE","HON","BA","UPS","FDX","RTX","LMT","PLD",
    "CSCO","QCOM","MU","PANW","CRWD","NOW","SNOW",
    "MRK","TMO","ABT","MDT","LLY","REGN",
    "BKNG","ABNB","MAR","CCL","RCL","NFLX","DIS",
]
START_DEFAULT = "2024-01-01"
END_DEFAULT   = "2025-09-01"
RF_ANNUAL_DEFAULT = 0.00
MIN_ADV_USD_DEFAULT = 5_000_000
MAX_N_DEFAULT = 7
MIN_N_DEFAULT = 4
N_CLUSTERS_DEFAULT = 8
CAP_PER_CLUSTER_DEFAULT = 2
RANDOM_STATE = 42

start_sel   = Text(value=START_DEFAULT, description="Start:")
end_sel     = Text(value=END_DEFAULT, description="End:")
rf_sel      = FloatText(value=RF_ANNUAL_DEFAULT, description="RF anual:")
min_adv_sel = FloatText(value=MIN_ADV_USD_DEFAULT, description="MIN ADV$:")
max_n_sel   = IntSlider(value=MAX_N_DEFAULT, min=3, max=10, step=1, description="Max N:")
min_n_sel   = IntSlider(value=MIN_N_DEFAULT, min=2, max=7, step=1, description="Min N:")
k_sel       = IntSlider(value=N_CLUSTERS_DEFAULT, min=3, max=15, step=1, description="Clusters:")
cap_sel     = IntSlider(value=CAP_PER_CLUSTER_DEFAULT, min=1, max=4, step=1, description="Cap/cluster:")
reselect_btn= Button(description="Reseleccionar (ElasticNet)", button_style="primary")
out_head    = HTML("<b>Seleccionados:</b> —")
out_coef    = Output()
out_select  = Output()

tickers_txt = Text(value="", description="Tickers:", layout=dict(width="720px"))
start_mkv   = Text(value=START_DEFAULT, description="Start:")
end_mkv     = Text(value=END_DEFAULT, description="End:")
rf_mkv      = FloatText(value=RF_ANNUAL_DEFAULT, description="RF anual:")
grid_slider = IntSlider(value=40, min=10, max=200, step=5, description="# puntos:")
obj_dd      = Dropdown(options=["Frontera (grid)", "Mín-Var", "Tangency"], value="Frontera (grid)", description="Objetivo:")
ubound      = BoundedFloatText(value=1.0, min=0.05, max=1.0, step=0.05, description="Límite w_i:")
auto_compute= Checkbox(value=True, description="Auto-calcular al reseleccionar")
compute_btn = Button(description="Calcular frontera", button_style="primary")
sel_idx_slider = IntSlider(value=0, min=0, max=0, step=1, description="Punto:")
out_plot   = Output()
out_weights= Output()
out_info   = Output()

def ann_to_daily(x, periods=252):
    return (1 + x)**(1/periods) - 1

def max_drawdown_from_returns(r):
    eq = (1+r).cumprod()
    peak = eq.cummax()
    dd = eq/peak - 1.0
    return float(dd.min())

def winsorize(s, p=0.05):
    lo, hi = s.quantile(p), s.quantile(1-p)
    return s.clip(lo, hi)

def download_panel(tickers, start, end):
    df = yf.download(tickers, start=start, end=end, auto_adjust=True, progress=False)
    if df.empty: 
        raise RuntimeError("Sin datos descargados.")
    px = df["Close"].dropna(how="all", axis=1)
    vol = df["Volume"].reindex(px.index).fillna(0).dropna(how="all", axis=1)
    cols = sorted(list(set(px.columns) & set(vol.columns)))
    px = px[cols].dropna()
    vol = vol[cols].loc[px.index]
    return px, vol

def min_variance_weights(cov, ub=1.0):
    n = cov.shape[0]; w0 = np.ones(n)/n
    cons = ({"type":"eq","fun": lambda w: np.sum(w)-1.0},)
    bnds = tuple((0.0, float(ub)) for _ in range(n))
    obj = lambda w: float(w @ cov @ w)
    res = minimize(obj, w0, method="SLSQP", bounds=bnds, constraints=cons, options={"maxiter":600})
    if not (res.success and np.all(np.isfinite(res.x))):
        w = w0
    else:
        w = res.x
    w = np.maximum(w, 0); s = w.sum()
    return w/s if s>0 else w0

def min_var_given_target(mu, cov, target, ub=1.0):
    n = len(mu); w0 = np.ones(n)/n
    cons = (
        {"type":"eq","fun": lambda w: np.sum(w) - 1.0},
        {"type":"eq","fun": lambda w: float(mu @ w) - target}
    )
    bnds = tuple((0.0, float(ub)) for _ in range(n))
    obj = lambda w: float(w @ cov @ w)
    res = minimize(obj, w0, method="SLSQP", bounds=bnds, constraints=cons, options={"maxiter":1000})
    if res.success and np.all(np.isfinite(res.x)):
        w = np.maximum(res.x, 0); s = w.sum()
        if s>0: return w/s
    return None

def tangency_weights(mu_d, cov_d, rf_d, ub=1.0):
    n = len(mu_d); w0 = np.ones(n)/n
    def neg_sharpe(w):
        mu_p = float(w @ mu_d)
        vol_p = np.sqrt(max(1e-12, w @ cov_d @ w))
        return - (mu_p - rf_d) / (vol_p + 1e-12)
    cons = ({"type":"eq","fun": lambda w: np.sum(w) - 1.0},)
    bnds = tuple((0.0, float(ub)) for _ in range(n))
    res = minimize(neg_sharpe, w0, method="SLSQP", bounds=bnds, constraints=cons, options={"maxiter":1000})
    if res.success and np.all(np.isfinite(res.x)):
        w = np.maximum(res.x, 0); s = w.sum()
        return w/s if s>0 else np.ones(n)/n
    return np.ones(n)/n

def reseleccionar():
    out_coef.clear_output(); out_select.clear_output()
    START = start_sel.value.strip()
    END   = end_sel.value.strip()
    RF_A  = float(rf_sel.value)
    MIN_ADV = float(min_adv_sel.value)
    MAX_N = int(max_n_sel.value)
    MIN_N = int(min_n_sel.value)
    N_CL  = int(k_sel.value)
    CAPC  = int(cap_sel.value)

    px, vol = download_panel(UNIVERSE, START, END)
    px = px.dropna(axis=1, how="any")
    vol = vol[px.columns]

    adv_usd = (px * vol).rolling(60).median().iloc[-1].replace([np.inf,-np.inf], np.nan).dropna()
    liq_keep = adv_usd[adv_usd >= MIN_ADV].index.tolist()
    px = px[liq_keep]; vol = vol[liq_keep]
    ret = np.log(px/px.shift(1)).dropna()
    tickers_ok = list(px.columns)
    if len(tickers_ok) < MIN_N:
        out_select.append_stdout("Muy pocos activos tras filtro de liquidez/datos.\n")
        return []

    r60  = ret.tail(60).sum()
    r126 = ret.tail(126).sum()
    r252 = ret.tail(252).sum() if len(ret) >= 252 else ret.sum()
    vol126 = ret.tail(126).std()
    vol252 = ret.std()
    neg = ret.copy(); neg[neg>0]=0
    downvol = neg.std()

    spy = "SPY" if "SPY" in ret.columns else tickers_ok[0]
    Xbeta = np.vstack([ret[spy].values, np.ones(len(ret))]).T
    betas, cors = {}, {}
    for t in tickers_ok:
        y = ret[t].values
        b = np.linalg.lstsq(Xbeta, y, rcond=None)[0][0]
        betas[t] = float(b)
        cors[t] = float(np.corrcoef(ret[spy], ret[t])[0,1])

    mdd = {}
    w = min(len(ret), 252)
    for t in tickers_ok:
        mdd[t] = max_drawdown_from_returns(ret[t].tail(w))

    liq = adv_usd[tickers_ok]
    feats = pd.DataFrame({
        "momentum_60":  r60,
        "momentum_126": r126,
        "momentum_252": r252,
        "vol_126":      vol126,
        "vol_252":      vol252,
        "down_vol":     downvol,
        "beta_SPY":     pd.Series(betas),
        "corr_SPY":     pd.Series(cors),
        "mdd":          pd.Series(mdd),
        "ADV_usd":      liq
    }).dropna()

    for c in feats.columns:
        feats[c] = winsorize(feats[c])

    mu_a = ret[feats.index].mean()*252
    vol_a = ret[feats.index].std()*np.sqrt(252)
    sharpe = (mu_a - RF_A) / (vol_a.replace(0, np.nan))

    X = StandardScaler().fit_transform(feats.fillna(feats.mean()))
    y = sharpe.loc[feats.index].fillna(sharpe.median()).values
    enet = ElasticNetCV(l1_ratio=[.2,.5,.8], cv=5, random_state=RANDOM_STATE).fit(X, y)
    coef = pd.Series(enet.coef_, index=feats.columns).sort_values(ascending=False)

    with out_coef:
        print("Importancias (ElasticNet, Sharpe como target):")
        display(coef.to_frame("coef").T)

    coef_nz = coef[coef.abs()>0]
    if coef_nz.empty:
        score = ( +0.5*feats["momentum_126"].rank(pct=True)
                  +0.3*feats["momentum_252"].rank(pct=True)
                  -0.3*feats["vol_126"].rank(pct=True)
                  -0.2*feats["mdd"].rank(pct=True)
                  -0.2*feats["corr_SPY"].rank(pct=True)
                  +0.2*feats["ADV_usd"].rank(pct=True) )
    else:
        Z = feats.apply(lambda s: (s - s.mean())/s.std(ddof=0))
        score = Z[coef_nz.index].dot(coef_nz)

    score = score.sort_values(ascending=False)

    C = ret[score.index].corr().fillna(0).values
    U, svals, Vt = np.linalg.svd(C - C.mean())
    X2 = U[:, :min(5, U.shape[1])] * svals[:min(5, U.shape[1])]
    km = KMeans(n_clusters=N_CL, n_init=20, random_state=RANDOM_STATE)
    labels = km.fit_predict(X2)
    lab_ser = pd.Series(labels, index=score.index, name="cluster")

    selected = []
    for t in score.index:
        c = int(lab_ser[t])
        cnt_c = sum(1 for tt in selected if lab_ser[tt]==c)
        if cnt_c < CAPC:
            selected.append(t)
        if len(selected) >= MAX_N:
            break
    if len(selected) < MIN_N:
        selected = list(score.index[:MIN_N])

    with out_select:
        out_select.clear_output()
        print("Seleccionados:", ", ".join(selected))

    out_head.value = f"<b>Seleccionados:</b> {', '.join(selected)}"
    tickers_txt.value = ",".join(selected)
    start_mkv.value = START
    end_mkv.value   = END
    rf_mkv.value    = RF_A
    if auto_compute.value:
        compute_frontier(None)
    return selected

_frontier = []; _curr_tickers = []

def compute_frontier(_):
    out_plot.clear_output(); out_weights.clear_output(); out_info.clear_output()
    global _frontier, _curr_tickers
    _frontier = []; _curr_tickers = []

    tickers = [t.strip().upper() for t in tickers_txt.value.split(",") if t.strip()]
    tickers = list(dict.fromkeys(tickers))
    if len(tickers)<2:
        with out_info: print("Elige al menos 2 tickers."); return

    START = start_mkv.value.strip()
    END   = end_mkv.value.strip()
    RF_A  = float(rf_mkv.value); rf_d = ann_to_daily(RF_A)
    UB    = float(ubound.value)

    try:
        px = (yf.download(tickers, start=START, end=END, auto_adjust=True, progress=False)["Close"]
                .dropna(axis=1, how="any"))
    except Exception as e:
        with out_info: print("Error descargando datos:", e); return

    if px.shape[1]<2:
        with out_info: print("Muy pocos activos con datos completos."); return

    ret = np.log(px/px.shift(1)).dropna()
    mu_d = ret.mean().values
    lw = LedoitWolf().fit(ret.values)
    cov_d = lw.covariance_
    _curr_tickers = list(px.columns)

    pts = []
    if obj_dd.value == "Frontera (grid)":
        mu_min, mu_max = float(np.min(mu_d)), float(np.max(mu_d))
        targets = np.linspace(mu_min+1e-6, mu_max-1e-6, int(grid_slider.value))
        for tau in targets:
            w = min_var_given_target(mu_d, cov_d, tau, ub=UB)
            if w is None: 
                continue
            mu_ann = float((mu_d @ w) * 252)
            vol_ann = float(np.sqrt(w @ cov_d @ w) * np.sqrt(252))
            pts.append({"ret_ann": mu_ann, "vol_ann": vol_ann, "w": w})
        w_mv = min_variance_weights(cov_d, ub=UB)
        pts.append({"ret_ann": float((mu_d @ w_mv)*252),
                    "vol_ann": float(np.sqrt(w_mv @ cov_d @ w_mv)*np.sqrt(252)),
                    "w": w_mv, "tag":"MinVar"})
        w_tg = tangency_weights(mu_d, cov_d, rf_d, ub=UB)
        pts.append({"ret_ann": float((mu_d @ w_tg)*252),
                    "vol_ann": float(np.sqrt(w_tg @ cov_d @ w_tg)*np.sqrt(252)),
                    "w": w_tg, "tag":"Tangency"})
    elif obj_dd.value == "Mín-Var":
        w_mv = min_variance_weights(cov_d, ub=UB)
        pts = [{"ret_ann": float((mu_d @ w_mv)*252),
                "vol_ann": float(np.sqrt(w_mv @ cov_d @ w_mv)*np.sqrt(252)),
                "w": w_mv, "tag":"MinVar"}]
    else:  
        w_tg = tangency_weights(mu_d, cov_d, rf_d, ub=UB)
        pts = [{"ret_ann": float((mu_d @ w_tg)*252),
                "vol_ann": float(np.sqrt(w_tg @ cov_d @ w_tg)*np.sqrt(252)),
                "w": w_tg, "tag":"Tangency"}]

    if len(pts)==0:
        with out_info: print("No se pudo construir la frontera."); return

    _frontier = sorted(pts, key=lambda d: d["vol_ann"])
    sel_idx_slider.max = max(0, len(_frontier)-1)
    sel_idx_slider.value = np.argmin([p["vol_ann"] for p in _frontier])

    with out_plot:
        fig, ax = plt.subplots(figsize=(7,5))
        xs = [p["vol_ann"] for p in _frontier]
        ys = [p["ret_ann"] for p in _frontier]
        ax.plot(xs, ys, marker='o', linestyle='-')
        try:
            i_mv = [i for i,p in enumerate(_frontier) if p.get("tag")=="MinVar"][0]
            ax.scatter([_frontier[i_mv]["vol_ann"]], [_frontier[i_mv]["ret_ann"]], s=80)
            ax.text(_frontier[i_mv]["vol_ann"], _frontier[i_mv]["ret_ann"], "  MinVar", va='bottom')
        except: pass
        try:
            i_tg = [i for i,p in enumerate(_frontier) if p.get("tag")=="Tangency"][0]
            ax.scatter([_frontier[i_tg]["vol_ann"]], [_frontier[i_tg]["ret_ann"]], s=80)
            ax.text(_frontier[i_tg]["vol_ann"], _frontier[i_tg]["ret_ann"], "  Tangency", va='bottom')
        except: pass
        ax.set_xlabel("Volatilidad anual"); ax.set_ylabel("Retorno anual")
        ax.set_title("Frontera eficiente (long-only)")
        plt.show()

    update_selection(None)

def update_selection(change):
    out_weights.clear_output(); out_info.clear_output()
    if not _frontier: return
    i = int(sel_idx_slider.value)
    i = max(0, min(i, len(_frontier)-1))
    w = _frontier[i]["w"]; vol = _frontier[i]["vol_ann"]; ret_ = _frontier[i]["ret_ann"]
    with out_weights:
        dfw = pd.DataFrame({"Ticker": _curr_tickers, "Weight": np.round(w, 4)})
        display(dfw)
        fig, ax = plt.subplots(figsize=(7,3))
        ax.bar(np.arange(len(w)), w)
        ax.set_xticks(np.arange(len(w))); ax.set_xticklabels(_curr_tickers, rotation=45, ha='right')
        ax.set_ylabel("Weight"); ax.set_title("Pesos del punto seleccionado")
        plt.tight_layout(); plt.show()
    with out_info:
        sharpe = ret_/max(vol, 1e-12)
        print(f"Índice: {i} | Ret anual: {ret_:.4f} | Vol anual: {vol:.4f} | Sharpe≈ {sharpe:.4f}")

reselect_btn.on_click(lambda _: reseleccionar())
compute_btn.on_click(compute_frontier)
sel_idx_slider.observe(update_selection, names="value")

ui_select = VBox([
    HTML("<h3>Selección (Sharpe→ElasticNet + Diversificación por clúster)</h3>"),
    HBox([start_sel, end_sel, rf_sel, min_adv_sel]),
    HBox([max_n_sel, min_n_sel, k_sel, cap_sel]),
    HBox([reselect_btn, auto_compute]),
    out_head,
    out_coef,
    out_select
])

ui_markowitz = VBox([
    HTML("<h3>Asignación (Markowitz, Ledoit–Wolf, Long-Only)</h3>"),
    HBox([tickers_txt]),
    HBox([start_mkv, end_mkv, rf_mkv, grid_slider]),
    HBox([obj_dd, ubound, compute_btn, sel_idx_slider]),
    out_plot,
    out_weights,
    out_info
])

display(VBox([ui_select, ui_markowitz]))

_ = reseleccionar()
