<div style="display: table; width: 100%;">
  <div style="display: table-cell; text-align: center; vertical-align: middle; width: 70%;">
    <h2>Maestría en Ciencia de Datos y Máquinas de Aprendizaje</h2>
    <h1>Inteligencia Artificial: Data Mining I - Tarea 2</h1>
  </div>
  <div style="display: table-cell; text-align: center; vertical-align: middle; width: 30%;">
    <img src="https://github.com/UIDE-Tareas/4-Inteligencia-Artificial-Data-Mining-I-Tarea3/blob/main/Assets/UideLogo.png?raw=true" alt="Texto alternativo" style="width:50%;">
  </div>
</div>
<hr />

#  Clustering
##  Breast Cancer Wisconsin Diagnostic Dataset

### 🟦 **Autores - Estudiantes - Grupo 7:**

&nbsp;&nbsp;&nbsp;&nbsp;💻 Luis Miguel Ramírez      
&nbsp;&nbsp;&nbsp;&nbsp;💻 Aviles Paute José     
&nbsp;&nbsp;&nbsp;&nbsp;💻 Espinoza Bone José    

### 🟦 Código fuente
[https://github.com/UIDE-Tareas/4-Inteligencia-Artificial-Data-Mining-I-Tarea3](https://github.com/UIDE-Tareas/4-Inteligencia-Artificial-Data-Mining-I-Tarea3)


**Fecha:** 11 de octubre de 2025

### 🟦 Introducción 
Breast Cancer Wisconsin Diagnostic Dataset es un conjunto de datos ampliamente utilizado en proyectos
de ciencia de datos y aprendizaje automático, especialmente para prácticas de clasificación y agrupamiento.

Fue recopilado por el Dr. William H. Wolberg en la Universidad de Wisconsin, y contiene información sobre
características de células tumorales obtenidas a partir de imágenes digitales de biopsias de tejido mamario.

El objetivo de estos datos es analizar las características físicas de los núcleos celulares para ayudar
a distinguir entre tumores malignos y benignos.

### 🟦 Objetivo General    
Aplicar técnicas de regresión, clasificación y redes neuronales para explorar y analizar 
patrones en los datos del cáncer de mama, con el fin de: 
1. Identificar relaciones entre variables mediante regresión lineal y logística. 
2. Evaluar el desempeño de un modelo de red neuronal para clasificación. 
3. Comparar resultados y probabilidades predichas por cada modelo. 

### 🟦 Objetivos Específicos      
- Realizar análisis exploratorio del dataset. 
- Implementar regresión lineal multivariable para predecir características continuas. 
- Implementar regresión logística para clasificar tumores malignos y benignos. 
- Diseñar y entrenar un MLP en PyTorch para clasificación binaria. 
- Visualizar resultados y comparar desempeño entre métodos. 



## 0️⃣ Preparar entorno (Funciones utilitarias, instalar libs)

In [None]:
import sys
import subprocess
import os
from pathlib import Path
from enum import Enum
import zipfile
from typing import Optional, Iterable
from dataclasses import dataclass
from typing import cast
from typing import Tuple
from types import SimpleNamespace

# Libs a instalar
LIBS = [
    "matplotlib",
    "numpy",
    "pandas",
    "seaborn",
    "scikit-learn",
    "requests",
    "wcwidth",
]

class ConsoleColor(Enum):
    RED = "\033[91m"
    GREEN = "\033[92m"
    YELLOW = "\033[93m"
    BLUE = "\033[94m"
    MAGENTA = "\033[95m"
    CYAN = "\033[96m"
    WHITE = "\033[97m"
    RESET = "\033[0m"


def PrintColor(message: str, color: ConsoleColor) -> str:
    RESET = ConsoleColor.RESET.value
    return f"{color.value}{message}{RESET}"


def ShowMessage(
    message: str, title: str, icon: str, color: ConsoleColor, end: str = "\n"
):
    colored_title = PrintColor(icon + f"  " + title.upper() + ":", color)
    print(f"{colored_title} {message}", end=end)


def ShowInfoMessage(
    message: str, title: str = "Info", icon: str = "ℹ️", end: str = "\n"
):
    ShowMessage(message, title, icon, ConsoleColor.CYAN, end)


def ShowSuccessMessage(
    message: str, title: str = "Success", icon: str = "✅", end: str = "\n"
):
    ShowMessage(message, title, icon, ConsoleColor.GREEN, end)


def ShowErrorMessage(
    message: str, title: str = "Error", icon: str = "❌", end: str = "\n"
):
    ShowMessage(message, title, icon, ConsoleColor.RED, end)


def ShowWarningMessage(
    message: str, title: str = "Warning", icon: str = "⚠️", end: str = "\n"
):
    ShowMessage(message, title, icon, ConsoleColor.YELLOW, end)


# Funcion para ejecutar comandos
def RunCommand(
    commandList: list[str], printCommand: bool = True, printError: bool = True
) -> subprocess.CompletedProcess[str]:
    print("⏳", " ".join(commandList))

    if printCommand:
        proc = subprocess.Popen(
            commandList,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            bufsize=1,
            universal_newlines=True,
        )

        out_lines: list[str] = []
        assert proc.stdout is not None
        for line in proc.stdout:
            print(line, end="")
            out_lines.append(line)

        proc.wait()
        err_text = ""
        if proc.stderr is not None:
            err_text = proc.stderr.read() or ""

        if proc.returncode != 0 and printError and err_text:
            ShowErrorMessage(err_text, "", end="")
            # print(err_text, end="")

        return subprocess.CompletedProcess(
            args=commandList,
            returncode=proc.returncode,
            stdout="".join(out_lines),
            stderr=err_text,
        )

    else:
        result = subprocess.run(
            commandList, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True
        )
        if result.returncode != 0 and printError and result.stderr:
            ShowErrorMessage(result.stderr, "", end="")
            # print(result.stderr, end="")
        return result


# Función para instalar las dependencias
def InstallDeps(libs: Optional[list[str]] = None):
    print("ℹ️ Installing deps.")
    printCommand = False
    printError = True
    RunCommand(
        [sys.executable, "-m", "pip", "install", "--upgrade", "pip"],
        printCommand=printCommand,
        printError=printError,
    )
    if libs is None or libs.count == 0:
        print("No hay elementos a instalar.")
    else:
        RunCommand(
            [sys.executable, "-m", "pip", "install", *libs],
            printCommand=printCommand,
            printError=printError,
        )
        print("Deps installed.")
    print()


# Función para mostrar info el ambiente de ejecución
def ShowEnvironmentInfo():
    print("ℹ️  Environment Info:")
    print("Python Version:", sys.version)
    print("Platform:", sys.platform)
    print("Executable Path:", sys.executable)
    print("Current Working Directory:", os.getcwd())
    print("VIRTUAL_ENV:", os.environ.get("VIRTUAL_ENV"))
    print("sys.prefix:", sys.prefix)
    print("sys.base_prefix:", sys.base_prefix)
    print()


InstallDeps(LIBS)
ShowEnvironmentInfo()
import requests


@dataclass(frozen=True)
class BoxStyle:
    TL: str
    TR: str
    BL: str
    BR: str
    H: str
    V: str

class TitleBoxLineStyle(Enum):
    SIMPLE = BoxStyle("┌", "┐", "└", "┘", "─", "│")
    DOUBLE = BoxStyle("╔", "╗", "╚", "╝", "═", "║")
    ROUNDED = BoxStyle("╭", "╮", "╰", "╯", "─", "│")
    HEAVY = BoxStyle("┏", "┓", "┗", "┛", "━", "┃")
    ASCII = BoxStyle("+", "+", "+", "+", "-", "|")
    DOUBLE_BOLD = BoxStyle("╔", "╗", "╚", "╝", "╬", "║")
    BLOCK = BoxStyle("█", "█", "█", "█", "█", "█")
    HEAVY_CROSS = BoxStyle("╒", "╕", "╘", "╛", "╪", "┃")
    METAL = BoxStyle("╞", "╡", "╘", "╛", "═", "║")


# Función para mostrar un título con recuadro
def ShowTitleBox(
    text: str,
    max_len: int = 100,
    boxLineStyle: TitleBoxLineStyle = TitleBoxLineStyle.SIMPLE,
    color: ConsoleColor = ConsoleColor.CYAN,
):
    try:

        def vislen(s: str) -> int:
            from wcwidth import wcswidth as _w

            n = _w(s)
            return n if n >= 0 else len(s)

    except Exception:

        def vislen(s: str) -> int:
            return len(s)

    pad = 1
    tlen = vislen(text)
    inner = max(max_len, tlen)
    left = (inner - tlen) // 2
    right = inner - tlen - left

    top = f"{boxLineStyle.value.TL}{boxLineStyle.value.H * (inner + 2 * pad)}{boxLineStyle.value.TR}"
    mid = f"{boxLineStyle.value.V}{' ' * pad}{' ' * left}{text}{' ' * right}{' ' * pad}{boxLineStyle.value.V}"
    bot = f"{boxLineStyle.value.BL}{boxLineStyle.value.H * (inner + 2 * pad)}{boxLineStyle.value.BR}"
    print(PrintColor("\n".join([top, mid, bot]), color))


# Función para descargar un archivo
def DownloadFile(uri: str, filename: str, overwrite: bool = False, timeout: int = 20):
    dest = Path(filename).resolve()
    if dest.exists() and dest.is_file() and dest.stat().st_size > 0 and not overwrite:
        print(
            f'✅ Ya existe: "{dest}". No se descarga (use overwrite=True para forzar).'
        )
        return
    if dest.parent and not dest.parent.exists():
        dest.parent.mkdir(parents=True, exist_ok=True)
    print(f'ℹ️ Descargando "{uri}" → "{dest}"')
    try:
        with requests.get(uri, stream=True, timeout=timeout) as resp:
            resp.raise_for_status()
            tmp = dest.with_suffix(dest.suffix + ".part")
            with open(tmp, "wb") as f:
                for chunk in resp.iter_content(chunk_size=1024 * 64):
                    if chunk:  # filtra keep-alive chunks
                        f.write(chunk)
            tmp.replace(dest)
        print(f'✅ Archivo "{dest}" descargado exitosamente.')
    except requests.exceptions.RequestException as e:
        print(f"❌ Error al descargar: {e}")


# Función para descomprimir un archivo zip
def UnzipFile(filename: str, outputDir: str):
    print(f'ℹ️ Descomprimiendo "{filename}" en "{outputDir}"')
    try:
        with zipfile.ZipFile(filename, "r") as zip_ref:
            zip_ref.extractall(outputDir)
        print(f"Descomprimido en: {os.path.abspath(outputDir)}")
    except Exception as e:
        print(f"Error: {e}")



In [25]:
# Importar librerías
import pandas as pd
import pandas
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.utils import Bunch
from sklearn.datasets import load_breast_cancer

from matplotlib.figure import Figure
from matplotlib.axes import Axes

warnings.filterwarnings("ignore")

# Configurar opciones de Pandas
pd.set_option("display.float_format", "{:.2f}".format)
pandas.set_option("display.max_rows", None)
pandas.set_option("display.max_columns", None) 

# Función para mostrar la información del DataFrame.
def ShowDfInfo(df: pandas.DataFrame, title):
    display(f"ℹ️ INFO {title} ℹ️")
    df.info()
    display()


# Función para mostrar las n primeras filas del DataFrame.
def ShowDfHead(df: pandas.DataFrame, title: str, headQty=10):
    display(f"ℹ️ {title}: Primeros {headQty} elementos.")
    display(df.head(headQty))
    display()


# Función para mostrar las n últimas filas del DataFrame.
def ShowDfTail(df: pandas.DataFrame, title: str, tailQty=10):
    display(f"ℹ️ {title}: Últimos {tailQty} elementos.")
    display(df.tail(tailQty))
    display()


# Mostrar el tamaño del DataFrame
def ShowDfShape(df: pandas.DataFrame, title: str):
    display(f"ℹ️ {title} - Tamaño de los datos")
    display(f"{df.shape[0]} filas x {df.shape[1]} columnas")
    display()


# Función para mostrar la estadística descriptiva de todas las columnas del DataFrame, por tipo de dato.
def ShowDfStats(df: pandas.DataFrame, title: str = ""):
    display(f"ℹ️ Estadística descriptiva - {title}")
    numeric_cols = df.select_dtypes(include="number")
    if not numeric_cols.empty:
        display("    🔢 Columnas numéricas".upper())
        numeric_desc = (
            numeric_cols.describe().round(2).T
        )  # Transpuesta para añadir columna
        numeric_desc["var"] = numeric_cols.var(numeric_only=True).round(2)
        display(numeric_desc.T)
    non_numeric_cols = df.select_dtypes(
        include=["boolean", "string", "category", "object"]
    )
    if not non_numeric_cols.empty:
        display("    🔡 Columnas no numéricas".upper())
        non_numeric_desc = non_numeric_cols.describe()
        display(non_numeric_desc)
    datetime_cols = df.select_dtypes(include=["datetime", "datetimetz"])
    if not datetime_cols.empty:
        display("    📅 Columnas fechas".upper())
        datetime_desc = datetime_cols.describe()
        display(datetime_desc)


# Función para mostrar los valores nulos o NaN de cada columna en un DataFrame
def ShowDfNanValues(df: pandas.DataFrame, title: str):
    display(f"ℹ️ Contador de valores Nulos - {title}")
    nulls_count = df.isnull().sum()
    nulls_df = nulls_count.reset_index()
    nulls_df.columns = ["Columna", "Cantidad_Nulos"]
    display(nulls_df)
    display()


# Tipos de correlación
class CorrelationType(Enum):
    ALL = "all"
    STRONG = "strong"
    WEAK = "weak"


# Muestra las correlaciones completas, débiles y fuertes.
def ShowDfCorrelation(
    df: pandas.DataFrame,
    title: str,
    fig: Figure,
    ax: Axes,
    level: CorrelationType = CorrelationType.ALL,
    umbral: float = 0.6,  # |r| >= umbral => fuerte; |r| <= umbral => débil
    showTable: bool = False,
    annotate: bool = True,
):
    display(f"ℹ️ {(title).upper()} - Matriz de Correlación, Type: {level.name}")
    corr = df.select_dtypes(include=["number"]).corr().copy()
    if level == CorrelationType.STRONG:
        corr = corr.where(np.abs(corr) >= umbral)
    elif level == CorrelationType.WEAK:
        corr = corr.where(np.abs(corr) <= umbral)
        np.fill_diagonal(corr.values, 1)
    elif level != CorrelationType.ALL:
        raise ValueError(f"Invalid level: {level}")
    cax = ax.matshow(corr, vmin=-1, vmax=1)

    cols = corr.columns
    ax.set_xticks(range(len(cols)))
    ax.set_yticks(range(len(cols)))
    ax.set_xticklabels(cols, rotation=90, ha="left")
    ax.set_yticklabels(cols)

    fig.colorbar(cax)

    if annotate:
        for (i, j), value in np.ndenumerate(corr.values):
            if not np.isnan(value):
                ax.text(j, i, f"{value:+.2f}", ha="center", va="center")

    if level == CorrelationType.ALL:
        titulo = "Matriz de correlación completa"
    else:
        titulo = f"Matriz de correlación ({level.name}, umbral={umbral})"

    total_elementos = corr.size
    total_nodiagonal = corr.size - corr.shape[0]
    total_nan = corr.isna().sum().sum()
    total_validos = total_elementos - total_nan - corr.shape[0]
    titulo = f"{titulo}, Total Matriz: {total_nodiagonal}, Total válidos: {total_validos}({((total_validos*100)/total_nodiagonal):.2f}%)"

    ax.set_title(titulo, pad=20)
    ax.grid(False)
    plt.tight_layout()
    plt.show()
    if showTable:
        display(corr)
    return corr


from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from pandas import DataFrame
from pandas import Series

# Para almacenar los datos del dataset
@dataclass
class Dataset:
    X: DataFrame
    y: DataFrame

# Carga el dataset
def LoadDataset() -> Dataset:
    bc = cast(Bunch, load_breast_cancer(as_frame=True))
    df: DataFrame = bc.frame.copy()
    TARGET_NAME = "target"
    X = df.drop(columns=[TARGET_NAME])
    y = df[[TARGET_NAME]]
    y.columns = ["Diagnosis"]
    return Dataset(X, y)

# Para almacenar los datos de split del dataset.
@dataclass
class DatasetSplit:
    Train: Dataset
    Test: Dataset

# Realiza el split del Dataset, en Train y test utilizando el ratio.
def SplitDataset(
    data: Dataset, trainRatio: float = 0.8, randomState: int = 42
) -> DatasetSplit:
    y_strat = data.y.iloc[:, 0]
    XTrain, XTest, yTrain, yTest = train_test_split(
        data.X,
        data.y,
        train_size=trainRatio,
        random_state=randomState,
        stratify=y_strat,
    )
    return DatasetSplit(
        Train=Dataset(X=XTrain.reset_index(drop=True), y=yTrain.reset_index(drop=True)),
        Test=Dataset(X=XTest.reset_index(drop=True), y=yTest.reset_index(drop=True)),
    )

# Para almacenar los datos del dataset aplicado el escalador.
class ScaledDatasetSplit(DatasetSplit):
    pass

# Escala el split usando StandardScaler y retorna el split escalado.
def ScaleDatasetSplit(
    split: DatasetSplit, withMean: bool = True, withStd: bool = True
) -> ScaledDatasetSplit:
    scaler = StandardScaler(with_mean=withMean, with_std=withStd)
    XTrain = scaler.fit_transform(split.Train.X)
    XTest = scaler.transform(split.Test.X)
    XTrainScaled = split.Train.X.copy()
    XTestScaled = split.Test.X.copy()
    XTrainScaled.loc[:, :] = XTrain
    XTestScaled.loc[:, :] = XTest
    TrainScaled = Dataset(X=XTrainScaled, y=split.Train.y.copy())
    TestScaled = Dataset(X=XTestScaled, y=split.Test.y.copy())
    return ScaledDatasetSplit(Train=TrainScaled, Test=TestScaled)

# Muestra el head de cada componente del split.
def ShowDatasetSplitHead(split: DatasetSplit, title: str, headQty: int = 5):
    ShowDfHead(split.Train.X, f"{title} - X Train", headQty)
    ShowDfHead(split.Train.y, f"{title} - y Train", headQty)
    ShowDfHead(split.Test.X, f"{title} - X Test", headQty)
    ShowDfHead(split.Test.y, f"{title} - y Test", headQty)

# Almacena los datos del split aplicado PCA.
class PcaDatasetSplit(DatasetSplit):
    pass

# Aplica PCA al split escalado y retorna el split con PCA aplicado.
def ApplyPCA(
    scaledSplit: ScaledDatasetSplit,
    explainedVarianceRatioSum: float = 0.95,
    randomState: int = 42,
) -> PcaDatasetSplit:
    def GetPCNames(n: int) -> list[str]:
        pcs: list[str] = []
        for i in range(1, n + 1):
            pcs.append(f"PC{i}")
        return pcs

    pca = PCA(n_components=explainedVarianceRatioSum, random_state=randomState)
    XTrainPCA = pca.fit_transform(scaledSplit.Train.X)
    XTestPCA = pca.transform(scaledSplit.Test.X)
    XTrainPcaDf = pandas.DataFrame(XTrainPCA, columns=GetPCNames(XTrainPCA.shape[1]), index=scaledSplit.Train.X.index)
    XTestPcaDf = pandas.DataFrame(XTestPCA, columns=GetPCNames(XTestPCA.shape[1]), index=scaledSplit.Test.X.index)
    return PcaDatasetSplit(Dataset(X= XTrainPcaDf, y = scaledSplit.Train.y),
                           Dataset(X= XTestPcaDf, y = scaledSplit.Test.y))

# Muestra la información del Dataset
def ShowDatasetInfo(data: Dataset, title):
    tAux = title
    title = f"{tAux} - Caracteristicas - X"
    ShowDfInfo(data.X, title)
    ShowDfShape(data.X, title)
    ShowDfStats(data.X, title)
    ShowDfNanValues(data.X, title)
    ShowDfHead(data.X, title)
    ShowDfTail(data.X, title)
    title = f"{tAux} - Características - y"
    ShowDfInfo(data.y, title)
    ShowDfShape(data.y, title)
    ShowDfStats(data.y, title)
    ShowDfNanValues(data.y, title)
    ShowDfHead(data.y, title)
    ShowDfTail(data.y, title)






## 1️⃣ Exploración inicial del dataset

In [26]:

ShowTitleBox("EXPLORACIÓN INICIAL", color=ConsoleColor.MAGENTA, boxLineStyle= TitleBoxLineStyle.DOUBLE)
# ---

ShowTitleBox("CARGANDO EL DATASET", color=ConsoleColor.CYAN, boxLineStyle= TitleBoxLineStyle.SIMPLE)
data: Dataset = LoadDataset()
title = "Data original"
ShowDatasetInfo(data, title)

ShowTitleBox("HACIENDO SPLIT AL DATASET", color=ConsoleColor.CYAN, boxLineStyle= TitleBoxLineStyle.SIMPLE)
split: DatasetSplit = SplitDataset(data)
title = "Split del Dataset"
ShowDatasetSplitHead(split, title)

ShowTitleBox("HACIENDO ESCALADO AL SPLIT", color=ConsoleColor.CYAN, boxLineStyle= TitleBoxLineStyle.SIMPLE)
scaled = ScaleDatasetSplit(split)
title = "Split escalado"
ShowDatasetSplitHead(split, title)


ShowTitleBox("APLICANDO PCA AL SPLIT ESCALADO", color=ConsoleColor.CYAN, boxLineStyle= TitleBoxLineStyle.SIMPLE)
pcaSplit = ApplyPCA(scaled)
title = "Split con PCA"
ShowDatasetSplitHead(pcaSplit, title)

[95m╔══════════════════════════════════════════════════════════════════════════════════════════════════════╗
║                                         EXPLORACIÓN INICIAL                                          ║
╚══════════════════════════════════════════════════════════════════════════════════════════════════════╝[0m
[96m┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                         CARGANDO EL DATASET                                          │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘[0m


'ℹ️ INFO Data original - Caracteristicas - X ℹ️'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 30 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   mean radius              569 non-null    float64
 1   mean texture             569 non-null    float64
 2   mean perimeter           569 non-null    float64
 3   mean area                569 non-null    float64
 4   mean smoothness          569 non-null    float64
 5   mean compactness         569 non-null    float64
 6   mean concavity           569 non-null    float64
 7   mean concave points      569 non-null    float64
 8   mean symmetry            569 non-null    float64
 9   mean fractal dimension   569 non-null    float64
 10  radius error             569 non-null    float64
 11  texture error            569 non-null    float64
 12  perimeter error          569 non-null    float64
 13  area error               569 non-null    float64
 14  smoothness error         5

'ℹ️ Data original - Caracteristicas - X - Tamaño de los datos'

'569 filas x 30 columnas'

'ℹ️ Estadística descriptiva - Data original - Caracteristicas - X'

'    🔢 COLUMNAS NUMÉRICAS'

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
count,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0,569.0
mean,14.13,19.29,91.97,654.89,0.1,0.1,0.09,0.05,0.18,0.06,0.41,1.22,2.87,40.34,0.01,0.03,0.03,0.01,0.02,0.0,16.27,25.68,107.26,880.58,0.13,0.25,0.27,0.11,0.29,0.08
std,3.52,4.3,24.3,351.91,0.01,0.05,0.08,0.04,0.03,0.01,0.28,0.55,2.02,45.49,0.0,0.02,0.03,0.01,0.01,0.0,4.83,6.15,33.6,569.36,0.02,0.16,0.21,0.07,0.06,0.02
min,6.98,9.71,43.79,143.5,0.05,0.02,0.0,0.0,0.11,0.05,0.11,0.36,0.76,6.8,0.0,0.0,0.0,0.0,0.01,0.0,7.93,12.02,50.41,185.2,0.07,0.03,0.0,0.0,0.16,0.06
25%,11.7,16.17,75.17,420.3,0.09,0.06,0.03,0.02,0.16,0.06,0.23,0.83,1.61,17.85,0.01,0.01,0.02,0.01,0.02,0.0,13.01,21.08,84.11,515.3,0.12,0.15,0.11,0.06,0.25,0.07
50%,13.37,18.84,86.24,551.1,0.1,0.09,0.06,0.03,0.18,0.06,0.32,1.11,2.29,24.53,0.01,0.02,0.03,0.01,0.02,0.0,14.97,25.41,97.66,686.5,0.13,0.21,0.23,0.1,0.28,0.08
75%,15.78,21.8,104.1,782.7,0.11,0.13,0.13,0.07,0.2,0.07,0.48,1.47,3.36,45.19,0.01,0.03,0.04,0.01,0.02,0.0,18.79,29.72,125.4,1084.0,0.15,0.34,0.38,0.16,0.32,0.09
max,28.11,39.28,188.5,2501.0,0.16,0.35,0.43,0.2,0.3,0.1,2.87,4.88,21.98,542.2,0.03,0.14,0.4,0.05,0.08,0.03,36.04,49.54,251.2,4254.0,0.22,1.06,1.25,0.29,0.66,0.21
var,12.42,18.5,590.44,123843.55,0.0,0.0,0.01,0.0,0.0,0.0,0.08,0.3,4.09,2069.43,0.0,0.0,0.0,0.0,0.0,0.0,23.36,37.78,1129.13,324167.39,0.0,0.02,0.04,0.0,0.0,0.0


'ℹ️ Contador de valores Nulos - Data original - Caracteristicas - X'

Unnamed: 0,Columna,Cantidad_Nulos
0,mean radius,0
1,mean texture,0
2,mean perimeter,0
3,mean area,0
4,mean smoothness,0
5,mean compactness,0
6,mean concavity,0
7,mean concave points,0
8,mean symmetry,0
9,mean fractal dimension,0


'ℹ️ Data original - Caracteristicas - X: Primeros 10 elementos.'

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.12,0.28,0.3,0.15,0.24,0.08,1.09,0.91,8.59,153.4,0.01,0.05,0.05,0.02,0.03,0.01,25.38,17.33,184.6,2019.0,0.16,0.67,0.71,0.27,0.46,0.12
1,20.57,17.77,132.9,1326.0,0.08,0.08,0.09,0.07,0.18,0.06,0.54,0.73,3.4,74.08,0.01,0.01,0.02,0.01,0.01,0.0,24.99,23.41,158.8,1956.0,0.12,0.19,0.24,0.19,0.28,0.09
2,19.69,21.25,130.0,1203.0,0.11,0.16,0.2,0.13,0.21,0.06,0.75,0.79,4.58,94.03,0.01,0.04,0.04,0.02,0.02,0.0,23.57,25.53,152.5,1709.0,0.14,0.42,0.45,0.24,0.36,0.09
3,11.42,20.38,77.58,386.1,0.14,0.28,0.24,0.11,0.26,0.1,0.5,1.16,3.44,27.23,0.01,0.07,0.06,0.02,0.06,0.01,14.91,26.5,98.87,567.7,0.21,0.87,0.69,0.26,0.66,0.17
4,20.29,14.34,135.1,1297.0,0.1,0.13,0.2,0.1,0.18,0.06,0.76,0.78,5.44,94.44,0.01,0.02,0.06,0.02,0.02,0.01,22.54,16.67,152.2,1575.0,0.14,0.2,0.4,0.16,0.24,0.08
5,12.45,15.7,82.57,477.1,0.13,0.17,0.16,0.08,0.21,0.08,0.33,0.89,2.22,27.19,0.01,0.03,0.04,0.01,0.02,0.01,15.47,23.75,103.4,741.6,0.18,0.52,0.54,0.17,0.4,0.12
6,18.25,19.98,119.6,1040.0,0.09,0.11,0.11,0.07,0.18,0.06,0.45,0.77,3.18,53.91,0.0,0.01,0.02,0.01,0.01,0.0,22.88,27.66,153.2,1606.0,0.14,0.26,0.38,0.19,0.31,0.08
7,13.71,20.83,90.2,577.9,0.12,0.16,0.09,0.06,0.22,0.07,0.58,1.38,3.86,50.96,0.01,0.03,0.02,0.01,0.01,0.01,17.06,28.14,110.6,897.0,0.17,0.37,0.27,0.16,0.32,0.12
8,13.0,21.82,87.5,519.8,0.13,0.19,0.19,0.09,0.23,0.07,0.31,1.0,2.41,24.32,0.01,0.04,0.04,0.01,0.02,0.0,15.49,30.73,106.2,739.3,0.17,0.54,0.54,0.21,0.44,0.11
9,12.46,24.04,83.97,475.9,0.12,0.24,0.23,0.09,0.2,0.08,0.3,1.6,2.04,23.94,0.01,0.07,0.08,0.01,0.02,0.01,15.09,40.68,97.65,711.4,0.19,1.06,1.1,0.22,0.44,0.21


'ℹ️ Data original - Caracteristicas - X: Últimos 10 elementos.'

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
559,11.51,23.93,74.52,403.5,0.09,0.1,0.11,0.04,0.14,0.07,0.24,2.9,1.94,16.97,0.01,0.03,0.06,0.01,0.01,0.0,12.48,37.16,82.28,474.2,0.13,0.25,0.36,0.1,0.21,0.09
560,14.05,27.15,91.38,600.4,0.1,0.11,0.04,0.04,0.15,0.06,0.36,1.49,2.89,29.84,0.01,0.03,0.02,0.02,0.02,0.01,15.3,33.17,100.2,706.7,0.12,0.23,0.13,0.1,0.23,0.08
561,11.2,29.37,70.67,386.0,0.07,0.04,0.0,0.0,0.11,0.06,0.31,3.9,2.04,22.81,0.01,0.01,0.0,0.0,0.02,0.0,11.92,38.3,75.19,439.6,0.09,0.05,0.0,0.0,0.16,0.06
562,15.22,30.62,103.4,716.9,0.1,0.21,0.26,0.09,0.21,0.07,0.26,1.21,2.36,22.65,0.0,0.05,0.07,0.02,0.02,0.01,17.52,42.79,128.7,915.0,0.14,0.79,1.17,0.24,0.41,0.14
563,20.92,25.09,143.0,1347.0,0.11,0.22,0.32,0.15,0.21,0.07,0.96,1.03,8.76,118.8,0.01,0.04,0.08,0.03,0.02,0.01,24.29,29.41,179.1,1819.0,0.14,0.42,0.66,0.25,0.29,0.1
564,21.56,22.39,142.0,1479.0,0.11,0.12,0.24,0.14,0.17,0.06,1.18,1.26,7.67,158.7,0.01,0.03,0.05,0.02,0.01,0.0,25.45,26.4,166.1,2027.0,0.14,0.21,0.41,0.22,0.21,0.07
565,20.13,28.25,131.2,1261.0,0.1,0.1,0.14,0.1,0.18,0.06,0.77,2.46,5.2,99.04,0.01,0.02,0.04,0.02,0.02,0.0,23.69,38.25,155.0,1731.0,0.12,0.19,0.32,0.16,0.26,0.07
566,16.6,28.08,108.3,858.1,0.08,0.1,0.09,0.05,0.16,0.06,0.46,1.07,3.42,48.55,0.01,0.04,0.05,0.02,0.01,0.0,18.98,34.12,126.7,1124.0,0.11,0.31,0.34,0.14,0.22,0.08
567,20.6,29.33,140.1,1265.0,0.12,0.28,0.35,0.15,0.24,0.07,0.73,1.59,5.77,86.22,0.01,0.06,0.07,0.02,0.02,0.01,25.74,39.42,184.6,1821.0,0.17,0.87,0.94,0.27,0.41,0.12
568,7.76,24.54,47.92,181.0,0.05,0.04,0.0,0.0,0.16,0.06,0.39,1.43,2.55,19.15,0.01,0.0,0.0,0.0,0.03,0.0,9.46,30.37,59.16,268.6,0.09,0.06,0.0,0.0,0.29,0.07


'ℹ️ INFO Data original - Características - y ℹ️'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 1 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   Diagnosis  569 non-null    int64
dtypes: int64(1)
memory usage: 4.6 KB


'ℹ️ Data original - Características - y - Tamaño de los datos'

'569 filas x 1 columnas'

'ℹ️ Estadística descriptiva - Data original - Características - y'

'    🔢 COLUMNAS NUMÉRICAS'

Unnamed: 0,Diagnosis
count,569.0
mean,0.63
std,0.48
min,0.0
25%,0.0
50%,1.0
75%,1.0
max,1.0
var,0.23


'ℹ️ Contador de valores Nulos - Data original - Características - y'

Unnamed: 0,Columna,Cantidad_Nulos
0,Diagnosis,0


'ℹ️ Data original - Características - y: Primeros 10 elementos.'

Unnamed: 0,Diagnosis
0,0
1,0
2,0
3,0
4,0
5,0
6,0
7,0
8,0
9,0


'ℹ️ Data original - Características - y: Últimos 10 elementos.'

Unnamed: 0,Diagnosis
559,1
560,1
561,1
562,0
563,0
564,0
565,0
566,0
567,0
568,1


[96m┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                      HACIENDO SPLIT AL DATASET                                       │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘[0m


'ℹ️ Split del Dataset - X Train: Primeros 5 elementos.'

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,10.32,16.35,65.31,324.9,0.09,0.05,0.01,0.01,0.19,0.06,0.21,0.97,1.36,12.97,0.01,0.01,0.01,0.01,0.02,0.0,11.25,21.77,71.12,384.9,0.13,0.09,0.04,0.02,0.27,0.07
1,20.18,19.54,133.8,1250.0,0.11,0.15,0.21,0.13,0.17,0.06,0.43,1.0,3.01,52.49,0.01,0.03,0.06,0.02,0.02,0.0,22.03,25.07,146.0,1479.0,0.17,0.29,0.53,0.22,0.3,0.08
2,10.66,15.15,67.49,349.6,0.09,0.04,0.0,0.0,0.19,0.06,0.33,1.93,2.15,21.98,0.01,0.01,0.0,0.0,0.03,0.0,11.54,19.2,73.2,408.3,0.11,0.07,0.0,0.0,0.27,0.06
3,13.56,13.9,88.59,561.3,0.11,0.12,0.08,0.04,0.2,0.06,0.26,0.5,2.01,21.03,0.01,0.02,0.03,0.01,0.02,0.0,14.98,17.13,101.1,686.6,0.14,0.27,0.26,0.09,0.31,0.08
4,11.37,18.89,72.17,396.0,0.09,0.05,0.02,0.02,0.2,0.06,0.27,1.97,1.95,17.49,0.01,0.01,0.01,0.01,0.03,0.0,12.36,26.14,79.29,459.3,0.11,0.1,0.08,0.06,0.33,0.07


'ℹ️ Split del Dataset - y Train: Primeros 5 elementos.'

Unnamed: 0,Diagnosis
0,1
1,0
2,1
3,1
4,1


'ℹ️ Split del Dataset - X Test: Primeros 5 elementos.'

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,19.55,28.77,133.6,1207.0,0.09,0.21,0.18,0.11,0.19,0.06,0.84,1.2,7.16,106.4,0.01,0.05,0.04,0.02,0.02,0.01,25.05,36.27,178.6,1926.0,0.13,0.53,0.43,0.19,0.28,0.1
1,11.13,16.62,70.47,381.1,0.08,0.04,0.01,0.01,0.15,0.06,0.14,0.97,0.97,9.7,0.01,0.01,0.01,0.01,0.02,0.0,11.68,20.29,74.35,421.1,0.1,0.06,0.05,0.04,0.24,0.07
2,13.82,24.49,92.33,595.9,0.12,0.17,0.14,0.07,0.23,0.07,0.48,1.53,2.97,39.05,0.01,0.04,0.03,0.02,0.02,0.01,16.01,32.94,106.0,788.0,0.18,0.4,0.34,0.15,0.37,0.12
3,16.5,18.29,106.6,838.1,0.1,0.08,0.06,0.05,0.15,0.06,0.34,1.44,2.34,33.58,0.01,0.02,0.02,0.01,0.02,0.0,18.13,25.45,117.2,1009.0,0.13,0.17,0.17,0.09,0.24,0.06
4,21.56,22.39,142.0,1479.0,0.11,0.12,0.24,0.14,0.17,0.06,1.18,1.26,7.67,158.7,0.01,0.03,0.05,0.02,0.01,0.0,25.45,26.4,166.1,2027.0,0.14,0.21,0.41,0.22,0.21,0.07


'ℹ️ Split del Dataset - y Test: Primeros 5 elementos.'

Unnamed: 0,Diagnosis
0,0
1,1
2,0
3,1
4,0


[96m┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                      HACIENDO ESCALADO AL SPLIT                                      │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘[0m


'ℹ️ Split escalado - X Train: Primeros 5 elementos.'

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,10.32,16.35,65.31,324.9,0.09,0.05,0.01,0.01,0.19,0.06,0.21,0.97,1.36,12.97,0.01,0.01,0.01,0.01,0.02,0.0,11.25,21.77,71.12,384.9,0.13,0.09,0.04,0.02,0.27,0.07
1,20.18,19.54,133.8,1250.0,0.11,0.15,0.21,0.13,0.17,0.06,0.43,1.0,3.01,52.49,0.01,0.03,0.06,0.02,0.02,0.0,22.03,25.07,146.0,1479.0,0.17,0.29,0.53,0.22,0.3,0.08
2,10.66,15.15,67.49,349.6,0.09,0.04,0.0,0.0,0.19,0.06,0.33,1.93,2.15,21.98,0.01,0.01,0.0,0.0,0.03,0.0,11.54,19.2,73.2,408.3,0.11,0.07,0.0,0.0,0.27,0.06
3,13.56,13.9,88.59,561.3,0.11,0.12,0.08,0.04,0.2,0.06,0.26,0.5,2.01,21.03,0.01,0.02,0.03,0.01,0.02,0.0,14.98,17.13,101.1,686.6,0.14,0.27,0.26,0.09,0.31,0.08
4,11.37,18.89,72.17,396.0,0.09,0.05,0.02,0.02,0.2,0.06,0.27,1.97,1.95,17.49,0.01,0.01,0.01,0.01,0.03,0.0,12.36,26.14,79.29,459.3,0.11,0.1,0.08,0.06,0.33,0.07


'ℹ️ Split escalado - y Train: Primeros 5 elementos.'

Unnamed: 0,Diagnosis
0,1
1,0
2,1
3,1
4,1


'ℹ️ Split escalado - X Test: Primeros 5 elementos.'

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,radius error,texture error,perimeter error,area error,smoothness error,compactness error,concavity error,concave points error,symmetry error,fractal dimension error,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,19.55,28.77,133.6,1207.0,0.09,0.21,0.18,0.11,0.19,0.06,0.84,1.2,7.16,106.4,0.01,0.05,0.04,0.02,0.02,0.01,25.05,36.27,178.6,1926.0,0.13,0.53,0.43,0.19,0.28,0.1
1,11.13,16.62,70.47,381.1,0.08,0.04,0.01,0.01,0.15,0.06,0.14,0.97,0.97,9.7,0.01,0.01,0.01,0.01,0.02,0.0,11.68,20.29,74.35,421.1,0.1,0.06,0.05,0.04,0.24,0.07
2,13.82,24.49,92.33,595.9,0.12,0.17,0.14,0.07,0.23,0.07,0.48,1.53,2.97,39.05,0.01,0.04,0.03,0.02,0.02,0.01,16.01,32.94,106.0,788.0,0.18,0.4,0.34,0.15,0.37,0.12
3,16.5,18.29,106.6,838.1,0.1,0.08,0.06,0.05,0.15,0.06,0.34,1.44,2.34,33.58,0.01,0.02,0.02,0.01,0.02,0.0,18.13,25.45,117.2,1009.0,0.13,0.17,0.17,0.09,0.24,0.06
4,21.56,22.39,142.0,1479.0,0.11,0.12,0.24,0.14,0.17,0.06,1.18,1.26,7.67,158.7,0.01,0.03,0.05,0.02,0.01,0.0,25.45,26.4,166.1,2027.0,0.14,0.21,0.41,0.22,0.21,0.07


'ℹ️ Split escalado - y Test: Primeros 5 elementos.'

Unnamed: 0,Diagnosis
0,0
1,1
2,0
3,1
4,0


[96m┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                   APLICANDO PCA AL SPLIT ESCALADO                                    │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘[0m


'ℹ️ Split con PCA - X Train: Primeros 5 elementos.'

Unnamed: 0,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10
0,-4.17,0.26,-0.35,-0.65,-0.64,-0.11,0.19,-0.26,-0.48,0.7
1,4.6,-0.97,-0.56,-0.88,0.03,-1.12,-2.12,0.32,0.79,-0.28
2,-4.56,-0.17,1.51,-0.71,-1.87,1.24,0.23,-0.12,0.74,-0.31
3,-0.88,0.44,-1.45,-1.96,-0.01,0.26,-0.09,-0.18,0.09,0.25
4,-2.95,0.37,1.18,0.57,-1.34,1.91,-0.51,-0.39,0.24,-0.05


'ℹ️ Split con PCA - y Train: Primeros 5 elementos.'

Unnamed: 0,Diagnosis
0,1
1,0
2,1
3,1
4,1


'ℹ️ Split con PCA - X Test: Primeros 5 elementos.'

Unnamed: 0,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10
0,6.42,-2.04,0.63,1.72,0.41,-0.1,1.46,0.52,0.23,0.37
1,-4.6,-0.69,0.16,-0.48,0.58,0.33,0.2,-0.01,0.16,-0.07
2,2.82,3.22,-0.75,1.24,-1.67,-0.64,0.22,0.11,-0.05,0.92
3,-0.73,-2.13,0.29,-0.11,-0.02,-0.88,-0.77,-0.06,0.79,-0.34
4,6.52,-3.78,2.51,-1.17,0.09,-2.39,-0.46,-0.42,-1.0,0.34


'ℹ️ Split con PCA - y Test: Primeros 5 elementos.'

Unnamed: 0,Diagnosis
0,0
1,1
2,0
3,1
4,0
