### Model Configuration Summary

The following table summarizes the preprocessing strategies applied to each model. All models include dataset balancing, while the use of encoding, scaling, and PCA varies according to the theoretical and practical requirements of each approach.

| Model               | Transformations                  | Scaling (Standard / MinMax)  | PCA | Balancing | 
|---------------------|----------------------------------|------------------------------|-----|-----------|
| Neural Network      | Yes                              | Yes                          | Yes | Yes       |
| Neural Network      | Yes                              | Yes                          | No  | Yes       |
| KNN                 | Yes                              | Yes                          | Yes | Yes       |
| KNN                 | Yes                              | Yes                          | No  | Yes       |
| Clustering          | Yes                              | Yes                          | Yes | Yes       |
| Clustering          | Yes                              | Yes                          | No  | Yes       |
| Bayesian Network    | Yes                              | No                           | No  | Yes       |


## Índice
- [1. Carga del Data Set](#sec1)
- [2. Balanceo del Dataset](#sec2)
  - [2.1 Verificación](#sec21)
  - [2.2 Corrección con SMOTE](#sec22)
  - [2.3 Análisis del Data Set](#sec23)
- [3. Funciones de Carga y Generación de Variables](#sec3)
- [4. Modelado y Optimización](#sec4)
  - [4.1 RN Base](#sec41)
  - [4.2 RN optimizada parámetros de compilación](#sec42)
  - [4.3 RN optimizada parámetros de densidad de capas](#sec43)
  - [4.4 RN optimizada parámetros funciones de capas](#sec44)
  - [4.5 RN optimizada Dropout](#sec45)
- [5. Comparación de Modelos](#sec5)
- [6. Predicción con Nuevos Casos](#sec6)
- [7. Conclusiones](#sec7)


In [1]:
# Bibliotecas a usar
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import joblib
import tensorflow as tf
import mlflow
import plotly.express as px
import plotly.graph_objects as go

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from imblearn.over_sampling import SMOTE

2025-12-27 21:50:56.926555: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-12-27 21:50:56.926882: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-12-27 21:50:56.977161: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-12-27 21:50:58.529410: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To tur

## <span id="sec1"></span> **1. Datasets Upload**

In [2]:
df_geometric = pd.read_csv("../_attachments/datasets/data-set-geometric-labels.csv", sep=',')
print("Dataset geometric loaded successfully")
df_geometric.head()

Dataset geometric loaded successfully


Unnamed: 0,AGE,AQ10_TOTAL_SCORE,FINE_DETAILS_AQ_1,GLOBAL_FOCUS_AQ_2,FOLLOW_CONVERSATIONS_AQ_3,TASK_SWITCHING_AQ_4,UNDERSTAND_INTENTIONS_AQ_5,NOTICE_INTEREST_AQ_6,SOCIAL_IMAGINATION_AQ_7,INTENSE_INTERESTS_AQ_8,...,ETHNICITY_Hispanic,ETHNICITY_Latino,ETHNICITY_Middle Eastern,ETHNICITY_Others,ETHNICITY_Pasifika,ETHNICITY_South Asian,ETHNICITY_Turkish,ETHNICITY_White-European,ETHNICITY_others,ASD_TARGET
0,26.0,6.0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0
1,24.0,5.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0
2,27.0,8.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
3,35.0,6.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0
4,40.0,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0


In [3]:
df_geometric = pd.read_csv("../_attachments/datasets/data-set-bayesian-labels.csv", sep=',')
print("Dataset geometric loaded successfully")
df_geometric.head()

Dataset geometric loaded successfully


Unnamed: 0,FINE_DETAILS_AQ_1,GLOBAL_FOCUS_AQ_2,FOLLOW_CONVERSATIONS_AQ_3,TASK_SWITCHING_AQ_4,UNDERSTAND_INTENTIONS_AQ_5,NOTICE_INTEREST_AQ_6,SOCIAL_IMAGINATION_AQ_7,INTENSE_INTERESTS_AQ_8,READ_FACIAL_EXPRESSIONS_AQ_9,SOCIAL_FRIENDSHIP_AQ_10,JAUNDICE,AUTISM_FAMILY_HISTORY,USED_APP_BEFORE,GENDER,ETHNICITY,AGE,AQ10_TOTAL_SCORE,ASD_TARGET
0,1.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0,1.0,2.0,0
1,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,4.0,1.0,2.0,0
2,1.0,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,4.0,2.0,3.0,1
3,1.0,1.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,10.0,3.0,2.0,0
4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0


## <span id="sec2"></span> **2. PCA**

### <span id="sec21"></span> **2.1 Split and Scaling**


In [18]:
X_geometric = df_geometric.drop(columns=["ASD_TARGET"])
y_geometric = df_geometric["ASD_TARGET"]

X_geometric_train, X_geometric_test, y_geometric_train, y_geometric_test = train_test_split(
    X_geometric,
    y_geometric,
    test_size=0.2,
    random_state=42,
    stratify=y_geometric
)

scaler_geometric = StandardScaler()

X_geometric_train_scaled = scaler_geometric.fit_transform(
    X_geometric_train
)

X_geometric_test_scaled = scaler_geometric.transform(
    X_geometric_test
)

### <span id="sec22"></span> **2.2 Explore PCA to Geometric dataset**
An exploratory PCA was performed on the geometric dataset to analyze the distribution of explained variance across principal components and assess feature redundancy.

In [5]:
pca_2 = PCA(n_components=2, random_state=42)
X_pca_2 = pca_2.fit_transform(X_geometric_train_scaled)

df_pca_2 = pd.DataFrame(
    X_pca_2,
    columns=["PC1", "PC2"]
)

df_pca_2["ASD_TARGET"] = y_geometric_train.values

fig_2d = px.scatter(
    df_pca_2,
    x="PC1",
    y="PC2",
    color="ASD_TARGET",
    #title="PCA 2D – Geometric Dataset",
    opacity=0.7,
    height=1144
)

fig_2d.show()


In [6]:
pca_3 = PCA(n_components=3, random_state=42)
X_pca_3 = pca_3.fit_transform(X_geometric_train_scaled)

df_pca_3 = pd.DataFrame(
    X_pca_3,
    columns=["PC1", "PC2", "PC3"]
)

df_pca_3["ASD_TARGET"] = y_geometric_train.values

fig_3d = px.scatter_3d(
    df_pca_3,
    x="PC1",
    y="PC2",
    z="PC3",
    color="ASD_TARGET",
    #title="PCA 3D – Geometric Dataset",
    opacity=0.7,
    height=1144
)

fig_3d.show()


### <span id="sec23"></span> **2.3 PCA full variables**

In [None]:
n_samples_geometric = X_geometric_train_scaled.shape[0]
n_features_geometric = X_geometric_train_scaled.shape[1]

max_components_geometric = min(
    n_samples_geometric,
    n_features_geometric
)

print("Number of training samples:", n_samples_geometric)
print("Number of features:", n_features_geometric)
print("Max possible PCA components:", max_components_geometric)

############################################################################################
############################################################################################

pca_geometric_full = PCA(
    n_components=max_components_geometric,
    random_state=42
)

X_geometric_train_pca_full = pca_geometric_full.fit_transform(
    X_geometric_train_scaled
)

print("\nPCA full applied")
print("PCA full train shape:")

Number of training samples: 563
Number of features: 17
Max possible PCA components: 17

PCA full applied
PCA full train shape:


In [17]:
explained_variance_geometric = pca_geometric_full.explained_variance_ratio_
cumulative_variance_geometric = np.cumsum(explained_variance_geometric)

print("\nExplained variance per component:")
print(np.round(explained_variance_geometric[:n_features_geometric], 5))

print("\nCumulative explained variance:")
print(np.round(cumulative_variance_geometric[:n_features_geometric], 5))

print("\nPCA Explained Variance all components\n")

for i in range(n_features_geometric):
    print(
        f"PC{i+1:02d} | "
        f"Explained variance: {explained_variance_geometric[i]:.5f} | "
        f"Cumulative variance: {cumulative_variance_geometric[i]:.5f}"
    )




Explained variance per component:
[0.23839 0.08867 0.07128 0.06348 0.06164 0.05917 0.05793 0.0517  0.04695
 0.0446  0.04155 0.03928 0.03683 0.03423 0.03167 0.0265  0.00612]

Cumulative explained variance:
[0.23839 0.32706 0.39834 0.46181 0.52345 0.58263 0.64056 0.69226 0.73921
 0.78381 0.82536 0.86464 0.90147 0.9357  0.96737 0.99388 1.     ]

PCA Explained Variance all components

PC01 | Explained variance: 0.23839 | Cumulative variance: 0.23839
PC02 | Explained variance: 0.08867 | Cumulative variance: 0.32706
PC03 | Explained variance: 0.07128 | Cumulative variance: 0.39834
PC04 | Explained variance: 0.06348 | Cumulative variance: 0.46181
PC05 | Explained variance: 0.06164 | Cumulative variance: 0.52345
PC06 | Explained variance: 0.05917 | Cumulative variance: 0.58263
PC07 | Explained variance: 0.05793 | Cumulative variance: 0.64056
PC08 | Explained variance: 0.05170 | Cumulative variance: 0.69226
PC09 | Explained variance: 0.04695 | Cumulative variance: 0.73921
PC10 | Explained vari

In [9]:
components = np.arange(1, len(explained_variance_geometric) + 1)

fig = go.Figure()

fig.add_trace(
    go.Scatter(
        x=components,
        y=explained_variance_geometric,
        mode="lines+markers",
        name="Explained Variance",
        hovertemplate=(
            "Component: %{x}<br>"
            "Explained variance: %{y:.4f}<extra></extra>"
        )
    )
)

fig.add_trace(
    go.Scatter(
        x=components,
        y=cumulative_variance_geometric,
        mode="lines+markers",
        name="Cumulative Explained Variance",
        hovertemplate=(
            "Component: %{x}<br>"
            "Cumulative variance: %{y:.4f}<extra></extra>"
        )
    )
)

fig.add_hline(
    y=0.80,
    line_dash="dash",
    annotation_text="80% variance"
)

fig.update_layout(
    #title="Explained and Cumulative Explained Variance",
    xaxis_title="Principal Component",
    yaxis_title="Variance Ratio",
    height=900,
)

fig.show()

### <span id="sec24"></span> **2.4 Select PCA to Geometric dataset**
Based on the PCA results, the dimensionality of the geometric dataset was reduced by retaining the first 11 principal components, as they collectively explain approximately 82.5% of the total variance. This selection exceeds the predefined 80% variance threshold, ensuring that most of the original information is preserved while reducing the dimensionality of the feature space. The remaining components contribute marginally to the explained variance and were therefore excluded to avoid unnecessary complexity without significant informational gain.