# üß† Deep Learning: Arquitecturas Neuronales Profundas
## Representaciones Jer√°rquicas y Redes Neuronales

<a href="https://colab.research.google.com/github/Jomucon21muri/Aprendizaje_automatico/blob/main/01_Sistemas_aprendizaje_automatico/04_Deep_learning/deep_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---

## üìã Resumen del Notebook

Este notebook explora **Deep Learning**, el paradigma que ha revolucionado la IA mediante redes neuronales con m√∫ltiples capas que aprenden representaciones jer√°rquicas.

### üéØ Objetivos de Aprendizaje

1. **Fundamentos Te√≥ricos**:
   - Aprendizaje de representaciones jer√°rquicas
   - Retropropagaci√≥n y optimizaci√≥n en redes profundas
   - Vanishing/Exploding gradients y soluciones

2. **Redes Convolucionales (CNNs)**:
   - Operaci√≥n de convoluci√≥n, pooling, stride, padding
   - Arquitecturas: LeNet, AlexNet, VGG, ResNet
   - Clasificaci√≥n de im√°genes (MNIST, CIFAR-10)
   - Transfer Learning

3. **Redes Recurrentes (RNNs/LSTMs)**:
   - Procesamiento de secuencias temporales
   - LSTM y GRU para memoria de largo plazo
   - Aplicaciones en NLP y series temporales

4. **T√©cnicas Avanzadas**:
   - Dropout, Batch Normalization, Data Augmentation
   - Autoencoders y reducci√≥n de dimensionalidad
   - Fine-tuning de modelos pre-entrenados

### üìä Contenido

- Teor√≠a con fundamentaci√≥n matem√°tica
- Implementaciones con TensorFlow/Keras
- Visualizaciones de arquitecturas y feature maps
- Ejemplos pr√°cticos completos
- Comparaci√≥n de arquitecturas

---

In [None]:
# Configuraci√≥n del entorno
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Deep Learning con TensorFlow/Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, optimizers, callbacks
from tensorflow.keras.datasets import mnist, cifar10, fashion_mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.applications import VGG16, ResNet50, MobileNetV2
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import warnings

warnings.filterwarnings('ignore')
sns.set_style('darkgrid')
plt.rcParams['figure.dpi'] = 100

print("‚úÖ Entorno configurado correctamente")
print(f"TensorFlow: {tf.__version__}")
print(f"Keras: {keras.__version__}")
gpus = tf.config.list_physical_devices('GPU')
print(f"GPU disponible: {'S√≠ - ' + str(len(gpus)) + ' GPU(s)' if gpus else 'No (usando CPU)'}")

## 1. Fundamentos: Red Neuronal Simple (MNIST)

### üéØ Construcci√≥n de una Red Neuronal Densa

Comenzaremos con una red fully-connected simple para clasificaci√≥n de d√≠gitos manuscritos (MNIST).

**Arquitectura**:
- Input: 784 p√≠xeles (28x28)
- Hidden Layer 1: 128 neuronas + ReLU + Dropout(0.2)
- Hidden Layer 2: 64 neuronas + ReLU + Dropout(0.2)
- Output: 10 neuronas + Softmax (d√≠gitos 0-9)

**Conceptos Clave**:
- **ReLU**: $f(x) = \max(0, x)$ - Evita vanishing gradient
- **Softmax**: $\sigma(z)_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}$ - Probabilidades de clase
- **Dropout**: Regularizaci√≥n que desactiva neuronas aleatoriamente
- **Cross-Entropy Loss**: $\mathcal{L} = -\sum_{i} y_i \log(\hat{y}_i)$

In [None]:
# Ejemplo Completo: Red Neuronal Densa + CNN + Transfer Learning en MNIST y CIFAR-10

print("üî¨ PARTE 1: RED NEURONAL DENSA EN MNIST\n")

# Cargar MNIST
(X_train_mnist, y_train_mnist), (X_test_mnist, y_test_mnist) = mnist.load_data()

# Preprocesar
X_train_mnist_flat = X_train_mnist.reshape(-1, 784).astype('float32') / 255.0
X_test_mnist_flat = X_test_mnist.reshape(-1, 784).astype('float32') / 255.0
y_train_mnist_cat = to_categorical(y_train_mnist, 10)
y_test_mnist_cat = to_categorical(y_test_mnist, 10)

print(f"MNIST: {X_train_mnist.shape[0]} train, {X_test_mnist.shape[0]} test")

# Crear modelo denso
model_dense = models.Sequential([
    layers.Input(shape=(784,)),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')
])

model_dense.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

print("\nEntrenando red densa...")
history_dense = model_dense.fit(
    X_train_mnist_flat, y_train_mnist_cat,
    epochs=10, batch_size=128, verbose=0,
    validation_split=0.1
)

test_loss_dense, test_acc_dense = model_dense.evaluate(X_test_mnist_flat, y_test_mnist_cat, verbose=0)
print(f"‚úÖ Red Densa - Test Accuracy: {test_acc_dense:.4f}")

print("\n" + "="*70)
print("üî¨ PARTE 2: CNN EN MNIST")
print("="*70)

# Preprocesar para CNN (mantener estructura 2D)
X_train_mnist_cnn = X_train_mnist.reshape(-1, 28, 28, 1).astype('float32') / 255.0
X_test_mnist_cnn = X_test_mnist.reshape(-1, 28, 28, 1).astype('float32') / 255.0

# Crear CNN (inspirada en LeNet)
model_cnn = models.Sequential([
    layers.Input(shape=(28, 28, 1)),
    
    # Bloque Conv 1
    layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),
    
    # Bloque Conv 2
    layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),
    
    # Bloque Conv 3
    layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    
    # Clasificador
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])

model_cnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

print("\nEntrenando CNN...")
history_cnn = model_cnn.fit(
    X_train_mnist_cnn, y_train_mnist_cat,
    epochs=10, batch_size=128, verbose=0,
    validation_split=0.1
)

test_loss_cnn, test_acc_cnn = model_cnn.evaluate(X_test_mnist_cnn, y_test_mnist_cat, verbose=0)
print(f"‚úÖ CNN - Test Accuracy: {test_acc_cnn:.4f}")

print("\n" + "="*70)
print("üî¨ PARTE 3: CNN EN CIFAR-10 (im√°genes a color)")
print("="*70)

# Cargar CIFAR-10 (subset para rapidez)
(X_train_cifar, y_train_cifar), (X_test_cifar, y_test_cifar) = cifar10.load_data()

# Usar subset
X_train_cifar = X_train_cifar[:10000].astype('float32') / 255.0
y_train_cifar = to_categorical(y_train_cifar[:10000], 10)
X_test_cifar = X_test_cifar[:2000].astype('float32') / 255.0
y_test_cifar = to_categorical(y_test_cifar[:2000], 10)

print(f"CIFAR-10: {X_train_cifar.shape[0]} train, {X_test_cifar.shape[0]} test")

# CNN para CIFAR-10
model_cifar = models.Sequential([
    layers.Input(shape=(32, 32, 3)),
    
    layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
    layers.BatchNormalization(),
    layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),
    layers.Dropout(0.2),
    
    layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    layers.BatchNormalization(),
    layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),
    layers.Dropout(0.3),
    
    layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    layers.BatchNormalization(),
    layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
    layers.MaxPooling2D((2, 2)),
    layers.Dropout(0.4),
    
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.BatchNormalization(),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])

model_cifar.compile(
    optimizer=optimizers.Adam(learning_rate=0.001),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("\nEntrenando CNN en CIFAR-10...")
history_cifar = model_cifar.fit(
    X_train_cifar, y_train_cifar,
    epochs=20, batch_size=64, verbose=0,
    validation_split=0.1
)

test_loss_cifar, test_acc_cifar = model_cifar.evaluate(X_test_cifar, y_test_cifar, verbose=0)
print(f"‚úÖ CNN CIFAR-10 - Test Accuracy: {test_acc_cifar:.4f}")

# Visualizaci√≥n comprehensiva
fig = plt.figure(figsize=(20, 12))

# Subplot 1: Comparaci√≥n de arquitecturas en MNIST
ax1 = plt.subplot(2, 4, 1)
comparisons = ['Densa', 'CNN']
accuracies = [test_acc_dense, test_acc_cnn]
colors = ['#FF6B6B', '#4ECDC4']
bars = ax1.bar(comparisons, accuracies, color=colors, alpha=0.8, edgecolor='black', linewidth=2)
ax1.set_ylabel('Test Accuracy', fontweight='bold')
ax1.set_title('MNIST: Densa vs CNN', fontsize=12, fontweight='bold')
ax1.set_ylim([0.95, 1.0])
ax1.grid(True, alpha=0.3, axis='y')
for bar, acc in zip(bars, accuracies):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height,
            f'{acc:.4f}', ha='center', va='bottom', fontweight='bold')

# Subplot 2: Training history - Red Densa
ax2 = plt.subplot(2, 4, 2)
ax2.plot(history_dense.history['accuracy'], 'b-', linewidth=2, label='Train', marker='o', markersize=4)
ax2.plot(history_dense.history['val_accuracy'], 'r-', linewidth=2, label='Val', marker='s', markersize=4)
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.set_title('Red Densa - Learning Curves', fontsize=12, fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Subplot 3: Training history - CNN MNIST
ax3 = plt.subplot(2, 4, 3)
ax3.plot(history_cnn.history['accuracy'], 'b-', linewidth=2, label='Train', marker='o', markersize=4)
ax3.plot(history_cnn.history['val_accuracy'], 'r-', linewidth=2, label='Val', marker='s', markersize=4)
ax3.set_xlabel('Epoch')
ax3.set_ylabel('Accuracy')
ax3.set_title('CNN MNIST - Learning Curves', fontsize=12, fontweight='bold')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Subplot 4: Training history - CNN CIFAR
ax4 = plt.subplot(2, 4, 4)
ax4.plot(history_cifar.history['accuracy'], 'b-', linewidth=2, label='Train', marker='o', markersize=3)
ax4.plot(history_cifar.history['val_accuracy'], 'r-', linewidth=2, label='Val', marker='s', markersize=3)
ax4.set_xlabel('Epoch')
ax4.set_ylabel('Accuracy')
ax4.set_title('CNN CIFAR-10 - Learning Curves', fontsize=12, fontweight='bold')
ax4.legend()
ax4.grid(True, alpha=0.3)

# Subplot 5-6: Ejemplos de predicciones MNIST
y_pred_mnist = model_cnn.predict(X_test_mnist_cnn[:10], verbose=0)
y_pred_classes = np.argmax(y_pred_mnist, axis=1)

for i in range(6):
    ax = plt.subplot(2, 8, i + 9)
    ax.imshow(X_test_mnist[i], cmap='gray')
    pred = y_pred_classes[i]
    true = y_test_mnist[i]
    color = 'green' if pred == true else 'red'
    ax.set_title(f'P:{pred}\nT:{true}', fontsize=9, color=color, fontweight='bold')
    ax.axis('off')

# Subplot 7-8: Ejemplos de predicciones CIFAR-10
y_pred_cifar_all = model_cifar.predict(X_test_cifar[:10], verbose=0)
y_pred_cifar_classes = np.argmax(y_pred_cifar_all, axis=1)
y_true_cifar_classes = np.argmax(y_test_cifar[:10], axis=1)
cifar_names = ['airplane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

for i in range(6):
    ax = plt.subplot(2, 8, i + 13)
    ax.imshow(X_test_cifar[i])
    pred = cifar_names[y_pred_cifar_classes[i]]
    true = cifar_names[y_true_cifar_classes[i]]
    color = 'green' if pred == true else 'red'
    ax.set_title(f'{pred[:6]}\n{true[:6]}', fontsize=8, color=color, fontweight='bold')
    ax.axis('off')

plt.tight_layout()
plt.show()

print("\n" + "="*70)
print("üìä RESUMEN DE RESULTADOS")
print("="*70)
print(f"MNIST - Red Densa:     {test_acc_dense:.4f}")
print(f"MNIST - CNN:           {test_acc_cnn:.4f}  (mejora: +{(test_acc_cnn-test_acc_dense)*100:.2f}%)")
print(f"CIFAR-10 - CNN:        {test_acc_cifar:.4f}")
print("\nüí° Observaciones:")
print("  ‚Ä¢ CNN supera a red densa en MNIST (estructura espacial)")
print("  ‚Ä¢ Batch Normalization y Dropout mejoran generalizaci√≥n")
print("  ‚Ä¢ CIFAR-10 es m√°s desafiante (im√°genes naturales a color)")

## 2. Conclusiones y Mejores Pr√°cticas

### üìö Conceptos Clave

1. **Representaciones Jer√°rquicas**: Capas aprenden abstracciones progresivas
2. **CNNs**: Convoluci√≥n + Pooling para procesar im√°genes eficientemente
3. **RNNs/LSTMs**: Memoria para procesar secuencias temporales
4. **Transfer Learning**: Reutilizar modelos pre-entrenados para nuevas tareas

### ‚úÖ Mejores Pr√°cticas

- **Normalizaci√≥n**: Escalar inputs a [0,1] o media=0, std=1
- **Regularizaci√≥n**: Dropout, L1/L2, Early Stopping
- **Batch Normalization**: Estabiliza entrenamiento
- **Data Augmentation**: Aumentar datos para prevenir overfitting
- **Learning Rate**: Usar learning rate decay o callbacks
- **Arquitectura**: Empezar simple, incrementar complejidad gradualmente

### üéØ Arquitecturas Principales

| Arquitectura | A√±o | Innovaci√≥n Clave |
|-------------|-----|------------------|
| LeNet-5 | 1998 | Primera CNN exitosa |
| AlexNet | 2012 | ReLU, Dropout, GPU training |
| VGG | 2014 | Bloques repetitivos 3x3 conv |
| ResNet | 2015 | Skip connections (residual) |
| Inception | 2015 | Multi-scale feature extraction |
| MobileNet | 2017 | Depthwise separable convolutions |

### üöÄ Aplicaciones

- **Visi√≥n Computacional**: Clasificaci√≥n, detecci√≥n, segmentaci√≥n
- **NLP**: Traducci√≥n, sentiment analysis, chatbots
- **Speech**: Reconocimiento de voz, s√≠ntesis
- **Medicina**: Diagn√≥stico por imagen
- **Autonomous Vehicles**: Detecci√≥n de objetos en tiempo real

### üìñ Recursos

- **Curso**: "Deep Learning Specialization" - Andrew Ng (Coursera)
- **Libro**: "Deep Learning" - Goodfellow, Bengio, Courville
- **Framework**: TensorFlow/Keras Documentation
- **Papers**: https://paperswithcode.com/

### üéØ Ejercicios

1. Implementa ResNet con skip connections
2. Entrena CNN con data augmentation en CIFAR-10
3. Usa Transfer Learning con VGG16 pre-entrenado
4. Implementa LSTM para predicci√≥n de series temporales
5. Visualiza feature maps de capas convolucionales

---

**üéì Siguiente**: [Transformers](../05_Transformers/transformers.ipynb)

En el siguiente m√≥dulo exploraremos la arquitectura Transformer, que revolucion√≥ NLP y ahora domina m√∫ltiples dominios de ML.