# <div align="center"><b> TPX - MATERIA XX - MIA </b></div>

<div align="right">📝 <em><small><font color='Gray'>Nota:</font></small></em></div>

<div align="right"> <em><small><font color='Gray'> La funcionalidad de visualización de jupyter notebooks en <a href="https://github.com/" target="_blank">github</a> es solamente un preview.</font></small></em> </div>

<div align="right"> <em><small><font color='Gray'> Para mejor visualización se sugiere utilizar el visualizador recomendado por la comunidad: <a href="https://nbviewer.org/" target="_blank">nbviewer</a></font></small></em> </div>

<div align="right"> <em><small><font color='Gray'> Puedes a acceder al siguiente enlace para ver este notebook en dicha página: <a href="https://nbviewer.org/ruta/de/archivo.ipynb">Ruta archivo</a></font></small></em> </div>

* * *

<style>
/* Limitar la altura de las celdas de salida en html */
.jp-OutputArea.jp-Cell-outputArea {
    max-height: 500px;
}
</style>

🛻 <em><font color='MediumSeaGreen'>  Instalaciones: </font></em> 🛻

Este notebook utiliza [Poetry](https://python-poetry.org/) para la gestión de dependencias.
Primero instala Poetry siguiendo las instrucciones de su [documentación oficial](https://python-poetry.org/docs/#installation).
Luego ejecuta el siguiente comando para instalar las dependencias necesarias y activar el entorno virtual:

- Bash:
```bash
poetry install
eval $(poetry env activate)
```

- PowerShell:
```powershell
poetry install
Invoke-Expression (poetry env activate)
```

<!-- Descargar archivos adicionales:
!gdown https://drive.google.com/drive/folders/1UBZ8PEbtmiWMGkULu7GAt3VhUpeTy9l7?usp=sharing --folder -->

✋ <em><font color='DodgerBlue'>Importaciones:</font></em> ✋

In [9]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.naive_bayes import MultinomialNB, ComplementNB
from sklearn.metrics import f1_score
import pandas as pd
from IPython.display import display, HTML
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
from scipy.sparse import csr_matrix
import matplotlib.pyplot as plt
from sklearn.model_selection import GridSearchCV
import torch

# 20newsgroups por ser un dataset clásico de NLP ya viene incluido y formateado
# en sklearn
from sklearn.datasets import fetch_20newsgroups
import numpy as np

# System information
import platform, psutil, GPUtil, gc
from GPUtil import showUtilization as gpu_usage

# Limpiar GPU
def clean_gpu_usage() -> None:
    """Permite mostrar el uso y limpiar el chache de la GPU"""
    print("Initial GPU Usage")
    gpu_usage()
    gc.collect()
    torch.cuda.empty_cache()
    print("GPU Usage after emptying the cache")
    gpu_usage()

# Mostrar información total del sistema.
def show_system_info():
    """Función que permite visualizar las especificaciones del sistema"""
    system_info = platform.uname()

    print("System Information:")
    print(f"System: {system_info.system}")
    print(f"Node Name: {system_info.node}")
    print(f"Release: {system_info.release}")
    print(f"Version: {system_info.version}")
    print(f"Machine: {system_info.machine}")
    print(f"Processor: {system_info.processor}")

    cpu_info = platform.processor()
    cpu_count = psutil.cpu_count(logical=False)
    logical_cpu_count = psutil.cpu_count(logical=True)

    print("\nCPU Information:")
    print(f"Processor: {cpu_info}")
    print(f"Physical Cores: {cpu_count}")
    print(f"Logical Cores: {logical_cpu_count}")

    memory_info = psutil.virtual_memory()

    print("\nMemory Information:")
    print(f"Total Memory: {memory_info.total} bytes")
    print(f"Available Memory: {memory_info.available} bytes")
    print(f"Used Memory: {memory_info.used} bytes")
    print(f"Memory Utilization: {memory_info.percent}%")

    disk_info = psutil.disk_usage('/')

    print("\nDisk Information:")
    print(f"Total Disk Space: {disk_info.total} bytes")
    print(f"Used Disk Space: {disk_info.used} bytes")
    print(f"Free Disk Space: {disk_info.free} bytes")
    print(f"Disk Space Utilization: {disk_info.percent}%")

    gpus = GPUtil.getGPUs()

    if not gpus:
        print("No GPU detected.")
    else:
        for i, gpu in enumerate(gpus):
            print(f"\nGPU {i + 1} Information:")
            print(f"ID: {gpu.id}")
            print(f"Name: {gpu.name}")
            print(f"Driver: {gpu.driver}")
            print(f"GPU Memory Total: {gpu.memoryTotal} MB")
            print(f"GPU Memory Free: {gpu.memoryFree} MB")
            print(f"GPU Memory Used: {gpu.memoryUsed} MB")
            print(f"GPU Load: {gpu.load * 100}%")
            print(f"GPU Temperature: {gpu.temperature}°C")

In [3]:
!nvidia-smi

Tue Apr 22 21:44:04 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 566.36                 Driver Version: 566.36         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 4080 ...  WDDM  |   00000000:01:00.0 Off |                  N/A |
|  0%   33C    P8             11W /  320W |    1314MiB /  16376MiB |     36%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

🔧 <em><font color='tomato'>Configuraciones:</font></em> 🔧


In [10]:
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu' # Establece el dispositivo.

# Parámetros
BATCH_SIZE = 10 # Tamaño del batch
N_EPOCHS = 10 # Número de épocas
VERBOSE = True # Muestra época a época la evolución
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'

print(f'Dispositivo actual: {DEVICE}')

Dispositivo actual: cpu


<!-- Colab -->
<!-- <div align="center"><img src="https://drive.google.com/uc?export=view&id=1QSNrTsz1hQbmZwpgwx0qpfpNtLW19Orm" width="600" alt="Figura 1: A data scientist is working on word generation using the Lord of the Rings lore. The image is dark and moody, with a focus on the scientist's computer screen. The screen displays a visualization the one ring, with a map of Middle Earth in the background. - Generada con DALL-E3"></div> -->

<div align="center"><img src="./ceia-materia/resources/portada.jpeg" width="600" alt="Figura 1: A data scientist playing with convolutional neural networks. - Generada con Microsoft Image Creator"></div>

<div align="center"><small><em>Figura 1: A data scientist playing with convolutional neural networks. - Generada con Microsoft Image Creator</em></small></div>

<div align="center">✨Datos del proyecto:✨</div>

<p></p>

<div align="center">

| Subtitulo       | Desafío 1 - NLP - FIUBA                                                                                                                |
| --------------- | -------------------------------------------------------------------------------------------------------------------------------------- |
| **Descrpción**  | Word vectorizer + Naive bayes                                                                                                          |
| **Integrantes** | Bruno Masoller (brunomaso1@gmail.com)                                                                                                  |

</div>

## Consinga

## Resolución

In [11]:
# cargamos los datos (ya separados de forma predeterminada en train y test)
newsgroups_train = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes'))
newsgroups_test = fetch_20newsgroups(subset='test', remove=('headers', 'footers', 'quotes'))

In [None]:
tfidfvect = TfidfVectorizer()
# Fieteamos y transformamos el conjunto X_train
X_train = tfidfvect.fit_transform(newsgroups_train.data)
# Obtenemos el conjunto y_train
y_train = newsgroups_train.target

# Obtenemos también el diccionario inverso
idx2word = {v: k for k,v in tfidfvect.vocabulary_.items()}
tfidfvect.vocabulary_

### Markdown examples

#### Alerts examples:
https://github.com/orgs/community/discussions/16925

Funciona solamente en markdown el preview de github, no en nbviewer.

> [!NOTE]  
> Highlights information that users should take into account, even when skimming.

> [!TIP]
> Optional information to help a user be more successful.

> [!IMPORTANT]  
> Crucial information necessary for users to succeed.

> [!WARNING]  
> Critical content demanding immediate user attention due to potential risks.

> [!CAUTION]
> Negative potential consequences of an action.

> [!CAUTION] CUSTOM NAME
> TEST TEST TEST

#### Details examples:

<details>
  <summary>Detalles</summary>
  <ul>
    <li>
      <em>Generación de claves SSH:</em> Para la generación de claves SSH, primeramente hay que generar un par de claves (privada y
      pública) en la máquina actual. Esto se puede hacer con <code>ssh-keygen</code> o, en su defecto, con PuTTYgen
      también. Esto 
      Ejemplo: <code>ssh-keygen -t ed25519 -C "login" -Z aes256-gcm@openssh.com</code>. Luego, hay que subir estas
      claves a la "Organización" en <samp>Organization -> SSH Keys -> Add SSH key</samp>.
      Es de destacar, que esta clave se copia a la instancia una vez se crea la misma automáticamente.      
    </li>
    <li>
      <em>Generación de clave API Key:</em> Para crear una API Key es simplemente ir a <samp>Organization -> API Keys -> Generate APÏ Key</samp>. Hay que poner que se utilizará para acceder al bucket también.
    </li>
  </ul>
</details>

#### Custom alerts examples:

🔧 <em><font color='tomato'>Configuraciones:</font></em> 🔧


🛻 <em><font color='MediumSeaGreen'>  Instalaciones: </font></em> 🛻


✋ <em><font color='DodgerBlue'>Importaciones:</font></em> ✋

> 🔮 <em><font color='violet'>Función auxiliar:</font></em> [Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec in felis ut est molestie eleifend. Aliquam luctus lacinia diam vel cursus. Fusce ipsum mauris, dictum at dignissim eu, tristique in magna. Maecenas iaculis nisi elit, id molestie nibh egestas quis. Nulla tempus rutrum ipsum, at iaculis mauris efficitur sit amet. Etiam ut tincidunt magna.]

> ⭐ <em><strong>Conclusión:</strong></em> [Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec in felis ut est molestie eleifend. Aliquam luctus lacinia diam vel cursus. Fusce ipsum mauris, dictum at dignissim eu, tristique in magna. Maecenas iaculis nisi elit, id molestie nibh egestas quis. Nulla tempus rutrum ipsum, at iaculis mauris efficitur sit amet. Etiam ut tincidunt magna.]

> ⚠️ <em><font color='gold'>PROBLEMAS DETECTADOS:</font></em> [Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec in felis ut est molestie eleifend. Aliquam luctus lacinia diam vel cursus. Fusce ipsum mauris, dictum at dignissim eu, tristique in magna. Maecenas iaculis nisi elit, id molestie nibh egestas quis. Nulla tempus rutrum ipsum, at iaculis mauris efficitur sit amet. Etiam ut tincidunt magna.]

> 📝 <em><font color='Gray'>Nota:</font></em> [Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec in felis ut est molestie eleifend. Aliquam luctus lacinia diam vel cursus. Fusce ipsum mauris, dictum at dignissim eu, tristique in magna. Maecenas iaculis nisi elit, id molestie nibh egestas quis. Nulla tempus rutrum ipsum, at iaculis mauris efficitur sit amet. Etiam ut tincidunt magna.]

> 💫 <em><font color='MediumPurple'> Mejoras posibles: </font></em> [Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec in felis ut est molestie eleifend. Aliquam luctus lacinia diam vel cursus. Fusce ipsum mauris, dictum at dignissim eu, tristique in magna. Maecenas iaculis nisi elit, id molestie nibh egestas quis. Nulla tempus rutrum ipsum, at iaculis mauris efficitur sit amet. Etiam ut tincidunt magna.]

> 💡 <em><font color='IndianRed'>Hipótesis:</font></em> [Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec in felis ut est molestie eleifend. Aliquam luctus lacinia diam vel cursus. Fusce ipsum mauris, dictum at dignissim eu, tristique in magna. Maecenas iaculis nisi elit, id molestie nibh egestas quis. Nulla tempus rutrum ipsum, at iaculis mauris efficitur sit amet. Etiam ut tincidunt magna.]

In [None]:
# Exportamos los requrimientos para reproducción local
%pip freeze > requirements.txt