# Pràctica 2: Aprenentatge No Supervisat
### Aprenentatge Automàtic i Mineria de Dades (AAMD) - GEI 2024-25
**Grup X - Nom1 Cognom1 - Nom2 Cognom2**

En aquesta pràctica aplicarem sis tècniques d'aprenentatge no supervisat sobre dos conjunts de dades: un conjunt sintètic i un de real. L'objectiu és comparar com cadascuna d'aquestes tècniques projecta o agrupa els patrons i explorar la seva estructura interna.


## Importació de llibreries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
from sklearn.cluster import KMeans
from scipy.cluster.hierarchy import dendrogram, linkage
from minisom import MiniSom
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.optimizers import Adam
sns.set(style="whitegrid")

## Lectura de les dades

In [None]:
# Carreguem els dos conjunts de dades
synthetic = pd.read_csv("A2-synthetic.txt")
real = pd.read_csv("A2-real.txt", sep=';')
synthetic.head(), real.head()

## Pre-processament de les dades

In [None]:
# Separació de característiques i etiquetes, i normalització
X_syn = synthetic.drop(columns=['class'])
y_syn = synthetic['class']
X_real = real.drop(columns=['Class', 'Location'])
y_real = real['Class']
scaler = StandardScaler()
X_syn_scaled = scaler.fit_transform(X_syn)
X_real_scaled = scaler.fit_transform(X_real)

## PCA (Principal Component Analysis)

In [None]:
pca = PCA(n_components=2)
X_syn_pca = pca.fit_transform(X_syn_scaled)
plt.figure(figsize=(6,5))
sns.scatterplot(x=X_syn_pca[:,0], y=X_syn_pca[:,1], hue=y_syn, palette='Set1')
plt.title("PCA - Synthetic")
plt.show()

## t-SNE

In [None]:
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_syn_tsne = tsne.fit_transform(X_syn_scaled)
plt.figure(figsize=(6,5))
sns.scatterplot(x=X_syn_tsne[:,0], y=X_syn_tsne[:,1], hue=y_syn, palette='Set1')
plt.title("t-SNE - Synthetic")
plt.show()

## K-means Clustering

In [None]:
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X_syn_scaled)
plt.figure(figsize=(6,5))
sns.scatterplot(x=X_syn_pca[:,0], y=X_syn_pca[:,1], hue=labels, palette='tab10')
plt.title("K-means (k=3) - Synthetic")
plt.show()

## Clustering Jeràrquic (AHC)

In [None]:
linkage_matrix = linkage(X_syn_scaled, method='average')
plt.figure(figsize=(10, 6))
dendrogram(linkage_matrix)
plt.title("Dendrograma (UPGMA) - Synthetic")
plt.show()

## Autoencoder

In [None]:
input_dim = X_syn_scaled.shape[1]
input_layer = Input(shape=(input_dim,))
encoded = Dense(8, activation='relu')(input_layer)
bottleneck = Dense(2, activation='linear')(encoded)
decoded = Dense(8, activation='relu')(bottleneck)
output_layer = Dense(input_dim, activation='linear')(decoded)
autoencoder = Model(input_layer, output_layer)
encoder = Model(input_layer, bottleneck)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(X_syn_scaled, X_syn_scaled, epochs=100, batch_size=16, verbose=0)
encoded_data = encoder.predict(X_syn_scaled)
plt.figure(figsize=(6,5))
sns.scatterplot(x=encoded_data[:,0], y=encoded_data[:,1], hue=y_syn, palette='Set1')
plt.title("Autoencoder - Synthetic")
plt.show()

## Self-Organizing Maps (SOM)

In [None]:
som = MiniSom(x=10, y=10, input_len=X_syn_scaled.shape[1], sigma=1.0, learning_rate=0.5)
som.random_weights_init(X_syn_scaled)
som.train_random(X_syn_scaled, 1000)
plt.figure(figsize=(7,7))
plt.pcolor(som.distance_map().T, cmap='bone_r')
plt.title("U-Matrix - SOM Synthetic")
plt.colorbar()
plt.show()