# SIB - Portfolio of Machine Learning Algorithms

## Exercise 5: Implementing PCA

### 5.1) 
Add the "PCA" class in the "pca.py" module on the "decomposition" sub-package. Consider the PCA class structure:

PCA is a linear algebra technique used to reduce the dimensions of the dataset. The PCA to be implemented must use eigenvalue decomposition of the covariance matrix of the data.

class PCA(Transformer):
- parameters:
  - n_components – number of components
- estimated parameters:
  - mean – mean of the samples
  - components – the principal components (eigenvectors)
  - explained_variance – amount of variance explained by each component
- methods:
  - _fit – estimates the mean, components, and explained variance
  - _transform – calculates the reduced dataset using the components

### 5.2) 
Test the PCA class using the iris.csv dataset (classification). Analyze the explained variance ratio of each principal component and verify how much information is retained with 2 components.

In [2]:
import sys
sys.path.append('C:/Users/dases/Desktop/SI/repositorio/si-2/src')

import numpy as np
from si.decomposition.pca import PCA
from si.io.csv_file import read_csv

# Load the iris dataset
iris = read_csv("../datasets/iris/iris.csv", features=True, label=True)

# Inicializar o PCA com 2 componentes
pca = PCA(n_components=2)

# Ajustar e transformar os dados
transformed_iris = pca.fit_transform(iris)

# Imprimir resultados
print("Original shape:", iris.X.shape)
print("Transformed shape:", transformed_iris.X.shape)

print("\nExplained variance ratio:")
for i, var in enumerate(pca.explained_variance):
    print(f"PC{i+1}: {var:.4f}")

print("\nNew features:", transformed_iris.features)

# Imprimir as primeiras amostras dos dados transformados e os resultados gerais do PCA
print("\nFirst 5 samples of transformed data:")
print(transformed_iris.X[:5])

print("Principal components:")
print(pca.components)

print("Explained Variance:")
print(pca.explained_variance)

Original shape: (150, 4)
Transformed shape: (150, 2)

Explained variance ratio:
PC1: 0.9246
PC2: 0.0530

New features: ['PC1', 'PC2']

First 5 samples of transformed data:
[[-8.19555022  4.98811642]
 [-8.22673371  5.48428058]
 [-8.40116264  5.45206934]
 [-8.2577803   5.62584805]
 [-8.23993608  4.98079917]]
Principal components:
[[ 0.36158968 -0.08226889  0.85657211  0.35884393]
 [-0.65653988 -0.72971237  0.1757674   0.07470647]]
Explained Variance:
[0.92461621 0.05301557]


A variância explicada por cada componente principal foi verificada, e os resultados mostram que a maior parte da informação é retida com 2 componentes.