9) Implement a PCA to visualize the built-in wine dataset.

Principal Component Analysis (PCA) is a dimensionality reduction technique commonly used to analyze and visualize high-dimensional data. It transforms a set of correlated variables into a smaller set of uncorrelated variables called principal components. These components capture the maximum variance in the data, allowing for an easier interpretation of the structure while preserving most of the relevant information.

Key Concepts in PCA:
Variance and Covariance:

Variance measures the spread of data along a particular axis.
Covariance indicates how much two variables change together. If variables are highly correlated, PCA can transform them into a set of uncorrelated components.
Eigenvectors and Eigenvalues:

PCA computes eigenvectors and eigenvalues from the covariance matrix of the data.
The eigenvectors define the directions of the new feature space (the principal components), while the eigenvalues describe the amount of variance captured by each principal component.
The principal components are ordered by their eigenvalues, meaning the first principal component captures the most variance, the second captures the second most, and so on.
Dimensionality Reduction:

By keeping only the first few principal components (those with the highest variance), PCA reduces the number of dimensions of the dataset, simplifying the data for visualization or further analysis.
Orthogonality of Principal Components:

Each principal component is orthogonal (perpendicular) to the others, ensuring that the new components are uncorrelated.
Feature Importance:

The components can be used to determine which original features contribute most to the variance in the data.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from mpl_toolkits.mplot3d import Axes3D

# Load the wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA()
X_pca = pca.fit_transform(X_scaled)

# Calculate the explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_

# Plot the cumulative explained variance ratio
plt.figure(figsize=(10, 6))
cumulative_variance_ratio = np.cumsum(explained_variance_ratio)
plt.plot(range(1, len(cumulative_variance_ratio) + 1), cumulative_variance_ratio, 'bo-')
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Explained Variance Ratio')
plt.title('Cumulative Explained Variance Ratio vs. Number of Components')
plt.grid(True)
plt.show()

# 2D visualization
plt.figure(figsize=(12, 8))
colors = ['r', 'g', 'b']
for color, i, target_name in zip(colors, [0, 1, 2], wine.target_names):
    plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], color=color, alpha=.8, lw=2,
                label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title('PCA of Wine Dataset (2 components)')
plt.xlabel(f'First Principal Component ({explained_variance_ratio[0]:.2f})')
plt.ylabel(f'Second Principal Component ({explained_variance_ratio[1]:.2f})')
plt.show()

# 3D visualization
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')
for color, i, target_name in zip(colors, [0, 1, 2], wine.target_names):
    ax.scatter(X_pca[y == i, 0], X_pca[y == i, 1], X_pca[y == i, 2], color=color, alpha=.8,
               label=target_name)
ax.legend(loc='best', shadow=False, scatterpoints=1)
ax.set_title('PCA of Wine Dataset (3 components)')
ax.set_xlabel(f'First Principal Component ({explained_variance_ratio[0]:.2f})')
ax.set_ylabel(f'Second Principal Component ({explained_variance_ratio[1]:.2f})')
ax.set_zlabel(f'Third Principal Component ({explained_variance_ratio[2]:.2f})')
plt.show()

# Print the explained variance ratio for each component
for i, ratio in enumerate(explained_variance_ratio):
    print(f"PC{i+1} explained variance ratio: {ratio:.4f}")

# Print the cumulative explained variance ratio for 2 and 3 components
print(f"\nCumulative explained variance ratio (2 components): {cumulative_variance_ratio[1]:.4f}")
print(f"Cumulative explained variance ratio (3 components): {cumulative_variance_ratio[2]:.4f}")