## 4. 降维 - PCA

#### 1. 数据加载

In [1]:
from sklearn.datasets import load_iris
from sklearn.metrics import explained_variance_score

iris = load_iris()
X = iris.data  # 特征：花瓣和花萼的长度和宽度


#### 2. 数据标准化
注意：PCA 对数据的尺度敏感，标准化可以确保每个特征对主成分分析的贡献是均等的。


In [2]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

#### 3. 构建PCA模型并训练模型

In [3]:
from sklearn.decomposition import PCA
pca = PCA(n_components=2)  # 降维到2个主成分
X_pca = pca.fit_transform(X_scaled)
print("Original shape:", X_scaled.shape)
print("Transformed shape:", X_pca.shape)

Original shape: (150, 4)
Transformed shape: (150, 2)


#### 4. 查看主成分解释的方差比例
1. explained_variance_ratio_ 属性：每个主成分解释的方差比例，反映了每个主成分的重要性。
2. singular_values_ 属性：每个主成分的奇异值，反映了主成分的尺度。

In [4]:
score = pca.explained_variance_ratio_
print("Explained variance ratio of each principal component:\n", score)
score_2 = pca.singular_values_
print("Singular values of each principal component:\n", score_2)

Explained variance ratio of each principal component:
 [0.72962445 0.22850762]
Singular values of each principal component:
 [20.92306556 11.7091661 ]
