MANJIT SINGH T117

AIM - Principal Component Analysis (PCA)

 Perform PCA on a dataset to reduce dimensionality.

 Evaluate the explained variance and select the appropriate number of principal components.

 Visualize the data in the reduced-dimensional space.

### 1\. Reducing Features Using Principal Components

In [5]:
# Load libraries
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.decomposition import PCA
import pandas as pd

# Load the data
# We use pandas here because the data is a CSV file
df = pd.read_csv('creditcard.csv')

# Preprocessing: The mushroom dataset is text-based, we must convert to numbers first
le = LabelEncoder()
for col in df.columns:
    df[col] = le.fit_transform(df[col])

# Define Feature Matrix (X) and Target Vector (y)
X = df.drop('Class', axis=1)
y = df['Class']

# Standardize the feature matrix
X_std = StandardScaler().fit_transform(X)

# Create a PCA that will retain 99% of the variance
pca = PCA(n_components=0.99, whiten=True)

# Conduct PCA
X_pca = pca.fit_transform(X_std)

# Show results
print('Original number of features:', X.shape[1])
print('Reduced number of features:', X_pca.shape[1])

Original number of features: 30
Reduced number of features: 28


### 2\. Reducing Features When Data Is Linearly Inseparable

In [3]:
# Load libraries
from sklearn.decomposition import KernelPCA

# Load the data (Using the X_std created in the previous step)
# We will take a subset of 1000 samples to speed up processing,
# as Kernel PCA creates an N x N matrix which can be slow on large datasets.
X_subset = X_std[:1000]

# Apply kernel PCA with radius basis function (RBF) kernel
kpca = KernelPCA(kernel="rbf", gamma=15, n_components=1)
X_kpca = kpca.fit_transform(X_subset)

# Show results
print('Original number of features:', X_subset.shape[1])
print('Reduced number of features:', X_kpca.shape[1])

Original number of features: 30
Reduced number of features: 1


### 3\. Reducing Features by Maximizing Class Separability

In [4]:
# Load libraries
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

# Create an LDA that will reduce the data down to 1 feature
lda = LinearDiscriminantAnalysis(n_components=1)

# run an LDA and use it to transform the features
# LDA is supervised, so we must pass y (the class targets)
X_lda = lda.fit(X_std, y).transform(X_std)

# Print the number of features
print('Original number of features:', X_std.shape[1])
print('Reduced number of features:', X_lda.shape[1])

# View the ratio of explained variance
print('Explained variance ratio:', lda.explained_variance_ratio_)

Original number of features: 30
Reduced number of features: 1
Explained variance ratio: [0.99725161]
