**Dimensionality Reduction Steps Using the PCA Method**

**1. Standardizing the Dataset:**

* This step involves transforming all variables in the dataset so that they have a mean of zero and a standard deviation of one.

**2. Computing the Covariance Matrix:**

* Based on the standardized data, the covariance matrix is calculated to capture the relationships (covariances) between variables.

**3. Calculating Eigenvalues and Eigenvectors of the Covariance Matrix:**

* The eigenvalues and their corresponding eigenvectors of the covariance matrix are computed to reveal the principal directions and the amount of variance explained by each.

**4. Sorting Eigenvalues and Selecting Eigenvectors:**

* The eigenvalues are sorted in descending order, and the top eigenvectors corresponding to the largest eigenvalues are selected according to the desired number of dimensions.

**5. Constructing the New Feature Matrix:**

* A new matrix is constructed using the selected eigenvectors, which represent the original data in a lower-dimensional space.

**6. Reducing Dimensionality and Forming the Transformed Dataset:**

* The original dataset is projected onto the selected eigenvectors to obtain the reduced-dimensional representation.

In [5]:
import numpy as np
import pandas as pd

In [7]:
#Creating Data Set
np.random.seed(42)

data = {
    'X1': np.random.rand(100),
    'X2': np.random.rand(100),
    'X3': np.random.rand(100),
    'X4': np.random.rand(100),
    'X5': np.random.rand(100),
    'Y': np.random.rand(100) #dependent variable
}

In [8]:
df = pd.DataFrame(data)

In [10]:
# 1. Standardizing the Dataset:

for column in df.columns[:-1]: # Excluding the target variable 'Y'
    mean_value = np.mean(df[column])
    std_dev = np.std(df[column])
    df[column] = (df[column] - mean_value)/std_dev

**Covariance Matrix**

$$
Cov = \frac{{1}}{{n}} X. X^T
$$

In [11]:
# 2. Computing the Covariance Matrix

X = df.drop('Y', axis = 1)

# COV = (X^T . X) / n
Cov = (X.T @ X) / len(X)

In [14]:
# 3. Calculating Eigenvalues and Eigenvectors of the Covariance Matrix:

eigenvalues, eigenvectors = np.linalg.eig(Cov)
eigenvalues, eigenvectors

(array([0.59174262, 0.81715522, 1.32991061, 1.09562139, 1.16557016]),
 array([[ 0.49930166,  0.15028647, -0.29914709, -0.6477717 , -0.46799003],
        [ 0.12625737, -0.63952093,  0.16492762, -0.51282615,  0.53374176],
        [ 0.40124055, -0.54211892, -0.53341401,  0.50090518, -0.09836966],
        [ 0.65144057,  0.03238792,  0.70933424,  0.25010438, -0.09417506],
        [-0.38650761, -0.52295798,  0.30923184, -0.06279296, -0.69105682]]))

In [20]:
# 4. Sorting Eigenvalues and Selecting Eigenvectors:

sorted_indexes = np.argsort(eigenvalues)[::-1]
sorted_eigenvalues = eigenvalues[sorted_indexes]
sorted_eigenvectors = eigenvectors[sorted_indexes]

In [21]:
# Selecting Principal Components Based on Desired Dimension

new_dimension = 2
principal_components = sorted_eigenvectors[:new_dimension]

In [22]:
# 5. Constructing the New Feature Matrix:

X_new = X @ principal_components.T

In [26]:
# 6. Reducing Dimensionality and Forming the Transformed Dataset:

reduced_data = pd.concat([X_new, df['Y']], axis = 1, ignore_index = True)
reduced_data

Unnamed: 0,0,1,2
0,-0.115801,2.087674,0.698162
1,1.134779,-2.186291,0.536096
2,1.434576,-0.377020,0.309528
3,-0.387193,-0.491669,0.813795
4,-0.885918,0.146332,0.684731
...,...,...,...
95,1.104447,0.500707,0.473962
96,-0.204867,-0.436725,0.667558
97,-0.219252,-0.011057,0.172320
98,-1.215498,-1.060753,0.192289


### Dimensionality Reduction with PCA in the Scikit-learn Library

In [27]:
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

In [28]:
# Set the seed for reproducibility
np.random.seed(42)

# Create the dataset
data = {
    'X1': np.random.rand(100),
    'X2': np.random.rand(100),
    'X3': np.random.rand(100),
    'X4': np.random.rand(100),
    'X5': np.random.rand(100),
    'Y': np.random.rand(100)
}

# Convert the dataset to a Pandas DataFrame
df = pd.DataFrame(data)

# Independent variable matrix from the standardized dataset (excluding 'Y')
X = df.drop('Y', axis=1)

In [29]:
# Creating the PCA model
pca = PCA(n_components=2)

In [30]:
# Fitting the model and transforming the data
X_new = pca.fit_transform(X)

In [32]:
df_dimensional_reduced = pd.concat([pd.DataFrame(X_new), df['Y']], axis=1, ignore_index=True)
df_dimensional_reduced

Unnamed: 0,0,1,2
0,-0.585185,0.076580,0.698162
1,0.407338,0.292742,0.536096
2,0.145770,0.101999,0.309528
3,0.127105,0.316258,0.813795
4,0.011463,-0.504780,0.684731
...,...,...,...
95,0.168661,-0.137770,0.473962
96,0.166553,-0.055927,0.667558
97,-0.200797,-0.508441,0.172320
98,0.459662,-0.214892,0.192289
