# Dimensionality Reduction Using PCA
PCA is a technique used to reduce the dimensionality of the data by projecting the data onto a lower-dimensional subspace.
e.g. if we have a 3D dataset, we can project it onto a 2D subspace.

#### Why do we need Dimensionality Reduction?
- Visualization: It is difficult to visualize data in higher dimensions.
- Computation: It is computationally expensive to work with high-dimensional data.
- Curse of Dimensionality: As the number of dimensions increases, the number of data points required to generalize accurately grows exponentially.

#### Steps involved in PCA:
1. Standardize the data.
2. Compute the covariance matrix.
3. Compute the eigenvectors and eigenvalues of the covariance matrix.
4. Sort the eigenvectors by decreasing eigenvalues and choose the top k eigenvectors.
5. Construct the projection matrix W from the selected k eigenvectors.
6. Transform the original dataset X via W to obtain a k-dimensional feature subspace Y.

#### PCA in Python:
```python
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
```

#### Dimensionality Reduction in python:
```python
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# Load the data
data = pd.read_csv('data.csv')
X = data.iloc[:, :-1].values # can also use data.drop('target', axis=1) or annything depending on what we are trying to do
y = data.iloc[:, -1].values # can also use data['target']

# apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X) # this is the reduced dimensionality data :D
```

# Example
we have a dataset programming languages trends over time.csv which contains the trends of programming languages over different years. We will use PCA to reduce the dimensionality of the data and visualize it in 2D.


In [2]:
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Load the data
data = pd.read_csv('../programming language trend over time.csv') 
X = data.iloc[:, 1:].values  # Select all rows and columns from index 1 onwards for features
y = data.iloc[:, 0].values  # Select all rows for the target variable


# Apply PCA
pca = PCA(n_components=2)  # Specify the number of components
transformed_data = pca.fit_transform(X)  # Fit and transform the data





#### original

In [28]:
data.shape

(262, 4)

#### the reduced dimensionality

In [3]:
transformed_data.shape

(262, 2)