# 1.Unsupervised Learning
## 1.1. Principal Component Analysis

### Aim of PCA
The primary aim of PCA is to reduce the dimensionality of a dataset while preserving as much variance(information) as possible. This makes data visualization and analysis easier.

### Objectives of PCA
- **Dimensionality Reduction:** Reduce the number of features in a dataset while retaining significant information
- **Data Visualization:** Enable visualization of high-dimensional data in 2D or 3D space.
- **Noise Reduction:** Eliminate a less important features that might add noise to the data.
- **Feature Extraction:** Create new features(principal components) that summarize the original features effectively.

### Use-Cases
- **Image Compression:** Reducing the size of image files while maintaining quality.
- **Genomics:** Identifying patterns in high-dimensional genetic data.
- **Finance:** Portfolio management by reducing the number of correlated assets.
- **Social Media:** Analyzing user behaviour by reducing the number of features  in user data.

### Terminology
- **Principal Components:** New axes in the transformed space that represent the directions of maximum variance.
- **Eigenvalues:** Indicate the amount of variance carried in each principal component.
- **Eigenvectors:** Directions along which data varies the most.
- **Covariance Matrix:** Represents how much dimensions vary together.

### Advantages
- **Simplifies Data:** Makes comples data easier to analyze.
- **Removes Redundancy:** Combines correlated features, reducing multicollinearity.
- **Enhances Visualization:** Helps visualize data trends and patterns in lower dimensions.

### Limitations
- **Linearity:** PCA assumes linear relationships among variables; it may not perform well with non-linear data.
- **Interpretability:** Principal components may not always have clear interpretations in terms of original features.
- **Scaling:** Sensitive to the scale of the data; features should be standadized
- **Loss of Information:** Reducing dimesnsions can lead to loss of important information.

### Practical Application of PCA on a Real-World Dataset
### Example 1
Let's apply PCA to **Iris Dataset**, which includes measurements of iris flowers from three different species. The dataset contains four features(sepal length, sepal width, petal length, petal width).

#### Step 1: Load the Dataset

In [1]:
import pandas as pd
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt

# Load the Iris dataset
iris = load_iris() # making the instance of the data
data = pd.DataFrame(data = iris.data, columns=iris.feature_names)
data['species'] = iris.target
data.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


#### Step 2: Preprocessing
Before performing PCA, we should standardize the data

In [3]:
from sklearn.preprocessing import StandardScaler

# Standardize the features
features = iris.data
scaler = StandardScaler() # creating the instance of StandardScalar
standarized_data = scaler.fit_transform(features)

#### Step-3: Apply PCA
Now, let's perform PCA on the standarlized data.

In [None]:
from sklearn.decomposition import PCA

# apply PCA
pca = PCA(n_components=2) # we want to reduce to 2 dimensions
principal_components = pca.fit_transform(standardized_data)

# create a dataframe with the principal components
pca_data = pd.DaT