<a href="https://colab.research.google.com/github/Geetanshi-jain/DSAssignmentByGeetanshijain/blob/main/day_10_DIMENSION_REDUCTION.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from google.colab import drive
drive.mount('/content/drive')

*italicized text*#10: DIMENSION REDUCTION

 Demonstrate How you will  Identify Multicollinearity R/Python
 Demonstrate HOW you’ll apply PRINCIPAL COMPONENTS ANALYSIS Using R/Python


### Dimension Reduction
Imagine you have a lot of information about something – like a long list describing a person (height, weight, favorite food, etc.). When there’s too much information, it gets hard to understand and see clear patterns. **Dimension reduction** is about keeping only the most important parts of that information and removing extra stuff to make things simpler.

### Multicollinearity
Sometimes, some pieces of information overlap a lot. For example, "height" and "shoe size" might be related (taller people often have bigger feet). This overlap is called **multicollinearity**, and it can make our data confusing and less reliable because it's like repeating the same information in different ways.

### How to Find Multicollinearity (Using Python and R)
To find out if there’s multicollinearity, we check how much the information overlaps. A common way to do this is with **VIF (Variance Inflation Factor)**. If the VIF value is more than 5 or 10, it means there’s too much overlap, which can be a problem.

In Python, it looks like this:

```python
import pandas as pd
import statsmodels.api as sm
import statsmodels.stats.outliers_influence as inf

# Let’s say we have a dataset called 'cereals' with some overlapping information
X = pd.DataFrame(cereals[['Sugars', 'Fiber', 'Potass']])
X = sm.add_constant(X)

# Checking VIF
[vif for vif in [inf.variance_inflation_factor(X.values, i) for i in range(X.shape[1])]]
```

In R, it would look like this:

```R
library(car)
model <- lm(formula = Rating ~ Fiber + Potass + Sugars, data = cereals)
vif(model)
```

If we get high VIF values, we know we have a problem with too much overlap.

### Principal Component Analysis (PCA)
When we have too much overlapping information, we can use **PCA (Principal Components Analysis)** to reduce the dimensions. PCA keeps the main parts of the data that are important but removes the overlap to make things simpler.

#### How PCA Works
1. **Find Main Components**: PCA creates "components" – new pieces that keep the main patterns from the original data but without the overlap.
2. **Reduce Dimensions**: Instead of keeping all the original data, we only keep a few of these components, which act like summaries of the data.

In Python, it would look like this:

```python
from sklearn.decomposition import PCA

# Run PCA and keep only 2 main components
pca = PCA(n_components=2)
principal_components = pca.fit_transform(X)
```

And in R:

```R
pca <- prcomp(X, center = TRUE, scale = TRUE)
summary(pca)
```

By using PCA, we make our data simpler and easier to understand without all the overlap and confusion.















































