# Formative Assignment: Advanced Linear Algebra (PCA)
This notebook will guide you through the implementation of Principal Component Analysis (PCA). Fill in the missing code and provide the required answers in the appropriate sections. You will work with the `fuel_econ.csv` dataset.

Make sure to display outputs for each code cell when submitting.

### Step 1: Load and Standardize the Data
Before applying PCA, we must standardize the dataset. Standardization ensures that all features have a mean of 0 and a standard deviation of 1, which is essential for PCA.
Fill in the code to standardize the dataset.

In [64]:
import pandas as pd 
import numpy as np

data = pd.read_csv('fuel_econ.csv')

numerical_data = data.select_dtypes(include=['float64', 'int64'])

#calculating the mean and standard deviation
mean = np.mean(numerical_data, axis=0) 
std_dev = np.std(numerical_data, axis=0)  

# Do not use sklearn (Data - Data Mean)/ Data's Standard Deviation

standardized_data = (numerical_data - mean) / std_dev 
print(standardized_data)

#testing the code if they mean of a column is actually 0
test = np.mean(standardized_data['city'])
print(f' City Mean: {test:.10f}' )

            id      year  cylinders     displ       pv2       pv4      city  \
0    -1.737140 -1.475835   0.283102  0.650536  1.467096 -1.217378 -0.859960   
1    -1.736684 -1.475835  -0.781816 -0.727998  1.864762 -1.217378  0.006427   
2    -1.736227 -1.475835   0.283102  0.497365  1.864762 -1.217378 -0.694416   
3    -1.735770 -1.475835   0.283102  0.497365  1.864762 -1.217378 -0.782800   
4    -1.735313 -1.475835  -0.781816 -0.421657 -0.627278  0.734890  0.471083   
...        ...       ...        ...       ...       ...       ...       ...   
3924  1.770283  1.474784  -0.781816 -0.881169 -0.627278 -1.217378  5.346290   
3925  1.777592  1.474784  -0.781816 -0.727998 -0.627278  0.960942  2.749115   
3926  1.778049  1.474784  -0.781816 -0.727998 -0.627278  0.960942  2.909231   
3927  1.778506  1.474784   0.283102  0.344195  1.997318 -1.217378 -0.417976   
3928  1.778962  1.474784   0.283102  0.344195  1.997318 -1.217378 -0.606417   

         UCity   highway  UHighway      comb       

### Step 3: Calculate the Covariance Matrix
The covariance matrix helps us understand how the features are related to each other. It is a key component in PCA.

In [65]:
# Step 3: Calculate the Covariance Matrix
cov_matrix =  np.cov(standardized_data, rowvar=False)   # Calculate covariance matrix
print(cov_matrix)

[[ 1.00025458  0.98591866 -0.06011148 -0.07468488 -0.00657025 -0.02195656
   0.09182316  0.09124849  0.0906161   0.09538375  0.09382686 -0.09974229
  -0.1279056  -0.12235207]
 [ 0.98591866  1.00025458 -0.05532701 -0.07044161  0.00623397 -0.03365174
   0.06806739  0.06675938  0.07330836  0.07766039  0.07201181 -0.0811853
  -0.1498676  -0.14517775]
 [-0.06011148 -0.05532701  1.00025458  0.93411019  0.24763384 -0.00426546
  -0.69327904 -0.66619842 -0.76646982 -0.77169964 -0.73821112  0.84848979
  -0.78405759 -0.78201448]
 [-0.07468488 -0.07044161  0.93411019  1.00025458  0.2594021   0.02207729
  -0.71366074 -0.6863403  -0.78418374 -0.78865771 -0.75859024  0.85559254
  -0.7936343  -0.79141752]
 [-0.00657025  0.00623397  0.24763384  0.2594021   1.00025458 -0.66581137
  -0.27817962 -0.27261515 -0.29688365 -0.29858023 -0.29095711  0.28727323
  -0.2961638  -0.29323103]
 [-0.02195656 -0.03365174 -0.00426546  0.02207729 -0.66581137  1.00025458
   0.03519659  0.03787859  0.07497068  0.07746161  0

### Step 4: Perform Eigendecomposition
Eigendecomposition of the covariance matrix will give us the eigenvalues and eigenvectors, which are essential for PCA.
Fill in the code to compute the eigenvalues and eigenvectors of the covariance matrix.

In [None]:
# Step 4: Perform Eigendecomposition
eigenvalues, eigenvectors = None  # Perform eigendecomposition
eigenvalues, eigenvectors

### Step 5: Sort Principal Components
Sort the eigenvectors based on their corresponding eigenvalues in descending order. The higher the eigenvalue, the more important the eigenvector.
Complete the code to sort the eigenvectors and print the sorted components.

In [None]:
# Step 5: Sort Principal Components
sorted_indices = None  # Sort eigenvalues in descending order
sorted_eigenvectors = None  # Sort eigenvectors accordingly
sorted_eigenvectors

### Step 6: Project Data onto Principal Components
Now that we’ve selected the number of components, we will project the original data onto the chosen principal components.
Fill in the code to perform the projection.

In [None]:
# Step 6: Project Data onto Principal Components
num_components = None  # Decide on the number of principal components to keep
reduced_data = None  # Project data onto the principal components
reduced_data[:5]

### Step 7: Output the Reduced Data
Finally, display the reduced data obtained by projecting the original dataset onto the selected principal components.

In [None]:
# Step 7: Output the Reduced Data
print(f'Reduced Data Shape: {reduced_data.shape}')  # Display reduced data shape
reduced_data[:5]  # Display the first few rows of reduced data

### Step 8: Visualize Before and After PCA
Now, let's plot the original data and the data after PCA to compare the reduction in dimensions visually.

In [None]:
# Step 8: Visualize Before and After PCA


# Plot original data (first two features for simplicity)


# Plot reduced data after PCA
