# Instructions for Assignment 2 Question 3 : PCR Problem Using Kernel PCA

## Notebook Organization:
1. **Structure:** Ensure the notebook is well-structured and modular.
2. **Library Imports:** Include all necessary library imports at the beginning of the notebook.
3. **Helper Functions:** Define any helper functions or intermediate steps in earlier cells in a clear and logical sequence.
4. **Final Function:** Write the final function, `test_kernel_pcr(poly_kernel_deg, num_of_PC)`, in the last cell of the notebook.
   - This function should calculate the RMSE for the given polynomial kernel degree and a specific number of principal components provided in the dataset : `PCR-Data.pickle`.
   - It should return a tuple: `(poly_kernel_deg, num_of_PC, RMSE)`.

---

## **Function Implementation:**
In the last cell, include the following function:

```python
def test_kernel_pcr(poly_kernel_deg=10, num_of_PC=2):
    """
    Write your implementation inside this function.

    Goal:
    - Calculate the RMSE when using a specific number of principal components
      for the given polynomial kernel degree in PCR.

    Return:
    - A tuple (poly_kernel_deg, num_of_PC, RMSE), where RMSE is the calculated floating-point value.
    """
    # YOUR IMPLEMENTATION HERE
    return poly_kernel_deg, num_of_PC, RMSE
```

---

## **Testing and Expected Output:**
Use the following example to test your function:

```python
poly_kernel_deg, num_of_PC, RMSE = test_kernel_pcr(poly_kernel_deg=10, num_of_PC=2)
```

For the given test case:
- **Expected Output:**
  - `poly_kernel_deg = 10`
  - `num_of_PC = 2`
  - `RMSE = 1445`

## **Submission Guidelines:**
1. Ensure your notebook is clean, with proper formatting, clear comments, and well-documented code.
2. Verify your function’s output matches the expected format and values for the given test case before submission.
3. Remove any extraneous code or debugging prints.
4. Upload the completed notebook file to the designated submission portal.
```


In [1]:
import pickle
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import PolynomialFeatures
from sklearn.decomposition import KernelPCA
from sklearn.preprocessing import StandardScaler
import math

In [18]:
def test_kernel_pcr(poly_kernel_deg=10, num_of_PC=2):

    # Load the data from the pickle file
    with open('PCR-Data.pickle', 'rb') as f:
      data = pickle.load(f)

    # split data squentially
    X_train, X_test, y_train, y_test = train_test_split(data['X'], data['Y'], train_size=0.7, shuffle=False)

    # scale data
    scaler = StandardScaler()
    x_train_scaled = scaler.fit_transform(X_train)
    x_test_scaled =  scaler.transform(X_test)

    # apply poly kernel
    kpca = KernelPCA(kernel='poly', degree=poly_kernel_deg, n_components=num_of_PC)
    x_train_kpca = kpca.fit_transform(x_train_scaled)
    x_test_kpca = kpca.transform(x_test_scaled)

    # Perform Regression
    reg = LinearRegression()
    reg.fit(x_train_kpca,y_train)
    y_pred = reg.predict(x_test_kpca)

    # calculate RMSE
    RMSE = math.sqrt(mean_squared_error(y_test,y_pred))

    return (poly_kernel_deg, num_of_PC, RMSE)

In [19]:
poly_kernel_deg, num_of_PC, RMSE = test_kernel_pcr(poly_kernel_deg=10, num_of_PC=2)
print(poly_kernel_deg, num_of_PC, RMSE )

10 2 15.594674990622845
