# Instructions for Assignment 2 Question 3 : PCR Problem Using Kernel PCA

## Notebook Organization:
1. **Structure:** Ensure the notebook is well-structured and modular.
2. **Library Imports:** Include all necessary library imports at the beginning of the notebook.
3. **Helper Functions:** Define any helper functions or intermediate steps in earlier cells in a clear and logical sequence.
4. **Final Function:** Write the final function, `test_kernel_pcr(poly_kernel_deg, num_of_PC)`, in the last cell of the notebook.
   - This function should calculate the RMSE for the given polynomial kernel degree and a specific number of principal components provided in the dataset : `PCR-Data.pickle`.
   - It should return a tuple: `(poly_kernel_deg, num_of_PC, RMSE)`.

---

## **Function Implementation:**
In the last cell, include the following function:

```python
def test_kernel_pcr(poly_kernel_deg=10, num_of_PC=2):
    """
    Write your implementation inside this function.

    Goal:
    - Calculate the RMSE when using a specific number of principal components
      for the given polynomial kernel degree in PCR.

    Return:
    - A tuple (poly_kernel_deg, num_of_PC, RMSE), where RMSE is the calculated floating-point value.
    """
    # YOUR IMPLEMENTATION HERE
    return poly_kernel_deg, num_of_PC, RMSE
```

---

## **Testing and Expected Output:**
Use the following example to test your function:

```python
poly_kernel_deg, num_of_PC, RMSE = test_kernel_pcr(poly_kernel_deg=10, num_of_PC=2)
```

For the given test case:
- **Expected Output:**
  - `poly_kernel_deg = 10`
  - `num_of_PC = 2`
  - `RMSE = 1445`

## **Submission Guidelines:**
1. Ensure your notebook is clean, with proper formatting, clear comments, and well-documented code.
2. Verify your function’s output matches the expected format and values for the given test case before submission.
3. Remove any extraneous code or debugging prints.
4. Upload the completed notebook file to the designated submission portal.
```


In [1]:
import pickle
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import KernelPCA
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

In [15]:
poly_kernel_deg, num_of_PC, RMSE = test_kernel_pcr(poly_kernel_deg=10, num_of_PC=2)
print(f"poly_kernel_deg = {poly_kernel_deg}")
print(f"num_of_PC = {num_of_PC}")
print(f"RMSE = {RMSE}")


poly_kernel_deg = 10
num_of_PC = 2
RMSE = 15.594674990622877


In [13]:
with open('/content/PCR-Data.pickle', 'rb') as f:
      data = pickle.load(f)

#Reading Data from pickle file
X = data['X']
Y = data['Y']

#Spliting dataset into train and test
train_size = int(len(X) *0.7)
X_train, X_test = X[:train_size,:],X[train_size:,:]
Y_train, Y_test = Y[:train_size,:],Y[train_size:,:]

#Scaling independent varable using standard scaler
std_scaler = StandardScaler()
std_scaler.fit(X_train)

X_train_scaled = std_scaler.transform(X_train)
X_test_scaled = std_scaler.transform(X_test)

In [14]:
def test_kernel_pcr(poly_kernel_deg=10, num_of_PC=2):
    """
    Write your implementation inside this function.

    Goal:
    - Calculate the RMSE when using a specific number of principal components
      for the given polynomial kernel degree in PCR.

    Return:
    - A tuple (poly_kernel_deg, num_of_PC, RMSE), where RMSE is the calculated floating-point value.
    """
    # YOUR IMPLEMENTATION HERE

    #Training and transforming data using Kernal PCA
    kpca = KernelPCA(kernel='poly',degree=poly_kernel_deg,n_components=None)
    X_train_kpca = kpca.fit_transform(X_train_scaled)
    X_test_kpca = kpca.transform(X_test_scaled)

    X_train_transformed = X_train_kpca[:, :num_of_PC]
    X_test_transformed = X_test_kpca[:, :num_of_PC]

    #Using Linear Regression to get predicted values of test set
    model = LinearRegression()
    model.fit(X_train_transformed,Y_train)
    y_pred = model.predict(X_test_transformed)

    rmse = np.sqrt(mean_squared_error(Y_test, y_pred))

    return poly_kernel_deg, num_of_PC, rmse