# Kernel Transformations

The application of the kernel trick to a Partial Least Squares (PLS) regression is not a standard approach, and as a result, there are no well-established libraries available for this purpose. For this reason, we have chosen to implement the kernel transformations using the `sklearn` library in Python. We will apply the kernel transformations to the predictor variables (`X`), export the resulting data, and subsequently read them into R, where the PLS models will be fitted. We begin by importing the necessary Python modules.

In [9]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import rbf_kernel, polynomial_kernel, sigmoid_kernel
import numpy as np
from scipy.spatial.distance import cdist

Next, we load the necessary data for this section, which consists of the previously cleaned and scaled dataset. This dataset will serve as the input for the kernel transformations.

In [10]:
df_reg = pd.read_csv("Regular_cleaned_scaled.csv")
df_t = pd.read_csv("Transformed_cleaned_scaled.csv")

And we divide, the data in X and y:

In [11]:
X_reg = df_reg.drop(columns=["edad_menarquia"])
y_reg = df_reg["edad_menarquia"]
X_t = df_t.drop(columns=["edad_menarquia"])
y_t = df_t["edad_menarquia"]

### Kernel aplication

We now select the parameters to be used for the kernel transformations. These are determined based on the preliminary analyses conducted in R and are applied through the appropriate transformation functions. In this case, we choose to explore four different kernel functions:
- Gaussian (RBF)
- Polynomial
- Sigmoid
- Laplacian

In [12]:
gamma = 0.1
degree = 3
coef0 = 1  # Para polinomial y sigmoide

# Kernels
X_r_rbf = rbf_kernel(X_reg, X_reg, gamma=gamma)
X_t_rbf = rbf_kernel(X_t, X_t, gamma=gamma)

X_r_poly = polynomial_kernel(X_reg, X_reg, degree=degree, coef0=coef0)
X_t_poly = polynomial_kernel(X_t, X_t, degree=degree, coef0=coef0)

X_r_sig = sigmoid_kernel(X_reg, X_reg, gamma=gamma, coef0=coef0)
X_t_sig = sigmoid_kernel(X_t, X_t, gamma=gamma, coef0=coef0)

# Laplacian kernel manually
def laplacian_kernel(X, Y, gamma):
    dists = cdist(X, Y, metric='cityblock')  # L1 distance
    return np.exp(-gamma * dists)

X_r_lap = laplacian_kernel(X_reg, X_reg, gamma)
X_t_lap = laplacian_kernel(X_t, X_t, gamma)

### Save Data

Finally, we convert the transformed data into a DataFrame, append the response variable (`y`), and save the result as a CSV file. This allows the processed data to be seamlessly imported into R for further modeling.

In [13]:
# Guardar como DataFrames
def to_df_with_y(X_kernel, y):
    df = pd.DataFrame(X_kernel)
    df["y"] = y.values
    return df

# Crear dataframes
df_r_rbf = to_df_with_y(X_r_rbf, y_reg)
df_t_rbf = to_df_with_y(X_t_rbf, y_t)

df_r_poly = to_df_with_y(X_r_poly, y_reg)
df_t_poly = to_df_with_y(X_t_poly, y_t)

df_r_sig = to_df_with_y(X_r_sig, y_reg)
df_t_sig = to_df_with_y(X_t_sig, y_t)

df_r_lap = to_df_with_y(X_r_lap, y_reg)
df_t_lap = to_df_with_y(X_t_lap, y_t)

# Guardar CSVs
df_r_rbf.to_csv("Regular_rbf.csv", index=False)
df_t_rbf.to_csv("Log_rbf.csv", index=False)

df_r_poly.to_csv("Regular_poly.csv", index=False)
df_t_poly.to_csv("Log_poly.csv", index=False)

df_r_sig.to_csv("Regular_sig.csv", index=False)
df_t_sig.to_csv("Log_sig.csv", index=False)

df_r_lap.to_csv("Regular_lap.csv", index=False)
df_t_lap.to_csv("Log_lap.csv", index=False)