# Gaussian Process Classifier
First, we tested different values for the `kernel` parameter; the optimal one was `1 * RBF(1)`. Finally, we selected multiple values for each parameter and trained the model using all combinations to identify the best result.

## Results
The Gaussian Process Classifier model achieved a solid maximum accuracy of `0.822`.

In [1]:
import pandas as pd

from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.metrics import accuracy_score

In [2]:
# load data
data = pd.read_csv("../data/train_data.csv")
data_labels = pd.read_csv("../data/train_data_labels.csv")
test = pd.read_csv("../data/train_test.csv")
test_labels = pd.read_csv("../data/train_test_labels.csv")

print("Data shape:", data.shape)
print("Data labels shape:", data_labels.shape)
print("Test shape:", test.shape)
print("Test labels shape:", test_labels.shape)

Data shape: (784, 3)
Data labels shape: (784, 1)
Test shape: (107, 3)
Test labels shape: (107, 1)


In [3]:
# convert data to 1D arrays
data_labels = data_labels.values.ravel()
test_labels = test_labels.values.ravel()

In [4]:
# test different kernel ranges
kernel_ranges = [0.0001 * RBF(1), 0.01 * RBF(1), 0.9 * RBF(1), 1.1 * RBF(1), 1.5 * RBF(1), 0.1 * RBF(0.1), 0.1 * RBF(10)]

for kernel in kernel_ranges:
    clf = GaussianProcessClassifier(kernel=kernel)
    clf.fit(data, data_labels)
    test_predictions = clf.predict(test)
    acc = accuracy_score(test_predictions, test_labels)
    print(f"GaussianProcessClassifier with kernel {kernel}: result: {acc:.3f}")

GaussianProcessClassifier with kernel 0.01**2 * RBF(length_scale=1): result: 0.766
GaussianProcessClassifier with kernel 0.1**2 * RBF(length_scale=1): result: 0.822
GaussianProcessClassifier with kernel 0.949**2 * RBF(length_scale=1): result: 0.822
GaussianProcessClassifier with kernel 1.05**2 * RBF(length_scale=1): result: 0.822
GaussianProcessClassifier with kernel 1.22**2 * RBF(length_scale=1): result: 0.766
GaussianProcessClassifier with kernel 0.316**2 * RBF(length_scale=0.1): result: 0.766
GaussianProcessClassifier with kernel 0.316**2 * RBF(length_scale=10): result: 0.766


In [None]:
# test different combinations of parameters
n_restarts_optimizer_ranges = [0, 1, 5]
max_iter_predict_ranges = [10, 100, 500]
random_state_ranges = [None, 42, 200]
n_jobs_ranges = [None, 1, 4, 8]

max_acc = 0.0

for random_state in random_state_ranges:
    for n_restarts_optimizer in n_restarts_optimizer_ranges:
        for max_iter_predict in max_iter_predict_ranges:
            for n_jobs in n_jobs_ranges:
                clf = GaussianProcessClassifier(
                    kernel=1 * RBF(1.0),
                    n_restarts_optimizer=n_restarts_optimizer,
                    max_iter_predict=max_iter_predict,
                    random_state=random_state,
                    n_jobs=n_jobs
                )
                clf.fit(data, data_labels)
                test_predictions = clf.predict(test)
                acc = accuracy_score(test_predictions, test_labels)
                if acc > max_acc:
                    max_acc = acc
                print(f"Accuracy: {acc:.3f} | "
                    f"random_state: {random_state}, "
                    f"n_restarts: {n_restarts_optimizer}, "
                    f"max_iter: {max_iter_predict}, "
                    f"n_jobs: {n_jobs}")
                
print(f"Maximum accuracy: {max_acc:.3f}")

Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 10, n_jobs: None
Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 10, n_jobs: 1
Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 10, n_jobs: 4
Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 10, n_jobs: 8
Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 100, n_jobs: None
Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 100, n_jobs: 1
Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 100, n_jobs: 4
Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 100, n_jobs: 8
Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 500, n_jobs: None
Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 500, n_jobs: 1
Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 500, n_jobs: 4
Accuracy: 0.822 | random_state: None, n_restarts: 0, max_iter: 500, n_jobs: 8
Accuracy: 0.822 | random_state: None, n_restarts: 1, max_it