In [7]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.metrics import accuracy_score
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [8]:
# Load dataset
data = pd.read_csv('../datasets/Obfuscated/Obfuscated-MalMem2022_edited.csv')

In [9]:
# Encode categorical target
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(data['Category'])

In [10]:
# Features
X_drop_columns = ['Class', 
                'Category', 
                'svcscan.interactive_process_services', 
                'handles.nport', 
                'modules.nmodules',
                'pslist.nprocs64bit', 
                'callbacks.ngeneric']
X = data.drop(columns=X_drop_columns)
y = data.Category

In [11]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.25, random_state=42)

In [12]:
# Gaussian Process Classifier
kernel = 1.0 * RBF(length_scale=2.0)
gpc = GaussianProcessClassifier(kernel=kernel, random_state=42)

In [13]:
# Train model
gpc.fit(X_train, y_train)

# ooops

MemoryError: Unable to allocate 7.19 GiB for an array with shape (965647431,) and data type float64

In [None]:
# Predict and evaluate
y_pred = gpc.predict(X_test)
print('Training accuracy:', gpc.score(X_train, y_train))
print('Test accuracy:', gpc.score(X_test, y_test))

In [None]:
print(f"Accuracy score: {accuracy_score(y_test, y_pred)}")
print(f"Precision score: {precision_score(y_test, y_pred, average='weighted', zero_division=0)}")
print(f"Recall score: {recall_score(y_test, y_pred, average='weighted', zero_division=0)}")
print(f"F-1 score: {f1_score(y_test, y_pred, average='weighted', zero_division=0)}")

**Gaussian Process**

A Gaussian Process is a type of probabilistic model used in machine learning for regression and classification tasks. It is a non-parametric method, meaning it does not make strong assumptions about the form of the mapping function from inputs to outputs.

In the provided code, the Gaussian Process is used for classification. The `GaussianProcessClassifier` from the `sklearn.gaussian_process` module is used. This classifier is initialized with a Radial Basis Function (RBF) kernel, which is a common choice for Gaussian Processes.

The classifier is trained on the `X_train` and `y_train` data using the `fit` method. After training, it is used to make predictions on the `X_test` data with the `predict` method. The predicted labels are stored in the `y_pred` variable.

The performance of the classifier is evaluated by comparing the predicted labels to the true labels (`y_test`) using the `accuracy_score` function from the `sklearn.metrics` module. The accuracy of the classifier is printed to the console.

Overall, Gaussian Processes are a powerful tool for machine learning tasks, especially when the relationship between inputs and outputs is complex and not well understood. They provide a flexible, non-parametric approach that can capture a wide range of behaviors.