# Models without pre-processing

This code defines a dictionary of various machine learning models, including RandomForest, GradientBoosting, LogisticRegression, SVC, and DecisionTree. A function is used to train each model and evaluate its performance on the validation and test sets. The results, including validation and test accuracy for each model, are stored and printed, allowing for a comparative analysis of multiple machine learning algorithms on the same dataset.

**Results:** Nothing good

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

In [2]:
df = pd.read_csv('../data/combined_dataset_100_partitions.csv')

In [3]:
X = df.drop('target', axis=1)
y = df['target']

In [4]:
X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.25, random_state=42)

In [6]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)

In [7]:
models = {
    "RandomForest": RandomForestClassifier(random_state=42),
    "GradientBoosting": GradientBoostingClassifier(random_state=42),
    "LogisticRegression": LogisticRegression(random_state=42, max_iter=1000),  # Increased max iterations
    "SVC": SVC(random_state=42),
    "DecisionTree": DecisionTreeClassifier(random_state=42)
}

In [8]:
def train_and_evaluate(model, X_train, y_train, X_val, y_val, X_test, y_test):
    model.fit(X_train, y_train)
    y_val_pred = model.predict(X_val)
    val_accuracy = accuracy_score(y_val, y_val_pred)
    y_test_pred = model.predict(X_test)
    test_accuracy = accuracy_score(y_test, y_test_pred)
    return val_accuracy, test_accuracy

In [9]:
results = {}
for model_name, model in models.items():
    val_acc, test_acc = train_and_evaluate(model, X_train, y_train, X_val, y_val, X_test, y_test)
    results[model_name] = {'Validation Accuracy': val_acc, 'Test Accuracy': test_acc}

In [10]:
for model_name, metrics in results.items():
    print(f'{model_name} - Validation Accuracy: {metrics["Validation Accuracy"]:.2f}, Test Accuracy: {metrics["Test Accuracy"]:.2f}')

RandomForest - Validation Accuracy: 0.26, Test Accuracy: 0.21
GradientBoosting - Validation Accuracy: 0.26, Test Accuracy: 0.23
LogisticRegression - Validation Accuracy: 0.16, Test Accuracy: 0.14
SVC - Validation Accuracy: 0.19, Test Accuracy: 0.28
DecisionTree - Validation Accuracy: 0.26, Test Accuracy: 0.21
