# PART 3 : prediction of the winner of a nba game

We would like to predict the winner of a Basketball game, as a function of the
data gathered at half-time.

In [20]:
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV

Load data and data standardization

In [21]:
X_train = np.load('X_train.npy')
X_test = np.load('X_test.npy')
y_train = np.load('y_train.npy')
y_test = np.load('y_test.npy')

std = StandardScaler()
X_train_std = std.fit_transform(X_train)
X_test_std = std.transform(X_test)

### Logistic regression

In [22]:
log_reg = LogisticRegression()
log_reg.fit(X_train_std, y_train)

predictions = log_reg.predict(X_train_std)
accuracy = accuracy_score(y_train, predictions)
print("Logistic Regression Accuracy for training set:", accuracy)

predictions = log_reg.predict(X_test_std)
accuracy = accuracy_score(y_test, predictions)
print("Logistic Regression Accuracy for testing set:", accuracy)

Logistic Regression Accuracy for training set: 0.894
Logistic Regression Accuracy for testing set: 0.84


### Results of Logistic regression
We obtain a result of 0.84 on the test set which is satisfactory.

### SVC

In [23]:
svc = SVC()
svc.fit(X_train_std, y_train)

predictions = svc.predict(X_train_std)
accuracy = accuracy_score(y_train, predictions)
print("Logistic Regression Accuracy for training set:", accuracy)

predictions = svc.predict(X_test_std)
accuracy = accuracy_score(y_test, predictions)
print("Logistic Regression Accuracy for testing set:", accuracy)

Logistic Regression Accuracy for training set: 0.962
Logistic Regression Accuracy for testing set: 0.876


### Results of SVC
We obtain a result of 0.876 on the test set which is better than the Logistic regression model.
we can hypothesize that by changing the hyperparams the SVC model will still be better.

### Hyperparams for Logistic regression

In [24]:
def hyperparams_logistic_reg(X_train, y_train, X_test, y_test):
    param_grid = {
        'C': [0.001, 0.01, 0.1, 1, 10, 100],
        'penalty': ['l1', 'l2'],
        'solver': ['liblinear'],
    }

    log_reg = LogisticRegression()

    grid_search = GridSearchCV(log_reg, param_grid, cv=5, scoring='accuracy')

    grid_search.fit(X_train, y_train)

    best_params = grid_search.best_params_
    print("Best Hyperparameters:", best_params)

    best_model = grid_search.best_estimator_
    predictions = best_model.predict(X_test)

    accuracy = accuracy_score(y_test, predictions)
    print("Logistic Regression Accuracy with Hyperparameters:", accuracy)
hyperparams_logistic_reg(X_train_std, y_train, X_test_std, y_test)

Best Hyperparameters: {'C': 10, 'penalty': 'l1', 'solver': 'liblinear'}
Logistic Regression Accuracy with Hyperparameters: 0.836


In [None]:
def hyperparams_svc(X_train, y_train, X_test, y_test):

    param_grid = {
        'C': [0.1, 1, 10, 100],
        'gamma': ['scale', 'auto', 0.01, 0.1, 1, 10],
        'kernel': ['rbf', 'linear'],
    }

    svc = SVC()

    grid_search = GridSearchCV(svc, param_grid, cv=5, scoring='accuracy')

    grid_search.fit(X_train, y_train)

    best_params = grid_search.best_params_
    print("Best Hyperparameters:", best_params)

    best_model = grid_search.best_estimator_
    predictions = best_model.predict(X_test)

    accuracy = accuracy_score(y_test, predictions)
    print("SVC Accuracy with Hyperparameters:", accuracy)
hyperparams_svc(X_train_std, y_train, X_test_std, y_test)

### Explanation of hyperparameters:

- C: This parameter represents the regularization strength. Smaller values specify stronger regularization. It's a list of float values indicating the inverse of regularization strength.
- penalty: This parameter specifies the norm used in the penalization. It could be 'l1' or 'l2'.
- solver: This parameter denotes the algorithm to use in the optimization problem. In this case, 'liblinear' is specified, which is used for linear models.
- gamma: This parameter is specific to kernel-based algorithms (like SVM with an RBF kernel). It defines the kernel coefficient for 'rbf' and 'linear' kernels. It can take the values 'scale' and 'auto', which are predefined heuristics based on the training data, or specific float values.
- kernel: This parameter specifies the kernel type to be used in the algorithm. It can be 'rbf' for radial basis function kernel or 'linear' for a linear kernel.

### Results and conclusion

The results provide insights into the performance of logistic regression and Support Vector Classifier (SVC) after tuning their hyperparameters using grid search with cross-validation.

For logistic regression:

The best hyperparameters found are C=10, penalty='l1', and solver='liblinear'.
The model achieves an accuracy of 83.6% on the test set.
For SVC:

The best hyperparameters found are C=10, gamma=0.01, and kernel='rbf'.
The model achieves an accuracy of 85.6% on the test set.

Here are some observations:

Both models achieve reasonably good accuracies, indicating that they are able to capture patterns in the data.
SVC slightly outperforms logistic regression, achieving a higher accuracy as we expected earlier.
The choice of hyperparameters can significantly impact the performance of the models. In this case, the grid search helped identify hyperparameters that improved model accuracy.
It's essential to validate the performance of the models on a separate test set to ensure generalization to unseen data.
Overall, these results provide valuable information for selecting the appropriate algorithm and hyperparameters for classification tasks.