# Task 1:

To compare the new algorithm to a baseline algorithm, we give the baseline some default settings and get its accuracy on the dataset. Then, we change some settings 100 times and compare each result using 10-fold cross-validation with the baseline algorithm to get the best algorithm. 

Since the training and testing is only based on half of the dataset (not the full dataset), there will exist sampling error and might cause over-estimation of performance. This means that when the two models are evaluated on the second half of the data, we are not supposed to suspect that the new algorithm (with the best-performing hyper-parameter setting) outperforming the baseline (with the standard hyper-parameter setting). And the best model from the first half of the data might be outperformed by the baseline in the second half of the data.

But, if most of the models can outperform the baseline on the first half, it is more likely to expect the result will be the same in the second half, that is, the best model will still outperform the baseline.

The code is as follows.


In [11]:
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
import numpy as np

# Generate dataset
X, y = make_classification(n_samples=500, n_features=10, n_classes=2, random_state=5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=5)

# Baseline model
baseline = RandomForestClassifier(random_state=5)
baseline_acc = cross_val_score(baseline, X_train, y_train, cv=10, scoring='accuracy').mean()

# Find best model
best_model = None
best_acc = 0

for i in range(100):
    n_estimators = np.random.randint(20, 80)
    max_depth = np.random.randint(2, 8)
    max_features = np.random.randint(2, 8)

    rfc = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, max_features=max_features, random_state=42)
    acc = cross_val_score(rfc, X_train, y_train, cv=10, scoring='accuracy').mean()

    if acc > best_acc:
        best_model = rfc
        best_acc = acc

# Evaluate best model and baseline on test set
best_model.fit(X_train, y_train)
best_acc_test = accuracy_score(y_test, best_model.predict(X_test))

baseline.fit(X_train, y_train)
baseline_acc_test = accuracy_score(y_test, baseline.predict(X_test))

print(f"Train set accuracy (baseline): {baseline_acc:.3f}")
print(f"Train set accuracy (best model): {best_acc:.3f}")
print(f"Test set accuracy (baseline): {baseline_acc_test:.3f}")
print(f"Test set accuracy (best model): {best_acc_test:.3f}")

Train set accuracy (baseline): 0.908
Train set accuracy (best model): 0.912
Test set accuracy (baseline): 0.932
Test set accuracy (best model): 0.920


From the result, we can see that in the first half of the dataset, the best model has higher accuracy than the baseline. But then in the second half of the dataset, the baseline has higher accuracy than the best model instead, as I said before.

# Task 2:

In this task we need to choose a binary classification model randomly (without training) on the dataset, and have collect the accuracy and AUC, and also observed a much higher precision for the majority class than for the minority class. 

Next, we use all the instances from the minority class and sampling (without replacement) 1000 instances from the majority class to form the new dataset and use the same binary classification model on it.

1.For accuracy, I think it will drop in the new dataset. Because in the first situation, most of accuracy comes from the majority class, and if we build a new dataset, it just like decreasing the majority class and increasing the minority class. So, the proportion of misclassified instances will also increase, which casue the accuracy to be less.

2.For the AUC, we do not expect it will change a lot because the AUC is not affected by the class proportions. But due to our new form of dataset and changed samples, AUC in our case may change slightly.

3.Last, for the precision of majority and minority class, it might increase for the minority and decrease for the majority. Because by constructing the new dataset, some majority samples which have been predicted to be minority by mistake are moved out, and thus the precision of minority class will increase and for majority will decrease instead.

The code is as follows.


In [16]:
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score, roc_auc_score, precision_score
from sklearn.linear_model import LogisticRegression
import numpy as np

# create the original dataset
X, y = make_classification(n_samples=5000, n_features=10, n_informative=5, n_redundant=5, n_classes=2,
                           weights=[0.8, 0.2], class_sep=1.0, random_state=48)

# create the model and fit it on the original dataset
model = LogisticRegression(random_state=48)
model.fit(X, y)
y_pred = model.predict(X)

# calculate the accuracy, AUC, and precision for the original dataset
acc = accuracy_score(y, y_pred)
auc = roc_auc_score(y, model.predict_proba(X)[:, 1])
prec_maj = precision_score(y, y_pred, pos_label=0)
prec_min = precision_score(y, y_pred, pos_label=1)

print("Results on the original dataset:")
print(f"Accuracy: {acc:.3f}")
print(f"AUC: {auc:.3f}")
print(f"Precision for majority class: {prec_maj:.3f}")
print(f"Precision for minority class: {prec_min:.3f}")

# create the new dataset, use the same model to predict 
# and calculate the accuracy, AUC, and precision on it
maj_indices = np.random.choice(np.where(y == 0)[0], size=1000, replace=False)
X_new = np.vstack([X[y == 1], X[maj_indices]])
y_new = np.concatenate([y[y == 1], y[maj_indices]])

y_pred_new = model.predict(X_new)

acc_new = accuracy_score(y_new, y_pred_new)
auc_new = roc_auc_score(y_new, model.predict_proba(X_new)[:, 1])
prec_maj_new = precision_score(y_new, y_pred_new, pos_label=0)
prec_min_new = precision_score(y_new, y_pred_new, pos_label=1)

print("\nResults on the new dataset:")
print(f"Accuracy: {acc_new:.3f}")
print(f"AUC: {auc_new:.3f}")
print(f"Precision for majority class: {prec_maj_new:.3f}")
print(f"Precision for minority class: {prec_min_new:.3f}")

Results on the original dataset:
Accuracy: 0.808
AUC: 0.776
Precision for majority class: 0.825
Precision for minority class: 0.572

Results on the new dataset:
Accuracy: 0.575
AUC: 0.772
Precision for majority class: 0.541
Precision for minority class: 0.818


From the result, we can see that the accuracy really drops in the new dataset, and the AUC of these two are nearly the same. As for precision, in the original dataset, the precision for majority class is much higher than that of the minority class, and it decrease for majority and increase for minority in the new dataset.