In [1]:
# packages
import pandas as pd
from mod02_build_bot_predictor import train_model

### Define a function to extract predictions from the model

In [2]:
def predict_bot(df, model=None):
    """
    Predict whether each account is a bot (1) or human (0).
    """
    if model is None:
        model = train_model()

    preds = model.predict(df)
    return pd.Series(preds, index=df.index)

### Define a function to evaluate model error

In [3]:
def confusion_matrix_and_metrics(y_true, y_pred):
    """
    Computes confusion matrix and common error rates for binary classification.

    Assumes labels:
      0 = negative class
      1 = positive class

    Returns:
      dict with:
        tn, fp, fn, tp
        misclassification_rate
        false_positive_rate
        false_negative_rate
    """
    tn = fp = fn = tp = 0

    for yt, yp in zip(y_true, y_pred):
        if yt == 0 and yp == 0:
            tn += 1
        elif yt == 0 and yp == 1:
            fp += 1
        elif yt == 1 and yp == 0:
            fn += 1
        elif yt == 1 and yp == 1:
            tp += 1
        else:
            raise ValueError("Labels must be 0 or 1")

    total = tn + fp + fn + tp

    misclassification_rate = (fp + fn) / total if total > 0 else 0.0
    false_positive_rate = fp / (fp + tn) if (fp + tn) > 0 else 0.0
    false_negative_rate = fn / (fn + tp) if (fn + tp) > 0 else 0.0

    return {
        "tp": tp,
        "tn": tn,
        "fp": fp,
        "fn": fn,
        "misclassification_rate": misclassification_rate,
        "false_positive_rate": false_positive_rate,
        "false_negative_rate": false_negative_rate,
    }


### Load the data

In [4]:
TRAIN_PATH = "mod02_data/train.csv"
train = pd.read_csv(TRAIN_PATH)

TEST_PATH = "mod02_data/test.csv"
test = pd.read_csv(TEST_PATH)

### Format the data by independent vs. dependent variables

In [5]:
X_train = train.drop(columns=["is_bot"])
y_train = train['is_bot']

X_test = test.drop(columns=["is_bot"])
y_test = test['is_bot']

### Build the model on training data

In [6]:
model = train_model(X_train, y_train)

### Get the model predictions on training and test data

In [7]:
y_pred_train = predict_bot(X_train, model)
y_pred_test = predict_bot(X_test, model)

### Check results on the training set (data used to build the model)

In [8]:
confusion_matrix_and_metrics(y_train, y_pred_train)

{'tp': 75,
 'tn': 2592,
 'fp': 45,
 'fn': 288,
 'misclassification_rate': 0.111,
 'false_positive_rate': 0.017064846416382253,
 'false_negative_rate': 0.7933884297520661}

### Check results on the test set (new data not yet seen by the model)

In [9]:
confusion_matrix_and_metrics(y_test, y_pred_test)

{'tp': 28,
 'tn': 858,
 'fp': 16,
 'fn': 98,
 'misclassification_rate': 0.114,
 'false_positive_rate': 0.018306636155606407,
 'false_negative_rate': 0.7777777777777778}

# Discussion Questions

### Based on the misclassification rate of your model, discuss your confidence in the ability to predict a bot. 

With a misclassification rate of 11.4% I would beleive that this model could rather acurately predict bots. However, this model has a 78% false negative rate, which means that it is a large majority of the time thinking a bot is a human. With only a 2% false positive rate, when given a human input,  this model is good at realizing that it is actualy a human. With both of these, if the model predicts a bot, I would be mostly confident that that is a bot. However, if the model predicts a human I am not confident that the input is actualy a human.

### What are potential ramifications of false positives from the model?

The issue caused by false positives is denying acsess to an actual user by mistaking them for a bot. The ramifications of this depend on how this model is being used, but will likley involved an annoyed user and less acsess to whatever this model wants only humans to acsess. 

### What are potential ramifications of false negatives from the model?

The issued caused by false neagtives is allowing acsess to bots due to thinking they are a human. The ramifications of this depend on how this model is being used, but will likely involve various types of bots invading whatever this model want only humans to acsess. This will likely also annoy users if they can realize that a bot is interacting with something that should only have humans interacting with it.