# Basline Classifiers (Dummy Classifiers)

The baseline classifiers make predictions with simple rules, possibly without using any features. If our cool ML models can't beat the baseline, then we need to rethink about our models. 

In [None]:
import os, sys
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.dummy import DummyClassifier

## Load Dataset

We will load the dataset from file into a Panda data frame and investigate its structure. 


In [None]:
# Dataset location
DATASET = '/dsa/data/all_datasets/wine-quality/winequality-red.csv'
assert os.path.exists(DATASET)

# Load and shuffle
dataset = pd.read_csv(DATASET, sep=';').sample(frac = 1).reset_index(drop=True)

# View some metadata of the dataset and see if that makes sense
print('dataset.shape', dataset.shape)

X = np.array(dataset.iloc[:,:-1])[:, [1,2,6,9,10]]
y = np.array(dataset.quality)

print('X', X.shape, 'y', y.shape)
print('Label distribution:', {i: np.sum(y==i) for i in np.unique(dataset.quality)})

### Describe dataset.

In [None]:
dataset.describe()

### Building a dummy classifier using sklearn DummyClassifier

Let's create four baslines with sklearn `DummyClassifier`: 

* Most Frequent: The classifier always predicts the most frequent class label in the training data.
* Stratified: It generates predictions by respecting the class distribution of the training data. It is different from the “most frequent” strategy as it instead associates a probability with each data point of being the most frequent class label.
* Uniform: It generates predictions uniformly at random.
* Constant: The classifier always predicts a constant label and is primarily used when classifying non-majority class labels.

#### Most Frequent Strategy

In [None]:
dummy_model = DummyClassifier(strategy='most_frequent')
dummy_model.fit(X,y)
print(f"Accuracy: {dummy_model.score(X,y)}")

#### Prior Strategy

In [None]:
dummy_model = DummyClassifier(strategy='prior')
dummy_model.fit(X,y)
print(f"Accuracy: {dummy_model.score(X,y)}")

#### Stratified Strategy

In [None]:
dummy_model = DummyClassifier(strategy='stratified')
dummy_model.fit(X,y)
print(f"Accuracy: {dummy_model.score(X,y)}")

Note: The above score will keep changing as prediction is sampled from a distribution. 

#### Uniform Strategy

In [None]:
dummy_model = DummyClassifier(strategy='uniform')
dummy_model.fit(X,y)
print(f"Accuracy: {dummy_model.score(X,y)}")

Note: The above score will keep changing as prediction is sampled from a distribution.