Name: Christian Hellum Bye

# Baseline Models

This notebook presents some baseline models for the classification problem in the final project.

## Preprocessing the data

In [9]:
import numpy as np
from sklearn.model_selection import train_test_split #to split the dataset

In [10]:
data = np.loadtxt('../pulsar_stars.csv', delimiter=',', skiprows=1)

In [11]:
X = data[:, 0:8] #features
y = data[:, 8] #classes

In [12]:
#split the dataset into two parts, 80 % containing training set, 20 % to the test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

## Fitting the models

In [29]:
from sklearn.dummy import DummyClassifier

In [31]:
strat = DummyClassifier(strategy='stratified') #generates predictions by respecting the training set’s class distribution
freq = DummyClassifier(strategy='most_frequent') #always predicts the most frequent label in the training set
uniform = DummyClassifier(strategy='uniform') #generates predictions uniformly at random
constant = DummyClassifier(strategy='constant', constant=1) #always predicts a constant label of 1

In [32]:
strat.fit(X_train, y_train)
freq.fit(X_train, y_train)
uniform.fit(X_train, y_train)
constant.fit(X_train, y_train)

DummyClassifier(constant=1, random_state=None, strategy='constant')

## Prediction and scores

In [33]:
y_strat = strat.predict(X_test)
y_freq = freq.predict(X_test)
y_uniform = uniform.predict(X_test)
y_const = constant.predict(X_test)

In [22]:
from sklearn.metrics import f1_score

In [34]:
f1_strat = f1_score(y_test, y_strat)
f1_freq = f1_score(y_test, y_freq)
f1_uniform = f1_score(y_test, y_uniform)
f1_const = f1_score(y_test, y_const)

In [35]:
print('F1-scores:\n')
print('Stratified:', f1_strat)
print('Most frequent:', f1_freq)
print('Uniform:', f1_uniform)
print('Constant:', f1_const)

F1-scores:

Stratified: 0.09895052473763119
Most frequent: 0.0
Uniform: 0.16760299625468167
Constant: 0.16316059517701387


In [37]:
print('Accuracy:\n')
print('Stratified:', strat.score(X_test, y_test))
print('Most frequent:', freq.score(X_test, y_test))
print('Uniform:', uniform.score(X_test, y_test))
print('Constant:', constant.score(X_test, y_test))

Accuracy:

Stratified: 0.8391061452513966
Most frequent: 0.9111731843575419
Uniform: 0.49860335195530725
Constant: 0.0888268156424581
