You want a simple baseline regression model to compare against your model

Use scikit-learn’s DummyRegressor to create a simple model to use as a baseline:

In [12]:
# Load libraries
from sklearn.datasets import load_boston
from sklearn.dummy import DummyRegressor
from sklearn.model_selection import train_test_split

In [13]:
# Load data
boston = load_boston()

# Create features
features, target = boston.data, boston.target
# Make test and training split
features_train, features_test, target_train, target_test = train_test_split(
features, target, random_state=0)



    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np

        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_ho

In [14]:
# Create a dummy regressor
dummy = DummyRegressor(strategy='mean')
# "Train" dummy regressor
dummy.fit(features_train, target_train)
# Get R-squared score
dummy.score(features_test, target_test)

-0.001119359203955339

To compare, we train our model and evaluate the performance score:

In [15]:
# Load library
from sklearn.linear_model import LinearRegression
# Train simple linear regression model
ols = LinearRegression()
ols.fit(features_train, target_train)
# Get R-squared score
ols.score(features_test, target_test)

0.6354638433202118

DummyRegressor allows us to create a very simple model that we can use as a
baseline to compare against our actual model. This can often be useful to
simulate a “naive” existing prediction process in a product or system. For
example, a product might have been originally hardcoded to assume that all new
users will spend $100 in the first month, regardless of their features. If we
encode that assumption into a baseline model, we are able to concretely state the
benefits of using a machine learning approach.
DummyRegressor uses the strategy parameter to set the method of making predictions, including the mean or median value in the training set. Furthermore,
if we set strategy to constant and use the constant parameter, we can set the
dummy regressor to predict some constant value for every observation:


In [16]:
# Create dummy regressor that predicts 20's for everything
clf = DummyRegressor(strategy='constant', constant=20)
clf.fit(features_train, target_train)
# Evaluate score
clf.score(features_test, target_test)

-0.06510502029325727

![](./pics/dummyregressor.jpg)