In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy

We shall work with the dataset found in the file 'murderdata.txt', which is a 20 x 5 data matrix where the columns correspond to

Index (not for use in analysis)

Number of inhabitants

Percent with incomes below $5000

Percent unemployed

Murders per annum per 1,000,000 inhabitants

**Reference:**

Helmut Spaeth,
Mathematical Algorithms for Linear Regression,
Academic Press, 1991,
ISBN 0-12-656460-4.

D G Kleinbaum and L L Kupper,
Applied Regression Analysis and Other Multivariable Methods,
Duxbury Press, 1978, page 150.

http://people.sc.fsu.edu/~jburkardt/datasets/regression

**What to do?**

We start by loading the data; today we will study how the number of murders relates to the percentage of unemployment.

In [5]:
data = numpy.loadtxt('murderdata.txt')
N, d = data.shape

We consider all both features simulaneously.

In [8]:
t = data[:,4]
X = data[:,2:4]
print("Number of training instances: %i" % X.shape[0])
print("Number of features: %i" % X.shape[1])
#print(t)
#print(X)

Number of training instances: 20
Number of features: 2


In [11]:
# NOTE: This template makes use of Python classes. If 
# you are not yet familiar with this concept, you can 
# find a short introduction here: 
# http://introtopython.org/classes.html

class LinearRegression():
    """
    Linear regression implementation.
    """

    def __init__(self):
        
        pass
            
    def fit(self, X, t):
        """
        Fits the linear regression model.

        Parameters
        ----------
        X : Array of shape [n_samples, n_features]
        t : Array of shape [n_samples, 1]
        """        

        # make sure that we have Numpy arrays; also
        # reshape the target array to ensure that we have
        # a N-dimensional Numpy array (ndarray), see
        # https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html
        X = numpy.array(X).reshape((len(X), -1))
        print("Reshaped array X")
        print(X)
        t = numpy.array(t).reshape((len(t), 1))
        print("Reshaped array t")
        print(t)

        # prepend a column of ones
        ones = numpy.ones((X.shape[0], 1))
        print("creating array of ones")
        print(ones)
        X = numpy.concatenate((ones, X), axis=1)
        print("concatenating original X array and array of ones")
        print(X)
        # compute weights  (matrix inverse)
        self.w = numpy.linalg.pinv((numpy.dot(X.T, X)))
        self.w = numpy.dot(self.w, X.T)
        self.w = numpy.dot(self.w, t)
        
        # (2) TODO: Make use of numpy.linalg.solve instead!
        # Reason: Inverting the matrix is not very stable
        # from a numerical perspective; directly solving
        # the linear system of equations is usually better.
                
    #def predict(self, X):
        """
        Computes predictions for a new set of points.

        Parameters
        ----------
        X : Array of shape [n_samples, n_features]

        Returns
        -------
        predictions : Array of shape [n_samples, 1]
        """                     
        
        # (1) TODO: Compute the predictions for the
        # array of input points
        
        # predictions = ...

        #return predictions

Next, let us instantiate a LinearRegression object and call the fit function.

In [12]:
model = LinearRegression()
model.fit(X, t)

Reshaped array X
[[16.5  6.2]
 [20.5  6.4]
 [26.3  9.3]
 [16.5  5.3]
 [19.2  7.3]
 [16.5  5.9]
 [20.2  6.4]
 [21.3  7.6]
 [17.2  4.9]
 [14.3  6.4]
 [18.1  6. ]
 [23.1  7.4]
 [19.1  5.8]
 [24.7  8.6]
 [18.6  6.5]
 [24.9  8.3]
 [17.9  6.7]
 [22.4  8.6]
 [20.2  8.4]
 [16.9  6.7]]
Reshaped array t
[[11.2]
 [13.4]
 [40.7]
 [ 5.3]
 [24.8]
 [12.7]
 [20.9]
 [35.7]
 [ 8.7]
 [ 9.6]
 [14.5]
 [26.9]
 [15.7]
 [36.2]
 [18.1]
 [28.9]
 [14.9]
 [25.8]
 [21.7]
 [25.7]]
creating array of ones
[[1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]
 [1.]]
concatenating original X array and array of ones
[[ 1.  16.5  6.2]
 [ 1.  20.5  6.4]
 [ 1.  26.3  9.3]
 [ 1.  16.5  5.3]
 [ 1.  19.2  7.3]
 [ 1.  16.5  5.9]
 [ 1.  20.2  6.4]
 [ 1.  21.3  7.6]
 [ 1.  17.2  4.9]
 [ 1.  14.3  6.4]
 [ 1.  18.1  6. ]
 [ 1.  23.1  7.4]
 [ 1.  19.1  5.8]
 [ 1.  24.7  8.6]
 [ 1.  18.6  6.5]
 [ 1.  24.9  8.3]
 [ 1.  17.9  6.7]
 [ 1.  22.4  8.6]
 [ 1.  20.2  8.4]
 [ 1.  16

We can also plot the computed model coefficients ...

In [13]:
print("Model coefficients:")
print(model.w)

Model coefficients:
[[-34.07253343]
 [  1.2239307 ]
 [  4.39893583]]


In [14]:
# (c) evaluation of results
preds = model.predict(X)

plt.figure(figsize=(10,10))
plt.scatter(t, preds)
plt.xlabel("Murders per annum per 1,000,000 inhabitants")
plt.ylabel("Predictions")
plt.xlim([0,50])
plt.ylim([0,50])
plt.title("Evaluation")
plt.show()

AttributeError: 'LinearRegression' object has no attribute 'predict'