# Regression Assignment
In this problem we use the abalone dataset available on Canvas.
The dataset is about predicting the age of the abalone from its
physical measurements. Use the first 7 variables as predictors
and the 8-th as the response. Report all results as the average
of 10 random splits with 80% of data for training and 20% for testing.

#### Import dependencies

In [20]:
import numpy as np
from numpy.linalg import inv
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from sklearn.tree import DecisionTreeRegressor

#### Load data

In [21]:
abalone_dataset = np.loadtxt("data/abalone.csv", delimiter=",")

#### Utility functions

In [22]:
def get_data():
    """ Get randomly split training and test data

    :returns training_x, training_y, test_x, test_y
    :rtype tuple(ndarray, ndarray, ndarray, ndarray)
    """
    # randomly split the data into training and testing
    training, test = train_test_split(abalone_dataset, train_size=.8, test_size=.2)

    # Separate the predictor (x) and response (y) variables
    training_x, training_y = np.hsplit(training, [7])
    test_x, test_y = np.hsplit(test, [7])

    return training_x, training_y, test_x, test_y

def average_ten_runs(func):
    """This decorator augments your function to run ten times
    and return the average result(s)

    Example usage
    The following function returns the average of ten random numbers:
    @average_ten_runs
    def random_number():
        return random.random()

    :parameter func your function
    :returns modified version of your function that returns the mean of ten runs
    """
    def wrapper(*args):
        # Runs func 10 times, putting each returned value into a separate list
        results = zip(*[func(*args) for __ in range(10)])
        # finds the average of each list
        averaged_results = tuple(sum(value_list)/10 for value_list in results)
        return averaged_results
    return wrapper


## OLS Regression
OLS regression, analytic, by solving the normal equations, with λ = 0.0001.
Report the average training and test R2 (2 points)

In [23]:
@average_ten_runs
def OLS_regression():
    """Solves OLS regression model analytically

    :returns training_R2, testing_R2 r-squared of the model predictions
                                     on the training and testing data
    """
    # get data
    training_x, training_y, test_x, test_y = get_data()
    # add columns of 1s
    training_x = np.hstack((np.ones([np.size(training_x, axis=0), 1]), training_x))
    test_x = np.hstack((np.ones([np.size(test_x, axis=0), 1]), test_x))

    # Solve OLS regression model analytically using the normal equation from the lecture
    X = training_x
    Y = training_y
    lmb = 0.0001
    parameters = inv(X.T.dot(X) + lmb * np.identity(np.size(X, axis=1))).dot(X.T).dot(Y)

    # predict y values
    predicted_training_y = training_x.dot(parameters)
    predicted_test_y = test_x.dot(parameters)

    # calculate r-squared
    training_R2 = r2_score(training_y, predicted_training_y)
    testing_R2 = r2_score(test_y, predicted_test_y)

    return training_R2, testing_R2

print("Average R2 on training data: {:.4f}\n" \
      "Average R2 on testing data: {:.4f}".format(*OLS_regression()))

Average R2 on training data: 0.5268
Average R2 on testing data: 0.5286


In [24]:
def Reg_Tree():

    training_x, trainging_y, test_x, test_y = get_data()
    training_x = np.hstack((np.ones([np.size(training_x, axis=0), 1]), training_x))

    test_x = np.hstack((np.ones([np.size(test_x, axis=0), 1]), test_x))
    regressor = DecisionTreeRegressor(max_depth=7, random_state=0)




