Skip to content

Ridge Regression Tutorial

Guled edited this page Feb 28, 2017 · 3 revisions

For this tutorial we will be using a dataset contained within a CSV file. You can obtain the CSV file I'm working with here. The CSV file contains data that pertain to the properties of a house when deciding to purchase one such as the size of the house, price, number of bathrooms, number of beds, etc.

⚠️ All of your data (features, observations, etc...) must be floating point values. You can utilize the MLDataManager class's convertMyDataToFloat method.

Downloading the data

Once you have downloaded the CSV file provided (above) we can now start modifying that data to be used with the framework.

⚠️ Note: We will be using the CSVReader framework to parse our CSV data. This framework is included with the installation of MLKit.

// Obtain data from csv file
let path = Bundle(for: [SELF OR OBJECT GOES HERE] ).path(forResource: "kc_house_data", ofType: "csv")
let csvUrl = NSURL(fileURLWithPath: path!)
let file = try! String(contentsOf: csvUrl as URL, encoding: String.Encoding.utf8)
let data = CSVReader(with: file)

Extracting Features

For this example I will be using one feature: Square Feet Living ("sqft_living" column of CSV). In order to obtain these two columns we will use the columns method like so:

// Setup the features we need and convert them to floats if necessary
let training_data_string = data.columns["sqft_living"]!

Since MLKit primarily uses Floats, we will proceed to converting the training data into type Float:

// Features
let training_data = training_data_string.map { Float($0)! }

Define Your Observation/Output

// Output
let output_as_string = data.columns["price"]!
let output_data = output_as_string.map { Float($0)! }

Training Our Model

Now that we have extracted our features it's time that we train our model. In order to do so we must instantiate a RidgeRegression Object.

let ridgeModel = RidgeRegression()

Next, we need to instantiate our weights. The weights chosen here are arbitrary. The last weight, 1.0, is the intercept. Here we are using the Matrix class which comes from the Upsurge framework.

// Setup initial weights (One weight for our feature, one for the intercept/bias)
let initial_weights = Matrix<Float>(rows: 2, columns: 1, elements: [0.0, 0.0])

Training the Model

The last step is to train the model. It's just one line of code. Here you can customize your l2Penalty value, stepSize, and the maximum number of iterations.

// Fit the model and obtain the weights
let weights = try! ridgeModel.train([training_data], output: output_data, initialWeights: initial_weights, stepSize: Float(1e-12), l2Penalty: 0.0, maxIterations: 1000)

Your new weights are available in the weights variable above.

Evaluating the Cost of our Model

The method used to evaluate the cost of our model will be the residual sum of squares (RSS) equation. In order to view the RSS of your current model (after it has been trained) simply call the rss method.

let RSS = try! ridgeModel.RSS([training_data], observation: output_data)

Making Predictions

In order to create a prediction we simply call the predict method passing in our input values (features) and our weights (the weights we obtained after training our model).

let quickPrediction = ridgeModel.predict([Float(1.0), Float(1.18000000e+03)], yourWeights: weights.elements)
print(quickPrediction)

You can now access the estimated predicted price of a house (in our example) via the quickPrediction constant.