Luna Regression Tutorial

Based on https://www.tensorflow.org/tutorials/keras/basic_regression

Dataset was downloaded from https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data and it has been slightly modified to be loaded out of the box.

Cloning repository.

git clone https://github.com/Luna-Tensorflow/RegressionTutorial.git
git clone -b MNIST_tutorial https://github.com/Luna-Tensorflow/Luna-Tensorflow.git
cd RegressionTutorial

Building libraries.

cd local_libs/Tensorflow/native_libs/
mkdir build
cd build
cmake ../src
make
cd ../../../..

Dataset

In this tutorial we will use slightly preprocessed dataset of cars parameters, to predict their fuel usage (MPG - miles per galon). The dataset is in a .csv format table with columns: MPG, Cylinders, Displacement, Horsepower, Weight, Acceleration, Model Year and Origin. The task is to predict value of first column, based on the rest of them. Due to unlinear influence, Origin column needs to be one hot encoded, so the last column will be replaced with three new columns: USA, Europe, Japan.

To load dataset from .csv file, we will use Dataframes, which is Luna library allowing more comfortable work with big datasets (https://github.com/luna/dataframes).

Let's start with Luna Studio!

At the beggining we need some imports.

import Std.Base
import Dataframes.Table
import Dataframes.Column
import Tensorflow.Layers.Input
import Tensorflow.Layers.Dense
import Tensorflow.Optimizers.RMSProp
import Tensorflow.Losses.MeanError
import Tensorflow.Model
import Tensorflow.Tensor
import Tensorflow.Types
import Tensorflow.Operations
import Tensorflow.GeneratedOps
import RegressionTutorial.DblColumn

The size of dataset labels is the number of different cars parameters.

def nfeatures:
    9

Function to extend given table with new column of zeros and ones, depending on values in column Origin.

def extendWith table name value:
    table' = table.eachTo name (row: (row.at "Origin" == value).switch 0.0 1.0)
    table'

Extending table with one hot encoded column Origin.

def oneHotOrigin table:
    t1 = extendWith table "USA" 1
    t2 = extendWith t1 "Europe" 2
    t3 = extendWith t2 "Japan" 3
    t3

We need a function that shuffles rows of given table, to balance dataset. Here the original table is extended with random column, sorted by this column, and then the random column is removed.

def shuffle table:
    row = table.rowCount
    rand = Tensors.random FloatType [row] 0.0 0.0
    col = columnFromList "rand" (rand.toFlatList)
    table1 = table.setAt "rand" col
    table2 = table1.sort "rand"
    table3 = table2.remove "rand"
    table3

Function that divides dataset with given ratio, into test and train parts.

def sample table fracTest:
    testCount = (fracTest * table.rowCount.toReal).floor
    test = table.take testCount
    train = table.drop testCount
    (train, test)

Function that converts the Dataframes table into tensors list. It simply interprets table as two dimensional list, maps it to Luna Real type, transpose (because we need to flip columns and rows), and finally creates tensor of given shape, from each row.

def dataframeToTensorList shape table:
    lst = table.toList . each (col: (col.toList).each (_.toReal))
    t1 = Tensors.fromList2d FloatType lst
    t2 = Tensors.transpose t1
    lst' = Tensors.to2dList t2
    samples = lst'.each(l: Tensors.fromList FloatType shape l)
    samples

To estimate correctness of models predictions we use mean error.

def error model xBatch yBatch:
    preds = model.evaluate xBatch
    predsConst = Operations.makeConst preds
    labelsConst = Operations.makeConst yBatch
    diff = Operations.abs (predsConst - labelsConst)
    error = Operations.mean diff [1]
    error.eval.atIndex 0

Preparing data consists of three parts:

Loading data and one hot encoding last column,
Dividing train and test datasets into features and labels,
Converting Dataframe tables to tensors, and batching them.

def prepareData path:
    table = Table.read path
    table1 = table.dropNa
    table2 = oneHotOrigin table1
    table3 = table2.remove "Origin"
    table4 = shuffle table3
    (trainTable, testTable) = sample table4 0.2

    trainLabels' = trainTable.at "MPG"
    testLabels' = testTable.at "MPG"
    trainFeatures' = trainTable.remove "MPG"
    testFeatures' = testTable.remove "MPG"

    trainFeatures = Tensors.batchFromList $ dataframeToTensorList [nFeatures] trainFeatures'
    testFeatures = Tensors.batchFromList $ dataframeToTensorList [nFeatures] testFeatures'
    trainLabels = Tensors.batchFromList $ dataframeToTensorList [1] trainLabels'
    testLabels = Tensors.batchFromList $ dataframeToTensorList [1] testLabels'

    (trainFeatures, testFeatures, trainLabels, testLabels)

And last but not least, helper function to prepare the optimizing function, used in a learning process.

def prepareOptimizer:
    lr = 0.001
    rho = 0.9
    momentum = 0.0
    epsilon = 0.000000001
    opt = RMSPropOptimizer.create lr rho momentum epsilon
    opt

Building model, training and testing

Let's focus on the details of Luna Tensorflow API.

Code	Node editor
`def main: (trainFeatures, testFeatures, trainLabels, testLabels) = prepareData "auto-mpg.csv"`	Loading batched datasets, divided into train and test parts.
`input = Input.create FloatType [nFeatures] d1 = Dense.createWithActivation 64 Operations.relu input d2 = Dense.createWithActivation 64 Operations.relu d1 d3 = Dense.createWithActivation 1 Operations.relu d2`	Connecting models layers in sequential order: input layer feeded with tensors of [nFeatures] shape, two fully connected layers with 64 output neurons, output fully connected layer with 1 neuron.
`opt = prepareOptimizer loss = MeanErrors.meanSquareError model = Models.make input d3 opt loss untrainedError = error model testFeatures testLabels`	Building model with its parameters: input and output layers, prepared optimizer, mean square error loss function.
`epochs = 30 (h, trained) = model.train [trainFeatures] [trainLabels] epochs (ValidationFraction 0.1) 0 trainedError = error trained testFeatures testLabels None`	Training model, and calculating its error on the test dataset, before and after a whole process.

Evaluated model lets us observe the error ratio after training process, on the node named trainedError. We can compare it with the error ratio before training, displayed on the node named untrainedError.

Appearance of all main function nodes.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.luna-package		.luna-package
Screenshots		Screenshots
local_libs		local_libs
src		src
README.html		README.html
README.md		README.md
auto-mpg.csv		auto-mpg.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Luna Regression Tutorial

Cloning repository.

Building libraries.

Dataset

Let's start with Luna Studio!

Building model, training and testing

About

Releases

Packages

Contributors 2

Luna-Tensorflow/RegressionTutorial

Folders and files

Latest commit

History

Repository files navigation

Luna Regression Tutorial

Cloning repository.

Building libraries.

Dataset

Let's start with Luna Studio!

Building model, training and testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages