Based on https://www.tensorflow.org/tutorials/keras/basic_regression
Dataset was downloaded from https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data and it has been slightly modified to be loaded out of the box.
git clone https://github.com/Luna-Tensorflow/RegressionTutorial.git
git clone -b MNIST_tutorial https://github.com/Luna-Tensorflow/Luna-Tensorflow.git
cd RegressionTutorial
cd local_libs/Tensorflow/native_libs/
mkdir build
cd build
cmake ../src
make
cd ../../../..
In this tutorial we will use slightly preprocessed dataset of cars parameters, to predict their fuel usage (MPG - miles per galon). The dataset is in a .csv
format table with columns: MPG
, Cylinders
, Displacement
, Horsepower
, Weight
, Acceleration
, Model Year
and Origin
. The task is to predict value of first column, based on the rest of them. Due to unlinear influence, Origin
column needs to be one hot encoded, so the last column will be replaced with three new columns: USA
, Europe
, Japan
.
To load dataset from .csv
file, we will use Dataframes, which is Luna library allowing more comfortable work with big datasets (https://github.com/luna/dataframes).
At the beggining we need some imports.
import Std.Base
import Dataframes.Table
import Dataframes.Column
import Tensorflow.Layers.Input
import Tensorflow.Layers.Dense
import Tensorflow.Optimizers.RMSProp
import Tensorflow.Losses.MeanError
import Tensorflow.Model
import Tensorflow.Tensor
import Tensorflow.Types
import Tensorflow.Operations
import Tensorflow.GeneratedOps
import RegressionTutorial.DblColumn
The size of dataset labels is the number of different cars parameters.
def nfeatures:
9
Function to extend given table with new column of zeros and ones, depending on values in column Origin
.
def extendWith table name value:
table' = table.eachTo name (row: (row.at "Origin" == value).switch 0.0 1.0)
table'
Extending table with one hot encoded column Origin
.
def oneHotOrigin table:
t1 = extendWith table "USA" 1
t2 = extendWith t1 "Europe" 2
t3 = extendWith t2 "Japan" 3
t3
We need a function that shuffles rows of given table, to balance dataset. Here the original table is extended with random column, sorted by this column, and then the random column is removed.
def shuffle table:
row = table.rowCount
rand = Tensors.random FloatType [row] 0.0 0.0
col = columnFromList "rand" (rand.toFlatList)
table1 = table.setAt "rand" col
table2 = table1.sort "rand"
table3 = table2.remove "rand"
table3
Function that divides dataset with given ratio, into test and train parts.
def sample table fracTest:
testCount = (fracTest * table.rowCount.toReal).floor
test = table.take testCount
train = table.drop testCount
(train, test)
Function that converts the Dataframes table into tensors list. It simply interprets table as two dimensional list, maps it to Luna Real
type, transpose (because we need to flip columns and rows), and finally creates tensor of given shape, from each row.
def dataframeToTensorList shape table:
lst = table.toList . each (col: (col.toList).each (_.toReal))
t1 = Tensors.fromList2d FloatType lst
t2 = Tensors.transpose t1
lst' = Tensors.to2dList t2
samples = lst'.each(l: Tensors.fromList FloatType shape l)
samples
To estimate correctness of models predictions we use mean error.
def error model xBatch yBatch:
preds = model.evaluate xBatch
predsConst = Operations.makeConst preds
labelsConst = Operations.makeConst yBatch
diff = Operations.abs (predsConst - labelsConst)
error = Operations.mean diff [1]
error.eval.atIndex 0
Preparing data consists of three parts:
- Loading data and one hot encoding last column,
- Dividing train and test datasets into features and labels,
- Converting Dataframe tables to tensors, and batching them.
def prepareData path:
table = Table.read path
table1 = table.dropNa
table2 = oneHotOrigin table1
table3 = table2.remove "Origin"
table4 = shuffle table3
(trainTable, testTable) = sample table4 0.2
trainLabels' = trainTable.at "MPG"
testLabels' = testTable.at "MPG"
trainFeatures' = trainTable.remove "MPG"
testFeatures' = testTable.remove "MPG"
trainFeatures = Tensors.batchFromList $ dataframeToTensorList [nFeatures] trainFeatures'
testFeatures = Tensors.batchFromList $ dataframeToTensorList [nFeatures] testFeatures'
trainLabels = Tensors.batchFromList $ dataframeToTensorList [1] trainLabels'
testLabels = Tensors.batchFromList $ dataframeToTensorList [1] testLabels'
(trainFeatures, testFeatures, trainLabels, testLabels)
And last but not least, helper function to prepare the optimizing function, used in a learning process.
def prepareOptimizer:
lr = 0.001
rho = 0.9
momentum = 0.0
epsilon = 0.000000001
opt = RMSPropOptimizer.create lr rho momentum epsilon
opt
Let's focus on the details of Luna Tensorflow API.
Evaluated model lets us observe the error ratio after training process, on the node named trainedError
. We can compare it with the error ratio before training, displayed on the node named untrainedError
.
Appearance of all main
function nodes.