All imports that we need for this ML model. Model we will build: linear regression. Linear regression basically draws a line in an X dimensional graph, where X>1 & X is a natural number, that shows the trend of the graph. In such cases if you have all values but 1 you are able to find that value by using the slope and the bias of the graph. ( eg.: y=mx*b ).

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

import pandas as pd
from IPython.display import clear_output
import tensorflow as tf

First lets import the data set. Using titanic data set where we have to determine who will most likely survive based on the information given.

In [None]:
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv') #training
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv') #evaluation

y_train = dftrain.pop('survived') #removes the survived column from the loaded csv and stores it in the variable
y_eval = dfeval.pop('survived') #same as the above but for the second dataset

Then we have to get our data ready. As some data has string values we have to parse it into our feature columns ( the data we will feed the model ) as numerical data. To do that we use tensorflow categorical_column_with_vocabulary_list which takes in a vocabulary ( each different feature name will get a unique numerical id ) and the feature name to encode.

Data values that are already represented by numbers are simply appended as float32's.

In [None]:
CATEGORICAL_COLUMNS = ['sex','n_siblings_spouses','parch','class','deck','embark_town','alone']
NUMERIC_COLUMNS = ['age','fare']

feature_columns = []

for feature_name in CATEGORICAL_COLUMNS:
    vocab = dftrain[feature_name].unique()
    feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocab))
for feature_name in NUMERIC_COLUMNS:
    feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))

We have to prepare our data for the model. We need for that a tf.data.Dataset object. We define a function that takes in the data for training, the results expected for the training, the number of epochs ( how many times will the ML model learn from the same data set ), if we should or not shuffle the dataset for each epoch ( helps training our model to think instead of just memorizing ), and the batch size ( this helps with enormous amounts of data, here is kinda pointless, but it's good to know ).

The first function returns a second inner function ( we get our object this way ). In the inner function we create the tf Dataset, shuffle it if necessary, then split it into batches.

Important: evaluation data should be only 1 epoch ( pointless having more ).

In [None]:
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
    def input_function():  # inner function, this will be returned
        ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))  # create tf.data.Dataset object with data and its label
        if shuffle:
            ds = ds.shuffle(1000)  # randomize order of data
        ds = ds.batch(batch_size).repeat(num_epochs)  # split dataset into batches of 32 and repeat process for number of epochs
        return ds  # return a batch of the dataset
    return input_function  # return a function object for use

train_input_fn = make_input_fn(dftrain, y_train, shuffle=True)  # here we will call the input_function that was returned to us to get a dataset object we can feed to the model
eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)

Time to start making the ML model.

We first pass the before created feature_columns to a LinearClassifier from the estimator module in tensorflow. Estimators are basic implementations of algorithms in tensorflow. This will basically create our model.

In [None]:
linear_est = tf.estimator.LinearClassifier(feature_columns);

TRAINING!!

Here is where tensorflow makes its magic, and we just sit and watch like a true overlord ( see code comments to see how it works ). It takes a while to train and evaluate so just do something else until it's done.

In [None]:
linear_est.train(train_input_fn) #here we tell tensorflow to train the model using the batched data
result = linear_est.evaluate(eval_input_fn) #we evaluate the model after it has been trained and get metrics/stats

clear_output() #we clear the console to get rid of the spam
print(result) #we print the accuracy stats of the model

We trained & evaluated our model. Time to predict!

In [None]:
result = list(linear_est.predict(eval_input_fn))
clear_output()
print(dfeval.loc[4], result[4]['probabilities'][1], y_eval.loc[4])

That's it for lineal regression.