# Tensorflow feature columns - A great idea

Tensorflow's [feature columns](https://www.tensorflow.org/guide/feature_columns) are a great idea. Feature columns allow user's to easily transform input to tensorflow's premade models. For example with feature columns you can specify

- how a feature should be normalized.
- that a feature should be one-hot-encoded
- that a feature should be transformed to an embedding

In my opinion feature columns provide a key feature I find missing in [keras](https://keras.io/) when working with proprietary data from day to day. Keras has some really nice preprocessing features that are useful for hard things like image preprocessing and time series. However, I find that keras lacks support for some of the workhorse preprocessing functionality found in sklearn (such as one hot encoding and pipelines). Personally the ease of specifying a categorical column as an embedding offers interesting possibilities for many business applications (think of how many times you've one hot encoded IDs or features like zip code).

# Working with feature columns

While feature columns support both a preprocessing pipeline and a set of basic transformations easily applied to a variety of industry problems I found that, unfortunately, while a great idea, working with feature columns is awkward if you want to do anything slightly outside the box. In my experience this is mainly due to the fact that feature columns only work with [tensorflow estimators](https://www.tensorflow.org/guide/estimators).

My first attempt at working with feature columns was to try and connect feature columns to keras models. Why? Because for time series applications at work I would like to have a convenient way to feed a mixture of numeric and categorical values to an LSTM. Feature columns make the first part easy and keras makes defining an LSTM easy. Since tensorflow now houses a keras API I thought this would be straightforward. I was wrong. The key to getting this to work is to convert your keras model to a tensorflow estimator with [tf.keras.estimator.model_to_estimator](https://www.tensorflow.org/api_docs/python/tf/keras/estimator/model_to_estimator). However, actually connecting your feature columns to your keras model is far from trivial requiring more code than seems worth the trouble.

Taking the hint from my first stab at using feature columns, my second attempt was to stick with tensorflow estimators and avoid keras altogether. In this case I simply tried to re-implement a simple linear model I implemented for a project at work some time ago using feature columns and tensorflow's [linear regression estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/LinearRegressor). In a short period of time I was able to get the model training. However, there was some outliers that the model predicted poorly on. No problem I thought - I already faced this in my initial implementation and knew that using huber loss would likely remedy the issue. However, after speding more time researching how to switch from the default loss function (MSE) to huber loss than I wish I had I concluded that it isn't possible to do so without writing your own custom estimator. But writing your own estimator was a deal breaker for me - for me, the whole appeal of feature columns was having something that worked out of the box.

My last comment is that it's worth noting that the web is pretty silent on how to do anything with tensorflow estimators outside of what you can find in the docs. This [stackoverlfow post](https://stackoverflow.com/questions/50766718/changing-loss-function-for-training-built-in-tensorflow-estimator) (accessed 7/17/18) is pretty indicative of the kind of help you'll find on the subject... crickets.

Given the outcome of this adventure I was pretty dissapointed that working with feature columns was so cumbersome considering they seem to be such a great idea. However, rather than give up on the idea, I wrote a few `keras` classes that accomplish what feature columns do. Snippets from the implementation are below.

In [1]:
import keras
import numpy as np
import pandas as pd


class FeatureColumn:
    def __init__(self, name):
        self.name = name
        self.input = keras.layers.Input((1,), name=self.name)

    @property
    def output(self):
        return self.input

    def transform(self, X):
        return X.values


class NumericColumn(FeatureColumn):
    def __init__(self, *args, normalizer=None, **kwargs):
        super().__init__(*args, **kwargs)
        self.normalizer = normalizer

    def transform(self, X):
        if self.normalizer is not None:
            return self.normalizer(X).values
        return X.values


class EmbeddingColumn(FeatureColumn):
    def __init__(self, *args, vocabulary, output_dim, **kwargs):
        super().__init__(*args, **kwargs)
        self.vocabulary = vocabulary
        self.vocab_map = {v: i for i, v in enumerate(vocabulary)}
        self.output_dim = output_dim

    @property
    def output(self):
        embedding = keras.layers.Embedding(
            input_dim=len(self.vocabulary)+1,  # +1 for OOV
            output_dim=self.output_dim,
            input_length=1)(self.input)
        return keras.layers.Flatten()(embedding)

    def transform(self, X):
        mapping = lambda x: self.vocab_map.get(x, len(self.vocabulary))
        return X.apply(mapping).values


class Features:
    def __init__(self, *features):
        self.features = features

    @property
    def inputs(self):
        return [f.input for f in self.features]

    @property
    def output(self):
        concat = keras.layers.Concatenate(axis=-1)
        return concat([f.output for f in self.features])

    def transform(self, X):
        return [f.transform(X[f.name]) for f in self.features]

Using TensorFlow backend.


In [2]:
X = pd.DataFrame({
    'feature1': np.random.randint(10, size=100),
    'feature2': np.random.randint(100, size=100),
    'feature3': np.random.rand(100)
})
y = np.random.rand(100)


def normalize(column):
    mean = X[column].mean()
    std = X[column].std()
    def normalizer(X, mean=mean, std=std):
        return (X - std) / mean


features = Features(
    EmbeddingColumn('feature1', vocabulary=list(range(10)), output_dim=10),
    EmbeddingColumn('feature2', vocabulary=list(range(100)), output_dim=10),
    NumericColumn('feature3', normalizer=normalize('feature3')),
)


x = keras.layers.Dense(50, activation='relu')(features.output)
x = keras.layers.Dense(50, activation='relu')(x)
x = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.models.Model(inputs=features.inputs, outputs=x)
model.summary()
model.compile(loss='mse', optimizer='adam')
_ = model.fit(features.transform(X), y)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
feature1 (InputLayer)           (None, 1)            0                                            
__________________________________________________________________________________________________
feature2 (InputLayer)           (None, 1)            0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 1, 10)        110         feature1[0][0]                   
__________________________________________________________________________________________________
embedding_2 (Embedding)         (None, 1, 10)        1010        feature2[0][0]                   
__________________________________________________________________________________________________
flatten_1 