# Feature Columns with tf.feature_column - DEPRECATED

> Important: `tf.feature_column` has been deprecate. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.

This tutorial details feature columns. Think of feature columns as the intermediaries between raw data and Estimators. Feature columns are very rich, enabling you to transform a diverse range of raw data into formats that Estimators can use, allowing easy experimentation.

*In simple words feature column are bridge between raw data and estimator or model.*



![alt text](https://www.tensorflow.org/images/feature_columns/feature_cloud.jpg)
Some real-world features (such as, longitude) are numerical, but many are not.

Input to a Deep Neural Network

What kind of data can a deep neural network operate on? The answer is, of course, numbers (for example, tf.float32). After all, every neuron in a neural network performs multiplication and addition operations on weights and input data. Real-life input data, however, often contains non-numerical (categorical) data. For example, consider a product_class feature that can contain the following three non-numerical values:

*  kitchenware
* electronics
* sports

ML models generally represent categorical values as simple vectors in which a 1 represents the presence of a value and a 0 represents the absence of a value. For example, when product_class is set to sports, an ML model would usually represent product_class as [0, 0, 1], meaning:

 * 0: kitchenware is absent
 *  0: electronics is absent
 *  1: sports is present

So, although raw data can be numerical or categorical, an ML model represents all features as numbers.

## Feature Columns

As the following figure suggests, you specify the input to a model through the feature_columns argument of an Estimator (DNNClassifier for Iris). Feature Columns bridge input data (as returned by input_fn) with your model.

![alt text](https://www.tensorflow.org/images/feature_columns/inputs_to_model_bridge.jpg)

   Feature columns bridge raw data with the data your model needs.
   
   To create feature columns, call functions from the tf.feature_column module. This tutorial explains nine of the functions in that module. As the following figure shows, all nine functions return either a Categorical-Column or a Dense-Column object, except bucketized_column, which inherits from both classes:
   
   ![alt text](https://www.tensorflow.org/images/feature_columns/some_constructors.jpg)
   Feature column methods fall into two main categories and one hybrid category.
   
   Let's look at these functions in more detail.

## Import TensorFlow and other libraries

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np
import pandas as pd

import tensorflow as tf

from tensorflow import feature_column

## Create Demo data


In [2]:
data = {'marks': [55,21,63,88,74,54,95,41,84,52],
        'marks_float': [55.4,21.4,63,88,74.999,54,95,41,84,52],
        'grade': ['average','poor','average','good','good','average','good','average','good','average'],
        'point': ['c','f','c+','b+','b','c','a','d+','b+','c'],
        'pass': [0,0,1,1,1,1,0,0,0,0],
        }

df = pd.DataFrame(data)
df

Unnamed: 0,marks,marks_float,grade,point,pass
0,55,55.4,average,c,0
1,21,21.4,poor,f,0
2,63,63.0,average,c+,1
3,88,88.0,good,b+,1
4,74,74.999,good,b,1
5,54,54.0,average,c,1
6,95,95.0,good,a,0
7,41,41.0,average,d+,0
8,84,84.0,good,b+,0
9,52,52.0,average,c,0


In [3]:
df.dtypes

marks            int64
marks_float    float64
grade           object
point           object
pass             int64
dtype: object

## Demonstrate several types of feature column

### Numeric columns
The output of a feature column becomes the input to the model (using the demo function defined above, we will be able to see exactly how each column from the dataframe is transformed). A [numeric column](https://www.tensorflow.org/api_docs/python/tf/feature_column/numeric_column) is the simplest type of column. It is used to represent real valued features. When using this column, your model will receive the column value from the dataframe unchanged.

In [4]:
# deprecated approach
marks = feature_column.numeric_column("marks")
marks

Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


NumericColumn(key='marks', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)

### Bucketized columns
Often, you don't want to feed a number directly into the model, but instead split its value into different categories based on numerical ranges. Consider raw data that represents a person's age. Instead of representing age as a numeric column, we could split the age into several buckets using a [bucketized column](https://www.tensorflow.org/api_docs/python/tf/feature_column/bucketized_column). Notice the one-hot values below describe which age range each row matches.
     Buckets include the left boundary, and exclude the right boundary.
 For example, consider raw data that represents the year a house was built. Instead of representing that year as a scalar numeric column, we could split the year into buckets

The model will represent the buckets as follows:
Date Range| Description|
------------|--------------------
< 1960 |  	[1, 0, 0, 0]|
$\ge$ 1960 but < 1980 | [0, 1, 0, 0]|
$\ge$ 1980 but < 2000 | [0, 0, 1, 0]|
$\ge$ 2000| [0, 0, 0, 1] |

Why would you want to split a number—a perfectly valid input to your model—into a categorical value? Well, notice that the categorization splits a single input number into a four-element vector. Therefore, the model now can learn four individual weights rather than just one; four weights creates a richer model than one weight. More importantly, bucketizing enables the model to clearly distinguish between different year categories since only one of the elements is set (1) and the other three elements are cleared (0). For example, when we just use a single number (a year) as input, a linear model can only learn a linear relationship. So, bucketing provides the model with additional flexibility that the model can use to learn.

The following code demonstrates how to create a bucketized feature:

In [5]:
# deprecated approach
marks_buckets = feature_column.bucketized_column(marks, boundaries=[30,40,50,60,70,80,90])
marks_buckets

Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


BucketizedColumn(source_column=NumericColumn(key='marks', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), boundaries=(30, 40, 50, 60, 70, 80, 90))

## Categorical Columns

## Indicator and embedding columns
Indicator columns and embedding columns never work on features directly, but instead take categorical columns as input.

### Indicator columns (i.e. One-Hot encoding)
The categorical vocabulary columns provide a way to represent strings as a one-hot vector (much like you have seen above with age buckets). 

The vocabulary can be passed as a list using [categorical_column_with_vocabulary_list](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_list), or loaded from a file using [categorical_column_with_vocabulary_file](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_file).

In [6]:
vocabulary = ['poor', 'average', 'good'] # i.e., admitted categories
grade = feature_column.categorical_column_with_vocabulary_list('grade', vocabulary)
grade_one_hot = feature_column.indicator_column(grade)
grade_one_hot

Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='grade', vocabulary_list=('poor', 'average', 'good'), dtype=tf.string, default_value=-1, num_oov_buckets=0))

### Embedding columns

As the number of categories grow large, it becomes infeasible to train a neural network using one-hot encodings. Instead of representing the data as a one-hot vector of many dimensions, $N_c$, represents that data as a lower-dimensional, $N_d \ll N_c$, dense vector in which each cell can contain any number, not just 0 or 1. The size of the embedding (8, in the example below) is a parameter that must be tuned.

Key point: using an embedding column is best when a categorical column has many possible values. We are using one here for demonstration purposes, so you have a complete example you can modify for a different dataset in the future.

> Note: when building embeddingm you start from $x_e = W_e \cdot x_o$, where $x_e$ and $x_o$ are the embedded and one-hot encoded representation of a given category, and train the coefficients of a $W_e$ a dense float matrix of size $N_d \times N_c$. If you write $W=[w_1, w_2, ..., w_{N_c}] $ You will see that the embedding on the i-th category is nothing but $w_i$.

### Point column as embedding_column

In [7]:
# deprecated approach
vocabulary = df['point'].unique()
print(f'{vocabulary=}')

point = feature_column.categorical_column_with_vocabulary_list('point', vocabulary) # this syntax is the same as for 1-hot encoding
point_embedding = feature_column.embedding_column(point, dimension=4)
point_embedding

vocabulary=array(['c', 'f', 'c+', 'b+', 'b', 'a', 'd+'], dtype=object)
Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


EmbeddingColumn(categorical_column=VocabularyListCategoricalColumn(key='point', vocabulary_list=('c', 'f', 'c+', 'b+', 'b', 'a', 'd+'), dtype=tf.string, default_value=-1, num_oov_buckets=0), dimension=4, combiner='mean', initializer=<tensorflow.python.ops.init_ops.TruncatedNormal object at 0x1616631d0>, ckpt_to_load_from=None, tensor_name_in_ckpt=None, max_norm=None, trainable=True, use_safe_embedding_lookup=True)

### Hashed feature columns

Another way to represent a categorical column with a large number of values is to use a [categorical_column_with_hash_bucket](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_hash_bucket). This feature column calculates a hash value of the input, then selects one of the `hash_bucket_size` buckets to encode a string. When using this column, you do not need to provide the vocabulary, and you can choose to make the number of hash_buckets significantly smaller than the number of actual categories to save space.

Key point: An important downside of this technique is that there may be collisions in which different strings are mapped to the same bucket. In practice, this can work well for some datasets regardless.

In [8]:
# deprecated approach
point_hashed = feature_column.categorical_column_with_hash_bucket(
      'point', hash_bucket_size=4)
feature_column.indicator_column(point_hashed)

Instructions for updating:
Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.


IndicatorColumn(categorical_column=HashedCategoricalColumn(key='point', hash_bucket_size=4, dtype=tf.string))

At this point, you might rightfully think: "This is crazy!" After all, we are forcing the different input values to a smaller set of categories. This means that two probably unrelated inputs will be mapped to the same category, and consequently mean the same thing to the neural network. The following figure illustrates this dilemma, showing that kitchenware and sports both get assigned to category (hash bucket) 12:

![alt text](https://www.tensorflow.org/images/feature_columns/hashed_column.jpg)
Representing data with hash buckets.

As with many counterintuitive phenomena in machine learning, it turns out that hashing often works well in practice. That's because hash categories provide the model with some separation. The model can use additional features to further separate kitchenware from sports.

### Crossed feature columns
Combining features into a single feature, better known as [feature crosses](https://developers.google.com/machine-learning/glossary/#feature_cross), enables a model to learn separate weights for each combination of features. Here, we will create a new feature that is the cross of marks and age. Note that `crossed_column` does not build the full table of all possible combinations (which could be very large). 

- Instead, it is backed by a `hashed_column`, so you can choose how large the table is. 
- This is not the only way. Say you have two embedding matrices, $W_a$ and $W_b$. If you just concatenate them, you can create a feature $x = x_a + x_b$ which is the sum of the emebedding vectors. You can feed this into a model (e.g. a linear model), such that $x = \alpha x_a + \beta x_b$, etc.

In all the approaches, the net efect is that the model learns separate weights for each combination of features.

More concretely, suppose we want our model to calculate real estate prices in Atlanta, GA. Real-estate prices within this city vary greatly depending on location. Representing latitude and longitude as separate features isn't very useful in identifying real-estate location dependencies; however, crossing latitude and longitude into a single feature can pinpoint locations. Suppose we represent Atlanta as a grid of 100x100 rectangular sections, identifying each of the 10,000 sections by a feature cross of latitude and longitude. This feature cross enables the model to train on pricing conditions related to each individual section, which is a much stronger signal than latitude and longitude alone.

The following figure shows our plan, with the latitude & longitude values for the corners of the city in red text:

![alt text](https://www.tensorflow.org/images/feature_columns/Atlanta.jpg)



In [9]:
# deprecated approach
crossed_feature = feature_column.crossed_column([marks_buckets, grade], hash_bucket_size=10)
feature_column.indicator_column(crossed_feature)

Instructions for updating:
Use `tf.keras.layers.experimental.preprocessing.HashedCrossing` instead for feature crossing when preprocessing data to train a Keras model.


IndicatorColumn(categorical_column=CrossedColumn(keys=(BucketizedColumn(source_column=NumericColumn(key='marks', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), boundaries=(30, 40, 50, 60, 70, 80, 90)), VocabularyListCategoricalColumn(key='grade', vocabulary_list=('poor', 'average', 'good'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), hash_bucket_size=10, hash_key=None))

You may create a feature cross from either of the following:

  * Feature names; that is, names from the dict returned from input_fn.
  * Any categorical column, except categorical_column_with_hash_bucket (since crossed_column hashes the input).

Except that a full grid would only be tractable for inputs with limited vocabularies. Instead of building this, potentially huge, table of inputs, the crossed_column only builds the number requested by the hash_bucket_size argument. The feature column assigns an example to a index by running a hash function on the tuple of inputs, followed by a modulo operation with hash_bucket_size.

As discussed earlier, performing the hash and modulo function limits the number of categories, but can cause category collisions; that is, multiple (latitude, longitude) feature crosses will end up in the same hash bucket. In practice though, performing feature crosses still adds significant value to the learning capability of your models.

Somewhat counterintuitively, when creating feature crosses, you typically still should include the original (uncrossed) features in your model (as in the preceding code snippet). The independent latitude and longitude features help the model distinguish between examples where a hash collision has occurred in the crossed feature.

In [10]:
feature_column.crossed_column

<function tensorflow.python.feature_column.feature_column_v2.crossed_column(keys, hash_bucket_size, hash_key=None)>