### Feature columns
The feature_column module of TensorFlow 2 acts as a bridge between your input data and the model. 

The input parameters to be used by the estimators for training are passed as feature columns. 

They are defined in TensorFlow feature_column and specify how the data is interpreted by the model. 

To create feature columns we will need to call functions from **tensorflow.feature_columns**.

There are nine functions available in feature column:

    * categorical_column_with_identity: Here each category is one-hot encoded, and thus has a unique identity. This can be used for numeric values only.


    * categorical_column_with_vocabulary_file: This is used when the categorical input is a string and the categories are given in a file. The string is first converted to a numeric value and then one-hot encoded.


    * categorical_column_with_vocabulary_list: This is used when the categorical input is a string and the categories are explicitly defined in a list. The string is first converted to a numeric value and then one-hot encoded.


    * categorical_column_with_hash_bucket: In case the number of categories is very large, and it is not possible to one-hot encode, we use hashing.


    * crossed_column: When we want to use two columns combined as one feature, for example, in the case of geolocation-based data it makes sense to combine longitude and latitude values as one feature.


    * numeric_column: Used when the feature is a numeric, it can be a single value or even a matrix.


    * indicator_column: We do not use this directly. Instead, it is used with the categorical column, but only when the number of categories is limited and can be represented as one-hot encoded.


    * embedding_column: We do not use this directly. Instead, it is used with the categorical column, but only when the number of categories is very large and cannot be represented as one-hot encoded.


    * bucketized_column: This is used when, instead of a specific numeric value, we split the data into different categories depending upon its value.



The first six functions inherit from the Categorical Column class, the next three inherit from the Dense Column class, and the last one inherits from both classes. In the following example we will use numeric_column and categorical_column_with_vocabulary_list functions.

In [1]:
import tensorflow as tf
from tensorflow import feature_column as fc

numeric_column = fc.numeric_column
categorical_column_with_vocabulary_list = fc.categorical_column_with_vocabulary_list

In [2]:
featcols = [
tf.feature_column.numeric_column("area"),
tf.feature_column.categorical_column_with_vocabulary_list("type",["bungalow","apartment"])
]

In [9]:
def train_input_fn():
    features = {"area":[1000,2000,4000,1000,2000,4000],
    "type":["bungalow","bungalow","house",
         "apartment","apartment","apartment"]}
    labels = [ 500 , 1000 , 1500 , 700 , 1300 , 1900 ]
    return features, labels

In [10]:
model = tf.estimator.LinearRegressor(featcols)
model.train(train_input_fn,steps=200)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/vm/vq630f6x7dx4shwqz6hskm_m0000gn/T/tmp0xq_84to', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.


<tensorflow_estimator.python.estimator.canned.linear.LinearRegressorV2 at 0x13929f990>

In [11]:
def predict_fn():
    features = {"area" : [1500,1800],
              "type" : ['house','apt']}
    return features

prediction = model.predict(predict_fn)

In [12]:
print(next(prediction))
print(next(prediction))

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/vm/vq630f6x7dx4shwqz6hskm_m0000gn/T/tmp0xq_84to/model.ckpt-200
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
{'predictions': array([692.7829], dtype=float32)}
{'predictions': array([830.9035], dtype=float32)}
