<a href="https://colab.research.google.com/github/duttapratikcsc/NPTEL/blob/master/Structured_Data/Feature_Columns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
!pip install sklearn
!pip install tensorflow==2.0.0-beta1

In [0]:
from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np
import pandas as pd
import tensorflow as tf

from tensorflow import feature_column, keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split


# **Use Pandas to create a dataframe**

In [3]:
URL = "https://storage.googleapis.com/applied-dl/heart.csv"
dataframe = pd.read_csv(URL)
dataframe.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,1,145,233,1,2,150,0,2.3,3,0,fixed,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,normal,1
2,67,1,4,120,229,0,2,129,1,2.6,2,2,reversible,0
3,37,1,3,130,250,0,0,187,0,3.5,3,0,normal,0
4,41,0,2,130,204,0,2,172,0,1.4,1,0,normal,0


# **Split the dataframe into train, validation, and test**

In [4]:
  train, test = train_test_split(dataframe, test_size=0.2)
  train, val = train_test_split(train, test_size=0.2)
  print(len(train), 'train examples')
  print(len(val), 'validation examples')
  print(len(test), 'test examples')

193 train examples
49 validation examples
61 test examples


# **Create an input pipeline using tf.data**

In [0]:
#A Utility method to create a tf.data dataset from a Pandas Dataframe
def df_to_dataset(dataframe, shuffle=True, batch_size=32):
  dataframe = dataframe.copy()
  labels = dataframe.pop('target')
  ds = tf.data.Dataset.from_tensor_slices((dict(dataframe),labels))
  if shuffle:
    ds = ds.shuffle(buffer_size=len(dataframe))
  ds = ds.batch(batch_size)
  return ds 


Let's Understand what this function is doing:


1.   Copy the input dataframe so that the changes are not persisted.
2.   Pop the label column from the dataframe with pop method, which returns the label column and remove it from the dataframe.
3.   Create dataset from tensor slices. The tensor slices are created by obtaining dictionary representation of the dataframe and the label column.
4.   Shuffle the dataset in case needed.
5.   Get a batch of tensors of specified size and return it.








In [0]:
batch_size = 5 # A small batch size is used for demonstration purposes
train_ds = df_to_dataset(train, batch_size=batch_size)
val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(train, shuffle=False, batch_size=batch_size)

In [0]:
x , y = next(iter(train_ds))

In [30]:
x

{'age': <tf.Tensor: id=367, shape=(5,), dtype=int32, numpy=array([49, 54, 42, 54, 41], dtype=int32)>,
 'ca': <tf.Tensor: id=368, shape=(5,), dtype=int32, numpy=array([3, 1, 0, 1, 0], dtype=int32)>,
 'chol': <tf.Tensor: id=369, shape=(5,), dtype=int32, numpy=array([149, 188, 265, 201, 157], dtype=int32)>,
 'cp': <tf.Tensor: id=370, shape=(5,), dtype=int32, numpy=array([3, 4, 4, 3, 2], dtype=int32)>,
 'exang': <tf.Tensor: id=371, shape=(5,), dtype=int32, numpy=array([0, 0, 0, 0, 0], dtype=int32)>,
 'fbs': <tf.Tensor: id=372, shape=(5,), dtype=int32, numpy=array([0, 0, 0, 0, 0], dtype=int32)>,
 'oldpeak': <tf.Tensor: id=373, shape=(5,), dtype=float64, numpy=array([0.8, 1.4, 0.6, 0. , 0. ])>,
 'restecg': <tf.Tensor: id=374, shape=(5,), dtype=int32, numpy=array([2, 0, 2, 0, 0], dtype=int32)>,
 'sex': <tf.Tensor: id=375, shape=(5,), dtype=int32, numpy=array([1, 1, 0, 0, 1], dtype=int32)>,
 'slope': <tf.Tensor: id=376, shape=(5,), dtype=int32, numpy=array([1, 2, 2, 1, 1], dtype=int32)>,
 'tha

In [31]:
y

<tf.Tensor: id=380, shape=(5,), dtype=int32, numpy=array([0, 1, 0, 0, 0], dtype=int32)>

# **Understand the input pipeline**

In [27]:
for feature_batch, label_batch in train_ds.take(1):
  print('Every feature:', list(feature_batch.keys()))
  print('A batch of ages:', feature_batch['age'])
  print('A batch of targets:', label_batch)

Every feature: ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']
A batch of ages: tf.Tensor([49 54 42 54 41], shape=(5,), dtype=int32)
A batch of targets: tf.Tensor([0 1 0 0 0], shape=(5,), dtype=int32)


# **Demonstrate several types of feature column**

In [34]:
example_batch = next(iter(train_ds))[0]
example_batch

<tf.Tensor: id=465, shape=(5,), dtype=int32, numpy=array([0, 1, 0, 0, 0], dtype=int32)>

In [0]:
def demo(feature_column):
  feature_layer = layers.DenseFeatures(feature_column)
  print(feature_layer(example_batch).numpy())

### Numeric Column