<a href="https://colab.research.google.com/github/Aayushktyagi/Structured_Data/blob/master/Heart_disease_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Classification problem using heart disease dataset.**



In [34]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
!pip3 install tensorflow==2.0.0-beta1
import tensorflow as tf
from tensorflow import keras
from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
tf.__version__




'2.0.0-beta1'

**Load Dataset and Split into train and test**

In [35]:
url = 'https://storage.googleapis.com/applied-dl/heart.csv'
dataframe = pd.read_csv(url)
dataframe.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,1,145,233,1,2,150,0,2.3,3,0,fixed,0
1,67,1,4,160,286,0,2,108,1,1.5,2,3,normal,1
2,67,1,4,120,229,0,2,129,1,2.6,2,2,reversible,0
3,37,1,3,130,250,0,0,187,0,3.5,3,0,normal,0
4,41,0,2,130,204,0,2,172,0,1.4,1,0,normal,0


In [36]:
#split dataset
train,test = train_test_split(dataframe,test_size = 0.2)
train,val = train_test_split(train, test_size = 0.2)
print("Training examples",len(train))
print("Validation examples",len(val))
print("test examples",len(test))

Training examples 193
Validation examples 49
test examples 61


**Creating tf.data from pandas Dataframe**

In [0]:
def df_to_dataset(dataframe,shuffle=True,batch_size = 32):
  dataframe = dataframe.copy()
  labels = dataframe.pop("target")
  ds = tf.data.Dataset.from_tensor_slices((dict(dataframe),labels))
  if shuffle:
    ds = ds.shuffle(buffer_size=len(dataframe))
  ds = ds.batch(batch_size)
  return ds 



In [0]:
#Testing df to dataset conversion
batch_size = 5
ds_train = df_to_dataset(train ,batch_size = batch_size)
ds_val = df_to_dataset(val,shuffle =False , batch_size = batch_size)
ds_test = df_to_dataset(test , shuffle =False , batch_size = batch_size)


In [0]:
for feature_batch , label_batch in ds_train.take(1):
  print("Features:",list(feature_batch.keys()))
  print("batch of ages",feature_batch['age'])
  print("batch of labels",label_batch)
  print("Feature batch", feature_batch)

Features: ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']
batch of ages tf.Tensor([52 51 57 77 71], shape=(5,), dtype=int32)
batch of labels tf.Tensor([0 0 0 1 0], shape=(5,), dtype=int32)
Feature batch {'age': <tf.Tensor: id=69, shape=(5,), dtype=int32, numpy=array([52, 51, 57, 77, 71], dtype=int32)>, 'sex': <tf.Tensor: id=77, shape=(5,), dtype=int32, numpy=array([1, 0, 1, 1, 0], dtype=int32)>, 'cp': <tf.Tensor: id=72, shape=(5,), dtype=int32, numpy=array([1, 3, 3, 4, 3], dtype=int32)>, 'trestbps': <tf.Tensor: id=81, shape=(5,), dtype=int32, numpy=array([152, 120, 128, 125, 110], dtype=int32)>, 'chol': <tf.Tensor: id=71, shape=(5,), dtype=int32, numpy=array([298, 295, 229, 304, 265], dtype=int32)>, 'fbs': <tf.Tensor: id=74, shape=(5,), dtype=int32, numpy=array([1, 0, 0, 0, 1], dtype=int32)>, 'restecg': <tf.Tensor: id=76, shape=(5,), dtype=int32, numpy=array([0, 2, 2, 2, 2], dtype=int32)>, 'thalach': <tf.Tensor: id=80, sh

**Demonstrate Several type of feature column**

In [0]:
#example feature column
example_batch = next(iter(ds_train))[0]

#method to create feature column
def demo(feature_column):
  feature_layer = layers.DenseFeatures(feature_column)
  print(feature_layer(example_batch).numpy())

**Numeric column**

In [0]:
age = feature_column.numeric_column("age")
demo(age)

[[52.]
 [51.]
 [57.]
 [77.]
 [71.]]


**Bucketized column**

If you don't want to feed a number directly into a model but instead split the value into different categories based on numeric ranges.


In [0]:
age_bucket = feature_column.bucketized_column(age , boundaries=[18,25,30,35,40,45,55,60,65])
demo(age_bucket)


[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]


**Categorical columns**

In datasets , columns contains values represented as strings. We cannot feed strings directly to a model. Instead we first map them to numeric values. Categorical vocabulary columns provide a way to represent strings to one-hot vector.


In [0]:
thal = feature_column.categorical_column_with_vocabulary_list(
      'thal',['fixed','normal','reversible'])
thal_one_hot = feature_column.indicator_column(thal)
demo(thal_one_hot)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
[[0. 0. 1.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 1. 0.]]


**Embedding Column**

Suppose instead of having few possible strings, we have thousands values per category.So it become infeasible to train a neural network using one hot encoding. We can represent columns with much lower dimension where each cell contains any number not just 0 or 1.

In [0]:
thal_embedding = feature_column.embedding_column(thal,dimension=8)
demo(thal_embedding)

[[-0.43031818 -0.5140477  -0.14765057  0.5500358   0.25458595  0.4369187
   0.6556285   0.59381485]
 [-0.20032851 -0.14532904 -0.13461508  0.18059397  0.51310056 -0.19209644
   0.2993331   0.23008475]
 [-0.43031818 -0.5140477  -0.14765057  0.5500358   0.25458595  0.4369187
   0.6556285   0.59381485]
 [-0.20032851 -0.14532904 -0.13461508  0.18059397  0.51310056 -0.19209644
   0.2993331   0.23008475]
 [-0.20032851 -0.14532904 -0.13461508  0.18059397  0.51310056 -0.19209644
   0.2993331   0.23008475]]


**Cross Feature columns**
Combining features into single feature.

In [0]:
crossed_feature = feature_column.crossed_column([age_bucket,thal],hash_bucket_size=1000)
demo(feature_column.indicator_column(crossed_feature))

Instructions for updating:
The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


**Choosing feature**


In [0]:
feature_columns = []

# numeric cols
for header in ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'ca']:
  feature_columns.append(feature_column.numeric_column(header))

# bucketized cols
age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
feature_columns.append(age_buckets)

# indicator cols
thal = feature_column.categorical_column_with_vocabulary_list(
      'thal', ['fixed', 'normal', 'reversible'])
thal_one_hot = feature_column.indicator_column(thal)
feature_columns.append(thal_one_hot)

# embedding cols
thal_embedding = feature_column.embedding_column(thal, dimension=8)
feature_columns.append(thal_embedding)

# crossed cols
# crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)
# crossed_feature = feature_column.indicator_column(crossed_feature)
# feature_columns.append(crossed_feature)


**Create feature layer**


In [0]:
feature_layer = tf.keras.layers.DenseFeatures(feature_columns)


In [0]:
batch_size = 32
ds_train = df_to_dataset(train ,batch_size = batch_size)
ds_val = df_to_dataset(val,shuffle =False , batch_size = batch_size)
ds_test = df_to_dataset(test , shuffle =False , batch_size = batch_size)

**Create , comple and Train model**

In [45]:
model = keras.Sequential([
        feature_layer,
        layers.Dense(128,activation='relu'),
        layers.Dense(128,activation='relu'),
        layers.Dense(1,activation='sigmoid')
])
model.compile(loss = 'binary_crossentropy',
              optimizer = 'adam',
              metrics = ['accuracy'])
model.fit(ds_train,
          validation_data =ds_val,
          epochs = 10)

loss , acc = model.evaluate(ds_test)
print("Accuracy:{}".format(acc))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Accuracy:0.5901639461517334
