##### Created at : 
28/03/2020
##### Created by : 
Angga Pur, Henrico Aldy Ferdian, & Juli Andika
##### Description : 
Process from get data, splitting data, feature scaling , training , evaluate, and logging
You can choose to using 1.A or 1.B
1.A => NOT convert numerical feature to categorical feature, creating dataset wiith dimension 400 x 152
1.B => convert  numerical feature to categorical feature, creating dataset wiith dimension 400 x 15

##### Import libraries

In [32]:
import datetime
import os
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.layers import Input, Dense, Activation,Dropout
from tensorflow.keras.models import Model

##### Uses Tensorflow 2.1.0

In [33]:
print(tf.__version__)

2.1.0


##### A function to read data from csv

In [34]:
def extract_data(csv_url,columns_name,header=0):
  cols = columns_name
  data = pd.read_csv(r''+csv_url, names=cols, header=header).iloc[:, 1:]
  return data

##### A function to create the dataset

In [35]:
def create_dataset(data,labels):
  X = pd.concat(data, axis=1)
  y = labels.values
  return X,y

##### Open the csv  and print the first 5 row of the csv

In [36]:
columns = ['user_id','gender','age','estimated_salary','output']
data = extract_data('dataset/Social_Network_Ads.csv',columns)
print(data.head())

   gender  age  estimated_salary  output
0    Male   19             19000       0
1    Male   35             20000       0
2  Female   26             43000       0
3  Female   27             57000       0
4    Male   19             76000       0


##### Make numeric data to be categorical data
the range is (a,b,c) => a is bottom_value, b is top_value+1, c is the step
example : range(18,61,6) => bottom age is 18, toppest age is 60, the step is 5, so it will make 6 class

In [37]:
data["age"] = pd.cut(data["age"],range(18,61,5),include_lowest=True) # will be 6 class
data["estimated_salary"] = pd.cut(data["estimated_salary"],range(15000,150001,22500),include_lowest=True) # will be 6 class

##### Make Dataset

In [40]:
gender = pd.get_dummies(data.gender,prefix='gender')
age = pd.get_dummies(data.age,prefix='age')
estimated_salary = pd.get_dummies(data.estimated_salary,prefix='estimated_salary')
labels = pd.get_dummies(data.output,prefix='condition')

##### Make Dataset

In [41]:
X,y = create_dataset([gender, age, estimated_salary],labels)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=np.random) #0 = not random

##### Prepare layers
input layer : adjust based on how many feature the dataset have
hidden layer (1) : 100 node
hidden layer (2) : 200 node
hidden layer (3) : 200 node
hidden layer (4) : 200 node
hidden layer (5) : 200 node
hidden layer (6) : 200 node
hidden layer (7) : 200 node
hidden layer (8) : 200 node
hidden layer (9) : 200 node
hidden layer (10) : 100 node
output layer : adjust based on how many label the dataset have  

In [49]:
input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(100, activation='relu')(input_layer)
dense_layer_2 = Dense(200, activation='relu')(dense_layer_1)
dense_layer_3 = Dense(200, activation='relu')(dense_layer_2)
dense_layer_4 = Dense(200, activation='relu')(dense_layer_3)
dense_layer_5 = Dense(200, activation='relu')(dense_layer_4)
dense_layer_6 = Dense(200, activation='relu')(dense_layer_5)
dense_layer_7 = Dense(200, activation='relu')(dense_layer_6)
dense_layer_8 = Dense(200, activation='relu')(dense_layer_7)
dense_layer_9 = Dense(200, activation='relu')(dense_layer_8)
dense_layer_10 = Dense(100, activation='relu')(dense_layer_9)
output = Dense(y.shape[1], activation='softmax')(dense_layer_10)

##### Make model
We use categorical crossentropy and adam optimizer

In [43]:
model = Model(inputs=input_layer, outputs=output)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
print(model.summary())

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         [(None, 16)]              0         
_________________________________________________________________
dense_11 (Dense)             (None, 100)               1700      
_________________________________________________________________
dense_12 (Dense)             (None, 200)               20200     
_________________________________________________________________
dense_13 (Dense)             (None, 200)               40200     
_________________________________________________________________
dense_14 (Dense)             (None, 200)               40200     
_________________________________________________________________
dense_15 (Dense)             (None, 200)               40200     
_________________________________________________________________
dense_16 (Dense)             (None, 200)               4020

##### Prepare tensorboard log

In [44]:
log_dir= os.path.join('logs','fit',datetime.datetime.now().strftime("%Y%m%d-%H%M%S"),'')
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

##### Train model
batch size 10 and 50 epoch. 80% train data and 20% validation

In [45]:
history = model.fit(X_train, y_train, batch_size=10, epochs=50, verbose=1, validation_split=0.2, callbacks=[tensorboard_callback])

Train on 256 samples, validate on 64 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


##### Save model

In [46]:
model.save('saved_model/model.h5')

##### Show score

In [47]:
score = model.evaluate(X_test, y_test, verbose=1)
print("Test Score:", score[0])
print("Test Accuracy:", score[1])

Test Score: 0.8240363240242005
Test Accuracy: 0.875


##### Launch tensorboard

In [48]:
%load_ext tensorboard
%tensorboard --logdir logs/fit

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


Reusing TensorBoard on port 6006 (pid 17544), started 0:19:27 ago. (Use '!kill 17544' to kill it.)