# Praktikum 4 - Klasifikasi dengan ANN

### Pra Pengolahan Data

**Langkah 1 - Import Library**

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf



**Langkah 2 - Load Data**

In [3]:
dataset = pd.read_csv('dataset/Churn_Modelling.csv')
X = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values

Cek Data (X)

In [4]:
print(X)

[[619 'France' 'Female' ... 1 1 101348.88]
 [608 'Spain' 'Female' ... 0 1 112542.58]
 [502 'France' 'Female' ... 1 0 113931.57]
 ...
 [709 'France' 'Female' ... 0 1 42085.58]
 [772 'Germany' 'Male' ... 1 0 92888.52]
 [792 'France' 'Female' ... 1 0 38190.78]]


**Langkah 3 - Encoding Data Kategorikal**

In [5]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])

Cek Data (X)

In [6]:
print(X)

[[619 'France' 0 ... 1 1 101348.88]
 [608 'Spain' 0 ... 0 1 112542.58]
 [502 'France' 0 ... 1 0 113931.57]
 ...
 [709 'France' 0 ... 0 1 42085.58]
 [772 'Germany' 1 ... 1 0 92888.52]
 [792 'France' 0 ... 1 0 38190.78]]


**Langkah 4 - Encoding Kolom "Geography" dengan One Hot Encoder**

In [7]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

Explanation:
1. from sklearn.compose import ColumnTransformer - This line imports the ColumnTransformer class from scikit-learn. The ColumnTransformer is a useful tool for applying different transformations to different columns of a dataset.
2. from sklearn.preprocessing import OneHotEncoder - This line imports the OneHotEncoder class from scikit-learn. The OneHotEncoder is a transformer used to convert categorical variables into a binary one-hot encoded representation.
3. ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough') - This line creates an instance of ColumnTransformer called ct. The transformers argument specifies the transformations to be applied. In this case, a tuple is provided with the name 'encoder', an instance of OneHotEncoder(), and the index [1], which indicates that the one-hot encoding transformation should be applied to the column at index 1 of the dataset. The remainder='passthrough' argument indicates that any remaining columns in the dataset should be passed through without any transformation.
4. X = np.array(ct.fit_transform(X)) - This line applies the transformation defined by the ColumnTransformer (ct) to the input dataset X. The fit_transform method is used to fit the transformer to the data and apply the transformation. The resulting transformed dataset is assigned back to the variable X, which is then converted to a NumPy array using np.array().

Cek Data (X)

In [8]:
print(X)

[[1.0 0.0 0.0 ... 1 1 101348.88]
 [0.0 0.0 1.0 ... 0 1 112542.58]
 [1.0 0.0 0.0 ... 1 0 113931.57]
 ...
 [1.0 0.0 0.0 ... 0 1 42085.58]
 [0.0 1.0 0.0 ... 1 0 92888.52]
 [1.0 0.0 0.0 ... 1 0 38190.78]]


**Langkah 5 - Split Data**

In [9]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

**Langkah 6 - Scaling Fitur**

In [10]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

### Membuat Model ANN

**Langkah 1 - Inisiasi Model ANN**

In [11]:
ann = tf.keras.models.Sequential()

Explanation:
Initializes a sequential model in Keras using tf.keras.models.Sequential(). The Sequential model is a linear stack of layers, where we can easily add layers one by one. It is a commonly used model in Keras for building neural networks.

**Langkah 2 - Membuat Input Layer dan Hidden Layer Pertama**

In [12]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

Explanation:
Adds a new Dense layer to the existing sequential model (ann), with 6 units and a ReLU activation function. By adding this layer, we are introducing another hidden layer to the neural network. Each Dense layer represents a fully connected layer, where each neuron is connected to every neuron in the previous layer.

**Langkah 3 - Membuat Hidden Layer Kedua**

In [13]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

Explanation: This line code has the same code as the previous code.

**Langkah 4 - Membuat Output Layer**

In [14]:
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

Explanation: Adds a new Dense layer to the existing sequential model (ann), with 1 unit and a sigmoid activation function. By adding this layer, we are introducing the output layer of the neural network. The output layer typically contains a single neuron for binary classification tasks, where the goal is to predict one of two classes. The units parameter is set to 1, indicating that the output layer has one neuron.

### Training Model

**Langkah 1 - Compile Model (Menyatukan Arsitektur) ANN**

In [15]:
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

Explanation:
1. optimizer='adam' - This parameter specifies the optimizer to be used for updating the weights of the neural network during training. In this case, the Adam optimizer is used, which is a popular optimizer that adapts the learning rate based on the gradient of the parameters.
2. loss='binary_crossentropy' - This parameter specifies the loss function to be minimized during training. It is appropriate when dealing with binary classification problems, where the goal is to predict one of two classes.
3. metrics=['accuracy'] - This parameter specifies the evaluation metric to be used during training and evaluation. In this case, the accuracy metric is used, which calculates the proportion of correctly classified samples.

After compiling the model, it is ready to be trained on the training data using the fit method, where we would specify the input data (X_train) and corresponding target labels (y_train).

**Langkah 2 - Fitting Model**

In [16]:
ann.fit(X_train, y_train, batch_size = 32, epochs = 100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.src.callbacks.History at 0x21dff2da6e0>

Explanation:
1. batch_size=32 - This parameter specifies the number of samples used in each batch for weight updates. The training data is divided into smaller batches, and the model is updated after processing each batch.
2. epochs=100 - This parameter specifies the number of times the model will iterate over the entire training dataset during the training process. Each iteration over the entire dataset is called an epoch. By specifying 100 epochs, the model will go through the training data 100 times, updating the weights and improving its performance.

During the training process, the model will learn to make predictions based on the input data and adjust its weights using the optimizer and loss function specified during compilation. The goal is to minimize the loss function and improve the model's accuracy on the training data.

### Membuat Prediksi

**Modelkan Data Baru dan Buat Prediksi**

In [17]:
print(ann.predict(sc.transform([[1, 0, 0, 600, 1, 40, 3, 60000, 2, 1, 1, 50000]])) > 0.5)

[[False]]


Explanation:
1. ann.predict(...) - This part of the code uses the trained model (ann) to make predictions on the transformed input example. The predict method takes the transformed input example as its argument and returns the predicted output.
2. The code transforms the input example using the scaler, passes the transformed example to the trained model for prediction, and prints whether the prediction is greater than 0.5 or not.

Apakah hasilnya **False**?
Benar, hasilnya adalah False.

**Prediksi Dengan Data Testing**

In [18]:
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[[0 0]
 [0 1]
 [0 0]
 ...
 [0 0]
 [0 0]
 [0 0]]


Explanation:
1. print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1)) - This line concatenates the reshaped predicted labels (y_pred) and the reshaped actual labels (y_test) using np.concatenate. The reshape function is used to ensure that the arrays have the same shape before concatenation.
2. This code can help us evaluate the performance of the model by comparing the predicted labels with the ground truth labels on the test data.

**Cek Akurasi dan Confusion Matrix**

In [19]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

[[1521   74]
 [ 197  208]]


0.8645

Explanation:
1. cm = confusion_matrix(y_test, y_pred) - This line computes the confusion matrix by comparing the actual labels (y_test) with the predicted labels (y_pred). The confusion matrix is a table that summarizes the performance of a classification model by counting the number of true positives, true negatives, false positives, and false negatives. The resulting confusion matrix is stored in the variable cm.
2. accuracy_score(y_test, y_pred) - This line computes the accuracy score by comparing the actual labels (y_test) with the predicted labels (y_pred). The accuracy score is a metric that measures the proportion of correctly classified examples out of the total number of examples. The resulting accuracy score is printed using print().
3. The confusion matrix provides a detailed breakdown of the model's predictions, while the accuracy score gives a single metric indicating the overall accuracy of the model.