The below code performs machine learning tasks for customer churn prediction using a classification Artificial Neural Network (ANN) model built with TensorFlow:

**1. Importing Libraries:**

* numpy as np: Provides numerical operations.
* pandas as pd: Used for data manipulation in DataFrames.
* tensorflow as tf: Provides the TensorFlow library for building and training the ANN model.
* sklearn libraries:
  * LabelEncoder: Encodes categorical data (country) into numerical labels.
  * ColumnTransformer: Applies transformers to specific columns in the data.
  * OneHotEncoder: Encodes categorical data (gender) into one-hot vectors.
  * StandardScaler: Standardizes features by removing the mean and scaling to unit variance.
  * train_test_split: Splits data into training and testing sets.
  * confusion_matrix: Computes the confusion matrix for evaluating model performance.
  * accuracy_score: Calculates the accuracy of the model's predictions.

**2. Data Loading and Preprocessing:**

* dataset = pd.read_csv('Churn_Modelling.csv'): Reads the CSV data into a Pandas DataFrame.
* Separates features (X) and target variable (y).
* LabelEncoder encodes the country column (X[:, 2]).
* ColumnTransformer with OneHotEncoder encodes the gender column (X[:, 1]) into one-hot vectors.
* StandardScaler standardizes all features (X).

**3. Train-Test Split:**

* train_test_split splits the data into training and testing sets (80% training, 20% testing) using a random seed for reproducibility.

**4. Building the ANN Model:**

* ann = tf.keras.models.Sequential(): Creates a sequential ANN model.
* Two hidden layers with 6 neurons each and ReLU activation are added.
* An output layer with 1 neuron and sigmoid activation for binary classification (churn or not churn).
* The model is compiled using the Adam optimizer, binary cross-entropy loss function, and accuracy metric.

**5. Model Training:**

* ann.fit(X_train, y_train, batch_size=32, epochs=50): Trains the model on the training data for 50 epochs with a batch size of 32.

**6. Prediction and Evaluation:**

* y_pred = ann.predict(X_test): Generates predictions on the testing data.
* Applies a threshold of 0.5 to convert the predictions to binary class labels (0 or 1).
* Prints the predicted vs. actual labels for the test data.
* cm = confusion_matrix(y_test, y_pred): Calculates the confusion matrix to evaluate model performance on the testing data.
* ac = accuracy_score(y_test, y_pred): Calculates the accuracy score of the model.
* Prints the model summary, showing the layers and number of parameters.

* Overall, this code demonstrates a well-structured approach to customer churn prediction using a TensorFlow ANN model. The code includes data preprocessing, train-test split, model building, training, prediction, and evaluation.

* Here are some additional points to consider:
  * We can experiment with different hyperparameters (number of layers, neurons, epochs) to potentially improve model performance.
  * We can explore other evaluation metrics like precision, recall, and F1-score to get a more comprehensive understanding of the model's performance.
  * We cam consider techniques like grid search or random search to find the optimal hyperparameters for your model.

## Import Libraries

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf

In [57]:
from sklearn.preprocessing import LabelEncoder
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

In [2]:
tf.__version__

'2.18.0'

## Import Dataset

In [3]:
dataset = pd.read_csv('Churn_Modelling.csv')

In [4]:
dataset.shape

(10000, 14)

In [6]:
dataset.head().T

Unnamed: 0,0,1,2,3,4
RowNumber,1,2,3,4,5
CustomerId,15634602,15647311,15619304,15701354,15737888
Surname,Hargrave,Hill,Onio,Boni,Mitchell
CreditScore,619,608,502,699,850
Geography,France,Spain,France,France,Spain
Gender,Female,Female,Female,Female,Female
Age,42,41,42,39,43
Tenure,2,1,8,1,2
Balance,0.0,83807.86,159660.8,0.0,125510.82
NumOfProducts,1,1,3,2,1


## Creating independent and dependent variables dataset (X, y)

In [None]:
# X = ['CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary']
# y = ['Exited']
# ['RowNumber', 'CustomerId', 'Surname']
# .iloc[:, 3:-1] - [:, (all rows)] [3:-1, (from col index 3 to second last column)]
# .iloc[:, -1] - All rows and last column

In [7]:
X = dataset.iloc[:, 3:-1].values
y = dataset.iloc[:, -1].values

In [8]:
X.shape

(10000, 10)

In [9]:
y.shape

(10000,)

In [11]:
X[0:5]

array([[619, 'France', 'Female', 42, 2, 0.0, 1, 1, 1, 101348.88],
       [608, 'Spain', 'Female', 41, 1, 83807.86, 1, 0, 1, 112542.58],
       [502, 'France', 'Female', 42, 8, 159660.8, 3, 1, 0, 113931.57],
       [699, 'France', 'Female', 39, 1, 0.0, 2, 0, 0, 93826.63],
       [850, 'Spain', 'Female', 43, 2, 125510.82, 1, 1, 1, 79084.1]],
      dtype=object)

In [15]:
y[0:5]

array([1, 0, 1, 0, 0], dtype=int64)

## Label Encoding the `Gender` column (X)

In [17]:
le = LabelEncoder()
X[:, 2] = le.fit_transform(X[:, 2])

In [19]:
X[0:5]

array([[619, 'France', 0, 42, 2, 0.0, 1, 1, 1, 101348.88],
       [608, 'Spain', 0, 41, 1, 83807.86, 1, 0, 1, 112542.58],
       [502, 'France', 0, 42, 8, 159660.8, 3, 1, 0, 113931.57],
       [699, 'France', 0, 39, 1, 0.0, 2, 0, 0, 93826.63],
       [850, 'Spain', 0, 43, 2, 125510.82, 1, 1, 1, 79084.1]],
      dtype=object)

## ColumnTransformer & OneHotEncoder the `Geography` column (X)

In [20]:
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

In [21]:
X[0:5]

array([[1.0, 0.0, 0.0, 619, 0, 42, 2, 0.0, 1, 1, 1, 101348.88],
       [0.0, 0.0, 1.0, 608, 0, 41, 1, 83807.86, 1, 0, 1, 112542.58],
       [1.0, 0.0, 0.0, 502, 0, 42, 8, 159660.8, 3, 1, 0, 113931.57],
       [1.0, 0.0, 0.0, 699, 0, 39, 1, 0.0, 2, 0, 0, 93826.63],
       [0.0, 0.0, 1.0, 850, 0, 43, 2, 125510.82, 1, 1, 1, 79084.1]],
      dtype=object)

## Feature Scaling (X)

In [22]:
sc = StandardScaler()
X = sc.fit_transform(X)

In [25]:
print(X[0:2])

[[ 0.99720391 -0.57873591 -0.57380915 -0.32622142 -1.09598752  0.29351742
  -1.04175968 -1.22584767 -0.91158349  0.64609167  0.97024255  0.02188649]
 [-1.00280393 -0.57873591  1.74273971 -0.44003595 -1.09598752  0.19816383
  -1.38753759  0.11735002 -0.91158349 -1.54776799  0.97024255  0.21653375]]


## Splitting the `X, y` into the Train set and Test set

In [26]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [28]:
print( X_train.shape )
print( X_test.shape )
print( y_train.shape )
print( y_test.shape )

(8000, 12)
(2000, 12)
(8000,)
(2000,)


In [30]:
print( X_train[0:2] )

[[-1.00280393 -0.57873591  1.74273971  0.17042381 -1.09598752 -0.4693113
  -0.00442596 -1.22584767  0.80773656  0.64609167 -1.03067011  1.10838187]
 [-1.00280393  1.72790383 -0.57380915 -2.31280236  0.91241915  0.29351742
  -1.38753759 -0.01289171 -0.91158349  0.64609167  0.97024255 -0.74759209]]


In [31]:
print( X_test[0:2] )

[[-1.00280393  1.72790383 -0.57380915 -0.55385049 -1.09598752 -0.37395771
   1.03290776  0.87532296 -0.91158349  0.64609167  0.97024255  1.61304597]
 [ 0.99720391 -0.57873591 -0.57380915 -1.31951189 -1.09598752  0.10281024
  -1.04175968  0.42442221 -0.91158349  0.64609167 -1.03067011  0.49753166]]


In [32]:
print( y_train[0:2] )

[0 0]


In [33]:
print( y_test[0:2] )

[0 1]


## Building the ANN Model

**_Initializing the ANN_**

In [39]:
ann = tf.keras.models.Sequential()
ann

<Sequential name=sequential_1, built=False>

**_Adding the input layer and the first hidden layer_**

In [40]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

**_Adding the second hidden layer_**

In [41]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

**_Adding the output layer_**

In [42]:
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

**_Summary of ANN_**

In [44]:
print( ann.summary() )

None


In [45]:
print(len(ann.layers))

3


## Training the ANN Model

**_Compiling the ANN_**

In [46]:
ann.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

**_Training the ANN_**

In [48]:
ann.fit(X_train, y_train, batch_size=32, epochs=50)

Epoch 1/50
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.5466 - loss: 0.6770
Epoch 2/50
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8009 - loss: 0.4580
Epoch 3/50
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8003 - loss: 0.4391
Epoch 4/50
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8106 - loss: 0.4236
Epoch 5/50
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8210 - loss: 0.4119
Epoch 6/50
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8290 - loss: 0.4022
Epoch 7/50
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8311 - loss: 0.3861
Epoch 8/50
[1m250/250[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8311 - loss: 0.3946
Epoch 9/50
[1m250/250[0m [32m━━━━━━━━

<keras.src.callbacks.history.History at 0x26a8e8d2120>

## Predicting the ANN Model

In [49]:
y_pred = ann.predict(X_test)
y_pred.shape

[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


(2000, 1)

In [51]:
y_pred[0:5]

array([[0.38986126],
       [0.3220135 ],
       [0.10545631],
       [0.05123943],
       [0.13622636]], dtype=float32)

In [52]:
y_pred = (y_pred > 0.5)
y_pred.shape

(2000, 1)

In [53]:
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[[0 0]
 [0 1]
 [0 0]
 ...
 [0 0]
 [0 0]
 [0 0]]


## Confusion Matrix

In [56]:
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[1519   76]
 [ 200  205]]


## Accuracy Score

In [58]:
ac = accuracy_score(y_test, y_pred)
print(ac)

0.862


## ANN Summary

In [59]:
print( ann.summary() )

None
