### Common Activation Functions:
1. **ReLU (Rectified Linear Unit)**
   - Output: `f(x) = max(0, x)`
   - Commonly used in hidden layers.

2. **Leaky ReLU**
   - Output: `f(x) = x` if `x > 0`, otherwise `f(x) = 0.01x`

3. **Parametric ReLU (PReLU)**
   - A variant of Leaky ReLU where the slope is learned during training.

4. **Sigmoid**
   - Output: `f(x) = 1 / (1 + e^(-x))`
   - Maps input to a range between 0 and 1.

5. **Tanh (Hyperbolic Tangent)**
   - Output: `f(x) = tanh(x)`
   - Maps input to a range between -1 and 1.

6. **Softmax**
   - Used for multi-class classification.
   - Outputs a probability distribution across classes.

7. **Swish**
   - Output: `f(x) = x * sigmoid(x)`
   - A smooth non-linear function.

8. **ELU (Exponential Linear Unit)**
   - Output: `f(x) = x` if `x > 0`, otherwise `f(x) = α(e^x - 1)`

9. **SELU (Scaled Exponential Linear Unit)**
   - A self-normalizing activation function.

10. **Hard Sigmoid**
    - A computationally efficient approximation of Sigmoid.

### Common Optimizers:
1. **SGD (Stochastic Gradient Descent)**
   - Updates weights by calculating the gradient of the loss function.

2. **Momentum**
   - Adds a fraction of the previous update to the current weight update.

3. **Nesterov Accelerated Gradient (NAG)**
   - A variant of Momentum with an additional lookahead step.

4. **Adam (Adaptive Moment Estimation)**
   - Combines the advantages of Momentum and RMSprop.

5. **RMSprop (Root Mean Square Propagation)**
   - Adapts learning rate based on the average of recent magnitudes of gradients.

6. **Adagrad (Adaptive Gradient Algorithm)**
   - Adapts the learning rate for each parameter individually.

7. **Adadelta**
   - A refinement of Adagrad that reduces aggressive decay in learning rates.

8. **Adamax**
   - A variant of Adam using the infinity norm.

9. **Nadam (Nesterov-accelerated Adam)**
   - Combines Adam with Nesterov momentum.

10. **FTRL (Follow The Regularized Leader)**
    - Used mainly for large-scale linear models.

11. **L-BFGS (Limited-memory Broyden–Fletcher–Goldfarb–Shanno)**
    - A second-order optimizer commonly used in smaller datasets.

In [1]:
import pandas as pd
import os

path = r'D:\Projects\Software_Engineering\Artificial_Intelligence\Huawei_Internship_16_Sep/Datasets'

In [2]:
df = pd.read_csv(os.path.join(path, 'Churn_Modelling.csv'))
df

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,15606229,Obijiaku,771,France,Male,39,5,0.00,2,1,0,96270.64,0
9996,9997,15569892,Johnstone,516,France,Male,35,10,57369.61,1,1,1,101699.77,0
9997,9998,15584532,Liu,709,France,Female,36,7,0.00,1,0,1,42085.58,1
9998,9999,15682355,Sabbatini,772,Germany,Male,42,3,75075.31,2,1,0,92888.52,1


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB


In [4]:
from sklearn.preprocessing import LabelEncoder

for col in df.select_dtypes(exclude=['number']):
    df[col] = LabelEncoder().fit_transform(df[col])

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  int32  
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  int32  
 5   Gender           10000 non-null  int32  
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int32(3), int64(9)
memory usage: 976.7 KB


In [5]:
non_feat_col = []

for col in df:
    if df[col].unique == 1:
        non_feat_col.append(col)

print(len(non_feat_col))

0


In [6]:
len(df['Exited'].unique())

2

In [7]:
removed_col = ['RowNumber', 'CustomerId', 'Surname']

df = df.drop(labels=removed_col, axis=1, inplace=False)

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   CreditScore      10000 non-null  int64  
 1   Geography        10000 non-null  int32  
 2   Gender           10000 non-null  int32  
 3   Age              10000 non-null  int64  
 4   Tenure           10000 non-null  int64  
 5   Balance          10000 non-null  float64
 6   NumOfProducts    10000 non-null  int64  
 7   HasCrCard        10000 non-null  int64  
 8   IsActiveMember   10000 non-null  int64  
 9   EstimatedSalary  10000 non-null  float64
 10  Exited           10000 non-null  int64  
dtypes: float64(2), int32(2), int64(7)
memory usage: 781.4 KB


In [8]:
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

print(f'X\n{X}')
print(f'y: {y}')

X
[[6.1900000e+02 0.0000000e+00 0.0000000e+00 ... 1.0000000e+00
  1.0000000e+00 1.0134888e+05]
 [6.0800000e+02 2.0000000e+00 0.0000000e+00 ... 0.0000000e+00
  1.0000000e+00 1.1254258e+05]
 [5.0200000e+02 0.0000000e+00 0.0000000e+00 ... 1.0000000e+00
  0.0000000e+00 1.1393157e+05]
 ...
 [7.0900000e+02 0.0000000e+00 0.0000000e+00 ... 0.0000000e+00
  1.0000000e+00 4.2085580e+04]
 [7.7200000e+02 1.0000000e+00 1.0000000e+00 ... 1.0000000e+00
  0.0000000e+00 9.2888520e+04]
 [7.9200000e+02 0.0000000e+00 0.0000000e+00 ... 1.0000000e+00
  0.0000000e+00 3.8190780e+04]]
y: [1 0 1 ... 1 1 0]


In [9]:
from sklearn.preprocessing import MinMaxScaler
X = MinMaxScaler().fit_transform(X)

In [10]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# ANN: Artificial neural network

In [11]:
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.callbacks import EarlyStopping
from keras.optimizers import Adam

model = Sequential()
call_back = EarlyStopping(monitor='val_loss', patience=3)

# input layer
model.add(Dense(128, activation='tanh'))

# hidden layers
model.add(Dense(64, activation='tanh'))
model.add(Dropout(0.25))
model.add(Dense(32, activation='tanh'))
model.add(Dropout(0.25))
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(8, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(4, activation='tanh'))

# output layer
model.add(Dense(1, activation='sigmoid'))

#change learning_rate
opt = Adam(learning_rate=0.1)

In [12]:
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy'] 
)

model.summary()

In [13]:
model.fit(
    X_train,
    y_train,
    epochs=500,
    validation_data=(X_test, y_test),
    batch_size=100,
    callbacks=call_back
)

Epoch 1/500
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.6041 - loss: 0.6835 - val_accuracy: 0.7975 - val_loss: 0.5341
Epoch 2/500
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7889 - loss: 0.5486 - val_accuracy: 0.7975 - val_loss: 0.4996
Epoch 3/500
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7999 - loss: 0.5117 - val_accuracy: 0.8070 - val_loss: 0.4603
Epoch 4/500
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8150 - loss: 0.4727 - val_accuracy: 0.8220 - val_loss: 0.4446
Epoch 5/500
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.8139 - loss: 0.4598 - val_accuracy: 0.8190 - val_loss: 0.4342
Epoch 6/500
[1m80/80[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.8169 - loss: 0.4546 - val_accuracy: 0.8325 - val_loss: 0.4138
Epoch 7/500
[1m80/80[0m [32m━━━

<keras.src.callbacks.history.History at 0x1336c85e180>

In [14]:
X_train.shape

(8000, 10)

In [15]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Conv1D, MaxPool1D, Flatten
from keras.callbacks import EarlyStopping
from keras.optimizers import Adam

model = Sequential()
call_back = EarlyStopping(monitor='val_loss', patience=3)

# input layer
model.add(Conv1D(
    filters=64, # num of detectors from each pic
    kernel_size=3, # detector size from each pic
    strides=1, # step of conv in matrix
    padding='valid', # valid -> no more cols add when need, same -> add cols equal zeros when needed
    activation='relu', # must be relu for the first layer to remove zeros (black in pic)
    input_shape=(10, 1)
    )
)

# full Dense layers
model.add(Conv1D(
    filters=64, # num of detectors from each pic
    kernel_size=3, # detector size from each pic
    strides=1, # step of conv in matrix
    padding='valid', # valid -> no more cols add when need, same -> add cols equal zeros when needed
    activation='relu' # must be relu for the first layer to remove zeros (black in pic)
    )
)

model.add(MaxPool1D( # take the max num in the detector
    pool_size=2,
    strides=1, # step of conv in matrix
    padding='valid' # valid -> no more cols add when need, same -> add cols equal zeros when needed
    )
)

model.add(Flatten()) # last step in conv must be Flatten

model.add(Dense(16, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(4, activation='relu'))

# output layer
model.add(Dense(1, activation='sigmoid'))

#change learning_rate
opt = Adam(learning_rate=0.1)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [16]:
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy'] 
)

model.summary()

In [17]:
model.fit(
    X_train,
    y_train,
    epochs=10,
    validation_data=(X_test, y_test),
    batch_size=80,
    callbacks=call_back
)

Epoch 1/10
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - accuracy: 0.6837 - loss: 0.6872 - val_accuracy: 0.7975 - val_loss: 0.5429
Epoch 2/10
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8073 - loss: 0.5041 - val_accuracy: 0.7975 - val_loss: 0.5063
Epoch 3/10
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7941 - loss: 0.5058 - val_accuracy: 0.7975 - val_loss: 0.4950
Epoch 4/10
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7978 - loss: 0.4863 - val_accuracy: 0.7975 - val_loss: 0.4694
Epoch 5/10
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8009 - loss: 0.4586 - val_accuracy: 0.8010 - val_loss: 0.4490
Epoch 6/10
[1m100/100[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8043 - loss: 0.4470 - val_accuracy: 0.8295 - val_loss: 0.4115
Epoch 7/10
[1m100/100[0m 

<keras.src.callbacks.history.History at 0x1336f5899a0>