# Neural Network

In this notebook, a neural network is trained. Neural networks are powerful models capable of capturing complex, non-linear relationships in the data. This model consists of two hidden layers with ReLU activation, dropout for regularization, and a sigmoid output layer for binary classification.

In [1]:
import pandas as pd
import numpy as np
import random as python_random
import tensorflow as tf
from tensorflow.keras import layers, callbacks
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler, OneHotEncoder
from sklearn.metrics import classification_report

In [2]:
df = pd.read_excel('combined_data_binary.xlsx', index_col=0)
df.head()

Unnamed: 0,age,gender,household_size,occupation_status,income,house_type,house_age,house_size,location,energy_bill,...,knowledge_energy,energy_awareness,attitude_energy_reduction,investment_willingness,belief_climate_change,financial_awareness,perceived_efficiency,environment_concern,previous_renovations,booked_energy_consultation
0,26,Male,1,Unemployed,20108,Multi-family House,2020,120,Rural,103,...,4,4,2,1,Yes,No,1,2,10,False
1,28,Female,3,Employed,53000,Detached,2020,400,Urban,170,...,2,4,5,3,Yes,No,2,5,1,False
2,52,Male,2,Employed,86352,Detached,1953,253,Urban,165,...,2,2,3,4,Yes,No,2,1,7,True
3,17,Other,1,Employed,27633,Detached,2018,108,Urban,102,...,4,4,2,1,Yes,No,4,2,7,False
4,20,Male,1,Employed,25011,Detached,2020,110,Urban,106,...,4,4,2,1,Yes,No,1,2,9,False


In [3]:
def reset_random_seeds():
    tf.random.set_seed(42)
    np.random.seed(42)
    python_random.seed(42)
reset_random_seeds()

In [3]:
# Separate features and target variable
X = df.drop('booked_energy_consultation', axis=1)
y = df['booked_energy_consultation']

# Identify numerical and categorical columns
numerical_cols = X.select_dtypes(include=['int64', 'float64']).columns
categorical_cols = X.select_dtypes(include=['object', 'category']).columns

# One-hot encode the categorical variables
encoder = OneHotEncoder(sparse_output=False)
categorical_encoded = encoder.fit_transform(X[categorical_cols])
categorical_encoded_df = pd.DataFrame(categorical_encoded, columns=encoder.get_feature_names_out(categorical_cols))

X = pd.concat([X[numerical_cols].reset_index(drop=True), categorical_encoded_df.reset_index(drop=True)], axis=1)

### Splitting the data into test and training set, training set 70%, test set 30%

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Scaling the features

In [5]:
# Scale the features
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

## Building the Neural Network

In [7]:
model = tf.keras.Sequential([
    tf.keras.Input(shape=(X_train.shape[1],)), 
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(64, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

### Compiling the model

In [8]:
model.compile(
    optimizer='adam',
    loss='binary_crossentropy',  # Change loss if it's a multi-class classification
    metrics=['accuracy', tf.keras.metrics.Precision(), tf.keras.metrics.Recall()]
)

### Training the NN

In [9]:
# Train the model
history = model.fit(X_train, y_train, epochs=150, validation_split=0.2, verbose=1)

Epoch 1/150
[1m132/132[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.5281 - loss: 726.2455 - precision: 0.3504 - recall: 0.5151 - val_accuracy: 0.7733 - val_loss: 17.5879 - val_precision: 0.9380 - val_recall: 0.3447
Epoch 2/150
[1m132/132[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 420us/step - accuracy: 0.7431 - loss: 85.0393 - precision: 0.6013 - recall: 0.6922 - val_accuracy: 0.9038 - val_loss: 15.2330 - val_precision: 0.8655 - val_recall: 0.8433
Epoch 3/150
[1m132/132[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 465us/step - accuracy: 0.8323 - loss: 41.2269 - precision: 0.7410 - recall: 0.7710 - val_accuracy: 0.9010 - val_loss: 16.6510 - val_precision: 0.8365 - val_recall: 0.8746
Epoch 4/150
[1m132/132[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 456us/step - accuracy: 0.8496 - loss: 31.0726 - precision: 0.7712 - recall: 0.7908 - val_accuracy: 0.8886 - val_loss: 37.1965 - val_precision: 0.7799 - val_recall: 0.9288
Epoch

In [10]:
loss, accuracy, precision, recall = model.evaluate(X_test, y_test)
print(f"Accuracy: {accuracy}, Precision: {precision}, Recall: {recall}")

[1m71/71[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 304us/step - accuracy: 0.9212 - loss: 0.2163 - precision: 0.9322 - recall: 0.8373
Accuracy: 0.9173333048820496, Precision: 0.9238505959510803, Recall: 0.8286082744598389


### Get Classification Report

In [11]:
y_pred_prob = model.predict(X_test)
y_pred_classes = (y_pred_prob > 0.5).astype("int32")

[1m71/71[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 377us/step


In [12]:
report = classification_report(y_test, y_pred_classes, target_names=['False', 'True'])  # Adjust target names based on your classes
print("Classification Report Neural Network:")
print(report)

Classification Report Neural Network:
              precision    recall  f1-score   support

       False       0.91      0.96      0.94      1474
        True       0.92      0.83      0.87       776

    accuracy                           0.92      2250
   macro avg       0.92      0.90      0.91      2250
weighted avg       0.92      0.92      0.92      2250



The neural network achieved strong overall performance, with an accuracy of 92%. It performed well, with a precision of 92%  for the positive class ("True"). While the recall for the positive class is slightly lower (83%), the model still demonstrates a strong ability to identify interested homeowners while minimizing false positives. Compared to the other models, the neural network offers high predictive power but at the cost of reduced interpretability.