# Neural Networks Implementation

Since the data shows that certain features have low correlation, we can argue that using non-linear approximation functions and dense layers would allow us to explore various feature engineering possibilities. This would help us improve the accuracy of our predictions, rather than using linear models.

The neural network would work with a multi-layer perceptron network consisting of 3 layers: the input layer with 64 nodes, the hidden layer with 32 nodes, and the output layer with one node. The activation function used for the hidden layers would be ReLU, as it helps analyze non-linear feature trends. The output layer, on the other hand, would use a linear activation function for regression results.

## Library Imports

For this notebook, I would be using Keras-based Sequential Neural Network along with sklearn-based accuracy metrics to measure the accuracy of my model on the chosen metrics.

In [22]:
# Visualisation and Manipulation imports
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

#Performace Metric imports
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, median_absolute_error
from sklearn.preprocessing import StandardScaler

# Model imports
from tensorflow import keras
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense

import warnings

warnings.filterwarnings('ignore')

### Model using pre-processed data

I would conduct basic pre-processing and then construct the neural network. The pre-processing step involves encoding categorical variables to train the model. This would help set a baseline to understand the improvements made using feature engineering. 

In [25]:
# Reading the data and pre-processing the categorical variables

data_raw = pd.read_csv("Preprocessed_Data.csv")
data_raw.dropna(inplace=True)
X = data_raw.drop('Target_Comment_Volume', axis=1)
y = data_raw['Target_Comment_Volume']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [26]:
# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [35]:
# Defining the NN model
model = keras.Sequential([
    Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    Dense(32, activation='relu'),
    Dense(1, activation='linear')   # single output for regression
])

# Training the model
model.compile(
    optimizer='adam',
    loss='mse',
    metrics=['mae']
)

model.fit(X_train_scaled, y_train, epochs=10, batch_size=32, verbose=0)

# Evaluate the model
loss = model.evaluate(X_test_scaled, y_test)
print(f"Test Loss: {loss}")

# Make predictions
y_pred = model.predict(X_test_scaled)

# Evaluate the model
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
meae = median_absolute_error(y_test, y_pred)

print("Test set performance:")
print(f" MAE : {mae:.4f}")
print(f" MSE : {mse:.4f}")
print(f" RMSE: {rmse:.4f}")
print(f" Median Absolute Error: {meae:.4f}")

[1m256/256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 390us/step - loss: 410.7568 - mae: 4.7961
Test Loss: [413.9024658203125, 4.844026565551758]
[1m256/256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 270us/step
Test set performance:
 MAE : 4.8440
 MSE : 413.9024
 RMSE: 20.3446
 Median Absolute Error: 1.0906


### Model using feature engineering

When we provide the neural network more data and extra features, I expect the model to perform better since the model would have more features to form the Dense hidden layers.

In [31]:
# Reading the feature-engineered data
train_data = pd.read_csv("train_df.csv")
test_data = pd.read_csv("test_df.csv")
X_train = train_data.drop('Target_Comment_Volume', axis=1)
y_train = train_data['Target_Comment_Volume']
X_test = test_data.drop('Target_Comment_Volume', axis=1)
y_test = test_data['Target_Comment_Volume']

FileNotFoundError: [Errno 2] No such file or directory: 'train_df.csv'

In [16]:
# 2. Standardize features
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.fit_transform(X_test)

# Defining the NN model
model = keras.Sequential([
    Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),
    Dense(32, activation='relu'),
    Dense(1, activation='linear')
])

# Training the model
model.compile(
    optimizer='adam',
    loss='mse',
    metrics=['mae']
)

model.fit(X_train_scaled, y_train, epochs=20, batch_size=32, verbose=0)

# Evaluate the model
loss = model.evaluate(X_test_scaled, y_test)
print(f"Test Loss: {loss}")

# Make predictions
y_pred = model.predict(X_test_scaled)

# Evaluate the model
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print("Test set performance:")
print(f" MAE : {mae:.4f}")
print(f" MSE : {mse:.4f}")
print(f" RMSE: {rmse:.4f}")
print(f" R²  : {r2:.4f}")

[1m256/256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 376.9333 - mae: 5.0703  
Test Loss: [405.73968505859375, 5.1293559074401855]
[1m256/256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step  
Test set performance:
 MAE : 5.1294
 MSE : 405.7396
 RMSE: 20.1430
 R²  : 0.6206


## Conclusion

As expected, our model performed the linear models, which shows that the most accurate model would use the non-linearity of the data to improve accuracy. The MSE of our model has improved significantly after feature engineering the dataset using the manipulations explained in the Feature Engineering Notebook. Since we have more features to work with, the neural network would be able to fit the dataset better. 

However, a limitation of the neural network is that it tends to overfit the data unless larger datasets are provided. This would mean that the accuracy of our model can't be improved unless we get more data or features.