# Part A
#### Grading Criteria
Use the Keras library to build a neural network with the following:
- One hidden layer of 10 nodes, and a ReLU activation function
- Use the adam optimizer and the mean squared error  as the loss function.
1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.
2. Train the model on the training data using 50 epochs.
3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.
4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.
5. Report the mean and the standard deviation of the mean squared errors.

Submit your Jupyter Notebook with your code and comments.

In [4]:
# perform installs
!pip install numpy==2.0.2
!pip install pandas==2.2.2
!pip install tensorflow_cpu==2.18.0
!pip install scikit-learn



In [21]:
# imports
import pandas as pd
import numpy as np
import keras
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Input

import warnings
warnings.simplefilter('ignore', FutureWarning)

In [6]:
# Get concrete dataset
filepath='https://cocl.us/concrete_data'
concrete_data = pd.read_csv(filepath)

concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [7]:
# check how many data points we have
concrete_data.shape

(1030, 9)

In [8]:
# look at dataset info
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [9]:
# make sure we don't have null values
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

In [11]:
# Split dataset into predictors (all columns except Strength) and target (Strength column)
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [12]:
n_cols = predictors.shape[1] # number of predictors
n_cols

8

In [20]:
# we do not normalize the model here since that is done in part b

In [18]:
# function to build model
def build_model_part_a():
    # create model
    model = Sequential()
    model.add(Input(shape=(n_cols,)))
    # model has hidden layer with 10 nodes and relu act func
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error') # Use adam and mse
    return model

In [39]:
# build model
part_a_model = build_model_part_a()

In [40]:
mse_list = []

In [41]:
def test_pt_a_mse():
    # split into train and test sets and hold 30% of data for testing
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)
    # train model with 50 epochs
    part_a_model.fit(X_train, y_train, epochs=50, verbose=2)
    y_pred = part_a_model.predict(X_test)  #eval model on test data
    mse = mean_squared_error(y_test, y_pred)  # compute mse
    mse_list.append(mse)  #add to list of mses

In [42]:
# repeat 50 times
num_iterations = 50
for i in range(num_iterations):
    test_pt_a_mse()

Epoch 1/50
23/23 - 1s - 30ms/step - loss: 238385.1250
Epoch 2/50
23/23 - 0s - 4ms/step - loss: 141126.1719
Epoch 3/50
23/23 - 0s - 5ms/step - loss: 80110.3750
Epoch 4/50
23/23 - 0s - 4ms/step - loss: 43938.7344
Epoch 5/50
23/23 - 0s - 4ms/step - loss: 23384.7715
Epoch 6/50
23/23 - 0s - 4ms/step - loss: 12470.9941
Epoch 7/50
23/23 - 0s - 4ms/step - loss: 7193.8477
Epoch 8/50
23/23 - 0s - 4ms/step - loss: 4839.8760
Epoch 9/50
23/23 - 0s - 4ms/step - loss: 3885.2795
Epoch 10/50
23/23 - 0s - 4ms/step - loss: 3520.0520
Epoch 11/50
23/23 - 0s - 7ms/step - loss: 3352.7061
Epoch 12/50
23/23 - 0s - 6ms/step - loss: 3226.4573
Epoch 13/50
23/23 - 0s - 6ms/step - loss: 3088.4854
Epoch 14/50
23/23 - 0s - 8ms/step - loss: 2927.1030
Epoch 15/50
23/23 - 0s - 7ms/step - loss: 2726.9016
Epoch 16/50
23/23 - 0s - 6ms/step - loss: 2489.4431
Epoch 17/50
23/23 - 0s - 6ms/step - loss: 2250.0728
Epoch 18/50
23/23 - 0s - 5ms/step - loss: 2021.8922
Epoch 19/50
23/23 - 0s - 4ms/step - loss: 1871.6324
Epoch 20/50


In [43]:
# Report the mean and the standard deviation of the mean squared errors.
average_mse = np.mean(mse_list)
print(f"Average Mean Squared Error over {num_iterations} iterations: {average_mse}")
std_mse = np.std(mse_list)
print(f"Mean Squared Error Standard Deviation over {num_iterations} iterations: {std_mse}")

Average Mean Squared Error over 50 iterations: 66.2795128282057
Mean Squared Error Standard Deviation over 50 iterations: 60.89369286469975
