# Cement Regression Model

## Part A: Building a Baseline Model

Lets download all the dependencies we need

In [1]:
!pip install numpy
!pip install pandas
!pip install tensorflow_cpu==2.15.0



Now lets import pandas, numpy, and keras

In [2]:
import pandas as pd
import numpy as np
import keras




Now lets download the data and read it into a dataframe

In [None]:
file = 'https://cocl.us/concrete_data'
concrete_data = pd.read_csv(file)

<class 'pandas.core.frame.DataFrame'>


Now we should separate the data into predictors and the target. The strength column is the target. All the other columns are the predictors. Here we create a variable that specifically returns the column names from the dataframe so we can easily perform operations that assign predictors and target

In [4]:
concrete_data_colms = concrete_data.columns

predictors = concrete_data[concrete_data_colms[concrete_data_colms != 'Strength']]
target = concrete_data['Strength']

Now lets import the rest of the packages we need and actually build the model.

In [5]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Input

In [None]:
def concrete_regression_model():

    model = Sequential()

    model.add(Input(shape=(predictors.shape[1],)))     #expect batches of 8 dimensional vectors as input
    model.add(Dense(10, activation='relu'))     #one hidden layer, 10 nodes, ReLU activation function
    model.add(Dense(1))     #output layer

    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

Lets call the function and test/train it but hold 30% of the data for testing.

In [None]:
model = concrete_regression_model()

model.fit(predictors, target, validation_split=0.3, epochs=50)     #used 50 epochs for training, validation_split=0.3 means 30% of data is set aside for testing

We're going to import the mean_squared_error function from sklearn.metrics.

In [6]:
from sklearn.metrics import mean_squared_error

We are going to train 50 times and compose a list of the resulting mean squared error that comes from every run.

In [None]:
mse_list = []

for i in range(50):
    model.fit(predictors, target, validation_split=0.3, epochs=50)
    predictions = model.predict(predictors)     #we get the predictions using the predict() function
    mse = mean_squared_error(target, predictions)     #compare target values to predicted values
    mse_list.append(mse)     #add the resulting mean squared error to a list




In [None]:
print(mse_list)

print(len(mse_list))


Now lets calculate the mean and standard deviation of the mean squared errors in the list.

In [None]:
mean_of_mse_list = np.mean(mse_list)
std_of_mse_list = np.std(mse_list)

print(mean_of_mse_list)
print(std_of_mse_list)

# In my runs I got mean = 65.485 and standard deviation = 14.172


## Part B: Normalize the Data

Lets normalize the predictors and feed it into the model

In [7]:
predictors_normalized = (predictors - predictors.mean()) / predictors.std()

In [None]:
# We run this 50 times once again but this time with normalized data as opposed to raw data.

mse_list_2 = []

for i in range(50):
    model.fit(predictors_normalized, target, validation_split=0.3, epochs=50)
    predictions = model.predict(predictors_normalized) 
    mse = mean_squared_error(target, predictions)     
    mse_list_2.append(mse) 

In [None]:
print(mse_list_2)
print(len(mse_list_2))     

In [None]:
mean_of_mse_list_2 = np.mean(mse_list_2)
std_of_mse_list_2 = np.std(mse_list_2)

print(mean_of_mse_list_2)
print(std_of_mse_list_2)

# In my run I got mean = 56.799 and standard deviation = 36.299

I notice that the mean of the mean squared errors is smaller than that in Part A. Part A's mean was 65.485 and Part B's mean was 36.299. This indicates the mean squared error was smaller on average after I normalized the predictors data. 

# Part C: Increase the Number of Epochs

Now let's do 100 epochs and compare results

In [None]:
mse_list_3 = []

for i in range(50):
    model.fit(predictors_normalized, target, validation_split=0.3, epochs=100)
    predictions = model.predict(predictors_normalized) 
    mse = mean_squared_error(target, predictions)     
    mse_list_3.append(mse)

In [None]:
print(mse_list_3)
print(len(mse_list_3))    

In [None]:
mean_of_mse_list_3 = np.mean(mse_list_3)
std_of_mse_list_3 = np.std(mse_list_3)

print(mean_of_mse_list_3)
print(std_of_mse_list_3)

# In my run I got mean = 67.618 and standard deviation = 27.701

I notice that the mean compared to Part B is actually a little bit higher. In Part B I got mean = 56.799 and in Part C I got mean = 67.618

# Part D: Increase the Number of Hidden Layers

Now let's repeat Part B, but with more hidden layers as opposed to more epochs

First lets make a new model function that has 3 hidden layers as opposed to 1.

In [8]:
def concrete_regression_model_2():

    model = Sequential()

    model.add(Input(shape=(predictors.shape[1],)))     #expect batches of 8 dimensional vectors as input
    model.add(Dense(10, activation='relu'))    
    model.add(Dense(10, activation='relu'))    
    model.add(Dense(10, activation='relu'))     #3 hidden layers, 10 nodes each, ReLU activation function
    model.add(Dense(1))     #output layer

    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

Let's initialize this function in a model_2 variable then we will train 50 times and compose a list of the resulting mean squared errors that come from every run in the loop.

In [9]:
model_2 = concrete_regression_model_2()

mse_list_4 = []

for i in range(50):
    model_2.fit(predictors_normalized, target, validation_split=0.3, epochs=50)
    predictions = model_2.predict(predictors_normalized) 
    mse = mean_squared_error(target, predictions)     
    mse_list_4.append(mse)




Epoch 1/50

Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 3

In [10]:
print(mse_list_4)
print(len(mse_list_4))  

[139.5225419360898, 104.73980415701607, 65.68252463465153, 74.03222456412945, 75.52530705230005, 78.77920644557274, 78.04983005633993, 79.62743640412042, 78.83659440110668, 76.80343176457902, 74.12442060430133, 82.26031659768434, 82.63395126064633, 73.7804578830254, 76.5578205039504, 78.99802067070661, 80.13034186242962, 76.8946685900263, 79.29481074871215, 76.72002375138477, 79.27798342313969, 72.64447683642342, 74.50853340802958, 75.02339069306754, 74.54131752022599, 70.91308294523554, 70.85830689713839, 72.64306073798892, 69.67861217812194, 73.55634391672652, 74.43311171065747, 67.65626533788495, 69.91828458361731, 67.14262918203747, 64.00375166618758, 68.47547356707983, 67.54097637106449, 67.85681138958397, 67.38645253221665, 66.74709113352196, 64.29057288638833, 64.82778339319019, 75.55050885357232, 68.545175826368, 67.3618748892834, 68.68585490016706, 66.22023145379107, 67.07522120141338, 64.89588955186139, 69.05151981570015]
50


In [11]:
mean_of_mse_list_4 = np.mean(mse_list_4)
std_of_mse_list_4 = np.std(mse_list_4)

print(mean_of_mse_list_4)
print(std_of_mse_list_4)

# In my run I got mean = 56.693 and standard deviation = 29.424

74.48608645380914
11.520621306973979


I got smaller mean in Part D of 56.693 compared to 56.799 in Part B. It is worth noting however that my standard deviation of 29.424 in Part D is smaller than my standard deviation of 36.299 in Part B.

In [15]:
my_mix = {
    'Cement': 198.6,
    'Blast Furnace Slag': 132.4,
    'Fly Ash': 0,
    'Water': 192.0,
    'Superplasticizer': 0,
    'Coarse Aggregate': 978.4,
    'Fine Aggregate': 825.5,
    'Age': 360
}

# Create DataFrame and normalize it using the same parameters as training data
my_mix_df = pd.DataFrame([my_mix])
my_mix_normalized = (my_mix_df - predictors.mean()) / predictors.std()

# Get prediction using normalized data
prediction = model_2.predict(my_mix_normalized)
print(f"Predicted concrete strength: {float(prediction[0][0]):.2f} MPa")

Predicted concrete strength: 41.52 MPa
