Download and Clean Dataset


In [None]:
import pandas as pd
import numpy as np

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

In [None]:
concrete_data = pd.read_csv('concrete_data.csv')
concrete_data.head()

the first row sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa.

In [None]:
concrete_data.shape

The sample has 1030 rows and 9 columns. Because of the few samples, we have to be careful not to overfit the training data.  

Do we have missing values?

In [None]:
concrete_data.describe()

In [None]:
concrete_data.isnull().sum()

The data looks very clean and is ready to be used to build our model.

*Split data into predictors and target*
The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns

In [None]:
concrete_data_columns = concrete_data.columns

concrete_data_columns


In [None]:
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [None]:
predictors.head()

In [None]:
target.head()

In [None]:
n_cols = predictors.shape[1] # number of predictors

Lets normalize de data because there are different features that have different ranges

In [None]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Import keras

In [None]:
import keras

In [None]:
from keras.models import Sequential
from keras.layers import Dense

Build a Neural Network

Lets create a neural network with 10 neurons and ReLU activation functions. It uses the adam optimizer and also the mean squared error as the loss function

In [None]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

 Randomly split the data into a training and test sets by holding 30% of the data for testing. Let's  can use the train_test_split helper function from Scikit-learn.

In [None]:

from sklearn.model_selection import train_test_split

In [None]:

X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=20)

Lets train and test the model

In [None]:


model = regression_model()

Lets train the model using 100 epoch

In [None]:
epochs = 100
model.fit(X_train, y_train, epochs=epochs, verbose=1)

 Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

In [None]:
loss_val = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
loss_val

In [None]:
from sklearn.metrics import mean_squared_error

In [None]:

mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

create a list of 50 mean squared errors.

In [None]:
mean_square_error_list = []
for i in range (0, 50):
    
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=20)

    model.fit(X_train, y_train, epochs=100, verbose=0)
    loss_val = model.evaluate(X_test, y_test, verbose=0)
    print("Iteration: " +str(i) +" Loss value: "+ str(loss_val))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean = np.mean(mean_square_error)
    mean_square_error_list.append(mean)
    
    
    


In [None]:
mean_squared_errors = np.array(mean_square_error_list)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print(f"Mean {mean} and standard deviation {standard_deviation} ")

Increasing the number of epoch improves the precission of the model