# <center>Regression with Keras</center>
<center><img src="https://3.bp.blogspot.com/-QZVBl08fmPk/XhO909Ha1dI/AAAAAAAACZI/q1a1UykGKe0KDUZ_ZITtWmM7bBJFRrvPQCLcBGAsYHQ/s1600/tensorflowkeras.jpg" /></center>

## Importing Libraries

In [1]:
import pandas as pd
from tqdm import tqdm
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from statistics import mean, stdev

## Reading Dataset

In [2]:
conc_df = pd.read_csv(r"https://cocl.us/concrete_data")
conc_df.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


## Description of Dataset

In [3]:
conc_df.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


## Length of the Data

In [4]:
conc_df.shape[0]

1030

## Information of Attributes

In [5]:
conc_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Cement              1030 non-null   float64
 1   Blast Furnace Slag  1030 non-null   float64
 2   Fly Ash             1030 non-null   float64
 3   Water               1030 non-null   float64
 4   Superplasticizer    1030 non-null   float64
 5   Coarse Aggregate    1030 non-null   float64
 6   Fine Aggregate      1030 non-null   float64
 7   Age                 1030 non-null   int64  
 8   Strength            1030 non-null   float64
dtypes: float64(8), int64(1)
memory usage: 72.6 KB


As we can see, there are no null values in any of the columns and all the columns have numerical data. Therfore no pre-processing is required.

## Dividing Dataset into Features and Target

In [6]:
features = conc_df[conc_df.columns[conc_df.columns != "Strength"]]
features.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [7]:
target = conc_df["Strength"]
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

## Finding Mean Squared Error

### Splitting Dataset into Training and Testing Set

In [8]:
ft_trn, ft_tst, tar_trn, tar_tst = train_test_split(features, target, test_size = 0.3)

### Defining our model

In [9]:
def regr_mod_A():
    model = Sequential([
        Dense(10,activation="relu",input_shape=features.columns.shape),
        Dense(1)
    ])
    model.compile(optimizer="adam",loss="mean_squared_error")
    return model

### Fitting our model

In [10]:
model = regr_mod_A()
model.fit(ft_trn, tar_trn, epochs=50, validation_split=0.3, verbose=2)

Epoch 1/50
16/16 - 1s - loss: 56474.6836 - val_loss: 46191.2969 - 654ms/epoch - 41ms/step
Epoch 2/50
16/16 - 0s - loss: 37268.7656 - val_loss: 28703.3379 - 52ms/epoch - 3ms/step
Epoch 3/50
16/16 - 0s - loss: 21336.8906 - val_loss: 14733.6602 - 47ms/epoch - 3ms/step
Epoch 4/50
16/16 - 0s - loss: 9590.5791 - val_loss: 6251.2632 - 45ms/epoch - 3ms/step
Epoch 5/50
16/16 - 0s - loss: 4106.1875 - val_loss: 3603.7817 - 53ms/epoch - 3ms/step
Epoch 6/50
16/16 - 0s - loss: 2900.4131 - val_loss: 3311.6335 - 48ms/epoch - 3ms/step
Epoch 7/50
16/16 - 0s - loss: 2743.8425 - val_loss: 3109.3145 - 44ms/epoch - 3ms/step
Epoch 8/50
16/16 - 0s - loss: 2527.2410 - val_loss: 2911.4602 - 45ms/epoch - 3ms/step
Epoch 9/50
16/16 - 0s - loss: 2377.0562 - val_loss: 2742.7209 - 46ms/epoch - 3ms/step
Epoch 10/50
16/16 - 0s - loss: 2231.4153 - val_loss: 2575.8481 - 46ms/epoch - 3ms/step
Epoch 11/50
16/16 - 0s - loss: 2100.6453 - val_loss: 2425.6526 - 48ms/epoch - 3ms/step
Epoch 12/50
16/16 - 0s - loss: 1977.9147 - v

<keras.src.callbacks.History at 0x27d20ff0090>

### Predictions on Testing Set

In [11]:
pred_tst = model.predict(ft_tst)



### Mean Squared Error

In [12]:
print(mean_squared_error(pred_tst, tar_tst))

211.413911794534


## Creating a function for doing the above operations 50 times and store the MSE's in a list

In [13]:
def regr_list(feat, regr_mod, epch):
    
    mean_list = []

    for i in tqdm(range(50)):
        ft_trn, ft_tst, tar_trn, tar_tst = train_test_split(feat, target, test_size = 0.3)
        model = regr_mod
        model.fit(ft_trn, tar_trn, epochs=epch, validation_split=0.3, verbose=0)
        pred_tst = model.predict(ft_tst, verbose=0)
        mean_list.append(mean_squared_error(pred_tst, tar_tst))
        
    return mean_list

## Part A : Printing Mean and Standard Deviations of MSE's obtained

In [14]:
mean_list_A = regr_list(features, regr_mod_A(), 50)
print("Mean               : ",mean(mean_list_A))
print("Standard Deviation : ",stdev(mean_list_A))

100%|██████████| 50/50 [01:52<00:00,  2.26s/it]

Mean               :  195.06590436161738
Standard Deviation :  567.0285797871568





## Part B : Perform Part A but with Normalized Values

In [15]:
norm_feat = (features - features.mean())/features.std()
norm_feat.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [16]:
mean_list_B = regr_list(norm_feat, regr_mod_A(), 50)
print("Mean               : ",mean(mean_list_B))
print("Standard Deviation : ",stdev(mean_list_B))

100%|██████████| 50/50 [01:52<00:00,  2.24s/it]

Mean               :  116.62958031697414
Standard Deviation :  54.677999477565805





## Part C : Perform Part B but with 100 epochs

In [17]:
mean_list_C = regr_list(norm_feat, regr_mod_A(), 100)
print("Mean               : ",mean(mean_list_C))
print("Standard Deviation : ",stdev(mean_list_C))

100%|██████████| 50/50 [03:36<00:00,  4.33s/it]

Mean               :  42.961765313659576
Standard Deviation :  56.77395100283535





## Part D : Perform Part B but with 3 hidden layers each with 10 nodes and ReLU activation

In [18]:
def regr_mod_D():
    model = Sequential([
        Dense(10,activation="relu",input_shape=features.columns.shape),
        Dense(10,activation="relu"),
        Dense(10,activation="relu"),
        Dense(1)
    ])
    model.compile(optimizer="adam",loss="mean_squared_error")
    return model

In [19]:
mean_list_D = regr_list(norm_feat, regr_mod_D(), 50)
print("Mean               : ",mean(mean_list_D))
print("Standard Deviation : ",stdev(mean_list_D))

100%|██████████| 50/50 [01:55<00:00,  2.30s/it]

Mean               :  36.30030846236525
Standard Deviation :  23.721144744868408





## Conclusion

|                    | A      | B      | C     | D     |
|--------------------|--------|--------|-------|-------|
| Mean               | 195.06 | 116.63 | 42.96 | 36.30 |
| Standard Deviation | 567.03 | 54.68  | 56.77 | 23.72 |

- Mean is reduced in B as compared to A
- Mean is significantly reduced in C as compared to B
- Mean is significantly reduced in D as compared to B

Therfore, normalizing the data reduces errors and increasing the epochs or the hidden layers reduces the errors even more.