## Mechanical Properties in low alloy steels


In [50]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
%matplotlib inline

In [51]:
from sklearn.linear_model import LinearRegression,Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

In [52]:
df=pd.read_csv("MatNavi Mechanical properties of low-alloy steels.csv")
df.head()

Unnamed: 0,Alloy code,C,Si,Mn,P,S,Ni,Cr,Mo,Cu,V,Al,N,Ceq,Nb + Ta,Temperature (°C),0.2% Proof Stress (MPa),Tensile Strength (MPa),Elongation (%),Reduction in Area (%)
0,MBB,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.61,0.04,0.0,0.003,0.0066,0.0,0.0,27,342,490,30,71
1,MBB,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.61,0.04,0.0,0.003,0.0066,0.0,0.0,100,338,454,27,72
2,MBB,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.61,0.04,0.0,0.003,0.0066,0.0,0.0,200,337,465,23,69
3,MBB,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.61,0.04,0.0,0.003,0.0066,0.0,0.0,300,346,495,21,70
4,MBB,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.61,0.04,0.0,0.003,0.0066,0.0,0.0,400,316,489,26,79


In [53]:
    df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 915 entries, 0 to 914
Data columns (total 20 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Alloy code                915 non-null    object 
 1    C                        915 non-null    float64
 2    Si                       915 non-null    float64
 3    Mn                       915 non-null    float64
 4    P                        915 non-null    float64
 5    S                        915 non-null    float64
 6    Ni                       915 non-null    float64
 7    Cr                       915 non-null    float64
 8    Mo                       915 non-null    float64
 9    Cu                       915 non-null    float64
 10  V                         915 non-null    float64
 11   Al                       915 non-null    float64
 12   N                        915 non-null    float64
 13  Ceq                       915 non-null    float64
 14  Nb + Ta   

No null values

Alloy code is for information purpose only so dropping it.

In [54]:
df.drop(columns=['Alloy code'],axis=1,inplace=True)

The 0.2% proof stress (0.2% OYS, 0.2% proof stress, RP0. 2, RP0,2) is defined as the amount of stress that will result in a plastic strain of 0.2%. This is basically Yield stress. So renaming this column


In [55]:
df.rename({" 0.2% Proof Stress (MPa)":"Yield Stress"},inplace=True,axis=1)
df

Unnamed: 0,C,Si,Mn,P,S,Ni,Cr,Mo,Cu,V,Al,N,Ceq,Nb + Ta,Temperature (°C),Yield Stress,Tensile Strength (MPa),Elongation (%),Reduction in Area (%)
0,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.610,0.04,0.000,0.003,0.0066,0.0,0.0000,27,342,490,30,71
1,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.610,0.04,0.000,0.003,0.0066,0.0,0.0000,100,338,454,27,72
2,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.610,0.04,0.000,0.003,0.0066,0.0,0.0000,200,337,465,23,69
3,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.610,0.04,0.000,0.003,0.0066,0.0,0.0000,300,346,495,21,70
4,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.610,0.04,0.000,0.003,0.0066,0.0,0.0000,400,316,489,26,79
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
910,0.22,0.22,1.24,0.021,0.008,0.030,0.05,0.017,0.01,0.005,0.005,0.0116,0.0,0.0017,350,268,632,28,65
911,0.22,0.22,1.24,0.021,0.008,0.030,0.05,0.017,0.01,0.005,0.005,0.0116,0.0,0.0017,400,244,575,28,68
912,0.22,0.22,1.24,0.021,0.008,0.030,0.05,0.017,0.01,0.005,0.005,0.0116,0.0,0.0017,450,224,500,29,72
913,0.22,0.22,1.24,0.021,0.008,0.030,0.05,0.017,0.01,0.005,0.005,0.0116,0.0,0.0017,500,209,428,30,78


In [56]:
for i in df.columns:
    df.rename({i:i.strip()},inplace=True,axis=1)

### EDA

In [57]:
px.imshow(df.corr().round(2),height=1000,width=1000,text_auto=True)

1. Temperature has significant influence on % Elongation and % Reduction in area.
2. Yield Stress is highly influenced by presence of V, Ni, Mn, Mo, Ceq, Si, Al, Cr, C and Cu in decreasing order.
3. Tensile Strength is highly influenced by presence of V and moderately influenced by presence of Mo, Ni, Cr, C and Mn in decreasing order.
4. Tensile Strength is also highly related to Yield Stress.
5. % Elongation and % Reduction in Area show maximum correlation with each other followed by temperature.
6. % Elongation is moderately influenced by presence of P and slightly influenced by presence of Al
7. % Reduction in Area is moderately influenced by presence of Al, Ceq, Si and Mn.

In [58]:
df.columns[:-4]

Index(['C', 'Si', 'Mn', 'P', 'S', 'Ni', 'Cr', 'Mo', 'Cu', 'V', 'Al', 'N',
       'Ceq', 'Nb + Ta', 'Temperature (°C)'],
      dtype='object')

In [59]:
# checking skewness
for i in df.columns[:-4]:
    print(i+" = {}".format(np.log(df[i]).skew()))

C = 0.41493170906795795
Si = 0.28163097443330803
Mn = 0.30648076747030994
P = -0.2172075701397653
S = -0.296647779505401
Ni = nan
Cr = nan
Mo = -0.7307489953712905
Cu = nan
V = nan
Al = 0.5315337198412445
N = -0.9572710852800583
Ceq = nan
Nb + Ta = nan
Temperature (°C) = -1.4080891562418343



divide by zero encountered in log



N, Mo and Temperature are highly skewed.

In [60]:
df1=df.copy()

### Feature Selection for using Linear Regression

##### Feature selection using pearson corelation

In [61]:
corr_matrix=df1.corr()
corr_matrix=(corr_matrix.where(np.triu(np.ones(corr_matrix.shape),k=1).astype(np.bool)).stack())
corr1=corr_matrix[(corr_matrix.values>0.5)&(corr_matrix.values<1)].sort_values()
print(corr1)

Temperature (°C)  Reduction in Area (%)    0.565359
Ni                Cu                       0.578134
Elongation (%)    Reduction in Area (%)    0.604215
Cr                V                        0.631938
V                 Yield Stress             0.636588
Mn                Al                       0.694037
Mo                V                        0.722976
Mn                Ceq                      0.736526
Cr                Mo                       0.795223
Al                Ceq                      0.815686
dtype: float64



Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations



So, we can see here that, 
1. yield stress gets ↑ with the % of V.
2. ↑ in Temperature leads to ↑ in Elongation that will Reduce the area.
3. Al and Mn are highly correlated with Ceq.
4. Cr and Mo are highly correlated.
5. Mo and V are also highly correlated.

In [62]:
# we will remove Ceq and Mo
df1.drop(columns=['Mo','Ceq'],inplace=True)

In [63]:
df1

Unnamed: 0,C,Si,Mn,P,S,Ni,Cr,Cu,V,Al,N,Nb + Ta,Temperature (°C),Yield Stress,Tensile Strength (MPa),Elongation (%),Reduction in Area (%)
0,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.04,0.000,0.003,0.0066,0.0000,27,342,490,30,71
1,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.04,0.000,0.003,0.0066,0.0000,100,338,454,27,72
2,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.04,0.000,0.003,0.0066,0.0000,200,337,465,23,69
3,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.04,0.000,0.003,0.0066,0.0000,300,346,495,21,70
4,0.12,0.36,0.52,0.009,0.003,0.089,0.97,0.04,0.000,0.003,0.0066,0.0000,400,316,489,26,79
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
910,0.22,0.22,1.24,0.021,0.008,0.030,0.05,0.01,0.005,0.005,0.0116,0.0017,350,268,632,28,65
911,0.22,0.22,1.24,0.021,0.008,0.030,0.05,0.01,0.005,0.005,0.0116,0.0017,400,244,575,28,68
912,0.22,0.22,1.24,0.021,0.008,0.030,0.05,0.01,0.005,0.005,0.0116,0.0017,450,224,500,29,72
913,0.22,0.22,1.24,0.021,0.008,0.030,0.05,0.01,0.005,0.005,0.0116,0.0017,500,209,428,30,78


In [64]:
X=df1.iloc[:,:-4]
Y=df1.iloc[:,-4:]

##### applying Linear Regression

In [65]:
xtrain,xtest,ytrain,ytest=train_test_split(X,Y,test_size=0.3,random_state=2)

In [66]:
scaler=StandardScaler()

In [67]:
scaler.fit_transform(xtrain)
scaler.transform(xtest)

array([[ 0.42281541, -1.0462736 , -0.3914184 , ..., -0.0077286 ,
        -0.15491933, -0.33330727],
       [-0.92480342,  0.20662079,  1.6217468 , ..., -0.30621258,
        -0.15491933,  0.46839393],
       [-0.41944636, -0.59067564, -1.06247346, ...,  0.29075538,
        -0.15491933,  1.27009513],
       ...,
       [-1.43016048,  0.66221876, -0.94576823, ...,  0.1201931 ,
        -0.15491933, -0.86777474],
       [ 1.77043423, -0.13507768, -0.33306578, ...,  0.84508277,
        -0.15491933,  0.20116019],
       [-0.41944636,  2.37071111, -0.09965532, ...,  1.22884789,
        -0.15491933, -1.79240346]])

In [68]:
lr1=LinearRegression()

In [69]:
lr1.fit(xtrain,ytrain)

LinearRegression()

In [70]:
y_pred=lr1.predict(xtest)

In [71]:
ytest['Yield Stress'].values

array([201, 409, 162, 275, 533, 580, 229, 395, 260, 170, 207, 240, 189,
       278, 426, 393, 335, 276, 222, 358, 268, 510, 306, 177, 228, 349,
       545, 172, 275, 203, 562, 459, 462, 264, 186, 545, 660, 650, 514,
       194, 360, 270, 394, 185, 459, 229, 411, 213, 459, 290, 190, 403,
       328, 193, 513, 198, 226, 247, 262, 162, 469, 199, 403, 178, 260,
       240, 223, 136, 516, 403, 426, 262, 537, 530, 506, 213, 404, 621,
       450, 237, 194, 199, 325, 197, 218, 241, 237, 266, 408, 304, 483,
       258, 358, 300, 507, 467, 176, 440, 456, 640, 359, 183, 542, 214,
       500, 330, 690, 168, 258, 439, 274, 259, 137, 279, 491, 261, 440,
       449, 198, 305, 175, 191, 230, 378, 200, 464, 450, 240, 446, 267,
       179, 500, 315, 150, 189, 623, 473, 508, 285, 229, 335, 451, 353,
       209, 284, 260, 382, 243, 192, 499, 249, 186, 655, 459, 276, 419,
       206, 437, 378, 513, 290, 435, 275, 488, 513, 428, 566, 395, 244,
       176, 438, 511, 278, 344, 500, 198, 303, 140, 270, 496, 54

In [72]:
# r2 scores
print("Yield Stress r2 score = {}".format(r2_score(ytest['Yield Stress'].values,pd.DataFrame(y_pred)[0])))
print("Tensile Strength (MPa) r2 score = {}".format(r2_score(ytest['Tensile Strength (MPa)'].values,pd.DataFrame(y_pred)[1])))
print("Elongation (%) r2 score = {}".format(r2_score(ytest['Elongation (%)'].values,pd.DataFrame(y_pred)[2])))
print("Reduction in Area (%) r2 score = {}".format(r2_score(ytest['Reduction in Area (%)'].values,pd.DataFrame(y_pred)[3])))

Yield Stress r2 score = 0.8240527649880142
Tensile Strength (MPa) r2 score = 0.5033779489826307
Elongation (%) r2 score = 0.4579561662460716
Reduction in Area (%) r2 score = 0.37559150896819415


Not much good r2 score achieved by this.

### Using Lasso

In [73]:
ls=Lasso(alpha=1)

In [74]:
X2=df.iloc[:,:-4]
Y2=df.iloc[:,-4:]

In [75]:
xtrain2,xtest2,ytrain2,ytest2=train_test_split(X2,Y2,test_size=0.3,shuffle=True,random_state=2)

In [76]:
scaler2=StandardScaler()

In [77]:
scaler2.fit_transform(xtrain2)
scaler2.transform(xtest2)

array([[ 0.42281541, -1.0462736 , -0.3914184 , ..., -0.53814099,
        -0.15491933, -0.33330727],
       [-0.92480342,  0.20662079,  1.6217468 , ...,  1.68299547,
        -0.15491933,  0.46839393],
       [-0.41944636, -0.59067564, -1.06247346, ..., -0.53814099,
        -0.15491933,  1.27009513],
       ...,
       [-1.43016048,  0.66221876, -0.94576823, ..., -0.53814099,
        -0.15491933, -0.86777474],
       [ 1.77043423, -0.13507768, -0.33306578, ..., -0.53814099,
        -0.15491933,  0.20116019],
       [-0.41944636,  2.37071111, -0.09965532, ..., -0.53814099,
        -0.15491933, -1.79240346]])

In [78]:
ls.fit(xtrain2,ytrain2)

Lasso(alpha=1)

In [79]:
y_pred2=ls.predict(xtest2)

In [80]:
# r2 scores
print("Yield Stress r2 score = {}".format(r2_score(ytest['Yield Stress'].values,pd.DataFrame(y_pred2)[0])))
print("Tensile Strength (MPa) r2 score = {}".format(r2_score(ytest['Tensile Strength (MPa)'].values,pd.DataFrame(y_pred2)[1])))
print("Elongation (%) r2 score = {}".format(r2_score(ytest['Elongation (%)'].values,pd.DataFrame(y_pred2)[2])))
print("Reduction in Area (%) r2 score = {}".format(r2_score(ytest['Reduction in Area (%)'].values,pd.DataFrame(y_pred2)[3])))

Yield Stress r2 score = 0.8175614191069538
Tensile Strength (MPa) r2 score = 0.5946670437110766
Elongation (%) r2 score = 0.2243664727915864
Reduction in Area (%) r2 score = 0.1941528865077321


Only Tensile Strength score improved using lasso, but other scores decreased.

### Using Random Forest

In [81]:
from sklearn.ensemble import RandomForestRegressor

In [82]:
sc_x = StandardScaler()
sc_x.fit(xtrain2)
x_train_sc = sc_x.transform(xtrain2)
x_test_sc = sc_x.transform(xtest2)

sc_y = StandardScaler()
sc_y.fit(ytrain2)
y_train_sc = sc_y.transform(ytrain2)
y_test_sc = sc_y.transform(ytest2)

In [83]:
# To be used later while visualizing results
actual_yield_strength = np.transpose(ytest2.values)[0]
actual_tensile_strength = np.transpose(ytest2.values)[1]
actual_pct_elongation = np.transpose(ytest2.values)[2]
actual_pct_reduction_area = np.transpose(ytest2.values)[3]

In [84]:
regressor = RandomForestRegressor(n_estimators=100, criterion='mse')
regressor.fit(x_train_sc, y_train_sc)


Criterion 'mse' was deprecated in v1.0 and will be removed in version 1.2. Use `criterion='squared_error'` which is equivalent.



RandomForestRegressor(criterion='mse')

In [85]:
y_rf_pred_sc = regressor.predict(x_test_sc)

In [86]:
r2_rf = r2_score(y_test_sc, y_rf_pred_sc)

In [87]:
print('R\u00b2_score = ' + str(round(r2_rf, 2)))

R²_score = 0.89


In [88]:
 # Scaling up the inputs
y_rf_pred = sc_y.inverse_transform(y_rf_pred_sc)

# Visualizing the accuracy of predicted results
C = np.transpose(y_rf_pred)[0]
rf_predicted_yield_strength = np.transpose(y_rf_pred)[0]
rf_predicted_tensile_strength = np.transpose(y_rf_pred)[1]
rf_predicted_pct_elongation = np.transpose(y_rf_pred)[2]
rf_predicted_pct_reduction_area = np.transpose(y_rf_pred)[3]

In [89]:
r2_yield_strength_rf = r2_score(actual_yield_strength, rf_predicted_yield_strength)
r2_tensile_strength_rf = r2_score(actual_tensile_strength, rf_predicted_tensile_strength)
r2_pct_elongation_rf = r2_score(actual_pct_elongation, rf_predicted_pct_elongation)
r2_pct_reduction_area_rf = r2_score(actual_pct_reduction_area, rf_predicted_pct_reduction_area)
print('R\u00b2_score for 0.2% yield Strength = ' + str(round(r2_yield_strength_rf, 2)))
print('R\u00b2_score for Tensile strength    = ' + str(round(r2_tensile_strength_rf, 2)))
print('R\u00b2_score for % Elongation        = ' + str(round(r2_pct_elongation_rf, 2)))
print('R\u00b2_score for % Reduction in Area = ' + str(round(r2_pct_reduction_area_rf, 2)))

R²_score for 0.2% yield Strength = 0.92
R²_score for Tensile strength    = 0.95
R²_score for % Elongation        = 0.86
R²_score for % Reduction in Area = 0.83


In [90]:
ytest2['Yield Stress Pred']=y_rf_pred.T[0]
ytest2['Tensile Strength (MPa) Pred']=y_rf_pred.T[1]
ytest2['Elongation (%) Pred']=y_rf_pred.T[2]
ytest2['Reduction in Area (%) Pred']=y_rf_pred.T[3]

In [91]:
y_rf_pred

array([[218.69, 497.34,  26.5 ,  53.87],
       [392.46, 509.25,  27.43,  79.14],
       [169.56, 278.89,  43.4 ,  88.25],
       ...,
       [299.32, 478.  ,  22.99,  70.03],
       [192.57, 431.68,  34.05,  66.96],
       [539.9 , 666.42,  17.43,  64.06]])

In [92]:
ytest2

Unnamed: 0,Yield Stress,Tensile Strength (MPa),Elongation (%),Reduction in Area (%),Yield Stress Pred,Tensile Strength (MPa) Pred,Elongation (%) Pred,Reduction in Area (%) Pred
311,201,454,27,56,218.69,497.34,26.50,53.87
779,409,517,28,79,392.46,509.25,27.43,79.14
565,162,270,47,89,169.56,278.89,43.40,88.25
308,275,490,33,64,285.01,466.14,30.59,64.33
783,533,636,24,72,500.87,621.78,22.56,69.71
...,...,...,...,...,...,...,...,...
815,465,570,21,77,489.05,619.56,22.09,72.09
284,378,614,23,51,234.39,544.35,26.10,49.58
12,296,439,25,77,299.32,478.00,22.99,70.03
304,199,432,33,61,192.57,431.68,34.05,66.96


In [93]:
px.line(ytest2.sort_index())

### Conclusion
The random forest regressor performs better in each category and overall as compared to Linear and Lasso Regression. Being computationally cheap to train, manually easier to fine-tune and highly versatile to fit itself on a complex data containing regressions within clusters, this model makes for an ideal choice for prediction of mechanical properties of low-alloy steels with R² score of 0.89 which is significant.