<a href="https://colab.research.google.com/github/knc6/jarvis-tools-notebooks/blob/master/jarvis-tools-notebooks/Basic_ML.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Introduction

The mechanical properties of a material affect how it behaves as it is loaded. The elastic modulus of the material affects how much it deflects under a load, and the strength of the material determines the stresses that it can withstand before it fails. The ductility of a material also plays a significant role in determining when a material will break as it is loaded beyond its elastic limit. Because every mechanical system is subjected to loads during operation, it is important to understand how the materials that make up those mechanical systems behave. There are a lot of parameters to affect mechanical properties such as microstrucre, heat treatment, process method, composition etc.

Composition and heat treatment are most important parameters. They give information about mechanical properties of materials in many engineering applications. Many tests are carried out to examine the mechanical effects of these properties. Sometimes these tests are disadvantageous in terms of time and money. Machine learning methods have proven to be successful in the prediction of a large number of material properties. ML studies have established the unequivocal potential of this emerging discipline in accelerating discovery and design of new/improved materials. However, there still does not exist a standardized set of protocols for exploring this approach in a systematic manner on many potential applications, and thus, establishing the composition-processing-structure-property relationships still remains an arduous task.

#### DataSet
#### Reference:

1. https://link.springer.com/article/10.1186/2193-9772-3-8

2. https://www.kaggle.com/code/emrzcn/prediction-of-mechanical-properties-of-steels/notebook

Fatigue Dataset for Steel from National Institute of Material Science (NIMS) MatNavi was used in this work, which is one of the largest databases in the world with details on composition, mill product (upstream) features and subsequent processing (heat treatment) parameters. The database comprises carbon and low-alloy steels, carburizing steels and spring steels. Fatigue life data, which pertain to rotating bending fatigue tests at room temperature conditions, was the target property for which we aimed to construct predictive models in the current study. The features in the dataset can be categorized into the following:

*  Chemical composition - %C, %Si, %Mn, %P, %S, %Ni, %Cr, %Cu, %Mo (all in wt. %)
*  Upstream processing details - ingot size, reduction ratio, non-metallic inclusionsAgrawal et al.
*  Heat treatment conditions - temperature, time and other process conditions for normalizing, through-hardening, carburizing-quenching and tempering processes
*  Mechanical properties - YS, UTS, %EL, %RA, hardness, Charpy impact value (J/cm2), fatigue strength.[3]

The data used in this work has 437 instances/rows, **25 features/columns (composition and processing parameters)**, and **1 target property (fatigue strength).** The 437 data instances include 371 carbon and low alloy steels, 48 carburizing steels, and 18 spring steels. This data pertains to various heats of each grade of steel and different processing conditions.

In [1]:
# https://link.springer.com/article/10.1186/2193-9772-3-8#Sec29
!wget https://static-content.springer.com/esm/art%3A10.1186%2F2193-9772-3-8/MediaObjects/40192_2013_16_MOESM1_ESM.xlsx -O 40192_2013_16_MOESM1_ESM.xlsx
import pandas as pd
df = pd.read_excel('40192_2013_16_MOESM1_ESM.xlsx')
df.head()

--2024-07-08 00:14:06--  https://static-content.springer.com/esm/art%3A10.1186%2F2193-9772-3-8/MediaObjects/40192_2013_16_MOESM1_ESM.xlsx
Resolving static-content.springer.com (static-content.springer.com)... 151.101.0.95, 151.101.64.95, 151.101.128.95, ...
Connecting to static-content.springer.com (static-content.springer.com)|151.101.0.95|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 91132 (89K) [application/octet-stream]
Saving to: ‘40192_2013_16_MOESM1_ESM.xlsx’


2024-07-08 00:14:07 (6.04 MB/s) - ‘40192_2013_16_MOESM1_ESM.xlsx’ saved [91132/91132]



  warn(msg)


Unnamed: 0,Sl. No.,NT,THT,THt,THQCr,CT,Ct,DT,Dt,QmT,...,S,Ni,Cr,Cu,Mo,RedRatio,dA,dB,dC,Fatigue
0,1,885,30,0,0,30,0.0,30.0,0.0,30,...,0.022,0.01,0.02,0.01,0.0,825,0.07,0.02,0.04,232
1,2,885,30,0,0,30,0.0,30.0,0.0,30,...,0.017,0.08,0.12,0.08,0.0,610,0.11,0.0,0.04,235
2,3,885,30,0,0,30,0.0,30.0,0.0,30,...,0.015,0.02,0.03,0.01,0.0,1270,0.07,0.02,0.0,235
3,4,885,30,0,0,30,0.0,30.0,0.0,30,...,0.024,0.01,0.02,0.01,0.0,1740,0.06,0.0,0.0,241
4,5,885,30,0,0,30,0.0,30.0,0.0,30,...,0.022,0.01,0.02,0.02,0.0,825,0.04,0.02,0.0,225


In [2]:
df.shape

(437, 27)

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 437 entries, 0 to 436
Data columns (total 27 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Sl. No.   437 non-null    int64  
 1   NT        437 non-null    int64  
 2   THT       437 non-null    int64  
 3   THt       437 non-null    int64  
 4   THQCr     437 non-null    int64  
 5   CT        437 non-null    int64  
 6   Ct        437 non-null    float64
 7   DT        437 non-null    float64
 8   Dt        437 non-null    float64
 9   QmT       437 non-null    int64  
 10  TT        437 non-null    int64  
 11  Tt        437 non-null    int64  
 12  TCr       437 non-null    float64
 13  C         437 non-null    float64
 14  Si        437 non-null    float64
 15  Mn        437 non-null    float64
 16  P         437 non-null    float64
 17  S         437 non-null    float64
 18  Ni        437 non-null    float64
 19  Cr        437 non-null    float64
 20  Cu        437 non-null    float6

In [4]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Sl. No.,437.0,219.0,126.295289,1.0,110.0,219.0,328.0,437.0
NT,437.0,872.299771,26.212073,825.0,865.0,870.0,870.0,930.0
THT,437.0,737.643021,280.036541,30.0,845.0,845.0,855.0,865.0
THt,437.0,25.949657,10.263824,0.0,30.0,30.0,30.0,30.0
THQCr,437.0,10.654462,7.841437,0.0,8.0,8.0,8.0,24.0
CT,437.0,128.855835,281.743539,30.0,30.0,30.0,30.0,930.0
Ct,437.0,40.502059,126.924697,0.0,0.0,0.0,0.0,540.0
DT,437.0,123.699844,267.128933,30.0,30.0,30.0,30.0,903.333
Dt,437.0,4.843936,15.700076,0.0,0.0,0.0,0.0,70.2
QmT,437.0,35.491991,19.419277,30.0,30.0,30.0,30.0,140.0


## Variable Description

* C % - Carbon
* Si % - Silicon
* Mn % - Manganese
* P % - Phosphorus
* S % - Sulphur
* Ni % - Nickel
* Cr % - Chromium
* Cu % - Copper
* Mo % - Molybdenum
* NT - Normalizing Temperature
* THT - Through Hardening Temperature
* THt - Through Hardening Time
* THQCr - Cooling Rate for Through Hardening
* CT - Carburization Temperature
* Ct - Carburization Time
* DT - Diffusion Temperature
* Dt - Diffusion time
* QmT - Quenching Media Temperature (for Carburization)
* TT - Tempering Temperature
* Tt - Tempering Time
* TCr - Cooling Rate for Tempering
* RedRatio - Reduction Ratio (Ingot to Bar)
* dA  - Area Proportion of Inclusions Deformed by Plastic Work
* dB  -Area Proportion of Inclusions Occurring in Discontinuous Array
* dC  -Area Proportion of Isolated Inclusions
* Fatigue  - Rotating Bending Fatigue Strength (107 Cycles)




### Visualization

In [8]:
!pip install -q pandas_profiling

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m324.4/324.4 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m359.5/359.5 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m104.8/104.8 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m686.1/686.1 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.5/296.5 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for htmlmin (setup.py) ... [?25l[?25hdone


In [9]:
import pandas_profiling as pp
profile_report = pp.ProfileReport(df)
profile_report

PydanticImportError: `BaseSettings` has been moved to the `pydantic-settings` package. See https://docs.pydantic.dev/2.8/migration/#basesettings-has-moved-to-pydantic-settings for more details.

For further information visit https://errors.pydantic.dev/2.8/u/import-error