# ARIM-Academy：　ユースケース


## データセット

<div style="border:1px solid #000; padding:10px;">
    
本編で扱う creep_data.xlsx" は、177種のTi合金にかかるクリープデータセットです。

[1] S. Sucheta, R. Ashish and S. A. Kumar, "Machine learning assisted interpretation of creep and fatigue life in titanium alloys", *APL Machine Learning*, **1**, 016102 (2023)

"creep_data.xlsx" には、合金の組成のほか、クリープの実験条件ならびに熱処理のデータがセットで含まれています。

<br>  
<img src="./img/m_016102_1_f1.jpeg" width="50%"><br> 

---



**Chemical composition** 
1. Titanium (wt %) Ti 
1. Alumunium (wt %) Al 
1. Vanadium (wt %) Vn 
1. Carbon (wt %) C 
1. Nitrogen (wt %) N 
1. Oxygen (wt %) O 
1. Hydrogen (wt %) H 
1. Iron (wt %) Fe 0 
1. Silicon (wt %) Si 
1. Tin (wt %) Sn 
1. Niobium (wt %) Nb 
1. Molybednum (wt %) Mo 
1. Zirconium (wt %) Zr 
1. Boron (wt %) B 
1. Chromium (wt %) Cr 


**Experimental parameters**  
1. Rupture strain (%) ϵr 
1. Temperature of measurement (cel) Tcreep 
1. Steady state strain rate (1/s) ˙ ϵ 
1. Applied stress (MPa) σ 

**Heat treatment conditions**  
1. Aging temperature (cel) Tage 
1. Aging time (hrs) tage 
1. Solution temperature (cel) Tsol
1. Solution time (cel) tsol

### 教材への接続
google colabにおけるオンラインの場合にこのラインを実行します。（<font color="red">Google colabに接続しない場合には不要</font>）

In [None]:
!git clone https://github.com/ARIM-Usecase/Example_2.git
%cd Example_2

### ライブラリのインポート
コード実行で必要なpythonのライブラリをimport文でロードします。

In [1]:
# 汎用ライブラリ
import pandas as pd
import numpy as np

# 機械学習
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LeakyReLU
from tensorflow.keras.optimizers import Adam

ModuleNotFoundError: No module named 'tensorflow'

# 1.データセットの読み込みと前処理

### サンプルファイルの読み込み
pandasライブラリのread_excel()関数は、Excelファイルを読み込んでpandasのDataFrame形式に変換する関数です。ここでは[data]フォルダーにあるcreep_data.xlsxファイルをDataFrameとして読み込み、その結果をdfという変数に格納します。177のサンプルに対して24の特徴量からなるデータ行列となっています。

In [3]:
# データの読み込み
data = pd.read_excel('data.xlsx',header = 1)
data

Unnamed: 0,Al,Ca,Cu,Fe,K,Mg,Mn,Na,Zn,grouped,tea,Concentration,time,teaConc,tea_org,tea_var
0,3.297,4.356,0.031290,0.067,99.06,3.531,1.455,0.541,0.131,Black Turkish 1 2,BT,1,2,BT1,black,turki
1,4.267,4.118,0.031290,0.079,106.50,3.378,1.542,0.603,0.126,Black Turkish 1 2,BT,1,2,BT1,black,turki
2,4.088,4.763,0.033370,0.084,114.00,4.763,1.838,1.058,0.156,Black Turkish 1 5,BT,1,5,BT1,black,turki
3,4.338,4.556,0.033370,0.091,122.60,5.005,2.269,0.958,0.162,Black Turkish 1 5,BT,1,5,BT1,black,turki
4,4.732,5.138,0.035514,0.110,132.40,5.626,2.998,1.510,0.165,Black Turkish 1 10,BT,1,10,BT1,black,turki
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
163,16.690,8.895,0.153000,0.236,323.40,20.450,10.420,6.360,0.335,Green Ceylan 3 30,GC,3,30,GC3,green,ceylon
164,17.620,8.909,0.177000,0.261,334.20,23.486,11.330,7.133,0.351,Green Ceylan 3 45,GC,3,45,GC3,green,ceylon
165,17.920,9.056,0.180000,0.266,332.30,22.840,11.290,7.609,0.358,Green Ceylan 3 45,GC,3,45,GC3,green,ceylon
166,17.820,9.128,0.175000,0.273,367.30,24.560,12.110,8.537,0.372,Green Ceylan 3 60,GC,3,60,GC3,green,ceylon


## 2.　機械学習モデル
### 説明変数と目的変数
クリーニングされたデータ行列（df_fin）を使って機械学習モデルを構築します。ここでは目的変数（y）をRupture_timeに対して説明変数をXを次のように定めます。また、目的変数はlog化しています。

In [3]:
# OneHotエンコーディング
encoder = OneHotEncoder(drop='first', sparse=False)
tea_encoded = encoder.fit_transform(data[['tea']])

# エンコードされたteaをデータフレームに追加
encoded_columns = encoder.get_feature_names_out(['tea'])
data[encoded_columns] = tea_encoded

# 必要な説明変数と目的変数の抽出
X = data[['Concentration', 'time'] + list(encoded_columns)]
y = data[['Al', 'Ca', 'Cu', 'Fe', 'K', 'Mg', 'Mn', 'Na', 'Zn']]

In [4]:
X

Unnamed: 0,Concentration,time,tea_BT,tea_GC,tea_GT
0,1,2,1.0,0.0,0.0
1,1,2,1.0,0.0,0.0
2,1,5,1.0,0.0,0.0
3,1,5,1.0,0.0,0.0
4,1,10,1.0,0.0,0.0
...,...,...,...,...,...
163,3,30,0.0,1.0,0.0
164,3,45,0.0,1.0,0.0
165,3,45,0.0,1.0,0.0
166,3,60,0.0,1.0,0.0


### 標準化

In [None]:
# データの標準化
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

### データセットの分割（Data set splitting）
分割は`test_size=0.2`として訓練データ：0.9、テストデータ：0.1とします。その後にトレーニングデータ（X_train）とテストデータ（X_test）を標準化します。

In [5]:
# 教師データと検証データの分割
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

In [6]:
X_train

array([[ 0.        ,  1.76272636, -0.57735027, -0.57735027, -0.57735027],
       [-1.22474487,  1.01641076, -0.57735027,  1.73205081, -0.57735027],
       [-1.22474487, -0.97376416,  1.73205081, -0.57735027, -0.57735027],
       [-1.22474487,  0.27009517, -0.57735027, -0.57735027,  1.73205081],
       [-1.22474487,  0.27009517, -0.57735027,  1.73205081, -0.57735027],
       [ 0.        ,  1.76272636, -0.57735027,  1.73205081, -0.57735027],
       [-1.22474487, -0.22744856, -0.57735027, -0.57735027,  1.73205081],
       [ 1.22474487, -0.97376416, -0.57735027, -0.57735027,  1.73205081],
       [ 0.        ,  1.76272636,  1.73205081, -0.57735027, -0.57735027],
       [ 0.        , -0.72499229, -0.57735027,  1.73205081, -0.57735027],
       [-1.22474487,  0.27009517, -0.57735027,  1.73205081, -0.57735027],
       [-1.22474487,  1.76272636, -0.57735027, -0.57735027,  1.73205081],
       [ 0.        ,  0.27009517, -0.57735027, -0.57735027, -0.57735027],
       [ 1.22474487,  0.27009517,  1.7

### scikit-learnによる予測モデルの選択（MLR）
GradientBoostingRegressorクラスは、Scikit-learnライブラリが提供するクラスで、予測型による勾配ブースティング回帰モデルを構築するために使用されます。

In [7]:
# 多重線形回帰（MLR）のモデル構築と評価
mlr_model = LinearRegression()
mlr_results = {}
for mineral in y.columns:
    mlr_model.fit(X_train, y_train[mineral])
    y_pred = mlr_model.predict(X_test)
    r2 = r2_score(y_test[mineral], y_pred)
    rmse = np.sqrt(mean_squared_error(y_test[mineral], y_pred))
    mae = mean_absolute_error(y_test[mineral], y_pred)
    mlr_results[mineral] = {'R2': r2, 'RMSE': rmse, 'MAE': mae, 'Coef': mlr_model.coef_, 'Intercept': mlr_model.intercept_}

# MLRの結果表示
for mineral, result in mlr_results.items():
    print(f'{mineral} - Coefficients: {result["Coef"]}, Intercept: {result["Intercept"]}')
    print(f'R2: {result["R2"]}, RMSE: {result["RMSE"]}, MAE: {result["MAE"]}\n')

Al - Coefficients: [3.79023937 1.43132072 2.422908   1.4915705  3.00820455], Intercept: 8.332939758068418
R2: 0.7860598485225264, RMSE: 2.095182608221988, MAE: 1.640582868042699

Ca - Coefficients: [0.82831951 0.50689865 0.60567955 0.94928797 0.35196316], Intercept: 6.063938541567816
R2: 0.9316957963319137, RMSE: 0.3165273080738458, MAE: 0.2458075365659622

Cu - Coefficients: [0.0158984  0.00961836 0.00235734 0.02024182 0.00080977], Intercept: 0.062010195124226244
R2: 0.6022344943444131, RMSE: 0.02301884798819232, MAE: 0.017072223559960718

Fe - Coefficients: [0.031093   0.03127944 0.04095397 0.04086792 0.03604626], Intercept: 0.1409555974041486
R2: 0.774561936559376, RMSE: 0.024588217217625958, MAE: 0.020367836277740834

K - Coefficients: [ 84.6439924   38.63003728 -10.61593612 -47.01264766 -30.12526335], Intercept: 242.08273098181968
R2: 0.9066364327056076, RMSE: 28.73240614126904, MAE: 23.273625727622942

Mg - Coefficients: [ 5.38707208  4.35945176 -1.60994223 -3.04017305 -1.9164252

### scikit-learnによる予測モデルの選択（ANN）

In [8]:
# ANNモデルの構築と評価
ann_results = {}
for mineral in y.columns:
    ann_model = Sequential()
    ann_model.add(Dense(18, input_dim=X_train.shape[1]))
    ann_model.add(LeakyReLU())
    ann_model.add(Dense(8))
    ann_model.add(LeakyReLU())
    ann_model.add(Dense(12))
    ann_model.add(LeakyReLU())
    ann_model.add(Dense(18))
    ann_model.add(LeakyReLU())
    ann_model.add(Dense(1))
    
    ann_model.compile(optimizer=Adam(learning_rate=0.0112), loss='mse')
    ann_model.fit(X_train, y_train[mineral], epochs=100, verbose=0)
    
    y_pred = ann_model.predict(X_test)
    r2 = r2_score(y_test[mineral], y_pred)
    rmse = np.sqrt(mean_squared_error(y_test[mineral], y_pred))
    mae = mean_absolute_error(y_test[mineral], y_pred)
    ann_results[mineral] = {'R2': r2, 'RMSE': rmse, 'MAE': mae}

# ANNの結果表示
for mineral, result in ann_results.items():
    print(f'{mineral} - R2: {result["R2"]}, RMSE: {result["RMSE"]}, MAE: {result["MAE"]}\n')

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 64ms/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 75ms/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 277ms/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 120ms/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 72ms/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 85ms/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 59ms/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 90ms/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 124ms/step
Al - R2: 0.9837956307621419, RMSE: 0.5766227808680172, MAE: 0.4172466917598948

Ca - R2: 0.9703738527651524, RMSE: 0.20846111506622098, MAE: 0.15754562722374404

Cu - R2: 0.8586506739503095, RMSE: 0.013721983629211886, MAE: 0.010227663828722408

Fe - R2: 0.9223771814392925, RMSE: 0.014428066081236533, MAE: 0.012066535169476002

K - R2: 0.9490123327814944, RMSE: 21.2332195049757, MAE: 16.21750744090361

Mg - R2: 0.9808399845196338, RMSE: 1.0105831586421972, MAE: 0.6542744476879346

Mn - R2: 0.9804482216790833, RMSE: 0.38568243448746453, MAE: 0.2844314436351552

Na - R2: 0.977862202371475, RMSE: 0.22621340220102293, MAE: 0.1760067259844612

Zn - R2: 0.9532628021231325, RMSE: 0.016892535690854697, MAE: 0.01239348752533688

