### Model Selection and Creation of Pickle File for Airfoil Data

In this notebook, we will explore various algorithms to determine the best model based on accuracy scores. The process involves:

1. **Evaluating Different Algorithms**: We will assess the performance of several algorithms, including:
   - Linear Regression
   - Decision Tree
   - Elastic Net
   - Ridge Regression
   - Random Forest

2. **Selecting the Best-Fit Model**: Based on the accuracy scores, we will identify the model that performs best.

3. **Creating a Pickle File**: The best-fit model will be saved in a pickle file for future use, ensuring that we have a reliable model ready for deployment or further analysis.

This approach allows us to systematically identify and preserve the most effective model for the airfoil data.

In [1]:
import pandas as pd

df = pd.read_csv("../data/airfoil_self_noise.dat", sep = "\t", header=None)

In [2]:
df.columns = ["Frequency", "Angle of attack", "Chord length", "Free-stream velocity", "Suction side", "Pressure level"]
df

Unnamed: 0,Frequency,Angle of attack,Chord length,Free-stream velocity,Suction side,Pressure level
0,800,0.0,0.3048,71.3,0.002663,126.201
1,1000,0.0,0.3048,71.3,0.002663,125.201
2,1250,0.0,0.3048,71.3,0.002663,125.951
3,1600,0.0,0.3048,71.3,0.002663,127.591
4,2000,0.0,0.3048,71.3,0.002663,127.461
...,...,...,...,...,...,...
1498,2500,15.6,0.1016,39.6,0.052849,110.264
1499,3150,15.6,0.1016,39.6,0.052849,109.254
1500,4000,15.6,0.1016,39.6,0.052849,106.604
1501,5000,15.6,0.1016,39.6,0.052849,106.224


In [3]:
df.isnull().sum()

Frequency               0
Angle of attack         0
Chord length            0
Free-stream velocity    0
Suction side            0
Pressure level          0
dtype: int64

In [4]:
from sklearn.model_selection import train_test_split

X = df.iloc[:,:-1]
y = df.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [10]:
#LINEAR REGRESSION
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

regressor_linear = LinearRegression()
regressor_linear.fit(X_train, y_train)
prediction_linear = regressor_linear.predict(X_test)

r2_score(y_test, prediction_linear)

0.5124474986138431

In [14]:
#DECISION TREE
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import r2_score

regressor_dt = DecisionTreeRegressor(random_state=0)
regressor_dt.fit(X_train, y_train)
prediction_dt = regressor_dt.predict(X_test)

r2_score(y_test, prediction_dt)

0.8395146655857919

In [16]:
#ELASTIC NET
from sklearn.linear_model import ElasticNet
from sklearn.metrics import r2_score

regressor_elasticnet = ElasticNet()
regressor_elasticnet.fit(X_train, y_train)
prediction_elasticnet = regressor_elasticnet.predict(X_test)

r2_score(y_test, prediction_elasticnet)

0.24858070974843682

In [18]:
#RIDGE REGRESSION
from sklearn.linear_model import Ridge

regressor_ridge = Ridge()
regressor_ridge.fit(X_train, y_train)
prediction_ridge = regressor_ridge.predict(X_test)

r2_score(y_test, prediction_ridge)

0.4769144167785677

In [19]:
#RANDOM FOREST
from sklearn.ensemble import RandomForestRegressor

regressor_randomforest = RandomForestRegressor()
regressor_randomforest.fit(X_train, y_train)
prediction_randomforest = regressor_randomforest.predict(X_test)

r2_score(y_test, prediction_randomforest)

0.9257594875700251

### Summary of R² Score

- **Linear Regression**: 0.5124474986138431
- **Decision Tree**: 0.8395146655857919
- **Elastic Net**: 0.24858070974843682
- **Ridge Regression**: 0.4769144167785677
- **Random Forest**: 0.9257594875700251

**Best Model** is **Random Forest** with **92.57%** accuracy.

In [41]:
import pickle
pickle.dump(regressor_randomforest, open('../models/model.pkl', 'wb')) #Using random forest as that is the best model

In [42]:
import pickle
pickled_model = pickle.load(open('../models/model.pkl', 'rb'))

#batch input (multiple inputs)
pickled_model.predict(X_test)

array([123.95611, 118.59041, 119.54673, 135.6969 , 134.35168, 123.73311,
       123.96261, 133.86611, 133.78117, 128.06461, 126.87136, 112.97958,
       133.41544, 132.45736, 125.33178, 107.99919, 130.25884, 129.68331,
       128.38537, 124.72661, 125.23889, 127.46829, 111.01687, 125.73107,
       124.32265, 126.2571 , 130.17039, 131.50868, 109.35832, 130.91059,
       132.21383, 122.02997, 128.53989, 118.8662 , 119.36342, 134.72917,
       133.97547, 130.04287, 120.75151, 111.74088, 125.58791, 134.64922,
       127.01834, 121.64516, 127.03021, 134.41103, 129.7535 , 118.51942,
       121.61493, 133.73072, 132.18519, 114.47942, 129.50883, 126.92735,
       126.72378, 120.92556, 123.70853, 129.87228, 122.38325, 118.98662,
       126.01032, 134.42685, 129.78697, 133.10778, 117.35703, 125.46676,
       123.34847, 119.93123, 126.0382 , 129.42572, 129.06334, 130.06925,
       128.74862, 117.68322, 129.61864, 128.22064, 130.22275, 132.50027,
       134.40527, 135.41716, 124.75338, 120.07745, 

In [43]:
#Single input 

dict_test = {
    "Frequency": 9,
    "Angle of attack": 8,
    "Chord length": 10,
    "Free-stream velocity": 1,
    "Suction side": 7
}

In [44]:
dict_test.keys()

dict_keys(['Frequency', 'Angle of attack', 'Chord length', 'Free-stream velocity', 'Suction side'])

In [45]:
dict_test.values()

dict_values([9, 8, 10, 1, 7])

In [46]:
type(dict_test.values())

dict_values

In [47]:
list(dict_test.values())

[9, 8, 10, 1, 7]

In [48]:
[list(dict_test.values())]

[[9, 8, 10, 1, 7]]

In [49]:
pickled_model.predict([list(dict_test.values())])



array([117.35598])

In [50]:
pickled_model.predict([list(dict_test.values())])[0].item()



117.35597999999999