# Airfoil Self Noise

Dataset: https://archive.ics.uci.edu/ml/datasets/airfoil+self-noise#

The NASA dataset comprises different size NACA 0012 airfoils at various wind tunnel speeds and angles of attack. The span of the airfoil and the observer position were the same in all of the experiments.

Polynomial has given the best fit so far. I think it's going to provide the best fit overall. Decision trees may be worth considering

## Import libraries

In [57]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Import data

In [84]:
dataset = pd.read_csv("/home/joe/Documents/ML-Resources/airfoil_self_noise.csv")
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

## Taking care of missing values

In [85]:
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer.fit(x[:, 1:3])
x[:, 1:3] = imputer.transform(x[:, 1:3])

## Split datasets

In [86]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size = 0.2, random_state = 1)

## Training Polynomial model

In [109]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly_reg = PolynomialFeatures(degree = 3)
X_poly = poly_reg.fit_transform(X_train)
regressor = LinearRegression()
regressor.fit(X_poly, Y_train)

In [110]:
y_pred = regressor.predict(poly_reg.transform(X_test))
np.set_printoptions(precision=2)
print(np.concatenate((y_pred.reshape(len(y_pred),1), Y_test.reshape(len(Y_test),1)),1))

[[119.73 117.74]
 [117.15 118.12]
 [126.79 120.66]
 [129.63 122.23]
 [131.86 129.34]
 [128.8  126.59]
 [131.02 133.44]
 [128.67 131.58]
 [124.5  111.91]
 [127.72 129.97]
 [119.03 118.62]
 [126.57 126.34]
 [123.26 123.92]
 [126.99 129.  ]
 [101.89 108.69]
 [124.24 125.4 ]
 [119.19 117.78]
 [120.02 123.25]
 [130.58 132.3 ]
 [128.53 125.72]
 [128.37 135.54]
 [118.48 119.56]
 [113.9  110.45]
 [130.05 123.74]
 [126.52 127.63]
 [124.18 124.76]
 [129.26 131.72]
 [121.7  123.69]
 [123.84 129.98]
 [128.89 128.52]
 [123.83 126.54]
 [124.74 125.8 ]
 [126.38 128.25]
 [126.08 130.96]
 [128.43 126.67]
 [130.59 131.24]
 [122.43 126.54]
 [127.12 125.5 ]
 [126.68 129.09]
 [133.63 133.38]
 [120.57 124.53]
 [129.09 128.71]
 [123.06 128.81]
 [125.33 123.76]
 [124.93 130.  ]
 [123.35 121.66]
 [124.25 124.45]
 [123.99 128.2 ]
 [118.57 120.04]
 [121.04 124.3 ]
 [121.2  121.77]
 [131.67 133.04]
 [125.3  131.45]
 [131.21 119.51]
 [132.78 135.87]
 [122.96 114.04]
 [127.19 129.38]
 [124.17 121.55]
 [109.89 111.5

## Final equation

In [111]:
print(regressor.coef_)
print(regressor.intercept_)

[ 1.22e-01  4.64e-03  4.94e-01  1.79e+01 -1.36e-01  6.73e+01 -2.39e-07
  1.10e-04 -3.31e-02 -8.32e-05 -2.61e-01 -9.94e-02 -8.26e+00 -2.92e-02
 -1.98e+01 -2.63e+02  1.48e+00 -5.03e+00  8.44e-03  2.55e+01  2.46e+01
  8.70e-12  2.70e-09  7.97e-07 -2.00e-09  9.45e-06  7.93e-06 -3.02e-04
 -4.42e-06 -2.00e-03  5.91e-02 -2.51e-05  2.20e-01  1.19e-06  9.18e-04
  2.54e+00  6.24e-03  1.02e+00 -1.30e-03 -2.22e-02  1.87e+01  5.07e-02
 -2.66e+02  5.66e-04 -7.82e-01 -1.53e+02  3.61e+02 -1.28e-01 -2.72e+01
 -1.34e-02 -6.69e+01 -2.04e+00 -9.82e-05 -1.00e-01  1.42e+02 -9.44e-01]
125.3284960008236


The values below were found with Multilinear Regression, and its r-score was very close to the the Polynomial Model's r-score:

Scaled sound pressure level = frequency (hertz) - 0.00128 x angle of attack (degrees) - 0.426 x
Chord length (meters) - 36.2 x Free-stream velocity + 0.1 x Suction side displacement thickness (meters) - 151 + 133.06056810486479 

## Evalute the model

In [112]:
from sklearn.metrics import r2_score
r2_score(Y_test, y_pred)

0.6315756420233873