# Feature Scaling

* Özellik ölçekleme, sayısal özelliklerin benzer bir ölçeğe sahip olmasını sağlamayı amaçlayan makine öğreniminde temel bir ön işleme adımıdır. 
- Birçok makine öğrenimi algoritması, girdi sayısal özellikleri benzer bir ölçekte olduğunda daha iyi performans gösterir veya öğrenme süreci daha hızlı gerçekleştirir.
* Temelde iki tür özellik ölçekleme yöntemi kullanılır:
    * **Normalizasyon:** Bu yöntem, her bir özelliği tüm değerlerin 0 ile 1 aralığında olmasını sağlayacak şekilde ölçekler. Bunu, özelliğin minimum değerini çıkarıp aralığa (maksimum ve minimum değerler arasındaki fark) bölerek başarır.
    * **Standardizasyon:** Burada, her özellik 0 ortalamasına ve 1 standart sapmasına sahip olacak şekilde dönüştürülür. Bu, ortalama değerin çıkarılması ve özelliğin standart sapmasına bölünmesiyle elde edilir. Burada da değerler yaklaşık olarak -1 ile 1 aralığına indirgenir.

In [1]:
# Gerekli kütüphaneleri ekleyelim
import pandas as pd
from sklearn.preprocessing import normalize, scale

In [2]:
df = pd.read_csv("pima-indians-diabetes.csv")

In [3]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [4]:
x=df.drop("Outcome", axis=1)
y=df[["Outcome"]]

### Normalizasyon

In [5]:
normalizeddata=normalize(x)

In [6]:
pd.DataFrame(normalizeddata)

Unnamed: 0,0,1,2,3,4,5,6,7
0,0.033552,0.827625,0.402628,0.195722,0.000000,0.187893,0.003506,0.279603
1,0.008424,0.716040,0.555984,0.244296,0.000000,0.224079,0.002957,0.261144
2,0.040398,0.924097,0.323181,0.000000,0.000000,0.117658,0.003393,0.161591
3,0.006612,0.588467,0.436392,0.152076,0.621527,0.185797,0.001104,0.138852
4,0.000000,0.596386,0.174127,0.152361,0.731335,0.187622,0.009960,0.143655
...,...,...,...,...,...,...,...,...
763,0.042321,0.427443,0.321640,0.203141,0.761779,0.139236,0.000724,0.266623
764,0.013304,0.811526,0.465629,0.179600,0.000000,0.244788,0.002262,0.179600
765,0.026915,0.651352,0.387582,0.123811,0.602905,0.141037,0.001319,0.161492
766,0.006653,0.838285,0.399184,0.000000,0.000000,0.200257,0.002322,0.312694


### Standardizasyon

In [7]:
scaleddata=scale(x)

In [8]:
pd.DataFrame(scaleddata)

Unnamed: 0,0,1,2,3,4,5,6,7
0,0.639947,0.848324,0.149641,0.907270,-0.692891,0.204013,0.468492,1.425995
1,-0.844885,-1.123396,-0.160546,0.530902,-0.692891,-0.684422,-0.365061,-0.190672
2,1.233880,1.943724,-0.263941,-1.288212,-0.692891,-1.103255,0.604397,-0.105584
3,-0.844885,-0.998208,-0.160546,0.154533,0.123302,-0.494043,-0.920763,-1.041549
4,-1.141852,0.504055,-1.504687,0.907270,0.765836,1.409746,5.484909,-0.020496
...,...,...,...,...,...,...,...,...
763,1.827813,-0.622642,0.356432,1.722735,0.870031,0.115169,-0.908682,2.532136
764,-0.547919,0.034598,0.046245,0.405445,-0.692891,0.610154,-0.398282,-0.531023
765,0.342981,0.003301,0.149641,0.154533,0.279594,-0.735190,-0.685193,-0.275760
766,-0.844885,0.159787,-0.470732,-1.288212,-0.692891,-0.240205,-0.371101,1.170732


### Alternatif Yöntem

In [9]:
from sklearn.preprocessing import MinMaxScaler

# Aralığın 0-1 arasında olması için feature_range özelliği kullanılır.
scaler = MinMaxScaler(feature_range=(0, 1))

In [10]:
scaled_data = scaler.fit_transform(x)

In [11]:
pd.DataFrame(scaled_data)

Unnamed: 0,0,1,2,3,4,5,6,7
0,0.352941,0.743719,0.590164,0.353535,0.000000,0.500745,0.234415,0.483333
1,0.058824,0.427136,0.540984,0.292929,0.000000,0.396423,0.116567,0.166667
2,0.470588,0.919598,0.524590,0.000000,0.000000,0.347243,0.253629,0.183333
3,0.058824,0.447236,0.540984,0.232323,0.111111,0.418778,0.038002,0.000000
4,0.000000,0.688442,0.327869,0.353535,0.198582,0.642325,0.943638,0.200000
...,...,...,...,...,...,...,...,...
763,0.588235,0.507538,0.622951,0.484848,0.212766,0.490313,0.039710,0.700000
764,0.117647,0.613065,0.573770,0.272727,0.000000,0.548435,0.111870,0.100000
765,0.294118,0.608040,0.590164,0.232323,0.132388,0.390462,0.071307,0.150000
766,0.058824,0.633166,0.491803,0.000000,0.000000,0.448584,0.115713,0.433333


In [12]:
# Aralığın -0 - +1 arasında olması için yine feature_range özelliğini kullanıyoruz.
scaler = MinMaxScaler(feature_range=(-1, 1))

In [13]:
scaled_data = scaler.fit_transform(x)

In [14]:
pd.DataFrame(scaled_data)

Unnamed: 0,0,1,2,3,4,5,6,7
0,-0.294118,0.487437,0.180328,-0.292929,-1.000000,0.001490,-0.531170,-0.033333
1,-0.882353,-0.145729,0.081967,-0.414141,-1.000000,-0.207154,-0.766866,-0.666667
2,-0.058824,0.839196,0.049180,-1.000000,-1.000000,-0.305514,-0.492741,-0.633333
3,-0.882353,-0.105528,0.081967,-0.535354,-0.777778,-0.162444,-0.923997,-1.000000
4,-1.000000,0.376884,-0.344262,-0.292929,-0.602837,0.284650,0.887276,-0.600000
...,...,...,...,...,...,...,...,...
763,0.176471,0.015075,0.245902,-0.030303,-0.574468,-0.019374,-0.920581,0.400000
764,-0.764706,0.226131,0.147541,-0.454545,-1.000000,0.096870,-0.776260,-0.800000
765,-0.411765,0.216080,0.180328,-0.535354,-0.735225,-0.219076,-0.857387,-0.700000
766,-0.882353,0.266332,-0.016393,-1.000000,-1.000000,-0.102832,-0.768574,-0.133333


# Son Söz
Özellik ölçeklendirme yöntemlerinin gösterildiği bu çalışmada aşağıdaki yöntemler kullanılmıştır:
* Normalizasyon (normalize)
* Standardizasyon (scale)
* Min-Maks Ölçeklendirme (MinMaxScaler)