# __Machine Learning - Scale.__
Date : 22, Feb, 2024.


__Scale Features.__
- When your data has different values, and even different measurement units, it can be difficult to compare them. 

- What is kilograms compared to meters? Or altitude compared to time?
  
    * The answer to this problem is scaling. We can scale data into new values that are easier to compare.
       
    * We take data-set called "scaledata.csv".
    * where volume column contains values in liters instead of cm3 (1.0 instead of 1000).

__Problem Statement and Procedure.__
  
- It can be difficult to compare the volume 1.0 with the weight 790, but if we scale them both into comparable values, we can easily see how much one value is compared to the other.

- There are different methods for scaling data, in this tutorial we will use a method called __standardization__.

- The standardization method uses this formula:

    * __z = (x - u) / s__

    - Where __z__ is the new value, __x__ is the original value, __u__ is the mean and __s__ is the standard deviation.

- If you take the weight column from the data set above, the first value is 790, and the scaled value will be:

    * (790 - 1292.23) / 238.74 = -2.1

- If you take the volume column from the data set above, the first value is 1.0, and the scaled value will be:

    * (1.0 - 1.61) / 0.38 = -1.59

- Now you can compare -2.1 with -1.59 instead of comparing 790 with 1.0.


__StandardScaler().__

- The Python sklearn module has a method called StandardScaler() which returns a Scaler object with methods for transforming data sets.

- Example : Scale all values in the Weight and Volume columns:

In [5]:
import pandas 
from sklearn import linear_model
from sklearn.preprocessing import StandardScaler
scale=StandardScaler()
df=pandas.read_csv('Files/scaledata.csv')
X=df[['Weight', 'Volume']]
scaledX=scale.fit_transform(X)
print(scaledX)


[[-2.10389253 -1.59336644]
 [-0.55407235 -1.07190106]
 [-1.52166278 -1.59336644]
 [-1.78973979 -1.85409913]
 [-0.63784641 -0.28970299]
 [-1.52166278 -1.59336644]
 [-0.76769621 -0.55043568]
 [ 0.3046118  -0.28970299]
 [-0.7551301  -0.28970299]
 [-0.59595938 -0.0289703 ]
 [-1.30803892 -1.33263375]
 [-1.26615189 -0.81116837]
 [-0.7551301  -1.59336644]
 [-0.16871166 -0.0289703 ]
 [ 0.14125238 -0.0289703 ]
 [ 0.15800719 -0.0289703 ]
 [ 0.3046118  -0.0289703 ]
 [-0.05142797  1.53542584]
 [-0.72580918 -0.0289703 ]
 [ 0.14962979  1.01396046]
 [ 1.2219378  -0.0289703 ]
 [ 0.5685001   1.01396046]
 [ 0.3046118   1.27469315]
 [ 0.51404696 -0.0289703 ]
 [ 0.51404696  1.01396046]
 [ 0.72348212 -0.28970299]
 [ 0.8281997   1.01396046]
 [ 1.81254495  1.01396046]
 [ 0.96642691 -0.0289703 ]
 [ 1.72877089  1.01396046]
 [ 1.30990057  1.27469315]
 [ 1.90050772  1.01396046]
 [-0.23991961 -0.0289703 ]
 [ 0.40932938 -0.0289703 ]
 [ 0.47215993 -0.0289703 ]
 [ 0.4302729   2.31762392]]


__Predict CO2 Values.__

- When the data set is not scaled, we will have to use the scale when we predict values:

- Example : Predict the CO2 emission from a 1.3 liter car that weighs 2300 kilograms:

In [9]:
import pandas 
from sklearn import linear_model 
from sklearn.preprocessing import StandardScaler 
scale=StandardScaler()
df=pandas.read_csv("Files/scaledata.csv")
X=df[['Weight', 'Volume']]
y=df['CO2']
scaledX=scale.fit_transform(X)
regr=linear_model.LinearRegression()
regr.fit(scaledX, y)
scaled=scale.transform([[2300, 1.3]])
predictedCO2=regr.predict(scaled)
print(predictedCO2)

[107.2087328]




In [12]:
import pandas 
from sklearn import linear_model 
from sklearn.preprocessing import StandardScaler 
scale=StandardScaler()
df=pandas.read_csv("Files/scaledata.csv")
X=df[['Weight', 'Volume']]
y=df['CO2']
scaledX=scale.transform(X)
regr=linear_model.LinearRegression()
regr.fit(scaledX, y)
scaled=scale.transform([[2300, 1.3]])
predictedCO2=regr.predict(scaled)
print(predictedCO2)

NotFittedError: This StandardScaler instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.