## Scaling Numeric Features

Another good practice is to scale numeric features to a small range of values e.g. $(0,1)$ or $(-1,1)$. Scaling numeric features ensures that no particular feature has a disproportionate impact on the model's loss. Optimization algorithms also work better in practice with smaller numbers.

The numeric columns in our dataset have varying ranges.

In [1]:
import pandas as pd
import numpy as np
raw_df = pd.read_csv("weather-dataset-rattle-package/weatherAUS.csv")
numeric_cols = raw_df.select_dtypes(include=np.number).columns.tolist()
categorical_cols = raw_df.select_dtypes('object').columns.tolist()

Let's use `MinMaxScaler` from `sklearn.preprocessing` to scale values to the $(0,1)$ range.

In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
# ?MinMaxScaler

In [None]:
scaler = MinMaxScaler()

First, we `fit` the scaler to the data i.e. compute the range of values for each numeric column.

In [None]:
scaler.fit(raw_df[numeric_cols])

We can now inspect the minimum and maximum values in each column.

In [None]:
print('Minimum:')
list(scaler.data_min_)

In [None]:
print('Maximum:')
list(scaler.data_max_)

In [None]:
raw_df[numeric_cols].describe()

In [None]:
raw_df[numeric_cols] = scaler.transform(raw_df[numeric_cols])

In [None]:
raw_df[numeric_cols].describe()