# Handling Numerical Data

In [16]:
import numpy as np
from sklearn import preprocessing
import pandas as pd

# Feature Scaling / Normalization

### Reasons for Feature Scaling include:-

#### Improved Convergence.
- Algorithms like gradient descent converge faster when features are scaled, unscaled features cause gradients to be very different in magnitude, leading to zigzag paths towards the minimum during oprimization.

#### Equal Treatment of Features:
- Algorithms such as KNN and SVMs are sensitive allowing for larger scales to disproportionately influence the results, leading to biased models.
- Rescaling ensures that each feature contributes equally to the model.

#### Enhanced Model Interpretability:
- Model coefficients and weights can be more easily interpreted as they correspond to features of similar magnitude.

#### Stabilized Training:
- Neural Networks v=benefit from feature scaling as it helps prevent the network from becoming unstable due to large values in the input data.

#### Compatibility with Regularizatio:
- Regularization techniques like Lasso / rigde regression are affected by feature scales, regularization ensures tha regularization penalties are applied uniformly across all features.

- Common rescaling methods include min-ma scaling, z-score normalization and max-abs scaling.

### Min-Max

In [2]:
# we will create a feature.
feature = np.array([[-500.5],
                    [-100.0],
                    [0],
                    [100.1],
                    [900.9]])

# create scaler
minmax_scale = preprocessing.MinMaxScaler(feature_range=(0, 1)) 
# goal of the Min-Max scaling is to transform the features so they are within a specific range i.e (0, 1) in this case.

# Scaling he feature.
scaled_feature = minmax_scale.fit_transform(feature)
# if we used fit only the min and max calculations would have been done and stores separately.
# the transform is then used to change the original values and rescale them
# fit_transform does both at the same time.

scaled_feature

array([[0.        ],
       [0.28578564],
       [0.35714286],
       [0.42857143],
       [1.        ]])

### Z-Score / Standard Scaling

In [3]:
# Standardizing a Feature / Z-score.

x = np.array([[-1000.1],
[-200.2],
[500.5],
[600.6],
[9000.9]])

# Create scaler
scaler = preprocessing.StandardScaler()
# features will have a mean of 0 and a standard deviation of 1.

# Transform the feature
standardized = scaler.fit_transform(x)

# Show feature
standardized

array([[-0.76058269],
       [-0.54177196],
       [-0.35009716],
       [-0.32271504],
       [ 1.97516685]])

In [4]:
# Print mean and standard deviation
print("Mean:", round(standardized.mean()))
print("Standard deviation:", standardized.std())


Mean: 0
Standard deviation: 1.0


- Standard Scaler works best for PCA but neural networks are better of with min-max.

### Robust Scaler

In [5]:
# Similar to standard scaler only it uses the median and the quartile range to counter the negative effects of significant outliers.

# Create scaler
robust_scaler = preprocessing.RobustScaler()

# Transform feature
rubust = robust_scaler.fit_transform(x)


rubust

array([[-1.87387612],
       [-0.875     ],
       [ 0.        ],
       [ 0.125     ],
       [10.61488511]])

## Normalizing Observations

- We do this when we want to rescale the feature values of observations to have unit norm ( a total length of 1)

In [8]:
# Using the Euclideoan normalization.

from sklearn.preprocessing import Normalizer
# Create feature matrix
features = np.array([[0.5, 0.5],
[1.1, 3.4],
[1.5, 20.2],
[1.63, 34.4],
[10.9, 3.3]])
# Create normalizer
normalizer = Normalizer(norm="l2")
# Transform feature matrix
new_features = normalizer.transform(features)

new_features


array([[0.70710678, 0.70710678],
       [0.30782029, 0.95144452],
       [0.07405353, 0.99725427],
       [0.04733062, 0.99887928],
       [0.95709822, 0.28976368]])

In [9]:
# Using the Manhattan norm (L1):

features_l1_norm = Normalizer(norm="l1").transform(features)

features_l1_norm


array([[0.5       , 0.5       ],
       [0.24444444, 0.75555556],
       [0.06912442, 0.93087558],
       [0.04524008, 0.95475992],
       [0.76760563, 0.23239437]])

## Generating polynomial and interraction features.

- Polynomial features are often created when we want to include the notion that there exists a nonlinear r/ship between the features and the target.
- When we suspect a relationshipis non-linear we can encode that non-constant effect in a feature, x, by generating that feature's higher-order forms.

- Interrcation features occur if the effects of each feature on the target are dependent on each other, we encode that relashioship by including an interraction feature that is the product of the individual features.

In [10]:
from sklearn.preprocessing import PolynomialFeatures

# Create feature matrix
features = np.array([[2, 3],
                    [2, 3],
                    [2, 3]])

# create a PolyomialFeatures object.
polynomial_interaction = PolynomialFeatures(degree=2, include_bias=False)

# create polynomial features.
polynomial_interaction.fit_transform(features)


array([[2., 3., 4., 6., 9.],
       [2., 3., 4., 6., 9.],
       [2., 3., 4., 6., 9.]])

In [11]:
# By default PolynomialFeatures includes interaction features.
# We can restrict the features created to only interaction features by:

interaction = PolynomialFeatures(degree=2, interaction_only=True, include_bias=False)

interaction.fit_transform(features)

array([[2., 3., 6.],
       [2., 3., 6.],
       [2., 3., 6.]])

#### Considerations when using polynomial and interation

- Adding polynomial and interaction features increases the number of features , which can lead to overfitting if not handled carefully, especially in models with a large number of original features.

- Computational costs.

- Often when using polynomial and interaction features,regularization techniques like Ridge or Lasso regression are used to prevent overfitting by penalizing large coefficients.

## Transforming Features.

In [13]:
# This approach is commonly used when we want to make a custome transformation to one or more features:

# In Scikit-learn we can use FunctionTransformer to apply a function to a set of features.

from sklearn.preprocessing import FunctionTransformer

# Create feature matrix
features = np.array([[2, 3],
                    [2, 3],
                    [2, 3]])

# Define a simple function.
def add_ten(x: int) -> int:
    return x + 10

# create the transformer:
ten_transformer = FunctionTransformer(add_ten)

# Transformer feature matrix:
ten_transformer.transform(features)

array([[12, 13],
       [12, 13],
       [12, 13]])

In [17]:
# We can create the same transformetion in pandas as:

# Create DataFrame
df = pd.DataFrame(features, columns=["feature_1", "feature_2"])
# Apply function
df.apply(add_ten)


Unnamed: 0,feature_1,feature_2
0,12,13
1,12,13
2,12,13
