### LSE Data Analytics Online Career Accelerator 

# DA201:  Data Analytics with Python

## Min-max feature scaling (tutorial video)

This file contains the code snippets that are introduced in the Min-max feature scaling video. 
Follow along with the demonstration to:
- normalise data using the min-max feature scaling technique.

Play and pause the video to follow along with the demonstration.

### 1. Prepare your workstation

In [1]:
# Import the Pandas package.
import pandas as pd

# Read the CSV file from the current working directory.
fitness = pd.read_csv('daily_activity.csv')

# View the DataFrame shape.
fitness.shape

(940, 15)

### 2. View the values before normalisation

In [2]:
# Values of sedentary minutes before scaling.
fitness['SedentaryMinutes']

0       728
1       776
2      1218
3       726
4       773
       ... 
935    1174
936    1131
937    1187
938    1127
939     770
Name: SedentaryMinutes, Length: 940, dtype: int64

### 3. Normalise the data

In [18]:
# Apply the min-max scaling in Pandas using the .min() and .max() methods.
def min_max_scaling(df):
    # Copy the DataFrame.
    df_scaled = df.copy()
    # Apply min-max scaling.
    if isinstance(df_scaled, pd.DataFrame):
        for column in df_scaled.columns:
            df_scaled = (df_scaled - df_scaled.min()) / (df_scaled.max() - df_scaled.min())
            
    else: 
        df_scaled = (df_scaled - df_scaled.min()) / (df_scaled.max(
        ) - df_scaled.min())
  
    return df_scaled

In [19]:
# Call the min_max_scaling function.
fitness['SedentaryMinutes_new'] = min_max_scaling(fitness['SedentaryMinutes'])

In [20]:
# Call the new values of the sedentary minutes column.
fitness['SedentaryMinutes_new']

0      0.505556
1      0.538889
2      0.845833
3      0.504167
4      0.536806
         ...   
935    0.815278
936    0.785417
937    0.824306
938    0.782639
939    0.534722
Name: SedentaryMinutes_new, Length: 940, dtype: float64

In [21]:
# Start with before.
print(fitness['LightlyActiveMinutes'])

# Apply the min-max scaling in Pandas using the .min() and .max() methods.
def min_max_scaling(df):
    # Copy the DataFrame.
    df_scaled = df.copy()
    # Apply min-max scaling.
    if isinstance(df_scaled, pd.DataFrame):
        for column in df_scaled.columns:
            df_scaled[column] = (df_scaled[column] - df_scaled[column].min()
                                ) / (df_scaled[column].max(
            ) - df_scaled[column].min())
    else:
        df_scaled = (df_scaled - df_scaled.min()) / (df_scaled.max(
        ) - df_scaled.min())

    return df_scaled

# Call the min_max_scaling function.
fitness['LightlyActiveMinutes_new'] = min_max_scaling(fitness['LightlyActiveMinutes'])
fitness['LightlyActiveMinutes_new']

0      328
1      217
2      181
3      209
4      221
      ... 
935    245
936    217
937    224
938    213
939    137
Name: LightlyActiveMinutes, Length: 940, dtype: int64


0      0.633205
1      0.418919
2      0.349421
3      0.403475
4      0.426641
         ...   
935    0.472973
936    0.418919
937    0.432432
938    0.411197
939    0.264479
Name: LightlyActiveMinutes_new, Length: 940, dtype: float64

In [22]:

# Start with before.
print(fitness['TrackerDistance'])

# Implementation of Maximum absolute scaling
# Apply the maximum absolute scaling in Pandas using the .abs() and .max() methods.
def max_abs_scaling(df):
        # Copy the DataFrame.
    df_scaled = df.copy()
    if isinstance(df_scaled, pd.DataFrame):
        # Apply maximum absolute scaling.
        for column in df_scaled.columns:
            df_scaled[column] = df_scaled[column]  / df_scaled[column].abs(
            ).max()
    else:
        df_scaled = df_scaled / df_scaled.abs().max()
        
    return df_scaled

# Call the maximum_absolute_scaling function.
fitness['TrackerDistance_new'] = max_abs_scaling(fitness['TrackerDistance'])
fitness['TrackerDistance_new']

0       8.500000
1       6.970000
2       6.740000
3       6.280000
4       8.160000
         ...    
935     8.110000
936    18.250000
937     8.150000
938    19.559999
939     6.120000
Name: TrackerDistance, Length: 940, dtype: float64


0      0.303247
1      0.248662
2      0.240457
3      0.224046
4      0.291117
         ...   
935    0.289333
936    0.651088
937    0.290760
938    0.697824
939    0.218337
Name: TrackerDistance_new, Length: 940, dtype: float64