# Normalising (and standardising) a distribution

<a id='ToC'></a>

----------

# Table of contents
- [Introduction](#introduction)
- [Credits](#credits)
- [Importing libraries/packages and generating data](#generating-data)
- [Raw data](#raw-data)
- Removing ... from all numbers (inspecting a step)
  - [minimum value](#remove-miniumum-value)
  - [absolute minimum value](#remove-abs-miniumum-value)
  - [maximum value](#remove-maximum-value)
  - [min-max value](#remove-min-max-value)
  - [mean value](#remove-mean-value)
  - [median value](#remove-median-value)
  - [mad (aad) value](#remove-mad-value)
  - [trimean value](#remove-trimean-value)
  - [geomean value](#remove-geomean-value)
  - [harmonic mean value](#remove-harmonic-mean-value)
  - [Comparing: raw, minimum, absolute minimum, maximum, min-max, mean, median, trimean, geomean, harmonic mean values removed from raw](#comparing-removal-values)
- Normalise using ... method
  - [scikit-learn's normalize](#normalise-scikit-mean-method)
  - [min-max](#normalise-min-max-method)
  - [mean](#normalise-mean-method)  
  - [median](#normalise-median-method)
  - [mad (aad)](#normalise-mad-method)
  - [trimean value](#normalise-trimean-method)
  - [geomean value](#normalise-geomean-method)
  - [harmonic mean value](#normalise-harmonic-mean-method)
  - [exp](#normalise-exp-method)
  - [log base 2](#normalise-log2-method)
  - [natural log (base e)](#normalise-natural-log-method)
  - [log base 10](#normalise-log10-method)
  - [Comparing all Normalise actions together (mean, min-max, median, M.A.D (A.A.D), trimean, geomean, harmonic mean, scikitlearn normalize, exp, log base 2, log base e, log base 10)](#comparing-normalisation-methods)
  - [Comparing some Normalise actions together (scikitlearn normalize, mean, min-max, median, M.A.D (A.A.D), trimean, geomean, harmonic mean)](#comparing-normalisation-group1-methods) 
  - [Comparing some Normalise actions together (scikitlearn normalize, mean, M.A.D (A.A.D), trimean, geomean, harmonic mean)](#comparing-normalisation-group2-methods)
  - [Comparing some Normalise actions together (mean, min-max)](#comparing-normalisation-group3-methods) 
  - [Comparing some Normalise actions together (exp, log base 2, natural log, log base 10)](#comparing-normalisation-group4-methods)

<a id='introduction'></a>

----------

## Introduction

Many of us are familiar with the term "normalisation" or "normalising", during the process I have been learning Data Science and ML topics, I have done this to my data (to the distribution) a number of times. But rarely looked under the hood. And sometimes just used the results and did not try to find out if there are different types, what do they do, what do they mean and what happens to the data (or the distribution) under the hood when we normalise it. There are also overlapping or overloaded terms that go along with normalisation (i.e. Standardisation) and this can add to the complexity in terms of understanding what is happening to our data when we "normalise" or "standardise" it (here's [a good post to find out the differences and similarities](https://towardsdatascience.com/normalization-vs-standardization-cb8fe15082eb) between two to processes)

We can find all or many of these methods or processes scattered all over the internet (Medium, Kaggle, GitHub, etc...) in the form of papers, blogs or libraries or code snippets, etc... Each trying to implement or explain how to go about with it and what happens to your data. Many try to explain, when or you should use them or not use them, but they may not give many different scenarios or talk about other details that can help us understand the fundamentals (or understand numbers or maths better). 

In this notebook, we try to cover a bit of both (and may be other things as well), and show various examples and also various types an ways to play with the distribution. I'm saying we cover, its my way of documentating what I know, what I'm learning and what I'm experimenting with. The rest is up for discussions, as there is a comments section for it. I'm not an expert with answers to all kinds of questions, but each of these will help us learn and also maybe help me improve the explanations and examples better. I'm always up for learning and seeking those constructive feedback that will help me and everyone else learn as well.

The idea is to have a few notebooks that are focussed around a few combinations of methods/techniques to the process, this would make the notebooks easier to navigate, understand and learn from.

Not all of the work is from me, and so I have given credits to the sources or inspiration and where I have got ideas and insipirations from.

<a id='credits'></a>

----------

## Credits

<i><p style="font-size:20px; background-color: #FFF1D7; border: 2px solid black; margin: 20px; padding: 20px;">Inspirations from https://chrisalbon.com/python/data_wrangling/pandas_normalize_column/ and https://stackoverflow.com/questions/26414913/normalize-columns-of-pandas-data-frame.

<i><p style="font-size:20px; background-color: #FFF1D7; border: 2px solid black; margin: 20px; padding: 20px;">Imported from https://github.com/neomatrix369/awesome-ai-ml-dl/blob/master/notebooks/data/data-processing/Normalising-a-distribution.ipynb (main repo: https://github.com/neomatrix369/awesome-ai-ml-dl/).

<i><p style="font-size:20px; background-color: #FFF1D7; border: 2px solid black; margin: 20px; padding: 20px;">Note: a number of the ideas here are experimental

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='generating-data'></a>

----------

## Importing libraries/packages and generating data

In [None]:
import pandas as pd
import numpy as np

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
plt.rcParams["figure.figsize"] = (20,3)

sns.set_palette(sns.color_palette("muted"))
sns.set(style="whitegrid", font_scale=1.75)


In [None]:
USE_RANDOM_POINTS = False
if USE_RANDOM_POINTS:
    num_of_points=20
    scale = 0.1
    centre_of_distribution = 0.0
    data = {'score': np.random.normal(centre_of_distribution, scale, num_of_points) }
else:
    data = {'score': np.array([197.804709823081, 289.477303264081, 115.638244634081, 182.672436726281, 
                               163.140845540081, 152.161639519881, 52.9497664225806, 124.298876206981, 
                               345.322611126581, 351.037461758581, 16.7329927407806, 224.045215506981, 
                               128.358680497281, 352.815156425081, 57.9937040246806, 237.993704024681, 
                               136.756649537169, 9.38141788325076, 31.2864212190436, 128.305256283999, 
                               230.059113991138, 143.127694669196, 133.739843104433])
           }
    num_of_points=len(data)

In [None]:
df = pd.DataFrame(data)
if USE_RANDOM_POINTS:
    df['score_20'] = df['score'] * 20
    df['score_100'] = df['score'] * 100
df

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='raw-data'></a>

----------

# Raw data

In [None]:
df['score'].plot(kind='bar')

In [None]:
df['score'].plot()

In [None]:
percentiles=[n*0.01 for n in range(10, 90, 10)] + [0.05, 0.25, 0.75, 0.95]
df['score'].describe(percentiles=percentiles)

<i><p style="font-size:20px; background-color: #FFF1D7; border: 2px solid black; margin: 20px; padding: 20px;">As we can see the raw data is between values `-1` and `1`. We will see if this has an impact after applying the respective transformations to it. Meaning if we have values between `0` and `100` of `-10` and `20`, will the transformations applied to them behave different than with `-1` and `1`.

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='remove-miniumum-value'></a>

----------

# Removing minimum value from all numbers (inspecting a step)

In [None]:
df['score_min_removed'] = df['score'] - df['score'].min()
df['score_min_removed']

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_min_removed']].plot()

<i><p style="font-size:20px; background-color: #FFF1D7; border: 2px solid black; margin: 20px; padding: 20px;">**Note:** don't be surprised that the plot of `score_min_removed` looks higher on the number-line axis than `score`. `min` value is a negative number, removing a negative value from a positive value only increases the positive value i.e. `x - (-y) = x + y`

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_min_removed']].plot(kind='bar')

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='remove-abs-miniumum-value'></a>

----------

# Removing absolute minimum value from all numbers (inspecting a step)

In [None]:
df['score_abs_min_removed'] = df['score'] - abs(df['score'].min())
df['score_abs_min_removed']

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_abs_min_removed']].plot()

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_abs_min_removed']].plot(kind='bar')

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='remove-maximum-value'></a>

----------

# Removing maximum value from all numbers (inspecting a step)

In [None]:
df['score_max_removed'] = df['score'] - df['score'].max()
df['score_max_removed']

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_max_removed']].plot()

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_max_removed']].plot(kind='bar')

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='remove-min-max-value'></a>

----------

# Removing min-max from all numbers (inspecting a step)


In [None]:
df['score_min_max_removed'] = df['score'] - (df['score'].max() - df['score'].min())
df['score_min_max_removed']

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 5.0]
df[['score','score_min_max_removed']].plot()

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_min_max_removed']].plot(kind='bar')

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='remove-mean-value'></a>

----------

# Removing mean from all numbers (inspecting a step)

In [None]:
df['score_mean_removed'] = df['score'] - df['score'].mean()
df['score_mean_removed']

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 5.0]
df[['score','score_mean_removed']].plot()

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 5.0]
df[['score','score_mean_removed']].plot(kind='bar')

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='remove-median-value'></a>

----------

# Removing median from all numbers (inspecting a step)

In [None]:
df['score_median_removed'] = df['score'] - df['score'].median()
df['score_median_removed']

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 5.0]
df[['score','score_median_removed']].plot()

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 5.0]
df[['score','score_median_removed']].plot(kind='bar')

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='remove-mad-value'></a>

----------

# Removing M.A.D value from all numbers (inspecting a step)

M.A.D (A.A.D) = Mean Absolute Deviation (Average Absolute Deviation)

        
References:
- https://en.wikipedia.org/wiki/Average_absolute_deviation
- https://www.statology.org/how-to-easily-calculate-the-mean-absolute-deviation-in-excel/
- https://www.khanacademy.org/math/cc-sixth-grade-math/cc-6th-data-statistics/cc-6-mad/v/mean-absolute-deviation

In [None]:
def mean_abs_dev(data):
    deviance = sum(abs(data - data.mean()))
    return deviance / len(data)

In [None]:
df['score_mad_removed'] = df['score'] - mean_abs_dev(df['score'])
df['score_mad_removed']

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_mad_removed']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='remove-trimean-value'></a>

----------

# Removing trimean from all numbers (inspecting a step)

In [None]:
def trimean(values):
    return (np.quantile(values, 0.25) + (2 * np.quantile(values, 0.50)) + np.quantile(values, 0.75))/4

In [None]:
df['score_trimean_removed'] = df['score'] - trimean(df['score'])
df['score_trimean_removed']

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_trimean_removed']].plot()

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_trimean_removed']].plot(kind='bar')

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='remove-geomean-value'></a>

----------

# Removing geomean from all numbers (inspecting a step)

In [None]:
def geomean(data):
    index = 0
    N = len(data)
    result = 1
    for index in range(0, N):
        result = result * data[index]
    if N != 0:
        return abs(result) ** (1 / N)
    else:
        raise ValueError('No distribution has been passed in, cannot compute Geometric Mean')

In [None]:
df['score_geomean_removed'] = df['score'] - geomean(df['score'])
df['score_geomean_removed']

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_geomean_removed']].plot(kind='bar')

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='remove-harmonic-mean-value'></a>

----------

# Removing harmonic mean from all numbers (inspecting a step)

In [None]:
def harmonic_mean(data):
    index = 0
    N = len(data)
    result = 0
    for index in range(0, N):
        if data[index] != 0:
            result = result + (1 / data[index])

    if result == 0:
        raise ValueError('Distribution contains one or more zeros, cannot compute Harmonic Mean')
    else:
        return 1 / result

In [None]:
df['score_harmonic_mean_removed'] = df['score'] - harmonic_mean(df['score'])
df['score_harmonic_mean_removed']

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_harmonic_mean_removed']].plot(kind='bar')

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='comparing-removal-values'></a>

----------


# Comparing: raw, minimum, absolute minimum, maximum, min-max, mean, median, trimean, geomean, harmonic mean values removed from raw

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 12.0]
df[['score','score_min_removed', 'score_abs_min_removed','score_max_removed', 'score_min_max_removed', 'score_mean_removed', 
    'score_median_removed', 'score_trimean_removed', 'score_geomean_removed', 'score_harmonic_mean_removed']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 16.0]
df[['score','score_min_removed', 'score_abs_min_removed','score_max_removed', 'score_min_max_removed', 
    'score_mean_removed', 'score_median_removed', 'score_trimean_removed', 'score_geomean_removed', 
    'score_harmonic_mean_removed']].plot(kind='bar')

<i><p style="font-size:20px; background-color: #FFF1D7; border: 2px solid black; margin: 20px; padding: 20px;">Interesting to see that the plots of `score_mean_removed`, `score_median_removed`, `score_trimean_removed`, `'score_geomean_removed`, and `score_harmonic_mean_removed` are quite close to each other

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_mean_removed', 'score_median_removed', 'score_trimean_removed', 
    'score_geomean_removed', 'score_harmonic_mean_removed']].plot()

In [None]:
plt.rcParams['figure.figsize'] = [20.0, 8.0]
df[['score','score_mean_removed', 'score_median_removed', 'score_trimean_removed', 
    'score_geomean_removed', 'score_harmonic_mean_removed']].plot(kind='bar')

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='normalise-scikit-mean-method'></a>

----------

# Normalise mean (using scikit-learn's normalize function)

In [None]:
from sklearn.preprocessing import normalize

In [None]:
values = normalize(np.array(df['score']).reshape(1,-1))
df['score_sklearn_normalize'] = values[0]
df['score_sklearn_normalize']

In [None]:
df[['score','score_sklearn_normalize']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='normalise-min-max-method'></a>

----------

# Normalise by min-max method

In [None]:
def normalise_min_max(data):
    return (data - data.max()) / (data.max() - data.min())

In [None]:
df['score_normalise_min_max'] = normalise_min_max(df['score'])
df['score_normalise_min_max']

In [None]:
df[['score', 'score_normalise_min_max']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='normalise-mean-method'></a>

----------

# Normalise by mean method

This method is also known by the term **standardisation**.

Resources:
- [What are the most common data normalization methods used in machine learning?](https://www.quora.com/What-are-the-most-common-data-normalization-methods-used-in-machine-learning?share=1)
- [Normalisation verus Standardisation](https://towardsdatascience.com/normalization-vs-standardization-cb8fe15082eb)

In [None]:
def normalise_mean(data):
    return (data - data.mean()) / data.std()

In [None]:
df['score_normalise_mean'] = normalise_mean(df['score'])
df['score_normalise_mean']

In [None]:
df[['score', 'score_normalise_mean']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='normalise-median-method'></a>

----------

# Normalise by median method (experimental)


In [None]:
def std_dev_median(data):
    variance = (data - data.median()) ** 2
    return sum(variance) ** (1 / 2)

def normalise_median(data):
    return (data - data.median()) / std_dev_median(data)

In [None]:
df['score_normalise_median'] = normalise_median(df['score'])
df['score_normalise_median']

In [None]:
df[['score','score_normalise_median']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='normalise-mad-method'></a>

----------

# Normalise by M.A.D (A.A.D) method (experimental)

M.A.D (A.A.D) = Mean Absolute Deviation (Average Absolute Deviation)

References:
- https://en.wikipedia.org/wiki/Average_absolute_deviation
- https://www.statology.org/how-to-easily-calculate-the-mean-absolute-deviation-in-excel/
- https://www.khanacademy.org/math/cc-sixth-grade-math/cc-6th-data-statistics/cc-6-mad/v/mean-absolute-deviation

In [None]:
def std_dev_mad(data):
    variance = (data - mean_abs_dev(data)) ** 2
    return sum(variance) ** (1 / 2)

def normalise_mad(data):
    return (data - mean_abs_dev(data)) / std_dev_mad(data)

In [None]:
df['score_normalise_mad'] = normalise_mad(df['score'])
df['score_normalise_mad']

In [None]:
df[['score','score_normalise_mad']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='normalise-trimean-method'></a>

----------

# Normalise by trimean method (experimental)

In [None]:
def std_dev_trimean(data):
    variance = (data - trimean(data)) ** 2
    return sum(variance) ** (1 / 2)

def normalise_trimean(data):
    return (data - trimean(data)) / std_dev_trimean(data)

In [None]:
df['score_normalise_trimean'] = normalise_trimean(df['score'])
df['score_normalise_trimean']

In [None]:
df[['score','score_normalise_trimean']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='normalise-geomean-method'></a>

----------

# Normalise by geomean method (experimental)

In [None]:
def std_dev_geomean(data):
    variance = (data - geomean(data)) ** 2
    return sum(variance) ** (1 / 2)

def normalise_geomean(data):
    return (data - geomean(data)) / std_dev_geomean(data)

In [None]:
df['score_normalise_geomean'] = normalise_geomean(df['score'])
df['score_normalise_geomean']

In [None]:
df[['score','score_normalise_geomean']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='normalise-harmonic-mean-method'></a>

----------

# Normalise by harmonic mean method (experimental)

In [None]:
def std_dev_harmonic_mean(data):
    variance = (data - harmonic_mean(data)) ** 2
    return sum(variance) ** (1 / 2)

def normalise_harmonic_mean(data):
    return (data - harmonic_mean(data)) / std_dev_harmonic_mean(data)

In [None]:
df['score_normalise_harmonic_mean'] = normalise_median(df['score'])
df['score_normalise_harmonic_mean']

In [None]:
df[['score','score_normalise_harmonic_mean']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='normalise-exp-method'></a>

----------

# Normalise using exp (experimental)

In [None]:
df['score_exp'] = df['score'].apply(np.exp)
df['score_exp']

In [None]:
df[['score', 'score_exp']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

## <a id='normalise-log2-method'></a>

----------

# Normalise using log (base 2) (experimental)

In [None]:
df['score_log_base_2'] = df['score'].apply(np.log2)
df['score_log_base_2']

In [None]:
df[['score', 'score_log_base_2']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='normalise-natural-log-method'></a>

----------

# Normalise using natural log (base e) (experimental)

In [None]:
df['score_log_base_e'] = df['score'].apply(np.log)
df['score_log_base_e']

In [None]:
df[['score', 'score_log_base_e']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='normalise-log10-method'></a>

----------

# Normalise using log (base 10) (experimental)

In [None]:
df['score_log_base_10'] = df['score'].apply(np.log10)
df['score_log_base_10']

In [None]:
df[['score', 'score_log_base_10']].plot()

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='comparing-normalisation-methods'></a>

----------


# Comparing all Normalise actions together (mean, min-max, median, M.A.D (A.A.D), trimean, geomean, harmonic mean, scikitlearn normalize, exp, log base 2, log base e, log base 10)

In [None]:
df.columns

In [None]:
columns_to_show = ['score', 'score_sklearn_normalize', 'score_normalise_mean', 'score_normalise_min_max', 'score_normalise_median',
                   'score_normalise_mad', 'score_normalise_trimean', 'score_normalise_geomean', 'score_normalise_harmonic_mean', 
                   'score_exp', 'score_log_base_2', 'score_log_base_e', 'score_log_base_10']

plt.rcParams['figure.figsize'] = [20.0, 20.0]
plt.plot(df[columns_to_show])
plt.legend(columns_to_show)

In [None]:
columns_to_show = ['score_sklearn_normalize', 'score_normalise_mean', 'score_normalise_min_max', 'score_normalise_median',
                   'score_normalise_mad', 'score_normalise_trimean', 'score_normalise_geomean', 'score_normalise_harmonic_mean', 
                   'score_exp', 'score_log_base_2', 'score_log_base_e', 'score_log_base_10']

plt.rcParams['figure.figsize'] = [20.0, 20.0]
plt.plot(df[columns_to_show])
plt.legend(columns_to_show)

<i><p style="font-size:20px; background-color: #FFF1D7; border: 2px solid black; margin: 20px; padding: 20px;"> `score_normalise_mean` is a lot less smoother than `score_sklearn_normalize` or `score_normalise_min_max`, while the log variants are discontinuous. Although discontinuous, they do trace the deviations of the other plots.

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='comparing-normalisation-group1-methods'></a>

----------


# Comparing some Normalise actions together (scikitlearn normalize, mean, min-max, median, M.A.D (A.A.D), trimean, geomean, harmonic mean)

In [None]:
columns_to_show = ['score', 'score_sklearn_normalize', 'score_normalise_mean', 'score_normalise_min_max', 'score_normalise_median',
                   'score_normalise_mad','score_normalise_trimean', 'score_normalise_geomean', 'score_normalise_harmonic_mean']

plt.rcParams['figure.figsize'] = [20.0, 12.0]
plt.plot(df[columns_to_show])
plt.legend(columns_to_show)

In [None]:
columns_to_show = ['score_sklearn_normalize', 'score_normalise_mean', 'score_normalise_min_max', 'score_normalise_median',
                   'score_normalise_mad','score_normalise_trimean', 'score_normalise_geomean', 'score_normalise_harmonic_mean']

plt.rcParams['figure.figsize'] = [20.0, 12.0]
plt.plot(df[columns_to_show])
plt.legend(columns_to_show)

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='comparing-normalisation-group2-methods'></a>

----------


# Comparing some Normalise actions together (scikitlearn normalize, mean, M.A.D (A.A.D), trimean, geomean, harmonic mean)

In [None]:
columns_to_show = ['score', 'score_sklearn_normalize', 'score_normalise_median', 'score_normalise_mad', 'score_normalise_trimean', 
                   'score_normalise_geomean', 'score_normalise_harmonic_mean']

plt.rcParams['figure.figsize'] = [20.0, 12.0]
plt.plot(df[columns_to_show])
plt.legend(columns_to_show)

In [None]:
columns_to_show = ['score_sklearn_normalize', 'score_normalise_median', 'score_normalise_mad', 'score_normalise_trimean', 
                   'score_normalise_geomean', 'score_normalise_harmonic_mean']

plt.rcParams['figure.figsize'] = [20.0, 12.0]
plt.plot(df[columns_to_show])
plt.legend(columns_to_show)

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='comparing-normalisation-group3-methods'></a>

----------


# Comparing some Normalise actions together (mean, min-max)

In [None]:
columns_to_show = ['score', 'score_normalise_mean', 'score_normalise_min_max']

plt.rcParams['figure.figsize'] = [20.0, 12.0]
plt.plot(df[columns_to_show])
plt.legend(columns_to_show)

In [None]:
columns_to_show = ['score_normalise_mean', 'score_normalise_min_max']

plt.rcParams['figure.figsize'] = [20.0, 12.0]
plt.plot(df[columns_to_show])
plt.legend(columns_to_show)

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>

<a id='comparing-normalisation-group4-methods'></a>

----------


# Comparing some Normalise actions together (exp, log base 2, natural log, log base 10)

In [None]:
columns_to_show = ['score', 'score_exp', 'score_log_base_2', 'score_log_base_e', 'score_log_base_10']

plt.rcParams['figure.figsize'] = [20.0, 8.0]
plt.plot(df[columns_to_show])
plt.legend(columns_to_show)

In [None]:
columns_to_show = ['score_exp', 'score_log_base_2', 'score_log_base_e', 'score_log_base_10']

plt.rcParams['figure.figsize'] = [20.0, 8.0]
plt.plot(df[columns_to_show])
plt.legend(columns_to_show)

In [None]:
columns_to_show = ['score_log_base_2', 'score_log_base_e', 'score_log_base_10']

plt.rcParams['figure.figsize'] = [20.0, 8.0]
plt.plot(df[columns_to_show])
plt.legend(columns_to_show)

<a href='#ToC'><span class="label label-info" style="font-size: 125%">Back to Table of Contents</span></a>