# Machine Learning 1 - Exercise Sheet 1

## Viktor Vironski (4330455), Andy Disser (5984875), Trung (Matrikelnummer)



### Exercise 1. Min-Max normalization

Calculate min-max normalization with the following formula for range [a,b]:

$$
X_{scaled} = a + \frac{(X - X_{min})*(b - a)}{X_{max} - X_{min}}
$$

In [194]:
from sklearn import datasets
import numpy as np

# prevent numpy exponential notation on print 
np.set_printoptions(suppress=True)

# implement Min-Max Normalization Function for range 0 to 1
def min_max_scale( data: list, min_range: float, max_range: float):
    
    # copy existing data to a new workable instance
    new_data = data.copy()
    
    # calculate global minimum and global maximum of the data
    min_value = np.matrix(new_data).min()
    max_value = np.matrix(new_data).max()
    
    # normalize data using formula x_scaled = (x - min_value)/(max_value - min_value)
    new_data -= min_value
    new_data *= ( (max_range - min_range) / (max_value - min_value))
    new_data += min_range
    
    # return global minimum, global maximum and the normalized data set
    return( new_data, min_value, max_value)


# set iris to a numpy array from the iris data
iris = np.array(datasets.load_iris().data)

### a) scale all features seperatly
# initial attempt at transposing the data matrix did not work as planed, therefore loop implementation

iris_feature_vectors_normed = []

# for all rows in the iris data set apply Min-Max Normalization
for i in range(0, len(iris.data)):
    iris_feature_vectors_normed.append(min_max_scale(iris[i],0, 1))

# print first five data points of the feature normed iris dataset
print('a) The first five data points of the feature vectors scaled dataset are:', '\n')
for i in range (0, 5):
    print(iris_feature_vectors_normed[i][0])

print('\n')


### b) scale the full dataset

# apply Min-Max Normalization to the full dataset
iris_full_dataset_normed = min_max_scale(iris, 0, 1)

# print first five data points of the fully scaled dataset
print('b) The first five data points of the fully scaled data set are:', '\n')
for j in range (0, 5):
    print(iris_full_dataset_normed[0][j])



a) The first five data point of the feature vectors scaled data set are: 

[1.         0.67346939 0.24489796 0.        ]
[1.         0.59574468 0.25531915 0.        ]
[1.         0.66666667 0.24444444 0.        ]
[1.         0.65909091 0.29545455 0.        ]
[1.         0.70833333 0.25       0.        ]


b) The first five data point of the fully scaled data set are: 

[0.64102564 0.43589744 0.16666667 0.01282051]
[0.61538462 0.37179487 0.16666667 0.01282051]
[0.58974359 0.3974359  0.15384615 0.01282051]
[0.57692308 0.38461538 0.17948718 0.01282051]
[0.62820513 0.44871795 0.16666667 0.01282051]


### Exercise 2. Z-Score Normalization

In [228]:
from sklearn import datasets
import numpy as np
import math

# prevent numpy exponential notation on print 
np.set_printoptions(suppress=True)

# implement Z-Score Normalization
def zscore (data: list):
    
    # copy existing data to a new workable instance
    new_data = data.copy()
    
    # calculate mean
    # numpy sum function adds all data in the data set
    # numpy size function calculates the size of the matrix aka the number of data points in the dataset
    mean = new_data.sum() / new_data.size
    
    # subtract mean from every data point
    new_data -= mean
    
    # calculate standard diviation
    # copy mean calculation matrix into new entity and raise to the power of two
    # sum up the standard diviation of all data points and divide by the number of data points
    # calculate square root of the result
    stan_div_matrix = np.power(new_data.copy(), 2)
    stan_div = math.sqrt(stan_div_matrix.sum() / stan_div_matrix.size)
    
    # normalize data using formula x_new = (x - mean)/stan_div
    # (x - mean) has been calculated prior to the calculation of standard diviation, so divide by standard diviation
    new_data *= 1/stan_div
    
    #return Z-Score normalized data
    return(new_data)


# set iris to a numpy array from the iris data
iris = np.array(datasets.load_iris().data)


### a) scale variables separately

# calculate the Z-Score normalization of every variable seperately
# transpose the resulting array to be able to concatenate the four arrays back into a full data set
sepal_length = zscore(iris[:, 0:1]).transpose()
sepal_width = zscore(iris[:, 1:2]).transpose()
petal_length = zscore(iris[:, 2:3]).transpose()
petal_width = zscore(iris[:, 3:4]).transpose()

# concatenate the four arrays and transpose to acquire the normalized dataset
iris_variables_normed = np.concatenate((sepal_length,sepal_width,petal_length,petal_width), axis=0).transpose()

# print first five datapoints of the Z-Score normalized iris dataset
print('a) The first five data points of the feature vectors scaled dataset are:', '\n')
for i in range (0, 5):
    print(iris_variables_normed[i])

print('\n')

#print(iris_variables_scaled)

### b) scale the full dataset

# apply Z-Score Normalization on iris dataset
iris_full_dataset_normed = zscore(iris)

# print first five datapoints of the Z-Score normalized iris dataset
print('b) The first five data points of the fully scaled dataset are:', '\n')
for j in range(0,5):
    print(iris_full_dataset_normed[j])



a) The first five data points of the feature vectors scaled dataset are: 

[-0.90068117  1.01900435 -1.34022653 -1.3154443 ]
[-1.14301691 -0.13197948 -1.34022653 -1.3154443 ]
[-1.38535265  0.32841405 -1.39706395 -1.3154443 ]
[-1.50652052  0.09821729 -1.2833891  -1.3154443 ]
[-1.02184904  1.24920112 -1.34022653 -1.3154443 ]


b) The first five data points of the fully scaled dataset are: 

[ 0.82858665  0.01798522 -1.04592915 -1.65388022]
[ 0.72726147 -0.23532773 -1.04592915 -1.65388022]
[ 0.62593629 -0.13400255 -1.09659174 -1.65388022]
[ 0.5752737  -0.18466514 -0.99526657 -1.65388022]
[ 0.77792406  0.06864781 -1.04592915 -1.65388022]


### Exercise 3. Plotting I