## Data transformation

#### 1. Smoothing

Let your data be

D = {xi} i=1, 8 = {1, 2, 3, 2, 3, 1, 4, 3}

X = [1, 2, 3, 2, 3, 1, 4, 3]

#### 2. Logarithm trasform:

    xi → log xi

In [6]:
import numpy as np
X = [1, 2, 3, 2, 3, 1, 4, 3]
lX = [np.log(x) for x in X]
print(np.round(lX,2))

# D → {0.0, 0.69, 1.1, 0.69, 1.1, 0.0, 1.39, 1.1}

[0.   0.69 1.1  0.69 1.1  0.   1.39 1.1 ]


#### 3. k-step simple moving average 

- (for time series):

In [7]:
# for k = 3
smaX = [X[0], X[1]] + [np.mean([X[i-2], X[i-1], X[i]]) for i in range(2,
len(X))]
print(np.round(smaX,2))

#D → {1.00, 2.00, 2.00, 2.33, 2.67, 2.00, 2.67, 2.67}

[1.   2.   2.   2.33 2.67 2.   2.67 2.67]


#### 4. Aggregation:

In [16]:
for j in range(len(X)):
    print(j)    

0
1
2
3
4
5
6
7


In [23]:
# X = [1, 2, 3, 2, 3, 1, 4, 3]
agg = [1/2*(X[j]+X[j+1]) for j in range(0,len(X)-1,2)]
print(np.round(agg,2))
# D → {1.5, 2.5, 2, 3.5}

[1.5 2.5 2.  3.5]


#### 5. Min-max normalization:

In [25]:
# D → {0.0, 0.33, 0.67, 0.33, 0.67, 0.0, 1.0, 0.67}
mmX = [(x-np.min(X))/(np.max(X)-np.min(X)) for x in X]
print(np.round(mmX,2))

[0.   0.33 0.67 0.33 0.67 0.   1.   0.67]


#### 6. Standardization(or z-score normalization):


In [26]:
# D → {−1.39, −0.38, 0.63, −0.38, 0.63, −1.39, 1.64, 0.63}
sX = [(x - np.mean(X))/(np.std(X)) for x in X]
print(np.round(sX,2))

[-1.39 -0.38  0.63 -0.38  0.63 -1.39  1.64  0.63]
