## scipy.stats.zscore 标准样本变换法

> https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zscore.html?highlight=zscore#scipy.stats.zscore

In [1]:
import numpy as np
from scipy.stats import zscore

a = np.array([[0, 0], 
              [0, 0], 
              [1, 1], 
              [1, 1]])
zscore(a) # 默认对列数据处理

array([[-1., -1.],
       [-1., -1.],
       [ 1.,  1.],
       [ 1.,  1.]])

In [2]:
import numpy as np
from scipy.stats import zscore

b = np.array([[ 0.3148,  0.0478,  0.6243,  0.4608],
              [ 0.7149,  0.0775,  0.6072,  0.9656],
              [ 0.6341,  0.1403,  0.9759,  0.4064],
              [ 0.5918,  0.6948,  0.904 ,  0.3721],
              [ 0.0921,  0.2481,  0.1188,  0.1366]])
zscore(b, axis=1, ddof=1) # 按每行数据处理， 自由度为n-1

array([[-0.19264823, -1.28415119,  1.07259584,  0.40420358],
       [ 0.33048416, -1.37380874,  0.04251374,  1.00081084],
       [ 0.26796377, -1.12598418,  1.23283094, -0.37481053],
       [-0.22095197,  0.24468594,  1.19042819, -1.21416216],
       [-0.82780366,  1.4457416 , -0.43867764, -0.1792603 ]])

## $\bigstar$ sklearn.preprocessing

> https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing

- sklearn.preprocessing.StandardScaler 标准样本变换法（==zscore）
- sklearn.preprocessing.MinMaxScaler 极差变换法
- sklearn.preprocessing.MaxAbsScaler 比例变换法
- sklearn.preprocessing.normalize 向量归一化法（不能还原）


StandardScaler中， numpy.std(x, ddof=0) 此处自由度为n，均为有偏估计

In [3]:
import numpy as np
from sklearn.preprocessing import StandardScaler

a = np.array([[0, 0], 
              [0, 0], 
              [1, 1], 
              [1, 1]])
scaler = StandardScaler().fit(a)


print(scaler.mean_)
print(scaler.scale_) # 标准差
print(scaler.var_, '\n') # 方差

b = scaler.transform(a) ########## 正向变化
print(b, '\n')
print(scaler.inverse_transform(b), '\n') ########## 反向变化


print(scaler.transform([[2, 2]])) # 按照fit得到的均值和标准差处理

[0.5 0.5]
[0.5 0.5]
[0.25 0.25] 

[[-1. -1.]
 [-1. -1.]
 [ 1.  1.]
 [ 1.  1.]] 

[[0. 0.]
 [0. 0.]
 [1. 1.]
 [1. 1.]] 

[[3. 3.]]


In [4]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

a = np.array([[-1, 2], 
              [-0.5, 6], 
              [0, 10], 
              [1, 18]])
scaler = MinMaxScaler().fit(a)


b = scaler.transform(a) ########## 正向变化
print(b, '\n')
print(scaler.inverse_transform(b), '\n') ########## 反向变化

[[0.   0.  ]
 [0.25 0.25]
 [0.5  0.5 ]
 [1.   1.  ]] 

[[-1.   2. ]
 [-0.5  6. ]
 [ 0.  10. ]
 [ 1.  18. ]] 



In [5]:
import numpy as np
from sklearn.preprocessing import MaxAbsScaler

a = np.array([[-1, 2], 
              [-0.5, 6], 
              [0, 10], 
              [1, 18]])
scaler = MaxAbsScaler().fit(a)


b = scaler.transform(a) ########## 正向变化
print(b, '\n')
print(scaler.inverse_transform(b), '\n') ########## 反向变化

[[-1.          0.11111111]
 [-0.5         0.33333333]
 [ 0.          0.55555556]
 [ 1.          1.        ]] 

[[-1.   2. ]
 [-0.5  6. ]
 [ 0.  10. ]
 [ 1.  18. ]] 



In [6]:
import numpy as np
from sklearn.preprocessing import normalize

X = [[ 1., -1.,  2.],
     [ 2.,  0.,  0.],
     [ 0.,  1., -1.]]
X_normalized = normalize(X, axis=0, norm='l2') # norm: {‘l1’, ‘l2’, ‘max’}

X_normalized

array([[ 0.4472136 , -0.70710678,  0.89442719],
       [ 0.89442719,  0.        ,  0.        ],
       [ 0.        ,  0.70710678, -0.4472136 ]])