# How wavelet packet energy features were calculated

While applying (shallow) machine learning algorithms for condition monitoring task, we first calculate features from raw time domain data and then apply the algorithms to the feature matrix. In our code demonstrations we have extensively used wavelet packet energy as well as wavelet packet entropy features. In this notebook, we will show how to calculate wavelet packet energy features. Wavelet packet energy and wavelet packet entropy features can be calculated in Python and R. But we will use MATLAB to calculate it. We do so, because our aim is to reproduce the same feature matrix that we have used in all our algorithms from raw time domain data. In Python and R, equivalent commands are not available (to the best of my knowledge) that can reprodce the feature matrix that we use in our experiments.

**Update**: See [this notebook](https://github.com/biswajitsahoo1111/cbm_codes_open/blob/master/notebooks/Wavelet_packet_energy_features_python.ipynb) to compute wavelet packet energy features entirely in Python. As we have mentioned above, the feature matrix obtained using Python will not be exactly equal to the feature matrix that we have used in our analysis. But it will be close.

In [1]:
import numpy as np

Download the time domain data from [here](https://github.com/biswajitsahoo1111/cbm_codes_open/blob/master/notebooks/data/CWRU_48k_load_1_CNN_data.npz) and run the following cells. See [this notebook](https://github.com/biswajitsahoo1111/cbm_codes_open/blob/master/notebooks/CWRU_time_domain_data_preprocessing.ipynb) to understand how the time domain data was prepared at the first place. This data will be used later in [deep learning demonstration](https://github.com/biswajitsahoo1111/cbm_codes_open/blob/master/notebooks/Deep_Learning_CWRU_Blog.ipynb). We use the same time domain data and calculate features from it.

In [3]:
file = np.load("./data/CWRU_48k_load_1_CNN_data.npz")
print(file.files)

['data', 'labels']


In [4]:
data = file['data']
labels = file['labels']
print(data.shape, labels.shape)

(4600, 32, 32) (4600,)


We divide the raw signal into segments of length 1024 each. For each fault type we collect 460 segments. There are 10 fault types, so we get 4600 segments in total. As the data were prepared for a CNN task, we further resize the data into a size of $(32 \times 32)$. So final size of data becomes $(4600 \times 32 \times 32)$.  

In [5]:
np.unique(labels)

array(['Ball_007', 'Ball_014', 'Ball_021', 'IR_007', 'IR_014', 'IR_021',
       'Normal', 'OR_007', 'OR_014', 'OR_021'], dtype='<U8')

In [6]:
resized_data = np.reshape(data, (2300,2048))
resized_data.shape

(2300, 2048)

We resize the data this way because for shallow leraning applications we consider segments of length 2048 and calculate features using the data of this segment. There is no particular reason in choosing segments of length 2048 as opposed to 1024 or 4096 or any othre number. One consideration might be the amount of raw data available. If we select a larger segment length, we will get less number of segments. And if we need more segments (this is a need in machine learning), we keep the segment length short. However, keep in mind that reducing the segment length to an arbitrarily small number might not be that useful as small segments might not capture useful events that are characteristic of bearing faults. It so happens that the author chose a segment length of 2048 for this dataset and the resulting feature matrix yielded excellent results. Thus, the author has not changed the segment length ever since.

Now save the data in npy file and load it in matlab. [Refer to this page](https://github.com/kwikteam/npy-matlab) that explains the procedure to read npy files into MATLAB.

In [7]:
np.save("cwru_resized", resized_data)

The code in the cell below is in MATLAB. Don't run it in Python.

In [8]:
# This is matlab code

# data = readNPY('cwru_resized.npy');
# matrix = NaN(2300,8);
# for i = 1:size(data,1)
#     [~,~,~,energy] = modwpt(data(i,:),'sym8',3); % Read matlab documentation to figure out what this line does
#     matrix(i,:) = energy;
# end
# csvwrite("check_cwru_energy.csv",matrix)

Now, if you have MATLAB installed in your system, you can run the commands in the above cell in MATLAB and compare the saved csv file `check_cwru_energy.csv` with [feature_wav_energy8_48k_2048_load_1.csv](https://github.com/biswajitsahoo1111/cbm_codes_open/blob/master/notebooks/data/feature_wav_energy8_48k_2048_load_1.csv) that is available at the author's github page. You will observe that data in both the files are identical. If you don't have MATLAB installed in you system, you have to accept my word that data in both the files are indeed identical. Similarly, other features matrices can now be computed.