# Preparing CWT data for deep learning 
In this notebook, we will describe detailed steps to prepare CWT data to train a deep learning model. CWT is computed from raw time domain data. Time domain data are first segmented and then for each segment, CWT is computed. CWT results are then resized so that they can be fed into a deep learning model.

All time domain preprocessing steps including segmenting time domain data for each fault type is detailed in [this notebook](https://github.com/biswajitsahoo1111/cbm_codes_open/blob/master/notebooks/CWRU_time_domain_data_preprocessing.ipynb). After segmenting, the data are saved in a `npz` file named [CWRU_48k_load_1_CNN_data.npz](https://github.com/biswajitsahoo1111/cbm_codes_open/blob/master/notebooks/data/CWRU_48k_load_1_CNN_data.npz). The file is inside data folder. We will use the same file and compute CWT on it.

First we will import relevant libraries. For wavelet transform we have used [PyWavelets](https://pywavelets.readthedocs.io/en/latest/) package. Image resizing can be done using many libraries. We have used Tensorflow for the same.

In [1]:
import tensorflow as tf
import numpy as np
import pywt

In [2]:
file = np.load("./data/CWRU_48k_load_1_CNN_data.npz")
file.files

['data', 'labels']

In [3]:
raw_data = file["data"]
raw_data.shape

(4600, 32, 32)

As the data file was originally intended to be used in a deep learning model, each segment had been resized into size $32 \times 32$. To compute CWT we have to again resize the matrix like shape into an 1D array. As we are considering a total of 4600 segments, after resizing, our data shape becomes $4600 \times 1024$.  

In [4]:
raw_data = raw_data.reshape(-1, 1024)
raw_data.shape

(4600, 1024)

Following code can be used to compute CWT. The CWT data would be fed into a CNN model that expects input image size to be $32 \times 32$. Therefore, we resize the output of CWT to a size of $32 \times 32$. 

The following code cell might take a long time to execute on a personal computer. Therefore we print a string output after every 100 segments are processed. That would give readers a fair idea as to whether it's worth waiting for the following code cell to finish executing or not. 

In [5]:
wavelet_data = np.repeat(np.nan, repeats = 460 * 10 * 32 * 32).reshape(-1, 32, 32)
for i in range(raw_data.shape[0]):
    segment = raw_data[i, :]
    coefs, _ = pywt.cwt(segment, np.arange(start = 1, stop = 2049, step = 32), "morl")
    wavelet_data[i, :, :] = tf.reshape(tf.image.resize(coefs.reshape((64, 1024, 1)), (32, 32)), (32, 32))
    if (i % 100) == 0 and (i != 0):
        print(f"{i} segments processed.")

100 segments processed.
200 segments processed.
300 segments processed.
400 segments processed.
500 segments processed.
600 segments processed.
700 segments processed.
800 segments processed.
900 segments processed.
1000 segments processed.
1100 segments processed.
1200 segments processed.
1300 segments processed.
1400 segments processed.
1500 segments processed.
1600 segments processed.
1700 segments processed.
1800 segments processed.
1900 segments processed.
2000 segments processed.
2100 segments processed.
2200 segments processed.
2300 segments processed.
2400 segments processed.
2500 segments processed.
2600 segments processed.
2700 segments processed.
2800 segments processed.
2900 segments processed.
3000 segments processed.
3100 segments processed.
3200 segments processed.
3300 segments processed.
3400 segments processed.
3500 segments processed.
3600 segments processed.
3700 segments processed.
3800 segments processed.
3900 segments processed.
4000 segments processed.
4100 segm

For the convenience of readers, we have already saved the wavelet outputs in a `npz` file named [CWRU_48k_load_1_CNN_wavelet_morlet_data.npz](https://github.com/biswajitsahoo1111/cbm_codes_open/blob/master/notebooks/data/CWRU_48k_load_1_CNN_wavelet_morlet_data.npz). That file is saved in the data folder. To convince ourselves that the data available online is the same as obtained in the last code cell, we will compare the outputs of previous cell with the online saved data.

In [6]:
online_file = np.load("./data/CWRU_48k_load_1_CNN_wavelet_morlet_data.npz")
online_file.files

['data', 'labels']

In [7]:
online_wavelet_data = online_file["data"]
online_wavelet_data.shape

(4600, 32, 32)

Checking whether both the data are identical or not.

In [8]:
4600 * 1024

4710400

In [9]:
np.sum(np.isclose(wavelet_data, online_wavelet_data))

4710400

As we can see both the data are indeed identical. 

## Save wavelet data
We can save both data and labels in a single `npz` file. We have commented the line that actually saves data as already have saved the file in `data` folder. Readers who wish to save it again can do so by uncommenting the line.

In [10]:
fault_types = ['Ball_007', 'Ball_014', 'Ball_021', 'IR_007', 'IR_014', 'IR_021', 'OR_007','OR_014', 'OR_021', 'Normal']
labels = np.repeat(fault_types, 460)

# np.savez('CWRU_48k_load_1_CNN_wavelet_morlet_data', data = wavelet_data, labels = labels)   # Save wavelet data