<span style="font-size:10pt">AI-ML @ ENSPIMA / v1.3 september 2024 / Jean-Luc CHARLES (Jean-Luc.charles@mailo.com) / CC BY-SA 4.0 /</span>

<div style="color:brown;font-family:arial;font-size:26pt;font-weight:bold;text-align:center"> 
Machine Learning $-$ MiniProject</div><br>
<hr>
<div style="color:blue;font-family:arial;font-size:22pt;font-weight:bold;text-align:center">
Training a neural network to diagnose bearing faults<br><br>
Part 2/3: Pre-Processing of the CWRU dataset</div>
<hr>
Expected duration : 60 minutes

## Part-2 targeted learning objectives
Know how to
- load a `.npz` into numpy ndarrays
- process the temporal dataset to get a spectral dataset.
- display a grid of spectra plots.

<div class="alert alert-block alert-danger">
<span style="color:brown;font-family:arial;font-size:12pt"> 
It is important to use a <span style="font-weight:bold;">Python Virtual Environment</span> (PVE) for your Python projects: a PVE makes it possible to control for each project the versions of the Python interpreter and the "sensitive" modules (like tensorflow).
    
All the notebooks must be loaded in a `jupyter notebook` or `jupyter lab` launched within the <b><span style="color: rgb(100, 151, 202);" >pyml</span></b> PVE specially created for the session.    
</span></div>

In [None]:
import os, sys
import scipy.io
import numpy as np
import matplotlib.pyplot as plt

# 2 $-$ Load the *CWRU* data with the `.npz` numpy file

Print the list of the `.npz` files in the current directory:

In [None]:
[ f for f in os.listdir() if f.endswith('.npz')]

`numpy.load` loads the wanted `.npz` file  and returns a dict object:

In [None]:
npzfile = np.load('CWRU_dadaset.npz')
list(npzfile.keys())

Thanks to the `value` method of the `dict` class, we can rename `A`, `B` and `C` the three `ndarray` objects of `npzfiles`:

In [None]:
A, B, C = npzfile.values()

Plot the data to verify (same Python instructions as the previous notebook, `1-load_CWRU_data.ipynb`, section 1.6):

In [None]:
# create the list of the health condition labels:
health_cond = ['N']
for def_type in 'RF', 'IF', 'OF':
    for size in '18', '36', '54':
        health_cond.append(f"{def_type}.{size}")

# define 'nb_HC', the number of health conditions:
nb_HC = len(health_cond)

# define 'nb_L', the number of load cases:
full_dataset = (A, B, C)
nb_load      = len(full_dataset)
sample_num   = 0  # the sample number

plt.rcParams['font.size'] = 6   # change the pyplot defaut font size
fig, axes = plt.subplots(nb_HC, nb_load, sharex=True)
fig.set_size_inches((8,12))
plt.subplots_adjust(top=.95, wspace=0.25, hspace=0.5)
plt.suptitle(f"Temporal plots for the sample #{sample_num}", fontsize=10)

for n, dataset in enumerate(full_dataset):
    for hc in range(nb_HC):
        axe = axes[hc, n]
        axe.set_title(f"Load_{n+1} / health cond {health_cond[hc]}", fontsize=8)
        axe.plot(dataset[hc, sample_num], linewidth=0.4)
        if hc == nb_HC-1: axe.set_xlabel("Rank")

plt.rcParams['font.size'] = 10  # restore the pyplot defaut font size to its defautl value

# 3 $-$ Compute and plot the data in the spectral domain

## 3.1 $-$ Transform the temporal dataset into a spectral dataset

Let's retrieve the shape of the array A (temporal data):

In [None]:
nb_HC, nb_sample, sample_size = A.shape
print(f"array A has <{nb_sample}> samples of <{sample_size}> data point for each of the <{nb_HC}> health conditions ")

The spectra are computed with [numpy.fft.rfft](https://numpy.org/doc/stable/reference/generated/numpy.fft.rfft.html): on the web page, you can see that how to compute the size of the spectrum:

In [None]:
if sample_size % 2 == 0:
    spectrum_size = int(sample_size/2+1)
else:
    spectrum_size = int((sample_size+1)/2)
print(f"size of spectra: {spectrum_size}")    

Now let's define and dimension 3 ndarrays to store the spectra of the 3 temporal data arrays:

In [None]:
A_spectrum = np.ndarray((nb_HC, nb_sample, spectrum_size), dtype=float)
B_spectrum = np.ndarray((nb_HC, nb_sample, spectrum_size), dtype=float)
C_spectrum = np.ndarray((nb_HC, nb_sample, spectrum_size), dtype=float)

and let's compute the spectra with the `np.fft.rfft` function:

In [None]:
from numpy.fft import rfft
for spectrum_array, temporal_array in zip((A_spectrum, B_spectrum, C_spectrum), (A, B, C)):
    for hc in range(nb_HC):
        for s in  range(nb_sample):
            sample_spectrun = np.abs(rfft(temporal_array[hc, s]))   # we take the module of the Fourier spectrum
            spectrum_array[hc, s] = sample_spectrun/sample_spectrun.max()  # normalize the spectum values in [0,1]

Let's draw the spectra of the first sample:

In [None]:
spectral_DATA = (A_spectrum, B_spectrum, C_spectrum)
nb_load       = len(spectral_DATA)
sample_num    = 0  # the sample number

plt.rcParams['font.size'] = 6   # change the pyplot defaut font size
fig, axes = plt.subplots(nb_HC, nb_load, sharex=True)
fig.set_size_inches((8,12))
plt.subplots_adjust(top=.95, wspace=0.25, hspace=0.5)
plt.suptitle(f"Plot of the spectra for the sample #{sample_num}", fontsize=10)

for n, spectral_dataset in enumerate(spectral_DATA):
    for hc in range(nb_HC):
        axe = axes[hc, n]
        axe.set_title(f"Load_{n+1} / health cond {health_cond[hc]}", fontsize=8)
        axe.plot(spectral_dataset[hc, sample_num], linewidth=0.4)
        if hc == nb_HC-1: axe.set_xlabel("Frequency rank")

plt.rcParams['font.size'] = 10  # restore the pyplot defaut font size to its defautl value

## 3.2 $-$ About the spectra size

As you can see from the previous plot, the spectrum is significant only for low frequencies: we can truncate the spectra size without loosing pertinent information on the vibration footprint of the defect.<br>

Hereafter we plot the spectra for sample #0 with only the first 400 spectral points:

In [None]:
spectral_DATA = (A_spectrum, B_spectrum, C_spectrum)
nb_load       = len(spectral_DATA)
sample_num    = 0  # the sample number

plt.rcParams['font.size'] = 6   # change the pyplot defaut font size
fig, axes = plt.subplots(nb_HC, nb_load, sharex=True)
fig.set_size_inches((8,12))
plt.subplots_adjust(top=.95, wspace=0.25, hspace=0.5)
plt.suptitle(f"Spectra for the sample #{sample_num}", fontsize=10)

for n, dataset in enumerate(spectral_DATA):
    for hc in range(nb_HC):
        axe = axes[hc, n]
        axe.set_title(f"Load_{n+1} / health cond {health_cond[hc]}", fontsize=8)
        axe.plot(dataset[hc, sample_num, :400], linewidth=0.4)
        if hc == nb_HC-1: axe.set_xlabel("Frequency rank")
            
plt.rcParams['font.size'] = 10  # restore the pyplot defaut font size to its defautl value

### Further work:

In the next notebook (`3-Mini_poject.ipynb`), you will build a Dense Neural Network and train the network to classify the CWR dataset.