<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Structure-cheat-sheet" data-toc-modified-id="Structure-cheat-sheet-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Structure cheat sheet</a></span></li><li><span><a href="#Data-structure" data-toc-modified-id="Data-structure-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data structure</a></span></li><li><span><a href="#get-features" data-toc-modified-id="get-features-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>get features</a></span></li><li><span><a href="#Exploration-of-the-Features" data-toc-modified-id="Exploration-of-the-Features-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Exploration of the Features</a></span></li><li><span><a href="#Exploring-spectral-features" data-toc-modified-id="Exploring-spectral-features-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Exploring spectral features</a></span></li></ul></div>

# Basics

## Structure cheat sheet

1. func: train data lead (following order)
    1. read the descriptive dataframe from the feature-pipeline
    2. extract feature from the feature-objects which are labeled train-dataset from dataframe
    3. create numpy feature array for the processing pipeline
2. preprocessing
    1. Transformation (any combination of the following)
        + log-transform
        + PCA
        + others
    2. Scaling (one of the following)
        + StandardScaler
        + MinMaxScaler
3. Unsupervised Clustering
    1. Estimate initial hyperparameter
    2. Create grid over various hyperparameters
    3. Train all and choose the best according to metric
    
    
in all steps the cluster-recorder object (possibly dataframe-row) will record all the meta-information like hyper-parameters

## Data structure

There are multiple degrees of freedom in the data:

1. Signal to noise ratio (SNR)
2. Machine type
    1. pump
    2. fan
    3. valve (solenoid)
    4. slider
3. Machine ID
    1. four different machine IDs
    
The pipeline will be applied to fixed SNR, fixed machine type and fixed ID

## get features

Get the descriptive dataframe for the features.

The descriptive dataframe contains all IDs of the pump. We will focus on ID '00' for now since the modeling phase is seperated per SNR, per machine, per ID anyway.

class: 
+ uni\_\<model\>
attributes:
+ default threshold
+ roc_auc
methods:
+ fit
+ predict
+ predict_score
+ eval_roc_auc

In [1]:
#===============================================
# Basic Imports
BASE_FOLDER = '../../'
%run -i ..\..\utility\feature_extractor\JupyterLoad_feature_extractor.py
%run -i ..\..\utility\modeling\JupyterLoad_modeling.py

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA, FastICA
from tqdm.auto import tqdm
import glob

load feature_extractor_mother
load feature_extractor_mel_spectra
load feature_extractor_psd
load feature_extractor_ICA2
load feature_extractore_pre_nnFilterDenoise
load extractor_diagram_mother
load Simple_FIR_HP
load TimeSliceAppendActivation
load load_data
Load split_data
Load anomaly_detection_models
Load pseudo_supervised_models
Load tensorflow models
Load detection_pipe


## RFC

In [6]:
diagram = 'extdia_v1'
machines = ['slider',
            'pump',
            'fan',
            'valve',
            ]
SNRs = ['6dB', 'min6dB']

IDs = [#'00'
       #,'02'
        '04'
       ,'06'
        ]
features = [#('MEL_denbssm', {'function':'frame', 'frames':3})
            #, ('MEL_denbssm', {'function':'frame', 'frames':5})
            #, ('MEL_denbssm', {'function':'frame', 'frames':7})
            #, ('MEL_denbssm', {'function':'flat'})
            #('PSD_denbssm', {'function':'flat'})
            #, ('MEL_bssm', {'function':'frame', 'frames':3})
            #, ('MEL_bssm', {'function':'frame', 'frames':5})
            #, ('MEL_bssm', {'function':'frame', 'frames':7})
            #, ('MEL_bssm', {'function':'flat'})
             #('PSD_bssm', {'function':'flat'})
            #, ('MEL_raw', {'function':'frame', 'frames':3})
            #, ('MEL_raw', {'function':'frame', 'frames':5})
            #, ('MEL_raw', {'function':'frame', 'frames':7})
            #, ('MEL_raw', {'function':'flat'})
             ('PSD_raw', {'function':'flat'})
            #, ('MEL_den', {'function':'frame', 'frames':3})
            #, ('MEL_den', {'function':'frame', 'frames':5})
            #, ('MEL_den', {'function':'frame', 'frames':7})
            #, ('MEL_den', {'function':'flat'})
            #, ('PSD_den', {'function':'flat'})
            #, ('ICA_demix', {'function':'flat'})
            #, ('ICA_demix', {'function':'maxrange'})
            ]

tasks = []
for machine in machines:
    for SNR in SNRs:
        for ID in IDs:
            for feature in features:
                print(machine,SNR,ID,feature)
                if machine == 'valve':
                    diagram_patch = 'extdia_v1_sporafic'
                else:
                    diagram_patch = diagram
                path = glob.glob(BASE_FOLDER \
                    + '/dataset/{}/{}{}{}_EDiaV1*aug0'.format(diagram_patch, machine, SNR, ID) 
                    + "*pandaDisc*.pkl", recursive=True)[0]
                d = {
                'path_descr': path,
                'feat':feature[1], 
                'feat_col':feature[0], 
                'SNR':SNR, 
                'machine':machine, 
                'ID':ID,
                'BASE_FOLDER':BASE_FOLDER}
                tasks.append(d)


preprocessing = [
    #(PCA, {'n_components':64}),
    (StandardScaler, {})
]

modeling = (uni_RandomForestClassifier, {'max_depth': 2})

# modeling = (uni_EllipticEnvelope, {})

pipes = [Pipe(preprocessing, modeling, pseudo_sup=True) for i in range(len(tasks))]

tasks_failed = []
for pipe, task in tqdm(zip(pipes, tasks), total=len(tasks)):
    try:
        pipe.run_pipe(task)
    except:
        tasks_failed.append(task)
        print('Task failed')

slider 6dB 04 ('PSD_raw', {'function': 'flat'})
slider 6dB 06 ('PSD_raw', {'function': 'flat'})
slider min6dB 04 ('PSD_raw', {'function': 'flat'})
slider min6dB 06 ('PSD_raw', {'function': 'flat'})
pump 6dB 04 ('PSD_raw', {'function': 'flat'})
pump 6dB 06 ('PSD_raw', {'function': 'flat'})
pump min6dB 04 ('PSD_raw', {'function': 'flat'})
pump min6dB 06 ('PSD_raw', {'function': 'flat'})
fan 6dB 04 ('PSD_raw', {'function': 'flat'})
fan 6dB 06 ('PSD_raw', {'function': 'flat'})
fan min6dB 04 ('PSD_raw', {'function': 'flat'})
fan min6dB 06 ('PSD_raw', {'function': 'flat'})
valve 6dB 04 ('PSD_raw', {'function': 'flat'})
valve 6dB 06 ('PSD_raw', {'function': 'flat'})
valve min6dB 04 ('PSD_raw', {'function': 'flat'})
valve min6dB 06 ('PSD_raw', {'function': 'flat'})


HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

../..//dataset/extdia_v1\slider6dB04_EDiaV1HPaug0_pandaDisc.pkl --> Done
...loading data
data loading completed

...preprocessing data
data preprocessing finished

...fitting the model
0.9999368766569878
model fitted successfully

...evaluating model
evaluation successfull, roc_auc: 0.9991793965408409
pipe saved to pickle
../..//dataset/extdia_v1\slider6dB06_EDiaV1HPaug0_pandaDisc.pkl --> Done
...loading data
data loading completed

...preprocessing data
data preprocessing finished

...fitting the model
0.9946370407776797
model fitted successfully

...evaluating model
evaluation successfull, roc_auc: 0.8530488574674915
pipe saved to pickle
../..//dataset/extdia_v1\slidermin6dB04_EDiaV1HPaug0_pandaDisc.pkl --> Done
...loading data
data loading completed

...preprocessing data
data preprocessing finished

...fitting the model
0.9967964903421285
model fitted successfully

...evaluating model
evaluation successfull, roc_auc: 0.9455561166519378
pipe saved to pickle
../..//dataset/extdia_v1\

## SVM

In [2]:
diagram = 'extdia_v1'
machines = ['slider',
            'pump',
            'fan',
            'valve',
            ]
SNRs = ['6dB', 'min6dB'
        ]
IDs = [#'00'
       #,'02'
        '04'
       ,'06'
        ]
features = [#('MEL_denbssm', {'function':'frame', 'frames':3})
            #, ('MEL_denbssm', {'function':'frame', 'frames':5})
            #, ('MEL_denbssm', {'function':'frame', 'frames':7})
            #, ('MEL_denbssm', {'function':'flat'})
            #('PSD_denbssm', {'function':'flat'})
            #, ('MEL_bssm', {'function':'frame', 'frames':3})
            #, ('MEL_bssm', {'function':'frame', 'frames':5})
            #, ('MEL_bssm', {'function':'frame', 'frames':7})
            #, ('MEL_bssm', {'function':'flat'})
             #('PSD_bssm', {'function':'flat'})
            #, ('MEL_raw', {'function':'frame', 'frames':3})
            #, ('MEL_raw', {'function':'frame', 'frames':5})
            #, ('MEL_raw', {'function':'frame', 'frames':7})
            #, ('MEL_raw', {'function':'flat'})
             ('PSD_raw', {'function':'flat'})
            #, ('MEL_den', {'function':'frame', 'frames':3})
            #, ('MEL_den', {'function':'frame', 'frames':5})
            #, ('MEL_den', {'function':'frame', 'frames':7})
            #, ('MEL_den', {'function':'flat'})
            #, ('PSD_den', {'function':'flat'})
            #, ('ICA_demix', {'function':'flat'})
            #, ('ICA_demix', {'function':'maxrange'})
            ]

tasks = []
for machine in machines:
    for SNR in SNRs:
        for ID in IDs:
            for feature in features:
                print(machine,SNR,ID,feature)
                if machine == 'valve':
                    diagram_patch = 'extdia_v1_sporafic'
                else:
                    diagram_patch = diagram
                path = glob.glob(BASE_FOLDER \
                    + '/dataset/{}/{}{}{}_EDiaV1*aug0'.format(diagram_patch, machine, SNR, ID) 
                    + "*pandaDisc*.pkl", recursive=True)[0]
                d = {
                'path_descr': path,
                'feat':feature[1], 
                'feat_col':feature[0], 
                'SNR':SNR, 
                'machine':machine, 
                'ID':ID,
                'BASE_FOLDER':BASE_FOLDER}
                tasks.append(d)


preprocessing = [
    #(PCA, {'n_components':64}),
    (StandardScaler, {})
]

modeling = (uni_svm, {'C': 0.1, 'degree':3,'kernel':'rbf'})

# modeling = (uni_EllipticEnvelope, {})

pipes = [Pipe(preprocessing, modeling, pseudo_sup=True) for i in range(len(tasks))]

tasks_failed = []
for pipe, task in tqdm(zip(pipes, tasks), total=len(tasks)):
    try:
        pipe.run_pipe(task)
    except:
        tasks_failed.append(task)
        print('Task failed')

slider 6dB 04 ('PSD_raw', {'function': 'flat'})
slider 6dB 06 ('PSD_raw', {'function': 'flat'})
slider min6dB 04 ('PSD_raw', {'function': 'flat'})
slider min6dB 06 ('PSD_raw', {'function': 'flat'})
pump 6dB 04 ('PSD_raw', {'function': 'flat'})
pump 6dB 06 ('PSD_raw', {'function': 'flat'})
pump min6dB 04 ('PSD_raw', {'function': 'flat'})
pump min6dB 06 ('PSD_raw', {'function': 'flat'})
fan 6dB 04 ('PSD_raw', {'function': 'flat'})
fan 6dB 06 ('PSD_raw', {'function': 'flat'})
fan min6dB 04 ('PSD_raw', {'function': 'flat'})
fan min6dB 06 ('PSD_raw', {'function': 'flat'})
valve 6dB 04 ('PSD_raw', {'function': 'flat'})
valve 6dB 06 ('PSD_raw', {'function': 'flat'})
valve min6dB 04 ('PSD_raw', {'function': 'flat'})
valve min6dB 06 ('PSD_raw', {'function': 'flat'})


HBox(children=(FloatProgress(value=0.0, max=16.0), HTML(value='')))

../..//dataset/extdia_v1\slider6dB04_EDiaV1HPaug0_pandaDisc.pkl --> Done
...loading data
data loading completed

...preprocessing data
data preprocessing finished

...fitting the model
0.997695997980053
model fitted successfully

...evaluating model
evaluation successfull, roc_auc: 0.9885746749147835
pipe saved to pickle
../..//dataset/extdia_v1\slider6dB06_EDiaV1HPaug0_pandaDisc.pkl --> Done
...loading data
data loading completed

...preprocessing data
data preprocessing finished

...fitting the model
0.968913016033329
model fitted successfully

...evaluating model
evaluation successfull, roc_auc: 0.9938139123847999
pipe saved to pickle
../..//dataset/extdia_v1\slidermin6dB04_EDiaV1HPaug0_pandaDisc.pkl --> Done
...loading data
data loading completed

...preprocessing data
data preprocessing finished

...fitting the model
0.9864521525059967
model fitted successfully

...evaluating model
evaluation successfull, roc_auc: 0.9154620628708496
pipe saved to pickle
../..//dataset/extdia_v1\sl

In [90]:
tasks_failed

[]