# Libraries and Data

**A FAZER:**
- Criar o dataframe
    - Gerar as features a partir de várias funções de agregação, onde cada uma será uma coluna do dataframe
    - Cada uma das partes (janelas) geradas anteriormente serão uma linha do dataframe

**Extra:** Pesquisar o significidado de cada uma das funções de agregação abaixo

Abaixo está um algoritmo que caminha por todos os arquivos dentro de um diretório e retorna o nome da sua pasta, o nome do arquivo e os seus dados de X e Y. Isso serve para poder iniciar as análises de forma mais automatizada.

# Data and libraries

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!cp -r /content/drive/MyDrive/IC\ CNC/cutting_tests_processed /content/
path_df_processed = "/content/drive/MyDrive/IC CNC/cutting_tests_processed"
! cp /content/dados-chatter/timeSeries_3inchStickout/F_12-Jun-2017_rpm570_doc0p015.mat /content/

cp: cannot stat '/content/dados-chatter/timeSeries_3inchStickout/F_12-Jun-2017_rpm570_doc0p015.mat': No such file or directory


In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.io as sc
import scipy.stats
from scipy.fftpack import fft, ifft 
from os import listdir
import os.path

In [4]:
mat = scipy.io.loadmat('/content/cutting_tests_processed/2inch_stickout/c_320_005.mat')

In [5]:
# creating new folders that will withhold the new dataframes created by spliting
! mkdir split_cutting_tests_processed 
! cd split_cutting_tests_processed 
list_of_folders = listdir("cutting_tests_processed")
for folder in list_of_folders:
    list_of_files = listdir(f'cutting_tests_processed/{folder}')
    ! mkdir split_cutting_tests_processed/{folder}

In [6]:
list_of_folders = listdir("cutting_tests_processed")
for folder in list_of_folders:
    list_of_files = listdir(f'cutting_tests_processed/{folder}')
    for file in list_of_files:
        if file[0] != 'u':
            ''' spliting each processed data into for new files
            saves them as files in the same organization logic as the original files
            files classified as unkown were not used
            '''
            data = sc.loadmat(f'cutting_tests_processed/{folder}/{file}')
            df = pd.DataFrame.from_dict(data['tsDS'][:,:])
            df.rename({0: 't', 1: 'y'}, axis=1, inplace=True)
            df_split = np.array_split(df,4)
            for split_num in np.arange(4):
                df_split[split_num].to_csv(f'split_cutting_tests_processed/{folder}/{file[:-4]}_split_{split_num}.csv')
                

In [7]:
list_of_folders = listdir("split_cutting_tests_processed")
cols = {'t': [],
        'y': []} # dictionary to whom the data will be added to create a df
for folder in list_of_folders:
    list_of_files = listdir(f'split_cutting_tests_processed/{folder}')
    for file in list_of_files:
        if file[0] != 'u':
            ''' Adding noise to all the split data files and saving it as a separate file.
             The only column to whom was added noise is the "y" column. The other one
             represents time, therefor cannot have added noise. 
             The files classified as "unknown" for the presence of chatter were not used 
             '''
            df = pd.read_csv(f'/content/split_cutting_tests_processed/{folder}/{file}')
            df['y'] = df['y'].astype('float')
            df['t'] = df['t'].astype('float')

            mu, sigma = 0, np.sqrt(np.mean(df['y']**2))*0.1 # adding noise using rms
            noise = np.random.normal(mu, sigma, df['y'].shape)

            cols['y'] = df['y'] + noise
            cols['t'] = df['t']

            df_noise_added = pd.DataFrame(cols)
            df_noise_added.to_csv(f'split_cutting_tests_processed/{folder}/{file[:-4]}_noise_added.csv')


In [8]:
# list_of_folders = listdir('cutting_tests_processed') # returns a list with the name of the folders inside "cutting_tests_processed"
# for folder in list_of_folders:
#     list_of_files = listdir(f'cutting_tests_processed/{folder}') # returns a list with the name of the files in the folder "cutting_tests_processed"
#     for file in list_of_files:
#         if file[0] != "u": # ignores data that has not been categorized by the occurence or not of chatter
#             data = sc.loadmat(f'cutting_tests_processed/{folder}/{file}') # reads .mat file
#             t = data['tsDS'][:, 0] # takes information from axis x
#             y = data['tsDS'][:, 1] # takes information from axis y
#             data_df = pd.DataFrame.from_dict(data['tsDS'][:,0])
#             # print("t =", t)
#             # print("y =", y)
        

# Exemplos de Funções de Agregação

## Domínio do tempo:

### Média ($\mu_x$)

$$\mu_x = \frac{1}{N}\sum_{i=0}^{N} x_i$$

### Desvio Padrão ($\sigma_x$)

$$\sigma_x^2 = \frac{1}{N}\sum_{i=0}^{N} (x_i - \mu_x)^2$$

### Curtose ($\kappa_x$)

$$\kappa_x = \frac{1}{N}\sum_{i=0}^{N} \Bigg(\frac{x_i - \mu_x}{\sigma_x}\Bigg)^4$$

### Distorção ($\gamma_x$)

$$\gamma_x = \frac{1}{N}\sum_{i=0}^{N} \Bigg(\frac{x_i - \mu_x}{\sigma_x}\Bigg)^3$$

### Amplitude Pico a Pico ($x_{ppv}$)

$$x_{ppv} = max(x_i) - min(x_i)$$

### Valor Quadrático Médio ($x_{rms}$)

$$x_{rms} = \Bigg(\frac{1}{N}\sum_{i=0}^{N} x_i^2\Bigg)^{1/2} $$

### Raiz Quadrada da Amplitude ($x_{sra}$)

$$x_{sra} = \Bigg(\frac{1}{N}\sum_{i=0}^{N} \sqrt{\left |x_i  \right |}\Bigg)^2$$

### Fator de Crista ($x_{cf}$)

$$x_{cf} = \frac{max(\left | x_i \right |)}{x_{rms}}$$

### Fator de Impulso ($x_{if}$)

$$x_{if} = \frac{max(\left | x_i \right |)}{\frac{1}{N}\sum_{i=0}^{N} \left |x_i  \right |}$$

### Fator de Margem ($x_{mf}$)

$$x_{mf} = \frac{max(\left | x_i \right |)}{x_{sra}}$$

### Fator de Curtose ($x_{kf}$)

$$x_{kf} = \frac{\kappa_x}{x_{rms}^4}$$


## Domínio da Frequência:

### Média ($\mu_x$)

$$\mu_X = \frac{1}{N}\sum_{i=0}^{N} X_i$$

### Desvio Padrão ($\sigma_x$)

$$\sigma_X^2 = \frac{1}{N}\sum_{i=0}^{N} (X_i - \mu_X)^2$$

### Valor Quadrático Médio ($x_{rms}$)

$$x_{rms} = \Bigg(\frac{1}{N}\sum_{i=0}^{N} X_i^2\Bigg)^{1/2} $$

### Valor de Pico

$$max(X)$$

### Frequência do Pico

$$f \space para \space quando \space max(X) \space é \space verdadeiro $$

## Categorias:

### Label 1

Nome da pasta

### Label 2

Primeira letra do nome do arquivo