# File Path CSV


A jupyter notebook where it will create a CSV for the file path

First, we need to set the file path for the zip file and name it as `zip_file`.


In [12]:
zip_file = 'datasets/6_dB_valve.zip'

Then we need to import the necessary libraries here:

In [13]:
import pandas as pd
import numpy as np
from io import BytesIO
from zipfile import ZipFile
import soundfile as sf

We create a script where it will get all the `.wav files` inside the zip file using ZipFile and BytesIO and and save it as a list in `file_names`.

In [16]:
file_names = []
with ZipFile(zip_file, 'r') as zipObj:
    listOfiles = zipObj.namelist()
    for elem in listOfiles:
        if "wav" in elem:
            file_names.append(elem)

Once we have all the file names in a list, we can proceed on creating our dataframe named `df`.

In [29]:
df = pd.DataFrame(file_names, columns=["File Path"])

In [30]:
df

Unnamed: 0,File Path
0,valve/id_00/abnormal/00000000.wav
1,valve/id_00/abnormal/00000001.wav
2,valve/id_00/abnormal/00000002.wav
3,valve/id_00/abnormal/00000003.wav
4,valve/id_00/abnormal/00000004.wav
...,...
4165,valve/id_06/normal/00000987.wav
4166,valve/id_06/normal/00000988.wav
4167,valve/id_06/normal/00000989.wav
4168,valve/id_06/normal/00000990.wav


If for example your file is not on the same folder as the file path, you have to update the file path to the folder where it is saved. So here in my case updating the file path with the folder where the file is located, all my datafiles are located inside the folder `datasets`, and this particular file is inside the `6_db_valve`. I used lambda to add the `datasets/6_dB_valve/` to the file path.

In [31]:
df['File Path'] = df['File Path'].apply(lambda x: zip_file.split(".")[0] + "/" + x) 

In [32]:
df

Unnamed: 0,File Path
0,datasets/6_dB_valve/valve/id_00/abnormal/00000...
1,datasets/6_dB_valve/valve/id_00/abnormal/00000...
2,datasets/6_dB_valve/valve/id_00/abnormal/00000...
3,datasets/6_dB_valve/valve/id_00/abnormal/00000...
4,datasets/6_dB_valve/valve/id_00/abnormal/00000...
...,...
4165,datasets/6_dB_valve/valve/id_06/normal/0000098...
4166,datasets/6_dB_valve/valve/id_06/normal/0000098...
4167,datasets/6_dB_valve/valve/id_06/normal/0000098...
4168,datasets/6_dB_valve/valve/id_06/normal/0000099...


Then we can create other additional columns.

In [34]:
df['File Path'][0].split("/")

['datasets', '6_dB_valve', 'valve', 'id_00', 'abnormal', '00000000.wav']

In [35]:
df['Type of SNR'] = df['File Path'].apply(lambda x: x.split("/")[1])
df['Type of Machine'] = df['File Path'].apply(lambda x: x.split("/")[2])
df['Model Number'] = df['File Path'].apply(lambda x: x.split("/")[3])
df['Status'] = df['File Path'].apply(lambda x: x.split("/")[4])
df['File Name'] = df['File Path'].apply(lambda x: x.split("/")[5])

In [36]:
df

Unnamed: 0,File Path,Type of SNR,Type of Machine,Model Number,Status,File Name
0,datasets/6_dB_valve/valve/id_00/abnormal/00000...,6_dB_valve,valve,id_00,abnormal,00000000.wav
1,datasets/6_dB_valve/valve/id_00/abnormal/00000...,6_dB_valve,valve,id_00,abnormal,00000001.wav
2,datasets/6_dB_valve/valve/id_00/abnormal/00000...,6_dB_valve,valve,id_00,abnormal,00000002.wav
3,datasets/6_dB_valve/valve/id_00/abnormal/00000...,6_dB_valve,valve,id_00,abnormal,00000003.wav
4,datasets/6_dB_valve/valve/id_00/abnormal/00000...,6_dB_valve,valve,id_00,abnormal,00000004.wav
...,...,...,...,...,...,...
4165,datasets/6_dB_valve/valve/id_06/normal/0000098...,6_dB_valve,valve,id_06,normal,00000987.wav
4166,datasets/6_dB_valve/valve/id_06/normal/0000098...,6_dB_valve,valve,id_06,normal,00000988.wav
4167,datasets/6_dB_valve/valve/id_06/normal/0000098...,6_dB_valve,valve,id_06,normal,00000989.wav
4168,datasets/6_dB_valve/valve/id_06/normal/0000099...,6_dB_valve,valve,id_06,normal,00000990.wav


In [41]:
df.to_csv("datasets/filepath_valve_6dB.csv", index=False)

Now that we have the file paths, we can use the other notebook `Preprocessing` to get the features.