
## Downloading Datasets

One of the important aspects of neural networks is that you need a lot of data that is diverse enough to really teach the network the difference between the classes of sounds that you are trying to distinguish.

For the first exercise, we will use a classic machine learning example, [Cats vs. Dogs](https://pythonprogramming.net/convolutional-neural-network-kats-vs-dogs-machine-learning-tutorial/), but in our case, in audio form. Go head on over to the following kaggle link, make an account and download the data: Download: [www.kaggle.com/mmoreaux/audio-cats-and-dogs](https://www.kaggle.com/mmoreaux/audio-cats-and-dogs/downloads/audio-cats-and-dogs.zip/). After it finished downloading, unzip the folder and move it to to the  ``AudioData/`` folder in the repository.

---

In the meantime, while things are downloading, and you are working go ahead and start downloading the Urban sound data sets [here on kaggle](https://www.kaggle.com/pavansanagapati/urban-sound-classification). When its finished, again unzip and move it to the ``AudioData/`` folder.

---

While the data is downloading lease have a look at the document [Information-flow-and-the-underlying-idea](https://github.com/DavidGoedicke/RealtimeAudioClassification/wiki/Information-flow-and-the-underlying-idea) in which we explain the data flow and structure of this project.


## Transforming data-sets
As mentioned before, We need to move from one folder full of data into a folder for each class. In some cases, we also need to split the data between test and training data. 

The following script will look at the cats-dogs data folder and use the .csv file seperate them into the different folders.

In [None]:
SOURCEFOLDER= '../AudioData/audio-cats-and-dogs/cats_dogs/'
CSV = '../AudioData/audio-cats-and-dogs/train_test_split.csv'
folderNames=['train','test']  # We define a list of folders for us to store the data in
classes = ['cat','dog'] # Similar like the row above, we define the classes that will turn into folders later
ClassFolderPath={} # Dictionaries are a handy way to keep track of complex data like a folder path

'''
In the following lines of the script, we make sure that the target folders exsist. 
The first for loops checks for the test and train folder and the inside for loop looks for the class folders.
If they do not exsists they are created.
The path are all added to the dictionary to use later.
'''
for i in folderNames: 
    TopLevelFolder = os.path.join(SOURCEFOLDER,i)
    if not os.path.exists(TopLevelFolder):
        os.makedirs(TopLevelFolder)
    TempDict={}
    for c in classes:
        LowLevelFolder = os.path.join(TopLevelFolder,c)
        if not os.path.exists(LowLevelFolder):
            os.makedirs(LowLevelFolder)
        TempDict[c] = LowLevelFolder
    ClassFolderPath[i] = TempDict;

'''

We now have a nested dictionary with the final folder path.
This allows us to find the folder path e.g. 
for the cats - training folder like this ClassFolderPath['train']['cat'].


This makes it very easy to seperate out the different files and move them to the correct folder. 


'''
    
with open(CSV, newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        for key in row.keys():
            if(len(key)==0):
                continue;
            if(len(row[key])==0):
                continue
            outDirectoryPath=''    
            if('cat'in key and 'train' in key):
                outDirectoryPath= ClassFolderPath['train']['cat']
            elif('cat'in key and 'test' in key):
                outDirectoryPath= ClassFolderPath['test']['cat']
            elif('dog'in key and 'train' in key):
                outDirectoryPath= ClassFolderPath['train']['dog']
            elif('dog'in key and 'test' in key):
                outDirectoryPath= ClassFolderPath['test']['dog']
            inFilePath = os.path.join(SOURCEFOLDER,row[key])
            outFilePath =os.path.join(outDirectoryPath,row[key])
            if(os.path.isfile(inFilePath) and not os.path.isfile(outFilePath)):
                os.rename(inFilePath,outFilePath)
                print("Moved file: "+row[key])
            else:
                print("Skipping file: "+row[key]+ " as it already exsists")

**HOT TIP:**
If you write your own code, while testing, instead of calling ``os.rename(inFilePath,outFilePath)`` first print out the path like ``print(inFilePath,outFilePath)`` and compare them by hand. Not only can it be a if the code does not work correctly but if you are not careful and move files into or away from your root directory ``/``, you can mess up your operating system really easily. In other words, make sure your code works befor you start moving around big chunks of files.


## The Next Steps 
After we organizied our data into class folders we can no generate the spectrograms we will be using to classify the incoming sound. 
Go to folder ``01_Spectrum Generation/`` and open the Notebook it contains. Further, look at the section ``Squishing time  - taking a snapshot`` [here](https://github.com/DavidGoedicke/RealtimeAudioClassification/wiki/Information-flow-and-the-underlying-idea#squishing-time----taking-a-snapshot) for an explanation for whats happening.

## Urban sound preperation

The test files are not usefule to us. We can only use the train data that contain labels. This is why we only reorganize the ``Train`` folder.

In [None]:
SOURCEFOLDER= '../AudioData/urban-sound-classification/train/Train/'
CSV = '../AudioData/urban-sound-classification/train/train.csv'
'''
The Urban-Sounds csv file is slighlty differently organized.
Each file ID appears in the list and has the actual class written next to it.
We can use the class name to create and use the class folder. and move the files to where they need to go. 
'''

with open(CSV, newline='') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    for row  in readCSV:
        if( row[0]=='ID'):
            continue;   
        targetFolder = os.path.join(SOURCEFOLDER,row[1]) #row[1] contains the class name 
        if not os.path.exists(targetFolder): #. checking if the folder exsists
            os.makedirs(targetFolder) #if !NOT! then we create a new directory
        filename = row[0]+".wav"  #Row[0] contains the id. 
        inFilePath = os.path.join(SOURCEFOLDER,filename) # Creating source path
        outFilePath = os.path.join(targetFolder,filename) # Creating target path 
        if(os.path.isfile(inFilePath) and not os.path.isfile(outFilePath)): # Checking if we are not overwriting something 
            print("Moved file: "+filename)
            os.rename(inFilePath,outFilePath) # Moving the file
        else:
            print("Skipping file: "+filename+ " as it already exsists") # if the target file already exsisted or the source was not there
       

## Appendix

### Handy Links

https://stackoverflow.com/questions/8858008/how-to-move-a-file-in-python    how to move a file in python
