# Final Project - Data Collection

This is a **project group assignment** with the teams already established.

# Project B: Ambient Sound Classification

## Step 1: Data Collection

Each team must **record at least 100 audio clips, each 5-second long**. We recommend collecting evenly number of samples per class, in this case, **20 audio clips per class (indoor quiet, street traffic, kitchen activity, human chatter, and nature sounds)**. These should be collected across at least three (3) different environments to ensure diversity. Clips should be stored in *.wav* format and labeled with the environment type. After data collection, all teamsâ€™ data will be merged, preprocessed, and split into training and test sets.

In order to standardize data collection as much as possible, please use the [Voice Recorder App](https://play.google.com/store/apps/details?id=com.media.bestrecorder.audiorecorder&hl=en_US) with the following settings:
* **Microphone adjustment:** Device auto control
* **File format:** .wav
* **Recording quality:** Mono - 48kHz
* **Default file name:** see instructions below
* **Duration:** Trim all your audio clip recordings to have 5 seconds duration exactly (this is important!).

The **ambient sound labels** are: 
* indoor quiet - Label 1
* street traffic - Label 2
* kitchen activity - Label 3
* human chatter - Label 4 
* nature sounds - Label 5

We recommend you to save your files using a **coding system**, e.g. **ID-trial-label**. First give a number from 1 to 3 to each team member, this is the ID. Then, for example, when team member with ID 2 is recording hers/his/theirs 3rd recording of nature sounds (label 5), the file name should read "2-3-5.wav".

## Step 2: Install library ```librosa```

Follow instructions here: [https://pypi.org/project/librosa/](https://pypi.org/project/librosa/).

## Step 3: Prepare Data for Submission

The code below will read a folder with all of your .wav files, and format the data.

First, you must change the variable ```mydir``` below with the directory path where your .wav recordings.

In [1]:
import numpy as np
import os
import librosa

# CHANGE ME!
mydir = 'my_data/'

Using the **coding system** explained above, run the following code to save your data and respective labels as single numpy files.

In [None]:
labels = np.array([])
data = np.array([])
i=0
for file in os.listdir(mydir):
    if file.endswith(".wav"): # Will only read .wav files, you may add any other format, if needed.
        filewav = file
        filename = mydir+'/'+file
        y, sr = librosa.load(filename, sr=44000)

        # Not all files are exactly 5s, we need to handle that
        desired_length = 44000*5
        if len(y) < desired_length:
            # pad with zeros
            y = np.pad(y, (0, desired_length - len(y)))
        else:
            # trim to desired length
            y = y[:desired_length]

        # Add to data
        if i==0:
            data = y.reshape(-1,1)
        else:
            data = np.hstack((data, y.reshape(-1,1)))
        
        # I modified the code to work with two digit numbers
        my_label = filewav.replace(".wav", "").split("-")[2]
        labels = np.hstack((labels, int(my_label)))
        i+=1

print('----------------------DONE-----------------------------')

# Saves the files to your current directory
np.save('data', data)
np.save('labels', labels)

  "cipher": algorithms.TripleDES,
  "class": algorithms.Blowfish,
  "class": algorithms.TripleDES,


----------------------DONE-----------------------------


In [3]:
# Confirm that the data saved all 100 audio clips. 
# The shape of data should be DxN and labels Nx1
# where D is the length of your audio clip (assuming a 48kHz sampling rate),
# and N is the number of samples

data.shape, labels.shape

((220000, 80), (80,))

In [4]:
# Example audio file
from IPython.display import display, Audio
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('bmh')

# Label names
labels_names = ['indoor quiet', 'street traffic', 'kitchen activity', 'human chatter', 'nature sounds']

idx = 0 #first sample in data
plt.figure(figsize=(10,2))
plt.plot(data[:,idx])
plt.title('Label: '+labels_names[int(labels[idx]-1)], fontsize=20);

display(Audio(data[:,idx], rate=44000, autoplay=True))

## Step 4: Upload your data in Canvas

To receive full credit in this question, you should submit to Canvas:

1. Compressed folder (.zip) with the recordings from all team members. (100 recordings per team should be included.)
2. File "data.npy"
3. File "labels.npy"

## Submit your Solution

Confirm that you've successfully completed the assignment.

```add``` and ```commit``` the final version of your work, and ```push``` your code/data to your GitHub repository -- **you may run into memory issues. If this happens, disregard this step and only submit the data files to Canvas**

Submit the URL of your GitHub Repository along with all data as your assignment submission on Canvas.