

# Audio Processing using CNN in TensorFlow

In this notebook, we will walk through the basics of audio processing using a Convolutional Neural Network (CNN) in TensorFlow. We'll cover the following steps:
1. Load and preprocess the audio data
2. Convert audio data to spectrograms
3. Build a CNN model
4. Train the model
5. Evaluate the model

## 1. Load and Preprocess Audio Data

We'll use the UrbanSound8K dataset for this example. First, let's install the necessary libraries and download the dataset.



In [None]:

# Install required libraries
!pip install tensorflow numpy matplotlib

In [None]:
# Install necessary libraries
!pip install tensorflow librosa wget


In [3]:
# Download the UrbanSound8K dataset
import os
import wget



In [4]:
url = r'https://zenodo.org/record/1203745/files/UrbanSound8K.tar.gz'
output_dir = r'UrbanSound8K.tar.gz'
wget.download(url, output_dir)

'UrbanSound8K.tar.gz'

In [5]:
# Extract the dataset
import tarfile
tar = tarfile.open(output_dir, "r:gz")
tar.extractall()
tar.close()


2. Convert Audio Data to Spectrograms
We will use Librosa to load audio files and convert them into spectrograms.



In [None]:
import librosa
import numpy as np
import matplotlib.pyplot as plt
import librosa.display

def audio_to_spectrogram(file_path, max_pad_len=128):
    y, sr = librosa.load(file_path, sr=None)
    spectrogram = librosa.feature.melspectrogram(y=y, sr=sr)
    spectrogram_db = librosa.power_to_db(spectrogram, ref=np.max)

    # Padding or truncating the spectrogram to a fixed size
    if spectrogram_db.shape[1] > max_pad_len:
        spectrogram_db = spectrogram_db[:, :max_pad_len]
    else:
        pad_width = max_pad_len - spectrogram_db.shape[1]
        spectrogram_db = np.pad(spectrogram_db, pad_width=((0, 0), (0, pad_width)), mode='constant')

    return spectrogram_db, sr

# Example: Convert an audio file to a spectrogram and plot it
example_file = 'UrbanSound8K/audio/fold1/101415-3-0-2.wav'
spectrogram, sr = audio_to_spectrogram(example_file)

plt.figure(figsize=(10, 4))
librosa.display.specshow(spectrogram, x_axis='time', y_axis='mel', sr=sr)
plt.colorbar(format='%+2.0f dB')
plt.title('Mel-frequency spectrogram')
plt.tight_layout()
plt.show()


3. Build a CNN Model
We will build a simple CNN model for audio classification


In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models

def build_cnn_model(input_shape):
    model = models.Sequential()
    model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(128, (3, 3), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Flatten())
    model.add(layers.Dense(128, activation='relu'))
    model.add(layers.Dense(10, activation='softmax'))  # Assuming 10 classes in the dataset

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

input_shape = (128, 128, 1)  # Example input shape
model = build_cnn_model(input_shape)
model.summary()


4. Train the Model
We'll prepare the dataset and train the model.

In [None]:
# Prepare the dataset
from sklearn.model_selection import train_test_split

def prepare_dataset(folder, test_size=0.2, max_pad_len=128):
    X = []
    y = []
    for subdir, dirs, files in os.walk(folder):
        for file in files:
            if file.endswith('.wav'):
                file_path = os.path.join(subdir, file)
                spectrogram, sr = audio_to_spectrogram(file_path, max_pad_len=max_pad_len)
                spectrogram = np.expand_dims(spectrogram, axis=-1)  # Add channel dimension
                label = int(file.split('-')[1])  # Assuming label is in the filename
                X.append(spectrogram)
                y.append(label)
    X = np.array(X)
    y = np.array(y)
    return train_test_split(X, y, test_size=test_size, random_state=42)

X_train, X_test, y_train, y_test = prepare_dataset('UrbanSound8K/audio')

# Train the model
history = model.fit(X_train, y_train, epochs=10, validation_data=(X_test, y_test))


5. Evaluate the Model
Finally, we'll evaluate the model on the test set.

In [9]:
test_loss, test_acc = model.evaluate(X_test, y_test)
print(f'Test accuracy: {test_acc:.2f}')


[1m55/55[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7256 - loss: 1.0394
Test accuracy: 0.74


This notebook provides a basic walkthrough of audio processing using a CNN in TensorFlow. You can further experiment with different model architectures, hyperparameters, and data augmentation techniques to improve the performance.