Audio Classification using TensorFlow

UrbanSound8K Audio Classifier: TensorFlow model

Overview

This repository contains a TensorFlow model for classifying audio samples from the UrbanSound8K dataset. The model achieves high accuracy through a streamlined workflow and comprehensive data analysis.

Dataset

The UrbanSound8K dataset consists of 8,732 audio samples across 10 classes. Each sample is labeled with the corresponding sound class, making it suitable for supervised learning tasks.

Features

Utilizes Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction.
TensorFlow model architecture optimized for audio classification tasks.
Efficient training process with Adam optimizer and early stopping.

Usage of Librosa for Audio Processing

This project heavily relies on the Librosa library for various audio processing tasks, including:

Loading audio files in various formats.
Extracting features using Mel-Frequency Cepstral Coefficients (MFCC) function.
Visualizing audio waveforms and spectrograms.
Performing advanced audio analysis by gauging the sampling rate of the loaded audio.
Performing manipulations such as time-stretching, pitch-shifting, and noise injection using Librosa's built-in functions.

Model Architecture

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, Input

# Define the model
model = Sequential(
    layers=[

            Input(shape=X[0].shape),

            # Layer 1
            Dense(100),
            Activation('relu'),
            Dropout(0.5),
            
            # Layer 2
            Dense(200),
            Activation('relu'),
            Dropout(0.5),
            
            # Layer 3
            Dense(100),
            Activation('relu'),
            Dropout(0.5),
            
            # Output Layer
            Dense(num_labels),
            Activation('softmax')
        
        ]

)
model.build()

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Model-Train Configuration

from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from datetime import datetime

EPOCHS = 1000
BACTH_SIZE = 32

checkPoint = ModelCheckpoint(filepath='checkpoints/model.h5', verbose=1, save_best_only=True)

start = datetime.now()

model.fit(X_train, y_train, batch_size=BACTH_SIZE, epochs=EPOCHS, validation_data=(X_test, y_test), callbacks=[checkPoint])

duration = datetime.now() - start
print("Elapsed Time: ", duration)

Results

Test accuracy: close to 82% (Actual : 81.96%)
Model checkpoint available for further experimentation.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
Dataset/UrbanSound8K		Dataset/UrbanSound8K
checkpoints		checkpoints
.gitattributes		.gitattributes
7383-3-1-0.wav		7383-3-1-0.wav
README.md		README.md
model.ipynb		model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Classification using TensorFlow

Overview

Dataset

Features

Usage of Librosa for Audio Processing

Model Architecture

Model-Train Configuration

Results

About

Releases

Packages

Languages

Ayushh1023/Audio-Classification-using-TensorFlow

Folders and files

Latest commit

History

Repository files navigation

Audio Classification using TensorFlow

Overview

Dataset

Features

Usage of Librosa for Audio Processing

Model Architecture

Model-Train Configuration

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages