## Overview

This notebook provides an exploration of audio data from animals, performing analysis through statistical analysis and visualisation of audio samples

Author: Andrew Kudilczak

## Pre-requisites to run this notebook.

This notebook explores a data set that is stored on the Deakin teams channel here:

[DataSets Folder](https://deakin365.sharepoint.com/:f:/r/sites/DataBytes2/Shared%20Documents/Project%20Echo/DataSets?csf=1&web=1&e=oxQZWj)

This dataset was not uploaded to git to keep the repository light weight.  This notebook assumes the dataset has been mapped to your local drive - see paths below

## Development environment

This notebook requires a conda environment setup in accordance with [Data Science Environment Setup](https://github.com/stephankokkas/Project-Echo/tree/main/src)

In [8]:
# some basic libraries 
from platform import python_version
import pandas as pd
import seaborn as sns
import numpy as np
import os

# plot support
import plotly.express as px
import matplotlib.pyplot as plt

# tensorflow support
import tensorflow as tf
import tensorflow_datasets as tfds

# reading audio datasets
import librosa
import librosa.display
import IPython.display as ipd

# print system information
print('Librosa Version    : ', python_version())
print('TensorFlow Version : ', tf.__version__)
print('Librosa Version    : ', librosa.__version__)

Librosa Version    :  3.9.13
TensorFlow Version :  2.10.0
Librosa Version    :  0.9.2


In [9]:
# set system parameters
DATASET_PATH = 'C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/'

In [25]:
def dataset_from_dir_structure(dataset_path):
    # each subdir represents a data class
    subfolders = [f for f in os.scandir(DATASET_DIR) if f.is_dir()]

    dataset_filenames = []
    dataset_labels = []

    for subfolder in subfolders:
        # now get all the files in the folder
        audiofiles = [f for f in os.scandir(subfolder.path) if f.is_file()]
        
        for audiofile in audiofiles:
            dataset_filenames.append(audiofile.path)
            dataset_labels.append(subfolder.name)
            
    # create the dataset
    dataset = tf.data.Dataset.from_tensor_slices((dataset_filenames, dataset_labels))
    
    return dataset

brant C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant
XC112697.ogg C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\XC112697.ogg
XC127691.ogg C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\XC127691.ogg
XC143518.ogg C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\XC143518.ogg
XC149188.ogg C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\XC149188.ogg
XC149189.ogg C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\XC149189.ogg
XC149310.ogg C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\XC149310.ogg
XC149312.ogg C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\XC149312.ogg
XC149389.ogg C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\XC149389.ogg
XC161414.ogg C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\XC161414.ogg
XC161421.ogg C:/Users/Andrew/O

In [31]:
# create the dataset
dataset = dataset_from_dir_structure()

# print three elements of the dataset pipeline
for row in dataset.take(3):
    print (row)

(<tf.Tensor: shape=(), dtype=string, numpy=b'C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\\XC112697.ogg'>, <tf.Tensor: shape=(), dtype=string, numpy=b'brant'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\\XC127691.ogg'>, <tf.Tensor: shape=(), dtype=string, numpy=b'brant'>)
(<tf.Tensor: shape=(), dtype=string, numpy=b'C:/Users/Andrew/OneDrive - Deakin University/DataSets/birdclef2022/brant\\XC143518.ogg'>, <tf.Tensor: shape=(), dtype=string, numpy=b'brant'>)


In [29]:
dataset_labels

['brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',
 'brant',


In [None]:
dataset.show