# Data Exploration for GuitarTab Project

This notebook explores the Guitar Chords V2 and GuitarSet datasets to understand their structure, distribution, and characteristics.

## Table of Contents
1. [Import Libraries](#Import-Libraries)
2. [Define Paths and Parameters](#Define-Paths-and-Parameters)
3. [Load Sample Audio Files](#Load-Sample-Audio-Files)
4. [Visualize Waveforms](#Visualize-Waveforms)
5. [Compute and Visualize Spectrograms](#Compute-and-Visualize-Spectrograms)
6. [Class Distribution](#Class-Distribution)
7. [Conclusion](#Conclusion)

In [None]:
# 1. Import Libraries
import os
import numpy as np
import matplotlib.pyplot as plt
import librosa
from src.data_preprocessing import load_wav_16k_mono, preprocess_audio, compute_spectrogram, plot_waveform, plot_spectrogram
import seaborn as sns
import pandas as pd

## 2. Define Paths and Parameters

Setting the paths to the datasets and defining any necessary parameters.

In [None]:
# 2. Define Paths and Parameters
GUITARCHORDS_PATH = '/content/drive/MyDrive/GuitarTab/data/raw/GuitarChordsV2/'
GUITARSET_PATH = '/content/drive/MyDrive/GuitarTab/data/raw/GuitarSet/'

chords = ['Am', 'Bb', 'Bdim', 'C', 'Dm', 'Em', 'F', 'G']

## 3. Load Sample Audio Files

Load a few sample audio files from each dataset to inspect their content.

In [None]:
# 3. Load Sample Audio Files
sample_chord = 'Am'
sample_file = os.path.join(GUITARCHORDS_PATH, sample_chord, 'Am_AcousticPlug26_1.wav')  # need to double check again

wav = load_wav_16k_mono(sample_file)
print(f"Audio Loaded: {sample_file}, Duration: {len(wav)/16000:.2f} seconds")

## 4. Visualize Waveforms

Plot the waveform of the sample audio to understand its amplitude variations over time.

In [None]:
# 4. Visualize Waveforms
plot_waveform(wav, title=f"Waveform of {sample_chord} Chord")

## 5. Compute and Visualize Spectrograms

Convert the audio waveform into a spectrogram and visualize it.

In [None]:
# 5. Compute and Visualize Spectrograms
spectrogram = compute_spectrogram(wav)
plot_spectrogram(spectrogram, title=f"Spectrogram of {sample_chord} Chord")

## 6. Class Distribution

Analyze the distribution of chords in the Guitar Chords V2 dataset.

In [None]:
# 6. Class Distribution
import glob

def get_all_files(dataset_path, chords):
    files = []
    labels = []
    for chord in chords:
        chord_path = os.path.join(dataset_path, chord, '*.wav')
        chord_files = glob.glob(chord_path)
        files += chord_files
        labels += [chord] * len(chord_files)
    return files, labels

# Get train and test files
train_files, train_labels = get_all_files(os.path.join(GUITARCHORDS_PATH, 'Training'), chords)
test_files, test_labels = get_all_files(os.path.join(GUITARCHORDS_PATH, 'Test'), chords)

# Combine and create a DataFrame
all_labels = train_labels + test_labels
df = pd.DataFrame({'Chord': all_labels})

# Plot
plt.figure(figsize=(10,6))
sns.countplot(data=df, x='Chord', order=chords)
plt.title('Chord Distribution in Guitar Chords V2 Dataset')
plt.xlabel('Chord')
plt.ylabel('Count')
plt.show()