## Performing tasks that Librosa does, without Librosa

### Loading/Reading an Audio Wav file and returning data and sample rate

##### WAV FILE FORMAT : 
The WAV (Waveform Audio File Format) is a popular file format for storing audio on Windows systems. It was developed by Microsoft and IBM. The WAV format is widely used because it is relatively simple and can store uncompressed audio data. Here's a brief overview of the WAV file format:

1. **Header Chunk:**
   - **Chunk ID (4 bytes):** It usually contains the ASCII characters "RIFF" (0x52494646 in hexadecimal).
   - **Chunk Size (4 bytes):** Total size of the file excluding the first 8 bytes (header size). It is little-endian, meaning the least significant byte is stored first.
   - **Format (4 bytes):** It usually contains the ASCII characters "WAVE" (0x57415645 in hexadecimal).

2. **Format Chunk:**
   - **Chunk ID (4 bytes):** It usually contains the ASCII characters "fmt " (0x666D7420 in hexadecimal).
   - **Chunk Size (4 bytes):** Size of the format chunk, which is typically 16 bytes for PCM data.
   - **Audio Format (2 bytes):** Audio format. PCM (Pulse Code Modulation) is the most common and has a value of 1.
   - **Number of Channels (2 bytes):** Number of audio channels (1 for mono, 2 for stereo, etc.).
   - **Sample Rate (4 bytes):** Number of samples per second (in hertz).
   - **Byte Rate (4 bytes):** Data rate in bytes per second. Calculated as Sample Rate * Number of Channels * Bits per Sample / 8.
   - **Block Align (2 bytes):** Number of bytes for one sample, including all channels.
   - **Bits per Sample (2 bytes):** Number of bits in each sample.

3. **Data Chunk:**
   - **Chunk ID (4 bytes):** It usually contains the ASCII characters "data" (0x64617461 in hexadecimal).
   - **Chunk Size (4 bytes):** Size of the data section (number of bytes of audio data).
   - **Audio Data:** Actual audio samples follow this field.

The WAV file format can support various audio formats and compression schemes, but PCM is the most common. PCM is a straightforward method of digitally representing analog signals, where the amplitude of the signal is sampled at regular intervals, and each sample is represented by a binary number. PCM audio is uncompressed and provides a high-quality representation of the original audio signal.

It's important to note that the WAV format can also support compressed audio data using codecs like ADPCM or MP3, but these are less common and not always supported universally across all WAV file readers.

In [3]:
file_name = "./UrbanSound8K/UrbanSound8K/audio/fold5/100032-3-0-0.wav"

In [5]:
with open(file_name, 'rb') as audio_file:
    print(audio_file.read())

b'RIFF\xf4\xda\x00\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x02\x00D\xac\x00\x00\x10\xb1\x02\x00\x04\x00\x10\x00data\xd0\xda\x00\x00j\xffl\xff`\xffc\xffi\xffi\xffh\xffe\xff`\xff`\xff[\xff_\xff]\xfff\xffk\xffq\xff}\xff|\xff\x99\xff\x93\xff\xb7\xff\xb3\xff\xd6\xff\xd9\xff\xfa\xff\xfb\xff\x07\x00\x04\x00;\x005\x00P\x00E\x00k\x00b\x00w\x00o\x00\x92\x00\x8c\x00\x98\x00\x96\x00\x9f\x00\xa0\x00\x9e\x00\x9d\x00\x91\x00\x8f\x00\x80\x00~\x00\x82\x00~\x00y\x00v\x00g\x00f\x00R\x00T\x00A\x00B\x001\x005\x00/\x001\x00\x0f\x00\r\x00 \x00\x1b\x00\x12\x00\x0c\x00\t\x00\x05\x00\x10\x00\x13\x00\x10\x00\x1a\x00\x0e\x00\x19\x00\x1d\x00%\x00$\x00\'\x00;\x008\x00O\x00P\x00S\x00T\x00_\x00`\x00[\x00]\x00_\x00c\x00e\x00j\x00g\x00m\x00]\x00b\x00w\x00w\x00j\x00g\x00i\x00e\x00j\x00e\x00M\x00H\x00^\x00^\x00H\x00M\x001\x006\x008\x007\x00\x1f\x00\x1f\x00\x08\x00\r\x00\xf2\xff\xf6\xff\xd9\xff\xdb\xff\xcd\xff\xcd\xff\xbf\xff\xbd\xff\xb0\xff\xad\xff\xb1\xff\xaf\xff\xb4\xff\xb0\xff\xcb\xff\xc0\xff\xd3\xff\xcc\xff\xcb\xff\xcd\x

#### In the above WAV file : 
##### Header Chunk : 
- Chunk ID : "RIFF"
- Chunk Size : "\xf4\xda\x00\x00" -> 419790976
- Format : "WAVE"
- Entire file size : 419790976(Chunk Size) + 4(RIFF) + 4(WAVE) = 419790984

##### Format Chunk
- Chunk ID : "fmt"
- Chunk Size : "10 00 00 00" (16 bytes in decimal)
- Audio Format : 01 00 (PCM Format, little endian)
- Number of Channels : 02 00(2 Channels, Little endian)
- Sample Rate : 44 ac 00 00 (44100 Hz, Little Endian)
- Byte Rate : 10 b1 02 00 ( 176400 bytes per second, Little Endian)
- Block Align : 04 00(4 bytes per sample, little endian)
- Bits per sample : 10 00(16 bits per sample, little-endian)

##### Data Chunk 
- Chunk ID : "data"
- Chunk Size : d0 da 00 00 ( 55824 bytes in little endian)
- data : Hexadecimal values that follow represent the actual audio samples

In [32]:
import struct
import numpy as np
def load_audio(file_path):
    with open(file_path,'rb') as audio_file:
        # Getting the audio file parameters
        # Read the header to get audio file information
        
        header = audio_file.read(44) # In WAV files, first 44 bytes are reserved for the header
        print(f"The header is {header}")
        # Extract relevant information from the header
        format_code = struct.unpack('<H', header[20:22])[0]
        channels = struct.unpack('<H', header[22:24])[0]
        sample_width = struct.unpack('<H', header[34:36])[0]
        sample_rate = struct.unpack('<I', header[24:28])[0]
        
        # Bytes 20-22: Audio Format (PCM is represented by the value 1)
        # Bytes 22-24: Number of Channels (1 for mono, 2 for stereo, etc.)
        # Bytes 24-28: Sample Rate (in Hz)
        # Bytes 34-36: Sample Width (in bits per sample)
        # The remaining bytes in the header may contain additional information, but the basic information needed to interpret the audio data is within the first 44 bytes.

        # Read the data from the file
        data = audio_file.read()
    
    # Converting the raw binary data to a list of integers : 
    data_array = np.frombuffer(data, dtype=np.float32)
    
    return data_array, format_code, channels, sample_width, sample_rate


In [36]:
file_name = "./UrbanSound8K/UrbanSound8K/audio/fold5/100032-3-0-0.wav"
audio_data, format_code, channels, sample_width, sample_rate = load_audio(file_name)

print(f"Format Code: {format_code}")
print(f"Channels: {channels}")
print(f"Sample Width: {sample_width}")
print(f"Sample Rate: {sample_rate}")
print(f"Number of Samples: {len(audio_data)}")
print(f"Data : {audio_data}")
print(f"Type of data array : {type(audio_data[0])}")

The header is b'RIFF\xf4\xda\x00\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x02\x00D\xac\x00\x00\x10\xb1\x02\x00\x04\x00\x10\x00data\xd0\xda\x00\x00'
Format Code: 1
Channels: 2
Sample Width: 16
Sample Rate: 44100
Number of Samples: 14004
Data : [-3.1502399e+38 -3.0306074e+38 -3.1103629e+38 ...            nan
            nan            nan]
Type of data array : <class 'numpy.float32'>
