# MP3 audio file - binary datatype

**MP3 files are intended for small computer speakers and portable music players. Each MP3 file has an ID3 tag attached to it, that holds the metadata for the audio file, rather than being contained within the MP3 file itself.**

**The variables below are used to display the ID3 information for the MP3 song file 'Someday', by Gary Linley. It can hold information like composer or recording time, and it supports several encoding sets to deal with non-ASCII characters. You can also access images associated with the song, like album front cover or an image of the artist.**

**ID3 tag must contain a header, which specifies ID3 version and flag type, and one or more frames. Each frame has a frame type, e.g. `TXXX` for text information or `APIC` for associated pictures.**

**Check out documentation https://id3.org/id3v2.3.0 and https://id3.org/id3v2.4.0-frames to see ID3 specifications.**

**NOTE: All code below is third-party material - this exercise is meant to show the general technique for processing binary data in MP3 file to output the associated information with a song, like song name, artist, picture of artist etc.**

In [1]:
# Encodings that ID3 supports
id3_field_encodings = ['iso-8859-1', 'utf_16', 'utf_16_be', 'utf_8']

# APIC picture types associate images to ID3 tag, e.g. front cover
apic_picture_types = {0x00: 'Other',
                      0x01: '32x32_pixels_png_file_icon',
                      0x02: 'Other_file_icon',
                      0x03: 'Front_cover',
                      0x04: 'Back_cover',
                      0x05: 'Leaflet_page',
                      0x06: 'Media_(eg_label_side_of_CD)',
                      0x07: 'Lead_artist',
                      0x08: 'Artist',
                      0x09: 'Conductor',
                      0x0A: 'Band',
                      0x0B: 'Composer',
                      0x0C: 'Lyricist',
                      0x0D: 'Recording_Location',
                      0x0E: 'During_recording',
                      0x0F: 'During_performance',
                      0x10: 'Video_screen_capture',
                      0x11: 'A_bright_coloured_fish',
                      0x12: 'Illustration',
                      0x13: 'Band_logotype',
                      0x14: 'Publisher_logotype',
                      }

# Frame types defined by ID3 specifications 
# NOTE: All dictionary keys are bytes objects (immutable)
frame_types = {
    b'AENC': 'Audio encryption',
    b'APIC': 'Attached picture',
    b'ASPI': 'Audio seek point index',
    b'COMM': 'Comments',
    b'COMR': 'Commercial frame',
    b'ENCR': 'Encryption method registration',
    b'EQU2': 'Equalisation',
    b'EQUA': 'Equalization',
    b'ETCO': 'Event timing codes',
    b'GEOB': 'General encapsulated object',
    b'GRID': 'Group identification registration',
    b'IPLS': 'Involved people list',
    b'LINK': 'Linked information',
    b'MCDI': 'Music CD identifier',
    b'MLLT': 'MPEG location lookup table',
    b'OWNE': 'Ownership frame',
    b'PRIV': 'Private frame',
    b'PCNT': 'Play counter',
    b'POPM': 'Popularimeter',
    b'POSS': 'Position synchronisation frame',
    b'RBUF': 'Recommended buffer size',
    b'RVA2': 'Relative volume adjustment',
    b'RVAD': 'Relative volume adjustment',
    b'RVRB': 'Reverb',
    b'SEEK': 'Seek frame',
    b'SIGN': 'Signature frame',
    b'SYLT': 'Synchronized lyric/text',
    b'SYTC': 'Synchronized tempo codes',
    b'TALB': 'Album/Movie/Show title',
    b'TBPM': 'BPM (beats per minute)',
    b'TCOM': 'Composer',
    b'TCON': 'Content type',
    b'TCOP': 'Copyright message',
    b'TDAT': 'Date',
    b'TDEN': 'Encoding time',
    b'TDLY': 'Playlist delay',
    b'TDOR': 'Original release time',
    b'TDRC': 'Recording time',
    b'TDRL': 'Release time',
    b'TDTG': 'Tagging time',
    b'TENC': 'Encoded by',
    b'TEXT': 'Lyricist/Text writer',
    b'TFLT': 'File type',
    b'TIME': 'Time',
    b'TIPL': 'Involved people list',
    b'TIT1': 'Content group description',
    b'TIT2': 'Title/songname/content description',
    b'TIT3': 'Subtitle/Description refinement',
    b'TKEY': 'Initial key',
    b'TLAN': 'Language(s)',
    b'TLEN': 'Length',
    b'TMCL': 'Musician credits list',
    b'TMED': 'Media type',
    b'TMOO': 'Mood',
    b'TOAL': 'Original album/movie/show title',
    b'TOFN': 'Original filename',
    b'TOLY': 'Original lyricist(s)/text writer(s)',
    b'TOPE': 'Original artist(s)/performer(s)',
    b'TORY': 'Original release year',
    b'TOWN': 'File owner/licensee',
    b'TPE1': 'Lead performer(s)/Soloist(s)',
    b'TPE2': 'Band/orchestra/accompaniment',
    b'TPE3': 'Conductor/performer refinement',
    b'TPE4': 'Interpreted, remixed, or otherwise modified by',
    b'TPOS': 'Part of a set',
    b'TPUB': 'Publisher',
    b'TRCK': 'Track number/Position in set',
    b'TRDA': 'Recording dates',
    b'TRSN': 'Internet radio station name',
    b'TRSO': 'Internet radio station owner',
    b'TSIZ': 'Size',
    b'TSOA': 'Album sort order',
    b'TSOP': 'Performer sort order',
    b'TSOT': 'Title sort order',
    b'TSRC': 'ISRC (international standard recording code)',
    b'TSSE': 'Software/Hardware and settings used for encoding',
    b'TSST': 'Set subtitle',
    b'TYER': 'Year',
    b'TXXX': 'User defined text information frame',
    b'UFID': 'Unique file identifier',
    b'USER': 'Terms of use',
    b'USLT': 'Unsychronized lyric/text transcription',
    b'WCOM': 'Commercial information',
    b'WCOP': 'Copyright/Legal information',
    b'WOAF': 'Official audio file webpage',
    b'WOAR': 'Official artist/performer webpage',
    b'WOAS': 'Official audio source webpage',
    b'WORS': 'Official internet radio station homepage',
    b'WPAY': 'Payment',
    b'WPUB': 'Publishers official webpage',
    b'WXXX': 'User defined URL link frame',
}


In [3]:
# Constant to always move file pointer relative to current position
from os import SEEK_CUR, path

# Supports binary file types
from typing import BinaryIO

filename = 'data/Someday.mp3'

In [4]:
def decode_size(encoded_size: bytes) -> int:
    """Decode and return an ID3-encoded size as a positive integer.

    The ID3v2 tag size is encoded with four bytes, where the
    most significant bit (bit 7) is set to zero in every byte.
    This gives a total of 28 bits. The zeroed high bit is ignored.
    Each byte after the least significant is shifted left 7 places.
    Thus:
        byte 3 is shifted left 21 places.

        byte 2 is shifted left 14 places

        byte 1 is shifted left 7 places

        byte 0 is unchanged,

        Or-ing the 4 bytes gives the decoded size.

    For example, a size of 257 bytes is represented as $00 00 02 01.
    Ignoring the 2 most significant bytes for simplicity
    (because they're zero):

                    0000 0010  ($02) << 7 =
          0000 0001 0000 0000 |
                    0000 0001
          -------------------
          0000 0001 0000 0001 ($01 01, 257 in decimal)

    :param encoded_size: The 4 bytes making up the encoded size.
    :return: The decoded size, as an integer.
    """
    return encoded_size[0] << 21 \
           | encoded_size[1] << 14 \
           | encoded_size[2] << 7 \
           | encoded_size[3]



In [5]:
def read_c_string(binary_file: BinaryIO, c_str_encoding: str) -> str:
    """
    Python doesn't have built-in way to read null-terminated strings.
    Function reads in a null-terminated sequence of bytes,
    and decodes it to a unicode string.

    Note: This function will probably crash if the file pointer
    isn't positioned on the first character of a c-string
    (it's fine for the terminating $00 of an empty string).

    :param binary_file: The file to read from. Must be opened in binary mode,
        and the file pointer should be positioned at the correct point
        to start reading from.
    :param c_str_encoding: The encoding to use when decoding the bytes.
    :return: A Python str corresponding to the decoded c-string.
    :raises UnicodeDecodeError: This could be raised if the file pointer
        isn't positioned at the start of a valid Unicode sequence. You
        may get an exception, or the returned string could be unintelligible.
    """
    byte_array = bytearray()
    # Python has no built-in way to read c-strings, so read character-by-character till we encounter 0
    byte_read = binary_file.read(1)
    while byte_read and byte_read != b'\x00':
        byte_array += byte_read
        byte_read = binary_file.read(1)

    if byte_array != b'\x00':
        return byte_array.decode(c_str_encoding)
    else:
        return ''
    


In [7]:
with open(filename, 'rb') as mp3_file:
    header = mp3_file.read(10)

    # Do we have an ID3 v3 tag?
    if header[:5] == b'ID3\x03\x00':
        # Flags
        print(f'Flags: {header[5]:#010b}')
        # Calculate the size
        size_bytes = header[-4:]
        size = decode_size(size_bytes)
        print(f'Tag size: {size} bytes')

        # Skip extended header, if there is one
        if header[5] & 0b01000000:
            # Extended header present. The 4-byte encoded size follows immediately after 10-byte file header
            ext_size = decode_size(mp3_file.read(4))
            print(f'Extended header, size is {ext_size} bytes.')

            # We're not interested in the extended header, seek past it
            mp3_file.seek(ext_size, SEEK_CUR)

        while True:
            print('*' * 80)
            print(f'Current file position: {mp3_file.tell()}')
            
            # Read 10-byte frame header
            frame_header = mp3_file.read(10)
            frame_id = frame_header[:4]

            if frame_id in frame_types:
                print(f'Found frame type: {frame_id}')

                # We need the frame size
                frame_size = int.from_bytes(frame_header[4:8], 'big')
                print(f'Frame size: {frame_size}')

                # Only process 'T' (Text), 'WXXX' and 'APIC' frames
                if frame_id.startswith(b'T'):  # a text field
                    # Get the encoding byte
                    encoding_byte = mp3_file.read(1)[0]
                    encoding = id3_field_encodings[encoding_byte]
                    print(f'encoding is {encoding}')

                    # Read & decode the data. We've already read byte 0, so there are `size - 1` bytes left
                    text = mp3_file.read(frame_size - 1).decode(encoding)
                    print(f'{frame_types[frame_id]}: {text}')
                    
                # If frame contains URL info
                elif frame_id == b'WXXX':
                    # Get the encoding byte
                    encoding_byte = mp3_file.read(1)[0]
                    encoding = id3_field_encodings[encoding_byte]
                    print(f'encoding is {encoding}')

                    # Now read & decode the data. NOTE: We've already read 1 byte of the frame
                    description_and_url = mp3_file.read(frame_size - 1)

                    # Split on 00 byte to separate description and url link
                    parts = description_and_url.split(b'\x00')
                    description = parts[0].decode(encoding)
                    url = parts[-1].decode('iso-8859-1')
                    print(f'{frame_types[frame_id]}:')
                    print(f'\tDescription: {description}')
                    print(f'\tURL: {url}')

                elif frame_id == b'APIC':
                    frame_data_start = mp3_file.tell()
                    print(f'APIC frame starts at {frame_data_start}')

                    # Get the encoding byte
                    encoding_byte = mp3_file.read(1)[0]
                    encoding = id3_field_encodings[encoding_byte]
                    print(f'APIC text encoding: {encoding}')

                    # Next we read in null-terminated string (MIME type)
                    mime_type = read_c_string(mp3_file, 'iso-8859-1')
                    
                    if mime_type == '':
                        mime_type = 'image/'
                    
                    print(f'MIME Type: {mime_type}')

                    # Read 1-byte picture type and get its 'human' name
                    picture_type = int.from_bytes(mp3_file.read(1), 'big')
                    apic_picture_name = apic_picture_types[picture_type]
                    print(f'Found {apic_picture_name} image')

                    # Description is also a null-terminated string
                    description = read_c_string(mp3_file, encoding)
                    print(f'Image Desription: {description}')

                    # Now write the image to a new file
                    if mime_type.startswith('image/'):
                        image_data_start = mp3_file.tell()
                        print(f'Image data starts at {image_data_start}')
                        image_size = frame_size - (image_data_start - frame_data_start)
                        print(f'Image Size = {image_size}')
                        image_data = mp3_file.read(image_size)
                        
                        # Save image file in working folder
                        image_type = mime_type.split('/')[-1]
                        
                        # Create filename from picture name
                        base_filename = path.split(filename)[1]
                        base_filename = path.splitext(base_filename)[0] # remove extension
                        picture_filename = f'{base_filename}_{apic_picture_name}.{image_type}'
                        print(f'Writing image file {picture_filename}...')
                        
                        # Open picture file in binary writing mode
                        with open(picture_filename, 'wb') as output_file:
                            output_file.write(image_data)

                else:
                    # Any frame that we're not going to process is skipped by seeking `frame_size` bytes
                    mp3_file.seek(frame_size, SEEK_CUR)
                    
                    # Skip any zero bytes by reading them in
                    next_byte = mp3_file.read(1)
                    
                    # Check for empty bytearray, to avoid trying to read past EOF (end of file)
                    while next_byte and next_byte == b'\0':
                        next_byte = mp3_file.read(1)
                        
                    # If byte is non-zero, it will be part of the next frame. Or it will be empty
                    # Move file pointer back 1 byte, to read it again next time round.
                    if next_byte != b'':
                        mp3_file.seek(-1, SEEK_CUR)
                print(f'seek position after frame: {mp3_file.tell()}')
            else:
                # Found unrecognised frame (or we've exhausted all frames)
                break
                


Flags: 0b00000000
Tag size: 99966 bytes
********************************************************************************
Current file position: 10
Found frame type: b'TIT2'
Frame size: 8
encoding is iso-8859-1
Title/songname/content description: Someday
seek position after frame: 28
********************************************************************************
Current file position: 28
Found frame type: b'TPE1'
Frame size: 12
encoding is iso-8859-1
Lead performer(s)/Soloist(s): Gary Linley
seek position after frame: 50
********************************************************************************
Current file position: 50
Found frame type: b'TPE2'
Frame size: 12
encoding is iso-8859-1
Band/orchestra/accompaniment: Gary Linley
seek position after frame: 72
********************************************************************************
Current file position: 72
Found frame type: b'TALB'
Frame size: 22
encoding is iso-8859-1
Album/Movie/Show title: Making Up The Numbers
seek position

**Is this what you expected! This is why using a lot of print statements in your code is vital. From the start, we know that the tag size is 99966 bytes. Interpreting flags information is beyond me.**

**The first frame starts at position 10, which makes sense seeing as the header is 10 bytes. It is frame type `TIT2`, which is text information on the song name. The next text frame `TPE1` starts at position 28, for information on the lead performer.**

**The code goes through the subsequent text frames, then at position 160 we get the first frame type `WXXX` for an Amazon URL link.**

**Directly after that, in postion 241, we get an `APIC` frame which contains the front cover image for the song. Since we wrote the image data to a new image file in your working folder, you can open the JPEG to actually see the ID3 image, which is the front cover image (big arrow pointing down).**

**The ID3 tag ends on a text `TYER` frame, for the release year of the song (2019).**