There are two types that represent sequences of characters

[Python3]
- bytes : raw 8 bit values
- str : Unicode characters

[Python2]
- str : raw 8 bit values
- unicode : unicode characters

To convert Unicode char to binary data, you must use the "encode" method.
To convert binary data to Unicode, you must use the "decode" method.


In [1]:
import logging
from pprint import pprint
from sys import stdout as STDOUT

def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode('utf-8')
    else:
        value = bytes_or_str
    return value # return str

print(repr(to_str(b'foo')))
print(repr(to_str('foo')))


'foo'
'foo'


In [None]:
def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value  # Instance of bytes

print(repr(to_bytes(b'foo')))
print(repr(to_bytes('foo')))


The core of your program should use Unicode char type(str in Python3, unicode in Python2)
and Should not assume anything about char encodings.

[Issue1]
In Python2, unicode and str instances seem to be the same type when str only contains 7-bits ASCII charateres.

[Issue2]
In Python3, operations involving file handles (returned by the open built-in function)
default to UTF-8 encoding. In Python2, file operations defaults to binary encoding.

In [3]:
try:
    import os
    with open('random.bin', 'w') as f:
        f.write(os.urandom(10))
except:
    logging.exception('Expected')
else:
    assert False

with open('random.bin', 'wb') as f:
    f.write(os.urandom(10))


ERROR:root:Expected
Traceback (most recent call last):
  File "<ipython-input-3-66358035aff3>", line 4, in <module>
    f.write(os.urandom(10))
TypeError: write() argument must be str, not bytes
