## Summary

This notebook contains notes and examples on topics in Python that I find confusing.

### strings, bytes, bytearrays

I have found this topic to be very confusing.  There are a number of changes from Python 2 to 3 which only adds to the confusion.

As its name suggests, a ```byte``` is an array or list of bytes.  ```bytearray``` is the mutable version of ```byte```.  Both can be assigned in a few ways:

In [91]:
a = bytes([1, 2, ord('a'), 255])
b = b'\x01\x02\x61\xff'
c = bytearray('test'.encode('utf-8'))
c.append(3)
a, b, c

(b'\x01\x02a\xff', b'\x01\x02a\xff', bytearray(b'test\x03'))

Let's look at the difference in between the functions available in ```byte``` and ```bytearray```.  Since ```bytearray``` is the mutable version of ```byte```, we would expect that its functions would be a superset of those of ```byte```.  As it turns out, this is almost the case.  The additional functions are for manipulating the list.  I don't know what ```__getnewargs__``` does, and the little google-searching I did did not further my understanding.

In [92]:
ba = bytearray(a)

f_ba = set(dir(c))
f_a = set(dir(a))
print('functions in byte but not in bytearray:\n\t', sorted(f_a.difference(f_ba)), '\n')
print('functions in bytearray but not in byte:\n\t', sorted(f_ba.difference(f_a)), '\n')
print('functions common to both byte and bytearray:\n\t', sorted(f_a.intersection(f_ba)), '\n')

functions in byte but not in bytearray:
	 ['__getnewargs__'] 

functions in bytearray but not in byte:
	 ['__alloc__', '__delitem__', '__iadd__', '__imul__', '__setitem__', 'append', 'clear', 'copy', 'extend', 'insert', 'pop', 'remove', 'reverse'] 

functions common to both byte and bytearray:
	 ['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'center', 'count', 'decode', 'endswith', 'expandtabs', 'find', 'fromhex', 'hex', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpart

### strings vs bytes/bytearrays

This [link](https://stackoverflow.com/questions/6224052/what-is-the-difference-between-a-string-and-a-byte-string) explains
the difference between a string and a byte clearly.  While many programmers cut their teeth in languages where strings were ASCII and thus each character was a byte, in Python strings are defined to be sequences of unicode characters, and there could be multiple ways ("encodings") to convert a sequence of characters to a sequence of bytes.  Note that the length of the string is the number of characters which might not be the same as the number of bytes (see below)

To convert a string to bytes, use ```encode```.  The default encoding is ```utf-8```.  I can't find a list of the built-in encodings supported.  You can do other possible encodings by using the [codec](https://docs.python.org/3/library/codecs.html) library.

To convert a byte/bytearray to a string, use ```decode```.

In [93]:
s='τoρνoς'

# TODO what the the built-in encodings?
b = s.encode()
ba = bytearray(s.encode())
print('string length is not the same as the number of bytes:', len(s), len(b), len(ba))

s_decode = ba.decode()
print('decoded version from bytes:', s_decode)

string length is not the same as the number of bytes: 6 10 10
decoded version from bytes: τoρνoς


This is an example using a codec.  One must use ```encode``` to convert a byte to another byte.  A ```decode```
function is available in the library, but I don't know when it should be used.  Maybe it's only for strings?

Here's a [thread](https://stackoverflow.com/questions/447107/what-is-the-difference-between-encode-decode)
on stack overflow that explains encode/decode for strings and bytes.

In [99]:
import codecs
temp = bytes([1, 2, 3, 4, 5])

temp_b64 = codecs.encode(temp, 'base64')
print(temp, temp_b64)

b'\x01\x02\x03\x04\x05' b'AQIDBAU=\n'
