## Bytes and Unicode Strings

Before using **pyserial** and communicating with external hardware over the serial interface, it is import to understand the difference between _bytes_ and _unicode strings_ in Python. 

The distinction between bytes and unicode strings is important because strings in Python are _unicode_ by default. However, external hardware like Arduino's, oscilloscopes and voltmeters transmit characters as _bytes_.

### Unicode Strings

In Python, the syntax to define a new string is:

In [1]:
ustring = 'A unicode string'

We can determine the data type of the ```ustring``` variable using the ```type()``` function:

In [2]:
print(type(ustring))

<class 'str'>


When the Python interperator declares the variable ```ustring``` is of ```<class 'str'>```, it indicates ```ustring``` is a _unicode string_.

**In Python 3 all strings are _unicode strings_ by defaut**.

_Unicode Strings_ are useful because there are many letter and letter-like characters that are not part of the the set of letters, numbers and symbols on a regular computer keyboard.  For example in Spanish, the accent character is used over certain vowels. Letters with accents can't be represented by the letters on a standard English keyboard.  However, letters with accents are part of a set of letters, numbers and symbols in _unicode strings_.

### Byte Strings

Another way that characters such as letters, numbers and punctuation can be stored is as _bytes_. A _byte_ is a unit of computer information that has a fixed width (one byte long). Because of this fixed width, one _byte_ only has a small number of unique combinations. This limits _byte strings_ to basically only the letter, numbers and punctuation marks on a computer keyboard. This limited set of characters is called the ASCII (pronounced _ask-ee two_) character set. A table of ASCII character codes is in the appendix. For instance, ASCII code ```49``` corresponds to the number one ```1```.

**Python does not use _byte strings_ by default**.

However, external hardware such as Arduinos, oscilloscopes, and voltmeters speak _byte strings_ by default. In fact, almost all machines speak _byte strings_ by default, including the servers that bring Netflix to your laptop. 

To define a _byte string_ in Python, a letter ```b``` is placed before the quotation marks when a string is created. 

In [5]:
bstring = b'bstring'

We can view the type of our ```bstring``` variable using the ```type()``` function.

In [6]:
print(type(bstring))

<class 'bytes'>


### Convert between unicode strings and byte strings

In order for a Python program to communicate with external hardware, we need to be able to convert between _unicode strings_ and _byte strings_. This conversion is done with the ```.encode()``` and ```.decode()``` methods. 

The ```.encode()``` method "encodes" a unicode string into a byte string.

```<byte string> = <unicode string>.encode()```

The ```.decode()``` method "decodes" a byte string into a unicode string.

```<unicode string> = <byte string>.decode```

**Remember: machines speak bytes, Python strings are unicode by default.** 

We need to decode what machines transmit to Python before further processing. Python defaults unicode (and machines do not), so within our Python code we need to _encode_ our unicode strings so machines can understand it.

In [3]:
ustring = 'A unicode string'
new_bstring = ustring.encode()
type(new_bstring)

bytes

In [4]:
bstring = b'bstring'
new_ustring = bstring.decode()
type(new_ustring)

str

When a command from a Python program (a unicode string) is sent to a piece of extrnal hardware (that reads bytes):

The ```.encode()``` method needs be applied to the unicode string (to convert the unicode string to a byte string) before the command is sent to the piece of external hardware.

When a chunk of data comes in from a piece of extrnal hardware (a byte string) and is read by a Python script (which speaks _unicode_ by the default):

The ```.decode()``` method needs be applied to the byte string (to convert the byte string to a unicode string) before it is processed further by Python program.