# Chapter 4. Unicode Text Versus Bytes
---

## ToC

1. [Handling Text Files](#handling-text-files)
---

## Handling Text Files

The best practice for handling text I/O is the “Unicode sandwich”

![Figure 64](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/64.PNG)

This means that `bytes` should be decoded to `str` as early as possible on input (e.g., when opening a file for reading). The “filling” of the sandwich is the business logic of your program, where text handling is done exclusively on `str` objects. You should never be encoding or decoding in the middle of other processing. On output, the `str` are encoded to `bytes` as late as possible. Most web frameworks work like that, and we rarely touch `bytes` when using them. In Django, for example, your views should output Unicode `str`; Django itself takes care of encoding the response to `bytes`, using UTF-8 by default.


Python 3 makes it easier to follow the advice of the Unicode sandwich, because the `open()` built-in does the necessary decoding when reading and encoding when writing files in text mode, so all you get from `my_file.read()` and pass to `my_file.write(text)` are `str` objects.

**Example:** A platform encoding issue

In [6]:
open('cafe.txt', 'w', encoding='utf_8').write('café')

4

In [7]:
open('cafe.txt').read()

'cafÃ©'

**The bug:** I specified UTF-8 encoding when writing the file but failed to do so when
reading it, so Python assumed Windows default file encoding—code page 1252—and
the trailing bytes in the file were decoded as characters `Ã©` instead of `é`.

In [45]:
fp = open('cafe.txt', 'w', encoding='utf_8')
fp

<_io.TextIOWrapper name='cafe.txt' mode='w' encoding='utf_8'>

In [46]:
fp.encoding

'utf_8'

In [47]:
fp.write('café')

4

In [48]:
fp.close()

In [49]:
import os
os.stat('cafe.txt').st_size

5

In [50]:
fp2 = open('cafe.txt')

In [51]:
fp2

<_io.TextIOWrapper name='cafe.txt' mode='r' encoding='cp1252'>

In [52]:
fp2.encoding

'cp1252'

In [53]:
fp2.read()

'cafÃ©'

In [54]:
fp3 = open('cafe.txt', encoding='utf_8')
fp3

<_io.TextIOWrapper name='cafe.txt' mode='r' encoding='utf_8'>

In [55]:
fp3.read()

'café'

In [56]:
fp4 = open('cafe.txt', 'rb')
fp4

<_io.BufferedReader name='cafe.txt'>

In [57]:
fp4.read()

b'caf\xc3\xa9'

![Figure 65](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/65.PNG)

![Figure 66](https://raw.githubusercontent.com/berserkhmdvhb/Training-Python/main/figures/Part_I/66.PNG)