Example 4-8. A platform encoding issue (if you try this on your machine,
you may or may not see the problem)

In [8]:
open('cafe.txt', 'w', encoding='utf_8').write('café')


4

In [9]:
open('cafe.txt').read()

'café'

Example 4-9. Closer inspection of Example 4-8 running on Windows
reveals the bug and how to fix it

In [18]:
fp = open('cafe.txt', 'w', encoding='utf_8')
fp

<_io.TextIOWrapper name='cafe.txt' mode='w' encoding='utf_8'>

In [19]:
fp.write('café')

4

In [20]:
fp.close()

In [21]:
import os
os.stat('cafe.txt').st_size

5

In [22]:
fp2 = open('cafe.txt')
fp2

<_io.TextIOWrapper name='cafe.txt' mode='r' encoding='UTF-8'>

In [23]:
fp3 = open('cafe.txt', 'rb')
fp3

<_io.BufferedReader name='cafe.txt'>

In [24]:
fp3.read()

b'caf\xc3\xa9'

Beware of Encoding Defaults
Several settings affect the encoding defaults for I/O in Python. See the
default_encodings.py script in Example 4-10.
Example 4-10. Exploring encoding defaults

In [25]:
import locale
import sys

expressions = """
        locale.getpreferredencoding()
        type(my_file)
        my_file.encoding
        sys.stdout.isatty()
        sys.stdout.encoding
        sys.stdin.isatty()
        sys.stdin.encoding
        sys.stderr.isatty()
        sys.stderr.encoding
        sys.getdefaultencoding()
        sys.getfilesystemencoding()
    """

my_file = open('dummy', 'w')

for expression in expressions.split():
    value = eval(expression)
    print(f'{expression:>30} -> {value!r}')
    

















 locale.getpreferredencoding() -> 'UTF-8'
                 type(my_file) -> <class '_io.TextIOWrapper'>
              my_file.encoding -> 'UTF-8'
           sys.stdout.isatty() -> False
           sys.stdout.encoding -> 'UTF-8'
            sys.stdin.isatty() -> False
            sys.stdin.encoding -> 'utf-8'
           sys.stderr.isatty() -> False
           sys.stderr.encoding -> 'UTF-8'
      sys.getdefaultencoding() -> 'utf-8'
   sys.getfilesystemencoding() -> 'utf-8'


Unicode support in Windows itself, and in Python for Windows, got better
since I wrote the First Edition. Example 4-11 used to report four different
encodings in Python 3.4 on Windows 7.

This means that a script like Example 4-12 works when printing to the
console, but may break when output is redirected to a file.
Example 4-12. stdout_check.py

In [27]:
import sys
from unicodedata import name

print(sys.version)
print()
print('sys.stdout.isatty():', sys.stdout.isatty())
print('sys.stdout.encoding():', sys.stdout.encoding)
print()

test_chars = [
    '\N{HORIZONTAL ELLIPSIS}',
    '\N{INFINITY}',
    '\N{CIRCLED NUMBER FORTY TWO}',
]

for char in test_chars:
    print(f'Trying to output {name(char)}:')
    print(char)




3.10.2 (main, Jan 15 2022, 18:03:19) [GCC 7.5.0]

sys.stdout.isatty(): False
sys.stdout.encoding(): UTF-8

Trying to output HORIZONTAL ELLIPSIS:
…
Trying to output INFINITY:
∞
Trying to output CIRCLED NUMBER FORTY TWO:
㊷
