## Intro (~19:20 (5 min))


* If you can't keep up with the course pace, then SDT is probably not for you
    * Question: how many of you have some experience with Python?
* Self-learning
    * Idea: after each lecture, you should learn both the lecture material and the self-learning topics
    * Motivation: don't expect to be spoon-fed; self-learning is a big part of your career
    * Note: some topics are marked as **bonus topic** -- learn them if you feel like it

### Table of contents

* Exceptions
* IO
    * Files
    * Bytes
* Namespaces
* JSON



# Exceptions (~19:35 (15 min))

Things often go wrong in programs. If you do nothing, they crash.

In [1]:
def parse_tskv(tskv: str) -> dict[str, int]:
    """Parse tskv string"""
    kvpairs = (keyvalue.split('=') for keyvalue in tskv.strip().split('\t'))
    return {k: int(v) for k, v in kvpairs}

log = [
    'banner_id=1\tshows=10\tclicks=1',
    'banner_id=2\tshows=15\tclicks=2',
    'banner_id=3\tshows=\tclicks=1',  # empty shows
]

for row in log:
    print(parse_tskv(row))

{'banner_id': 1, 'shows': 10, 'clicks': 1}
{'banner_id': 2, 'shows': 15, 'clicks': 2}


ValueError: invalid literal for int() with base 10: ''

What mechanisms exist for handling errors?

- Special return values (Golang)
```go
i, err := strconv.Atoi("42")
if err != nil {
    fmt.Printf("couldn't convert number: %v\n", err)
    return
}
fmt.Println(i)
```

- Exceptions (Python)

- Exceptions are a special language mechanism for working with errors.
- They interrupt the normal flow of program execution.
- They report that an exceptional situation occurred (place and cause)
- They allow you to handle the error and restore the program's operation.

### Examples of exceptions

In [2]:
int('abc')

ValueError: invalid literal for int() with base 10: 'abc'

In [3]:
[0] * int(1e16)

MemoryError: 

In [4]:
open('nonexistent.file')

FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.file'

In [5]:
[1, 2, 3] + 4

TypeError: can only concatenate list (not "int") to list

In [6]:
 a = 2 * 5 + 3)

SyntaxError: unmatched ')' (2637871998.py, line 1)

Hierarchy of built-in exceptions: https://docs.python.org/3/library/exceptions.html#exception-hierarchy

<details>
<summary>Exception hierarchy</summary>

```
BaseException
 ├── BaseExceptionGroup
 ├── GeneratorExit
 ├── KeyboardInterrupt
 ├── SystemExit
 └── Exception
      ├── ArithmeticError
      │    ├── FloatingPointError
      │    ├── OverflowError
      │    └── ZeroDivisionError
      ├── AssertionError
      ├── AttributeError
      ├── BufferError
      ├── EOFError
      ├── ExceptionGroup [BaseExceptionGroup]
      ├── ImportError
      │    └── ModuleNotFoundError
      ├── LookupError
      │    ├── IndexError
      │    └── KeyError
      ├── MemoryError
      ├── NameError
      │    └── UnboundLocalError
      ├── OSError
      │    ├── BlockingIOError
      │    ├── ChildProcessError
      │    ├── ConnectionError
      │    │    ├── BrokenPipeError
      │    │    ├── ConnectionAbortedError
      │    │    ├── ConnectionRefusedError
      │    │    └── ConnectionResetError
      │    ├── FileExistsError
      │    ├── FileNotFoundError
      │    ├── InterruptedError
      │    ├── IsADirectoryError
      │    ├── NotADirectoryError
      │    ├── PermissionError
      │    ├── ProcessLookupError
      │    └── TimeoutError
      ├── ReferenceError
      ├── RuntimeError
      │    ├── NotImplementedError
      │    └── RecursionError
      ├── StopAsyncIteration
      ├── StopIteration
      ├── SyntaxError
      │    └── IndentationError
      │         └── TabError
      ├── SystemError
      ├── TypeError
      ├── ValueError
      │    └── UnicodeError
      │         ├── UnicodeDecodeError
      │         ├── UnicodeEncodeError
      │         └── UnicodeTranslateError
      └── Warning
           ├── BytesWarning
           ├── DeprecationWarning
           ├── EncodingWarning
           ├── FutureWarning
           ├── ImportWarning
           ├── PendingDeprecationWarning
           ├── ResourceWarning
           ├── RuntimeWarning
           ├── SyntaxWarning
           ├── UnicodeWarning
           └── UserWarning
```

</details>

### Handling exceptions
`try...except`

In [7]:
filename = 'nonexistent.file'

try:
    fd = open(filename, 'r')
except FileNotFoundError:  # catch exceptions which satisfy isinstance(exc, FileNotFoundError)
    print(f'File {filename!r} does not exist')

File 'nonexistent.file' does not exist


In [8]:
filename = 'nonexistent.file'

try:
    fd = open(filename, 'r')
except FileNotFoundError as e:  # e is the exception object
    print(f'File {filename!r} does not exist')
    print("exception type: ", type(e))
    print("exception message: ", e)

File 'nonexistent.file' does not exist
exception type:  <class 'FileNotFoundError'>
exception message:  [Errno 2] No such file or directory: 'nonexistent.file'


Handling exceptions: `try...except...except`

In [9]:
filename = 'nonexistent.file'

try:
    # [1] + 3
    # [0] * int(1e16)
    # int(10e1000)
    fd = open(filename, 'r')
except FileNotFoundError:
    print(f'File {filename!r} does not exist')
except (TypeError, ValueError, MemoryError) as e:  # if exception is any of these types, this block will be executed
    print('Just to demonstrate a tuple of exceptions')
    print("exception type: ", type(e))
except Exception as e:  # the first matching except clause is triggered

    print(f'Exception occured while reading file {filename!r}: {e!r}')

File 'nonexistent.file' does not exist


Handling exceptions: `try...except...else...finally`

In [10]:
f = None
try:
    # f = open("filename.txt", 'r') # something dangerous
    # 1 / 0
    print('all good')
except ValueError as e:  # scope failure
    print(f'Something bad happened: {e!r}')
else:  # scope success
    print('Nothing bad happened')
finally:  # scope exit
    if f is not None:
        f.close()
    print('Print this no matter what')

all good
Nothing bad happened
Print this no matter what


_optional_

Error-handling strategies: **Look Before You Leap**


In [11]:
def ctr(shows, clicks):
    """Returns banner click-through rate"""
    if shows == 0:
        return 0
    return clicks / shows

Error-handling strategies: **It's easier to ask for forgiveness than permission**

In [12]:
def ctr(shows, clicks):
    """Returns banner click-through rate"""
    try:
        return clicks / shows
    except ZeroDivisionError:
        return 0

Try to be as specific as possible in your except clauses

### `raise`

In [15]:
def foo_with_positive_argument(x: int) -> int:
    if x < 0:
        ... # what am I supposed to do?
    return x

In [16]:
foo_with_positive_argument(-1)

-1

You can raise an exception using the `raise` keyword

In [17]:
raise ValueError('Positive integer expected')

ValueError: Positive integer expected

In [18]:
def foo_with_positive_argument(x: int) -> int:
    if x < 0:
        raise ValueError("x must be positive")
    return x

foo_with_positive_argument(-1)

ValueError: x must be positive

An exception must be an object of type BaseException or a subclass of it

In [19]:
raise 42

TypeError: exceptions must derive from BaseException

_optional_

`raise` with no argument re-raises the last caught exception.

In [20]:
try:
    raise RuntimeError('Crash hard')
except Exception as e:
    print('Unknown error occured, no chance to recover, run!')
    raise

Unknown error occured, no chance to recover, run!


RuntimeError: Crash hard

In [21]:
raise

RuntimeError: No active exception to reraise

### Custom Exceptions

Note: we're not familiar with inheritance and classes yet

In [None]:
# sometimes built-in exceptions are not enough
class ShoeError(Exception):
    pass

class WrongFootError(ShoeError):
    def __str__(self):
        return f'Try another one!'
        
raise WrongFootError([1, 2, 3])

WrongFootError: Try another one!

# Input/Output streams (~19:50 (15 min))

### Standard streams

**Stream** — a flow of data between a program and an external source or destination.

**Motivation:**
A stream acts like a communication pipe between your program and something else —
for example:

* **Output stream** → sending data to the display or a file.
* **Input stream** → receiving data from the keyboard, a file, or the network.

![Stream](./stream_pic.png)

In [23]:
import sys

sys.stdout.write('Hello again!\n');  # stdout -- standard output stream

Hello again!


In [24]:
print("Hello again!")  # equivalent to above

Hello again!


In [25]:
sys.stderr.write('Danger!\n');  # stderr -- standard error stream

Danger!


In [26]:
print("Danger!", file=sys.stderr)

Danger!


### Files

In [10]:
%%writefile some_text.txt
Hello, world!
Here some more text...
What else?

Writing some_text.txt


In [11]:
f = open('some_text.txt')
f

<_io.TextIOWrapper name='some_text.txt' mode='r' encoding='UTF-8'>

In [12]:
content = f.read()

In [13]:
len(content)

48

In [14]:
f.close()

Reading an entire file into memory can be very expensive

In [15]:
f = open('some_text.txt', 'r')

In [16]:
f.readline()

'Hello, world!\n'

_optional_

What if the content is not text? Or you need to read a batch of lines at once?

In [18]:
chunk_size = 4
f.read(chunk_size)

'Here'

In [43]:
f.close()

Okay, we want to read in chunks until the end...

In [44]:
f = open('some_text.txt')

In [45]:
chunk_size = 8
lines = 0
while True:
    chunk = f.read(chunk_size)
    if not chunk:
        break
    lines += chunk.count('\n')
lines

3

In [46]:
f.read()

''

In [47]:
f.close()

### Context manager

Problem:
```python
file = open('some_text.txt')
# ... a LOT of code here

# Ooops, easy to forget to close the file
```

In [20]:
with open('some_text.txt') as f:
    print(f.read())

Hello, world!
Here some more text...
What else?



### Modes

**read** (default)

In [49]:
%%writefile some_text.txt
Hello, world!
Here some more text...
What else?

Overwriting some_text.txt


In [50]:
with open('some_text.txt', 'r') as f:
    print(f.readline())
    print('-' * 10)
    print(f.read())  # read the rest

Hello, world!

----------
Here some more text...
What else?



Note: use `rb` instead of `r` to read binary files.

**Write**

In [51]:
# f = open('/tmp/junk.txt', 'w')
with open('/tmp/junk.txt', 'w') as f:
    f.write('Hello, world!')

with open('/tmp/junk.txt', 'r') as f:
    print(f.read())

Hello, world!


In [52]:
with open('/tmp/junk.txt', 'w') as f:
    f.write('Oops, file is overwritten...\n')
    f.write('One more line...\n')

with open('/tmp/junk.txt', 'r') as f:
    print(f.read())

Oops, file is overwritten...
One more line...



**append**

In [53]:
with open('/tmp/junk.txt', 'a') as f:
    f.write('Yet another line...\n')

with open('/tmp/junk.txt', 'r') as f:
    print(f.read())

Oops, file is overwritten...
One more line...
Yet another line...



**Read + write**: 'r+'

### Bytes

[More on bytes](https://realpython.com/python-bytes/)

**What are bytes?**

* A **byte** is an integer in **0–255**.
* A `bytes` object is a **sequence** of those integers (immutable); `bytearray` is mutable.
* **Everything** ultimately respresented as bytes (text, images, network data).

Examples:
* Image is collection of pixels (RGB values)
    * RGB values are 3 bytes (0-255) (e.g. `b'\x00\x00\x00'` for black, where `\x00` is hex for 0)
* Text is a sequence of Unicode characters (UTF-8 encoding) 

In [54]:
b = b"ABC"          # bytes literal
print(list(b))             # [65, 66, 67]
ba = bytearray([0, 255, 10])
ba


[65, 66, 67]


bytearray(b'\x00\xff\n')

**Why bytes for streams?**

* Streams are I/O channels; devices/files speak **bytes**.
* Text is a *view* on bytes via an **encoding** (e.g., UTF-8).

**Bytes vs. str (text)**

* `str` = Unicode characters (abstract text), that is a set of _decoded_ bytes.
* `bytes` = concrete storage format.
* Files store always bytes. When you read the file, you _decode_ these bytes into `str`.


In [55]:
s = "Hello"
b = s.encode("utf-8")   # str -> bytes
print(b)
s2 = b.decode("utf-8")  # bytes -> str
print(s2)
s3_wrong = b.decode("utf-16")
print(s3_wrong)

b'Hello'
Hello


UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x6f in position 4: truncated data



**Encoding** is a way to convert a sequence of characters into a sequence of bytes. 

Popular encodings:
- ASCII
- UTF-8  # utf-8 is the default encoding for Python
- UTF-16
- UTF-32

# Namespaces (~20:05 (15 min))

### Intro

* **Definition:** A **namespace** is a mapping from *names* (identifiers) to *objects*.
  In Python, namespaces are dict-like (e.g., `globals()`, `locals()`).
* **Mnemonic:** *“names → objects map”* (an address book for objects).


Why we need namespaces?

* **Modularity:** each module has its own names; imports don’t collide by default.
* **Isolation & clarity:** local names live near where they’re used; reduces unintended interference.
* **Deterministic resolution:** LEGB makes lookups predictable and debuggable.
* **Introspection:** you can inspect/manipulate mappings (`globals()`, `locals()`), aiding tooling and debugging.

### Four namespaces in Python

![img](https://static-assets.codecademy.com/Courses/Intermediate-Python/Types-of-Namespaces_3_final.gif)

### Variable Scope & LEGB rule

* **Scope:** The region of code where a name is accessible, and at the end of which it's destroyed.
* **Lookup order (LEGB):** **L**ocal → **E**nclosing (in nested funcs) → **G**lobal (module) → **B**uiltins.

**Important:** `local` and `enclosing` namespaces are unique to each scope (e.g. function call).

In [56]:
x = "global"

def outer():
    x = "enclosed"
    def inner():
        x = "local"
        print('from local', x)  # -> "local" (L)
    inner()
    print('from enclosed', x)  # -> "enclosed" (E)

outer()
print('from global', x)  # -> "global" (G)


from local local
from enclosed enclosed
from global global


#### Built-In (built-in namespace)

In [57]:
print(dir(__builtins__))



In [58]:
'print' in dir(__builtins__), '__builtins__' in globals()

(True, True)

**When it is created and how many instances**

* Created at script startup and deleted when the script terminates
* One per entire program

#### Global (global namespace)

In [5]:
x = 1
'x' in globals(), 'y' in globals()

(True, False)

<details>
<summary>Fun Fact</summary>
Fun Fact: Jupyter Notebook saves all code cell results and cells in global scope. \
For example, you can access the content and the result (if it's not None) of the any executed cell like this:

```python
globals().get('_i3') # content of 3rd executed cell
globals().get('_3') # result of 3rd executed cell
```
</details>

In [60]:
print(globals().get('a', 'not found'))
a = 1
print(globals().get('a', 'not found'))
del a
print(globals().get('a', 'not found'))
a = None
print(globals().get('a', 'not found'))


not found
1
not found
None


In [61]:
# reminder: everything is an object

class A:
    pass

def foo():
    pass

'A' in globals(), 'foo' in globals(), 'non_existing' in globals()

(True, True, False)

**When it is created and how many instances**

* Created at script startup and deleted when the script terminates
* Each module has its own global namespace; created at module import time

#### Enclosing and local namespaces

In [62]:
x = 0
def f():
    x = 1
    def g():
        x = 2
        print(x)
    g()
    print(x)
f()
print(x)

2
1
0


<div class="alert alert-danger">
<b>Antipattern: </b> using same variable name in different scopes (shadowing)
</div>

In [63]:
def f(x):
    y = 1
    print(locals())
f(10)

{'x': 10, 'y': 1}


In [64]:
def function(arg):
    print(locals())
    print(locals() == globals())

function(1)

{'arg': 1}
False


In [65]:
locals() == globals()

True

### Namespace

Functions create their own namespace

In [66]:
def function():
    inner_variable = 42

function()
inner_variable

NameError: name 'inner_variable' is not defined

Loops and conditionals do not create their own namespace

In [67]:
for k in range(10):
    in_for = k

print(in_for)  # still accessible
print(k)

9
9


In [68]:
if True:
    in_if = 2
    
print(in_if)

2


Generators (comprehensions) create a namespace

In [69]:
i = 'Hello'
[i for i in range(10)]
print(i)  # i inside the list comprehension is destroyed

Hello


In [70]:
i = 'Hello'
[i for i in range(10) if (inside_value := i) > 5]  # walrus operator (:=) does not create a new variable
print(i)
inside_value

Hello


9

### LEGB Playground

In [71]:
global_var = 'global_var'
local_var = 'lol_kek'

def func(): 
    local_var = 'local_var'
    # global_var is in global namespace; it's accessible in all the scopes below
    print('func:', global_var)
    print('func:', local_var)

func()
print(global_var)
print(local_var)

func: global_var
func: local_var
global_var
lol_kek


_optional_

Nested functions

In [72]:
def outer():
    outer_var = 'foo'

    def inner():
        inner_var = 'bar'
        print('from inner:', outer_var)
        print('from inner:', inner_var)

    inner()

    print('from outer:', outer_var)
    print('from outer:', inner_var)

outer()

from inner: foo
from inner: bar
from outer: foo


NameError: name 'inner_var' is not defined

Functions have access to outer namespaces relative to where they are **defined**, not where they are **called**

In [73]:
def f():
    print(it)

def q(func):
    for it in range(10):
        func()
    print(it)

q(f)

NameError: name 'it' is not defined

# Serialization and Deserialization (~20:15 (10 min))

<div align="center"><b><font size=6>Why do we need this?</font></b></div>

1. Web API: JSON/RPC/...
2. Application configuration
3. Caching / Storing in DB

In general: JSON is the most popular data format which is used to exchange and store data.

<div align="center"><b><font size=6>JSON</font></b></div>
<div align="center"><img src="https://www.json.org/img/json160.gif"/></div>

```json
{
    "int_value": 1,
    "float_value": 1.0,
    "str_value": "Hello",
    "bool_value": true,
    "list_value": [1, 2, 3],
    "dict_value": {
        "a": 1,
        "b": 2
    }
}
```
---
```json
[
    1,
    2,
    "word",
    {
        "inner_key": "inner_value",
        "inner_list": [1, 2, 3]
    }
    [1, 2, 3]
]
```

JSON - JavaScript Object Notation

Formal description: https://www.json.org/json-en.html

Libraries for working with JSON:
1. json 
2. simplejson
3. simdjson
4. orjson
5. ujson

The json module has 4 main functions: 2 for working with streams and 2 for working with strings.

Stream:
1. dump
2. load

String:
1. dumps
2. loads

In [21]:
import json

data = ['foo', {'bar': ('baz', None, 1.0, 2)}]
data_dump = json.dumps(data)
print(data_dump)

with open('result.json', 'w') as fout:
    json.dump(data, fout)
!cat result.json

["foo", {"bar": ["baz", null, 1.0, 2]}]
["foo", {"bar": ["baz", null, 1.0, 2]}]

In [22]:
data_parsed = json.loads(data_dump)
print(data_parsed)

with open('result.json') as fin:
    print(json.load(fin))

['foo', {'bar': ['baz', None, 1.0, 2]}]
['foo', {'bar': ['baz', None, 1.0, 2]}]


In [23]:
print(data == data_parsed)

False


<details>
  <summary>Answer</summary>
  <p>You can convert a tuple to JSON Array. However, JSON Array is converted back to a list by default.</p>
</details>


In [77]:
print(data)
print(data_parsed)

['foo', {'bar': ('baz', None, 1.0, 2)}]
['foo', {'bar': ['baz', None, 1.0, 2]}]


| Python  | JSON  |
|:---|:---|
| dict  | Object |
| list  | Array  |
| tuple  | Array  |
| str  | String  |
| int  | Number (int)  |
| float  | Number (real)  |
| True  | true  |
| False  | false  |
| None  | null  |

**JSON pitfalls**

In [78]:
# Keys are always str

dct = {
    1: 'one',
    2: 'two',
    3: 'three',
}

dct_json = json.dumps(dct)
print('json_str:', dct_json)
print(json.loads(dct_json))

json_str: {"1": "one", "2": "two", "3": "three"}
{'1': 'one', '2': 'two', '3': 'three'}


In [79]:
# Multiple dumps 

val1 = [1, 2, 3]
val2 = {'key': 'value'}

with open('bad.json', 'w') as fout:
    json.dump(val1, fout)
    fout.write('\n')
    json.dump(val2, fout)
    
!cat bad.json

[1, 2, 3]
{"key": "value"}

In [80]:
with open('bad.json') as fin:
    json.load(fin)

JSONDecodeError: Extra data: line 2 column 1 (char 10)

In [81]:
with open('bad.json') as fin:
    for line in fin:
        print(json.loads(line))

[1, 2, 3]
{'key': 'value'}


In [82]:
# repr misuse

arr = [1, 2, 3]
print(repr(arr))
print(json.loads(repr(arr)))

arr2 = ["Hello world", "!"]
print(repr(arr2))
print(json.loads(repr(arr2)))

[1, 2, 3]
[1, 2, 3]
['Hello world', '!']


JSONDecodeError: Expecting value: line 1 column 2 (char 1)

Lack of cross-platform behavior:

https://github.com/jqlang/jq/issues/1959

### Other Formats

* [CSV and variations](https://en.wikipedia.org/wiki/Comma-separated_values)
    * Widely used in data science
* [YAML](https://en.wikipedia.org/wiki/YAML)
    * Superset of JSON, great for configuration files
* [Pickle](https://www.geeksforgeeks.org/python/understanding-python-pickling-example/)
    * Python-specific format, not portable, but can store whatever python object you want


Note: we'll talk about databases later. They are used to store big data.

Cover if time is left:
* Exceptions: BaseException antipattern
* Namespaces: global, local
* ...