# Python for Data Science

Errors, Modules, File IO

# Errors

Bugs come in three basic flavours:

- *Syntax errors:* 
    - Code is not valid Python (easy to fix, except for some whitespace things)
    
- *Runtime errors:* 
    - Syntactically valid code fails, often because variables contain wrong values

- *Semantic errors:* 
    - Errors in logic: code executes without a problem, but the result is wrong (difficult to fix)

## Runtime Errors

### Trying to access undefined variables

In [67]:
# Q was never defined
print(Q)

NameError: name 'Q' is not defined

### Trying to execute unsupported operations

In [76]:
1 + 'abc'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

### Trying to access elements in collections that don't exist

In [None]:
L = [1, 2, 3]
L[1000]

### Trying to compute a mathematically ill-defined result

In [None]:
2 / 0

## Catching Exceptions: ``try`` and ``except``


In [77]:
try:
    print("this gets executed first")
except:
    print("this gets executed only if there is an error")

this gets executed first


In [78]:
try:
    print("let's try something:")
    x = 1 / 0 # ZeroDivisionError
except:
    print("something bad happened!")

let's try something:
something bad happened!


In [79]:
def safe_divide(a, b):
    """
    A function that does a division and returns a half-sensible 
    value even for mathematically ill-defined results
    """
    try:
        return a / b
    except:
        return 1E100

In [80]:
print(safe_divide(1, 2))
print(safe_divide(1, 0))

0.5
1e+100


### What about errors that we didn't expect?


In [81]:
safe_divide (1, '2')

1e+100

### It's good practice to always catch errors explicitly:

All other errors will be raised as if there were no try/except clause.


In [82]:
def safe_divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        return 1E100

In [83]:
safe_divide(1, '2')

TypeError: unsupported operand type(s) for /: 'int' and 'str'

## Throwing Errors

- When your code is executed, make sure that it's clear what went wrong in case of errors.
- Throw [specific errors built into Python](https://docs.python.org/3/tutorial/errors.html)
- Write your own error classes

In [84]:
raise RuntimeError("my error message")

RuntimeError: my error message

## Specific Errors

In [None]:
def safe_divide(a, b):
    if (not issubclass(type(a), float)) or (not issubclass(type(b), float)):
        raise ValueError("Arguments must be floats")
    try:
        return a / b
    except ZeroDivisionError:
        return 1E100

In [None]:
safe_divide(1, '2')

## Accessing Error Details

In [85]:
import warnings

def safe_divide(a, b):
    if (not issubclass(type(a), float)) or (not issubclass(type(b), float)):
        raise ValueError("Arguments must be floats")
    try:
        return a / b
    except ZeroDivisionError as err:
        warnings.warn("Caught Error {} with message {}".format(type(err),err) + 
                " - will just return a large number instead")
        return 1E100
    
    

In [86]:
safe_divide(1., 0.)

  # Remove the CWD from sys.path while we load stuff.


1e+100

# Loading Modules: the ``import`` Statement

- Explicit imports (best)
- Explicit imports with alias (ok for long package names)
- Explicit import of module contents
- Implicit imports (to be avoided)

## Creating Modules

- Create a file called [somefilename].py
- In a (i)python shell change dir to that containing dir
- type 

```python
import [somefilename]
```

Now all classes, functions and variables in the top level namespace are available. 

Let's assume we have a file `mymodule.py` in the current working directory with the content:

```python
mystring = 'hello world'

def myfunc():
    print(mystring)
```

In [87]:
import mymodule
mymodule.mystring

'hello world'

In [88]:
mymodule.myfunc()

hello world


## Explicit module import

Explicit import of a module preserves the module's content in a namespace.

In [89]:
import math
math.cos(math.pi)

-1.0

## Explicit module import with aliases

For longer module names, it's not convenient to use the full module name. 

In [90]:
import numpy as np
np.cos(np.pi)

-1.0

## Explicit import of module contents
You can import specific elements separately. 

In [91]:
from math import cos, pi
cos(pi)

-1.0

## Implicit import of module contents
You can import all elements of a module into the global namespace. Use with caution.

In [92]:
cos = 0
from math import *
sin(pi) ** 2 + cos(pi) ** 2

1.0

# File IO and Encoding


- Files are opened with ``open``
- By default in ``'r'`` mode, reading text mode, line-by-line

## Reading Text


In [93]:
path = 'umlauts.txt'
f = open(path)
lines = [x.strip() for x in f]
f.close()
lines

['Eichhörnchen', 'Flußpferd', '', 'Löwe', '', 'Eichelhäher']

In [94]:
# for easier cleanup
with open(path) as f:
    lines = [x.rstrip() for x in f]
lines

['Eichhörnchen', 'Flußpferd', '', 'Löwe', '', 'Eichelhäher']

## Detour: Context Managers

Often, like when opening files, you want to make sure that the file handle gets closed in any case.

```python
file = open(path, 'w')
try:
    # an error
    1 / 0
finally:
    file.close()
```

Context managers are a convenient shortcut:
```python
with open(path, 'w') as opened_file:
    # an error
    1/0
```

## Writing Text


In [95]:
with open('tmp.txt', 'w') as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)
[x.rstrip() for x in open('tmp.txt')]

['Eichhörnchen', 'Flußpferd', 'Löwe', 'Eichelhäher']

## Reading Bytes


In [96]:
# remember 't' was for text reading/writing
with open(path, 'rt') as f:
    # just the first 6 characters
    chars = f.read(6)
chars

'Eichhö'

In [97]:
# now we read the file content as bytes
with open(path, 'rb') as f:
    # just the first 6 bytes
    data = f.read(6)

In [98]:
# byte representation
data.decode('utf8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 5: unexpected end of data

In [99]:
# decoding error, utf-8 has variable length character encodings
data[:4].decode('utf8')

'Eich'