# Writing Python Programs

-----

In this notebook, we introduce two important python concepts: python modules and error and exception handling.

## Table of Contents
[Python Modules](#Python-Modules)  

[Errors and Exceptions](#Errors-and-Exceptions)  


-----
[[Back to TOC]](#Table-of-Contents)

## Python Modules

As the Python language has become more popular, individuals and organizations have invested considerable time, energy, and effort in developing Python applications. One hallmark of good application design is _reusability_. After all, who wants to rewrite the same code multiple times to solve the same problem? Fortunately, the Python language supports encapsulating code into [modules](https://docs.python.org/3/tutorial/modules.html), which are essentially files containing Python definitions, for example, functions, classes, or variables. A _module_ can be imported into another Python file, allowing the definitions to be reused. 

When one or modules are more widely used, they can be bundled together into a Python package, which can provide enhanced functionality. To import a package (or module) into another Python program, you use the `import` statement, which has the following forms:

1. `import numpy`
2. `import numpy as np`
3. `from numpy import arange`
4. `from numpy import *`

The first form brings the entire contents of the NumPy package into the current program but leaves all items in the NumPy namespace. Thus, to refer to a particular definition, like `arange` one must use the `numpy` prefix, as in `numpy.arange()`. The second form is similar to the first, but the prefix has been shortened to `np`. Thus, you refer to a definition, like `arange` by using `np.arange()`. The third form only imports the single, listed definition, which is also brought into the current namespace and thus does not require any prefix. The last form brings the entire contents of the NumPy package into the current file and namespace. As a result, the chances for name collisions increase, and thus the last form is strongly discouraged. 

Note that in this course we generally use the first form for system libraries, like `sys` (i.e., `import sys`), and the second form for third-party libraries, like `numpy` (i.e., `import numpy as np`). In general, the appropriate `import` format will be demonstrated in future lessons for each module when it is introduced.

Many popular packages have been included with the standard Python distributions and are known collectively as the Python [standard library][psl]. This includes items like [`sys`][ps] for working with the computer system directly, [`math`][pm] to work with mathematical constants and functions, [`csv`][6] to work with comma-separated value files, and [`datetime`][dt] to manipulate dates and times. 

Other packages are available from third parties and can be very useful in specific circumstances. The following table lists some of the more popular Python packages that are relevant for this course: 

| name              | Description                              |
| ----------------- | ---------------------------------------- |
| [numpy][1]        | Fast numerical arrays and matrices       |
| [scipy][2]        | Comprehensive set of scientific and engineering functions |
| [matplotlib][3]   | Comprehensive plotting library           |
| [seaborn][4]      | Better data plotting                     |
| [pandas][5]       | Data structures and simplifies data analysis tasks |
| [scikit_learn][8] | Provides machine learning tools          |

In addition to these listed packages, many other packages exist. The official repository for public Python packages is PyPI, the [Python Package Index][pypi]. These libraries can generally be installed with [pip][pip], the Python package management tool, or with [conda][ac]; however, the details of doing this are beyond the scope of this course.

-----
[psl]: https://docs.python.org/3/library/index.html
[1]: http://www.numpy.org
[2]: http://www.scipy.org/scipylib/index.html
[3]: http://matplotlib.org
[4]: http://web.stanford.edu/~mwaskom/software/seaborn/index.html
[5]: http://pandas.pydata.org
[6]: https://docs.python.org/3/library/csv.html
[8]: http://scikit-learn.org/stable/index.html
[pypi]: https://pypi.python.org/pypi
[pip]: https://python-packaging-user-guide.readthedocs.org/en/latest/current.html

[ps]: https://docs.python.org/3/library/sys.html
[dt]: https://docs.python.org/3/library/datetime.html
[pm]: https://docs.python.org/3/library/math.html

[ac]: https://conda.io/docs/

### Working with Modules

Working with modules is extremely easy in Python. First, we must import the module. As stated earlier, for modules in the [standard Python library](https://docs.python.org/3/tutorial/stdlib.html), the standard approach is to simply import the module:

```python
import sys
import os
import math
import re
```

For non-standard library modules, including those we create, the standard practice is to provide a shorthand name for the module during the import:

```python
import my_module as mm
import numpy as np
```

In the following Code cell, we demonstrate how to import the `sys` module, which enables a programmer to interact with the Python  interpreter. To demonstrate working with this module, we display the `version_info` for our Python interpreter. In the second Code cell, we demonstrate how to import the `collections` module. With this module we create and display a `Counter` data structure, which maintains the count for each item in the collection. In addition to viewing the online documentation for a module, you can also use the Python interpreter to learn about a module by using the built-in `help` function. You also can see a concise list of defined objects, such as variables and functions, for a module by using the built-in `dir` function.

-----

In [1]:
import sys

print(sys.version_info)

sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)


In [2]:
import collections

my_list = [1, 2, 3, 4, 2, 3, 4, 5, 1, 4, 3, 2,4, 5]

cnt = collections.Counter(my_list)
print(cnt)

Counter({4: 4, 2: 3, 3: 3, 1: 2, 5: 2})


### `dir()` and `help()`

The two Python built in functions `dir()` and `help()` are very handy in helping us understand how a module or functions in a module work. First, let's use `help()` function to help us understand `dir()` function.

In [3]:
help(dir)

Help on built-in function dir in module builtins:

dir(...)
    dir([object]) -> list of strings
    
    If called without an argument, return the names in the current scope.
    Else, return an alphabetized list of names comprising (some of) the attributes
    of the given object, and of attributes reachable from it.
    If the object supplies a method named __dir__, it will be used; otherwise
    the default dir() logic is used and returns:
      for a module object: the module's attributes.
      for a class object:  its attributes, and recursively the attributes
        of its bases.
      for any other object: its attributes, its class's attributes, and
        recursively the attributes of its class's base classes.



We will demonstrate the two functions with `math` module. We will use `dir()` function to list all attributes and functions in the `math` module. We can ignore names start with `_` which are private to the module and not suppose to be used outside of the module. Once we know the attribute and function names, we can use `help()` function to check out the details of them. We will check out help document of `sqrt` as an example.

In [4]:
import math
dir(math)

['__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 'hypot',
 'inf',
 'isclose',
 'isfinite',
 'isinf',
 'isnan',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'nan',
 'pi',
 'pow',
 'radians',
 'remainder',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'tau',
 'trunc']

In [5]:
help(math.sqrt)

Help on built-in function sqrt in module math:

sqrt(x, /)
    Return the square root of x.



From the help document we can see that `sqrt` is a function that takes one argument and returns square root of the argument.

You can apply `dir()` on a data type, class or an object. For example, if you want to know functions of a string, you can execute `dir(str)` or simply pass a string object to `dir()`, for example, `dir("a")`. `help()` also works this way. If you want to see how string function `upper()` works, you may execute `help(str.upper)` or `help('a'.upper)`.

In [6]:
#view attributes and functions of string
dir("a")

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


In [7]:
#check out help document of string function upper()
help('a'.upper)

Help on built-in function upper:

upper() method of builtins.str instance
    Return a copy of the string converted to uppercase.



-----

<font color='red' size = '5'> Student Exercise </font>

In the following Code cell, use `help` and `dir` to explore Python module `sys`. How does the output compare to the [official online documentation](https://docs.python.org/3/library/sys.html)?

-----

-----
[[Back to TOC]](#Table-of-Contents)


## Errors and Exceptions

There are two types errors in python: syntax errors and exceptions.

### Syntax Errors
Syntax errors, also known as parsing errors, are perhaps the most common kind of complaint you get while you are still learning Python:

<img src='images/syntax_error.png' />

The parser repeats the offending line and displays a little ‘arrow’ pointing at the earliest point in the line where the error was detected. The error is caused by (or at least detected at) the token preceding the arrow: in the example, the error is detected at the end of for loop, since a colon (':') is missing. File name and line number are printed so you know where to look in case the input came from a script. Syntax errors must be fixed or the python code won't be executed.

### Exceptions

A well written program(without syntax errors) can still have errors. Errors detected at run time is called exceptions. Or data that is being analyzed can have errors. You may have encountered error messages when working with a Jupyter notebook, such as the error shown in the following screenshot:

![value_error](images/value_error.png)

Above code tries to add up list of sales to get total sales. Since the sales figures are in string format, it's converted to float first with `float(s)` which can't handle `$99` because of the `$`. Thus a specific `ValueError` was thrown. The **Traceback** also indicates the line that caused the exception to be thrown. Often this enables the error condition to be identified and corrected. In this case it is the string `$99` that caused the error. We may first strip the `$` from `s` before converting to float: `total_sales += float(s.replace('$', ''))`

-----

### Handling Exceptions

Unhandled exceptions will terminate python code unexpectedly. It is possible to write programs that handle selected exceptions. Python provides several statements that enable a programmer to gracefully handle exceptions as they arise. Formally, we use a try statement to identify a code block that might throw an exception. Following this code block, we have one or more except statements that catch an exception and either try to fix the problem or log the error so that the problem can be corrected at a later time.
Thus, this exception handling mechanism is coded in the following manner:
```
try:
    # Do something that might cause exceptions
except ExceptionName:
    # Do something to recover or log the problem when ExceptionName occurs
```
In the following Code cell, we use this exception handling mechanism to handle division by zero error. Now, however, we catch the ZeroDivisionError exception and set our answer to None since no valid value could be determined, and print a descriptive message (in a production system we would write a similar message to a log file).

In [8]:
# Simple program to demonstrate exception handling

numerator = 1.0
denominator = 0.0

try:
    answer = numerator / denominator    
except ZeroDivisionError:
    answer = None
    print('You tried to divide by zero!')

print(f'My Answer = {answer}')

You tried to divide by zero!
My Answer = None


-----

This simple example demonstrated the basic approach to handling exceptions. However, in general, we won't intentionally cause an exception to occur. Thus, we must be more defensive when writing code and place critical code inside a `try` code block, followed by any relevant exceptions handling blocks. In this case, the first matching exception code block is used to handle the appropriate exception.

The Python language also provides two optional statements for use in exception handling: the `else` statement and the `finally` statement. The `else` statement is executed if no exception handling code blocks are executed. The `finally` statement, on the other hand, is executed whether an exception was handled or not, and is generally used to perform any action that is required whether an exception occurred or not, such as closing a file.

-----

### Exception Handling Case Study

Datetime is one of the most common feature in business related dataset. Datetime is normally stored as strings with different format, ie. `Mar 31, 2019`, `2019-03-31 13:00:00`, `31.03.2019`. We usually need to convert datetime strings to python `datetime` object. Python `datetime` class has a function [`strptime`](https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior) specifically for this task.
Following code will convert `2019-03-31 13:00:00` to datetime object. Note that `datetime` module has many classes, of which we will use `datetime` class. That's why we import the `datetime` class with `from datetime import datetime`. It's an coincidence that the class name is same as module name. There're other useful date time related classes in the module, ie. `timedelta`

In [9]:
from datetime import datetime
dt_str = '2019-03-31 13:00:00'
dt = datetime.strptime(dt_str, '%Y-%m-%d %H:%M:%S')
dt

datetime.datetime(2019, 3, 31, 13, 0)

In above code, `%Y-%m-%d %H:%M:%S` defines the format of the datetime string:
- %Y: Year with century or 4 digit year
- %m: Month
- %d: Day of the month
- %H: Hour in 24-hour clock
- %M: Minute
- %S: Second

Please refer to python [datetime](https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior) document for complete list of date time format.

If the datetime string doesn't match to the format string, an exception ValueError will be raised as show below:

![dt_value_error](images/datetime_value_error.png)

In the following code cell we'll create a function that can convert datetime string with different format to datetime object, with help of python exception handling.

In [10]:
def get_datetime(dt_str):
    '''
    Convert dt_str to datetime object
    Return: datetime object
    '''
    try:
        #'2019-03-31 13:00:00'
        return datetime.strptime(dt_str, '%Y-%m-%d %H:%M:%S')
    except:
        pass
    try:
        #'03/31/2019'
        return datetime.strptime(dt_str, '%m/%d/%Y')
    except:
        pass
    try:
        #'Mar 31, 2019'
        return datetime.strptime(dt_str, '%b %d, %Y')
    except:
        pass
    try:
        #'31.03.2019'
        return datetime.strptime(dt_str, '%d.%m.%Y')
    except:
        pass

    print ("Can't convert", dt_str)
    return None

In [11]:
get_datetime('2019-03-31 13:00:00')

datetime.datetime(2019, 3, 31, 13, 0)

In [12]:
get_datetime('03/31/2019')

datetime.datetime(2019, 3, 31, 0, 0)

In [13]:
get_datetime('Mar 31, 2019')

datetime.datetime(2019, 3, 31, 0, 0)

In [14]:
get_datetime('31.03.2019')

datetime.datetime(2019, 3, 31, 0, 0)

In [15]:
get_datetime('31-03-2019')

Can't convert 31-03-2019


In above example, `get_datetime(dt_str)` function will take one argument, dt_str, which is a datetime string, and convert it to python datetime object. Each try-except block deals with one kind of datetime format. `except` clause without exception name will catch all exceptions. In this function, the exception handling code is simply `pass` which means do nothing, just go to next line. We are able to handle 4 different datetime format. If a datetime string is in the 4 formats included in this function, a corresponding datetime object will be returned. Otherwise, a message will be printed and the function will return None.

-----

<font color='red' size = '5'> Student Exercise </font>

Improve above function `get_datetime(dt_str)` so that it can also handle date string `'31-03-2019'`.

-----

-----

## Ancillary Information

The following links are to additional documentation that you might find helpful in learning this material. Reading these web-accessible documents is completely optional.

1. Python3 tutorial section on [modules](https://docs.python.org/3/tutorial/modules.html)

2. The RealPython site has a nice tutorial on [modules](https://realpython.com/python-modules-packages/#reloading-a-module)

3. Python3 Tutorial section on [errors](https://docs.python.org/3/tutorial/errors.html)

5. The book _Dive into Python_ provides a detailed on [writing a Python program](http://www.diveintopython3.net/your-first-python-program.html)

6. The book, A Byte of Python, has a section on [exceptions](https://python.swaroopch.com/exceptions.html)

-----

**&copy; 2019: Gies College of Business at the University of Illinois.**

This notebook is released under the [Creative Commons license CC BY-NC-SA 4.0][ll]. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.

[ll]: https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode