# Object-Oriented Programming in Python

Programming languages are or are not object-oriented based on four main vectors: [encapsulation](#encapsulation), [data abstraction](#data-abstraction), [inheritance](#inheritance), and [polymorphism](#polymorphism).

## Encapsulation

Encapsulation is the hiding of implementation details of an object. Such details include attributes and/or methods that are used in the internal implementation and manipulation of an object. The idea is to allow objects to be implemented and manipulated _without_ exposing exactly how they are implemented and manipulated.

As we discussed last week, there is an agreement within the Python community that if something, say an attribute or method, is prefixed with an `_`, it’s “private” and shouldn’t be accessed. The upshot is that, if you need to change a “private” attribute or method, you won’t end up breaking anyone’s code (unless they’re accessing those `_`-prefixed attributes or methods, in which case it’s their own fault if their code breaks).

The upshot of encapsulation in OOP is the same: if you need to change an implementation detail, you should be able to do so without breaking anyone’s code.

Python doesn’t enforce strong encapsulation, which is consistent with the nothing-is-truly-hidden design of the language. As [Guido](https://en.wikipedia.org/wiki/Guido_van_Rossum) once said: “we’re all consenting adults here”. Instead of strong encapsulation being built into the language, encapsulation emerges from the community’s agreement to honor `_`-prefixed attributes, methods, and variables as private.

## Data Abstraction

Data abstraction is closely related to, and in some ways made possible by, encapsulation. Data abstraction is the defining of classes, objects, attributes, and methods in terms of their interfaces and functionality as opposed to their actual implementation details.

Because we can hide implementation details via encapsulation, we can represent objects, and their properties and functions, in an abstracted form, not altogether unlike abstraction in art. Abstraction hides all but the relevant information about an object, reducing its complexity and simplifying its use; once abstracted, an object is a representation of its original, with unwanted or unnecessary details (i.e., implementation details) omitted.

For example, let’s say I’ve arranged to have coffee with someone that I’ve never met before. We’ve agreed on a time and place, but that is not enough information for the person I’m meeting to find me. So, I send them a message that says “I’m the girl with black glasses and the rainbow scarf sitting in the back corner”. I _don’t_ send them a message that says “My last name is Adams and pigeons are my favorite animals”. The latter information won’t help the person identify me, while the former information will. “The girl with black glasses and a rainbow scarf” is an abstraction of me as a person, with the most effective details for recognizing me in a coffee shop represented. “The girl with the last name Adams whose favorite animals are pigeons” is another abstraction of me as a person, with the most effective details for _some other function_ represented.

Each object has many possible abstractions, and each of those abstractions will involve the representation of different information and details. When representing objects, methods, and attributes in code, you should always aim to represent the right details for a particular interface or piece of functionality in the right way.

## Inheritance

Inheritance refers to the “is a” relationship between objects. Take, for example, a library. (This is a pretty common example of inheritance in OOP, and I can’t quite remember where I heard/read it!)

A library has a set of assets that can be loaned to a set of patrons. These assets could include books, journals, audio recordings, and so on. Though each of these objects is an asset of the library, they aren’t identical. A book has an ISBN, while a journal has a DOI, and an audio recording has a play length. Because of these difference, each of these objects should be represented by their own class.

However, unless they can all _inherit_ from a more general class of library assets, we would have to re-define and re-implement the details that are common to them all in each class definition, such as: the title, the date of acquisition, the replacement cost, and whether the object is currently checked out or is available for checkout. Rather than duplicating the functionality and details common to all library assets, we can define a “superclass” or a “base class” that books, journals, audio recordings, and microfilm can all inherit common functionality and details from. (Inheritance is one of the best ways to keep your code D.R.Y.)

In [1]:
from datetime import date, datetime, timedelta

class LibraryAsset(object):

    def __init__(self, title, acquisition_date,
                 publication_date, replacement_cost):
        self.title = title
        self.acquisition_date = acquisition_date
        self.publication_date = publication_date
        self.replacement_cost = replacement_cost

        # Initialize assets as available for checkout
        self.availability = True
        # Initialize the borrower as None
        self.borrower_id = None
        # Initialize due date as None
        self.due = None

    def check_availability(self):
        # some code that checks availability, perhaps by querying a database
        return self.availability

    def check_overdue(self):
        self.overdue = bool(self.due < datetime.now())
        return self.overdue

    def check_out(self, borrower_id):
        if self.check_availability():
            self.availability = False
            self.borrower_id = borrower_id
            self.due = datetime.now() + timedelta(30)

    def check_in(self):
        self.borrower_id = None
        self.availabililty = True



class Book(LibraryAsset):

    def __init__(self, title, acquisition_date, publication_date,
                 replacement_cost, author, isbn, pages):
        super(Book, self).__init__(title, acquisition_date,
                                   publication_date, replacement_cost)
        self.author = author
        self.isbn = isbn
        self.pages = pages
        
    def __len__(self):
        return self.pages


class Journal(LibraryAsset):

    def __init__(self, title, acquisition_date, publication_date,
                 replacement_cost, doi, issue, pages):
        super(Journal, self).__init__(title, acquisition_date,
                                       publication_date, replacement_cost)
        self.doi = doi
        self.issue = issue
        self.pages = pages


class AudioRecording(LibraryAsset):

    def __init__(self, title, acquisition_date, publication_date,
                 replacement_cost, recording_date, play_length):
        super(AudioRecording, self).__init__(
            title, acqusition_date, publication_date, replacement_cost)
        self.recording_date = self.recording_date
        self.play_length = play_length
        
class BareClass(LibraryAsset):
    
    def return_title(self):
        return self.title

In [2]:
new_book = Book(title='I have no init!', acquisition_date=1, publication_date=0, replacement_cost=100, author='Eva', isbn=12345, pages=500)
len(new_book)

500

We’ll discuss `super()` in the section on [Python Classes](#python-classes-and-oop).

Now when we create a new instance of the `Book` class, it will automatically have the methods `check_availability()`, `check_out()`, and `check_in()`, without having had to specify them in our definition of the `Book` class.

In [3]:
revolutionary_road = Book(title = 'Revolutionary Road',
                          author = 'Richard Yates',
                          isbn = '0-375-70844-8',
                          pages = 337,
                          acquisition_date = date(2015, 6, 30),
                          publication_date = date(1961, 12, 31),
                          replacement_cost = '$14.00')

In [4]:
print revolutionary_road.availability
len(revolutionary_road)

SyntaxError: Missing parentheses in call to 'print'. Did you mean print(revolutionary_road.availability)? (<ipython-input-4-6a0b04a83978>, line 1)

In [5]:
revolutionary_road.check_out(borrower_id=1)
revolutionary_road.availability

False

In [6]:
revolutionary_road.due

datetime.datetime(2019, 10, 10, 15, 22, 2, 390800)

## Polymorphism

In OOP, polymorphism refers to a situation in which a single method name is used to refer to multiple methods, each with slightly different functionality. This is usually the most difficult of the OOP principles to fully grasp at the beginning.

For example, the `+` operator can operate on integers, floats, strings, etc. While the `+` never changes, its functionality changes based on the type of operands it receives.

With respect to inheritance, we can extend the `check_out()` method on each of our child classes to execute slightly different behavior. Let’s say journals cannot be checked out overnight.

In [7]:
class Journal(LibraryAsset):

    def __init__(self, title, acquisition_date, publication_date,
                 replacement_cost, volume, issue, pages):
        super(Journal, self).__init__(title, acquisition_date,
                                      publication_date, replacement_cost)
        self.volume = volume
        self.issue = issue
        self.pages = pages

    def check_out(self, borrower_id):
        super(Journal, self).check_out(borrower_id)
        self.due = datetime.combine(date.today(), datetime.max.time())

Now, the `Journal` class’s `check_out()` method will execute the `check_out()` method of the base `LibraryAsset` class, but will override the attribute `due` to be midnight on the current date as opposed to 30 days from when it was checked out.

In [8]:
nature = Journal(title = 'Nature Plants',
                 acquisition_date = date(2015, 1, 9),
                 publication_date = date(2015, 1, 8), 
                 replacement_cost = '$5.00',
                 volume = 1,
                 issue = 1,
                 pages = 108)

In [9]:
nature.check_out(borrower_id=1)
nature.due

datetime.datetime(2019, 9, 10, 23, 59, 59, 999999)

We could also define a method `calculate_overdue_fine()` on each child class that returns a different fine for the amount of time the item is overdue, based on the type of object it is (a.k.a., based on what class it is).

Polymorphism can allow you to write functionality that is agnostic to the type of object that it operates on.

In [10]:
def send_fine_email(item, borrower_id):
    email_body = 'Dear patron, you owe %s in overdue fines for %s.' % (item.calculate_overdue_fine(), item.title)
    # some code to actually send the email

Without polymorphism of the method `calculate_overdue_fine()`, this function would be much less flexible:

In [None]:
def send_fine_email(item, borrower_id):
    email_body_format = 'Dear patron, you owe {0} in overdue fines for {1}.'
    if isinstance(item, Book):
        email_body = email_body_format.format(item.calculate_book_overdue_fine(), item.title)
    elif isinstance(item, Journal):
        email_body = email_body_format.format(item.calculate_journal_overdue_fine(), item.title)
    elif isinstance(item, AudioRecording):
        email_body = email_body_format.format(item.calculate_audio_recording_overdue_fine(), item.title)
    # some code to actually send the email

This type-checking may not seem overly burdensome in an example with only three types of library assets, but imagine if there were hundreds of types of library assets?

Another example of polymorphism is the behavior of the `+` operator. Integers, floats, and strings can all be operated on with the `+` operator, but the behavior is different. This is because the `__add__` method (which is called when you use the `+` operator) is polymorphic. If this wasn’t true, we would either have to have three different permutations of the `+` operator for these three different types of objects, or the underlying code that provides the functionality for the `+` operator would resemble the code above, with different behavior specified in `if` and `elif` clauses.

## Python Classes and OOP

You may have read or been told at some point that everything in Python is an object. All this means, in Python at least, is that everything can be assigned to a variable and/or passed as an argument to a function. Modules, functions, classes—they’re all objects, and most of them have methods and attributes.

Classes are a fundamental part of the Python language. In this section, we’re going to discuss what classes are, when to use them, and how they can be useful.

### What is a class?

Very basically, a class is a grouping of data and functions. When defined within a class, these data are usually referred to as attributes, and these functions are usually referred to as methods. In theory, we could group any attributes and functions we wanted together into a class, and this would technically be object-oriented programming. In essence, however, objet-oriented programming is the grouping together of attributes and functions according to logical connections between things, leveraging the four principles above to do so.

The `class` keyword is used to define classes, as the `def` keyword is used to define functions. When defining a class, you are designing a sort of blueprint for creating objects of that class. You are not actually creating objects of that class.

To create an object of a class, you need to call the class’s `__init__` method with the proper number of arguments. To call the `__init__` method, we call the class name as if it’s a function, and provide any arguments as if they were function arguments. This returns an instance of the class, which we can assign to a variable.

#### What is `self`?

The `self` argument in class methods refers to the instance of the class. When defining a class method `def some_method(self, arg)`, we are defining an abstract function that applies to all instances of our class, the first argument of which is an instance of that class that is provided when the method is actually called on a class object. To clarify, `some_instance.some_method(arg)` is the same as `SomeClass.some_method(some_instance, arg)`, though you’ll rarely see the latter. Both execute the method `some_method` on the instance `some_instance` of the class `SomeClass`.

#### Instance Attributes versus Class Attributes

Instance attributes are defined at, well, the instance level, and are usually defined within the class’s `__init__` method.
```python
def __init__(self, arg):
    self.arg = arg  # self.arg is an attribute of the instance self
```

If all instance attributes are defined within a class’s `__init__` method, new instances of that class are said to be fully initialized, otherwise they are said to be partially initialized. If you notice that you’re writing documentation that specifies that someone using the class needs to call a method that sets one or more instance attributes before they can call a method that uses those attributes, new instances of your class are partially initialized. Sometimes this is okay, but consider how you could fully initialize your class so that knowledge of what methods need to be called before what other methods can be abstracted away.

There are also class attributes that are defined at the class level.

In [None]:
class Life(object):
    meaning = 42  # meaning is an attribute of the class Life

You can access the `meaning` attribute with or without creating an instance of the class `Life`. Class attributes are useful for defining attributes that hold for all instances of the class. In this case, no matter what kind of `Life` instance we have, its meaning is always 42.

In [None]:
janes_life = Life()
janes_life.meaning

#### Instance Methods versus Static Methods versus Class Methods

As with attributes, there are both instance methods and class methods. There are also static methods. Static and class methods are most often used in connection with [inheritance](#inheritance).

##### Static Methods

Like class attributes, static methods are accessible even if no instance of the class has been created. In fact, static methods don’t have access to instances of the class (a.k.a., `self`s). Static methods are denoted with the `@staticmethod` decorator.

In [None]:
class Life(object):

    @staticmethod
    def the_meaning_of():
        return 42

In [None]:
Life.the_meaning_of()

##### Class Methods

Instance methods take `self` (a.k.a., an instance) as the first argument, while class methods take the class itself as the first argument. Like class attributes and static methods, class methods are accessible even if no instance of the class has been created. Also like static methods, class methods are denoted with a decorator: `@classmethod`.

In [None]:
class Life(object):

    meaning = 42  # class attribute

    @classmethod
    def has_meaning(cls):
        return cls.meaning == 42

In [None]:
Life.has_meaning()

### When do I use a class?

It is always up to you, and you will develop a feel for this as you read and write more code, but here are a few circumstances in which you should consider using classes:

1. You have multiple, closely-related functions that share state, and you’re passing and forwarding many arguments to and from each function. 
	+ Because each method in a class has access to all instance and class attributes, via `self` or `cls`, you don’t have to pass these attributes as arguments to each method.
2. You have more than one copy of the same state variable(s).
	+ Each instance of a class has its own fully-defined state, which means instances don’t have to share the same state variables. Especially because Python is a [pass-by-object-reference](#pass-by-object-reference) language (which we’ll get to next), sharing state variables isn’t a great idea.
3. You’re using a library (for example, `unittest` or `sqlalchemy`), where the classes are meant to be sub-classed and extended in normal use.

More broadly, you should use classes to minimize the coupling between individual parts of your code _without_ sacrificing the D.R.Y.ness of your code (i.e., without duplicating code).

Type checking, like the example at the end of the section on [polymorphism](#polymorphism), is an indication that parts of your code are too tightly coupled together. Every time we wanted to add a new type of library asset, we had to add an `elif` clause to _any_ function that varied with the type of asset passed to it. This makes extending your code to new use cases painful, and if extending your code to new use cases is painful, you won’t do it, and your code will start gathering dust, so to speak.

#### Abstract Base Classes

The builtin [`abc`](https://docs.python.org/2/library/abc.html) module provides support for defining abstract base classes. We’re not going to get into it here, but you can read about why abstract base classes were added to Python in [PEP 3119](https://www.python.org/dev/peps/pep-3119/). Unlike non-abstract base classes, which you _can_ create instances of, you cannot create instances of abstract base classes. Sometimes disallowing this behavior is useful/needed.

### Example: Python client for the Enigma API

Below is a very simple Python client for the Engima API that provides an example of a Python class.

In [None]:
import datetime
import json
import os
import requests

API_KEY = os.environ.get('ENIGMA_API_KEY')
API_ENDPOINT = 'https://api.enigma.io'
API_VERSION = 'v2'


class EnigmaAPI(object):

    '''docstring for EnigmaAPI'''

    _param_mapping = {
        'meta': ['page'],
        'data': ['limit', 'select', 'search',
                 'where', 'conjunction', 'sort', 'page'],
        'stats': ['select', 'operation', 'by', 'of', 'limit',
                  'search', 'where', 'conjunction', 'sort', 'page'],
        'export': ['select', 'search', 'where', 'conjunction', 'sort']
    }

    def __init__(self, client_key=API_KEY, endpoint=API_ENDPOINT, version=API_VERSION):
        self.client_key = client_key
        self.endpoint = endpoint
        self.version = version

    def __repr__(self):
        return '<EnigmaAPI(endpoint={endpoint}, version={version})'.format(
            endpoint=self.endpoint, version=self.version)

    def _check_query_params(self, accepted_params, **kwargs):
        '''Returns True if the provided parameters are a subset of the accepted
        parameters for the endpoint, one of 'meta', 'data', 'stats', or
        'export', else returns False.

        ARGUMENTS
        ---------
        accepted_params     : a list of strings
        **kwargs            : a dictionary of keyword arguments corresponding to
                              provided query parameters and values
        '''
        if set(kwargs.keys()) - set(accepted_params):
            return False
        return True

    def _url_for_datapath(self, resource, datapath, **kwargs):
        '''Returns a string corresponding to the requested URL with any query
        parameters appended in the `?k=v&k=v`-separated pattern.

        ARGUMENTS
        ---------
        resource            : a string corresponding to the endpoint (one of
                              'meta', 'data', 'stats', 'export')
        datapath            : a string corresponding to the dataset requested
        **kwargs            : a dictionary of keyword arguments corresponding 
                              to the provided query paramters and values
        '''
        params = ['='.join([k, v]) for k, v in kwargs.iteritems()]
        return '/'.join([self.endpoint, self.version, resource,
                         self.client_key, datapath, '?', '&'.join(params)])

    def request(self, resource, datapath, **kwargs):
        '''Returns an HTTP response as decoded JSON.

        ARGUMENTS
        ---------
        resource            : a string corresponding to the endpoint (one of
                              'meta', 'data', 'stats', 'export')
        datapath            : a string corresponding to the dataset requested
        **kwargs            : a dictionary of keyword arguments corresponding 
                              to the provided query parameters and values
        '''
        if self._check_query_params(self._param_mapping[resource], **kwargs):
            self.request_datapath = self._url_for_datapath(
                resource, datapath, **kwargs)
        try:
            res = requests.get(self.request_datapath)
        except res.status_code != 200:
            return 'Request returned with status code: %s.' % res.status_code
        else:
            return res.json()

    def get_data(self, datapath, **kwargs):
        '''Returns an HTTP response from the data endpoint as decoded JSON.

        ARGUMENTS
        ---------
        datapath            : a string corresponding to the dataset requested
        **kwargs            : a dictionary of keyword arguments corresponding
                              to the provided query parameters and values
        '''
        return self.request('data', datapath, **kwargs)

    def get_metadata(self, datapath, **kwargs):
        '''Returns an HTTP response from the metadata endpoint as decoded JSON.

        ARGUMENTS
        ---------
        datapath            : a string corresponding to the dataset requested
        **kwargs            : a dictionary of keyword arguments corresponding
                              to the provided query parameters and values
        '''
        return self.request('meta', datapath, **kwargs)

    def get_stats(self, datapath, **kwargs):
        '''Returns an HTTP response from the stats endpoint as decoded JSON.

        ARGUMENTS
        ---------
        datapath            : a string corresponding to the dataset requested
        **kwargs            : a dictionary of keyword arguments corresponding
                              to the provided query parameters and values
        '''
        return self.request('stats', datapath, **kwargs)

    def get_export(self, datapath, **kwargs):
        '''Returns an HTTP response from the export endpoint as decoded JSON.

        ARGUMENTS
        ---------
        datapath            : a string corresponding to the dataset requested
        **kwargs            : a dictionary of keyword arguments corresponding
                              to the provided query parameters and values
        '''
        return self.request('export', datapath, **kwargs)

**QUESTIONS:**
+ Are new instances of the `EnigmaAPI` class fully or partially initialized?
+ What, if any, class attributes and class methods are there? Static methods?
+ In what ways does it or doesn’t it embody the four principles of object-oriented programming?

# Object Passing in Python

Chances are, if you don’t understand object passing in Python, eventually you’re going to write some code that behaves in a very unexpected way and you’re going to have a hard time figuring out why.

When you assign a variable to another variable in Python, what happens? Does each variable refer to a different object, or do both variables refer to the same object?

The two most common approach to object passing are pass-by-reference and pass-by-value. Python is neither; Python is pass-by-object-reference. Okay, so how are object references passed? They’re passed by value.

In the following scenarios, imagine that a variable is a box in which an object resides. Let’s say that box is named `list`, and the object inside the box is the list `[1, 2, 3]`. 

## Pass-by-reference

In a pass-by-refence setup, the box (a.k.a., the variable) is passed directly into a function, and its contents (a.k.a., the object represented by the variable) come with it. Anything that is done to the variable or the object represented by the variable within the function is visible to whatever calls the variable (which is referred to, simply, as the “caller”).

```python
>>> list = [1, 2, 3]
>>> def do_nothing(list):
...     list  # refers to the box named list and its contents
>>> do_nothing(list)
>>> print list  # refers to the same box named list and its contents
[1, 2, 3]
```
```python
>>> def add_item(list):
...     list.append(len(list) + 1)  # refers to the box named list and its contents
>>> add_item(list)
>>> print list  # refers to the same box named list and its contents
[1, 2, 3, 4]
```

## Pass-by-value

In a pass-by-value setup, a copy (stored at a different location in memory than the original) of the box and its contents are passed into a function. Anything that is done to the copy of the variable and the copy of the object represented by the variable within the function has no impact on the original variable and object.
```python
>>> def do_nothing(list):
...     list  # refers to a different box named list and its contents
>>> do_nothing(list)
>>> print list  # refers to the original box named list and its contents
[1, 2, 3]
```
```python
>>> def zero_out(list):
...     list = [0]  # refers to a different box named list and its contents
>>> zero_out(list)
>>> print list  # refers to the original box named list and its contents
[1, 2, 3]

>>> def add_item(list):
...     list.append(len(list) + 1)  # refers to a different box named list and its contents
>>> add_item(list)
>>> print list  # refers to the original box named list and its contents
[1, 2, 3]
```

## Pass-by-object-reference

Python is neither of these. In a pass-by-object-reference setup, a function receives a _reference to_ the same object (i.e., the original contents, the list `[1, 2, 3]`, at the same location in memory) as the caller, but it receives a _different_ box (i.e., not the original box named `list`). In essence, the object `[1, 2, 3]` represented by the original variable `list` is _simultaneously represented by_ another variable `list`. When passed a box, a function will provide its own box to represent the contents of the box that was passed to it.

In [None]:
def add_item(list):
    list.append(len(list) + 1)  # refers to a different box named list representing the contents of the original box 

In [None]:
list = [1, 2, 3]

In [None]:
add_item(list)
print list

In [None]:
def zero_out(list):
    list = [0]  # refers to a different box named list representing the contents of the original box
zero_out(list)
print list

In the `zero_out()` function, reassiging the function's box `list` to a different object `[0]` didn't work. In pass-by-object-reference, the caller doesn't care if you reassign a function's box. Different boxes, same content.



## `copy()`

When you assign a variable to another variable, you’re creating a different box containing the same object that the original variable represents. Often, this is not what you want; mutations on your newly assigned variable wind up reflected in the original variable as well. (Remember: different boxes, same content.)

To get around this, you can use the builtin [`copy` module](https://docs.python.org/2/library/copy.html). “For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other.”

In [None]:
list_a = ['A', 'B', 'C']
list_b = list_a
list_b.append('D')
list_a

In [None]:
import copy

list_a = ['A', 'B', 'C']
list_b = copy.copy(list_a)
list_b.append('D')
list_a

In [None]:
list_b

> **NOTE:** Dictionaries and sets have a `copy()` method that doesn't require the `copy` module.

In [None]:
d = {'name': 'Jane', 'id': 1}
d2 = d.copy()
d2['gender'] = 'Female'
d

In [None]:
d2

# Pandas, Continued

This section will consist mostly of examples and focus less on writing idiomatic Python (which the first and last sections of today are focused on). As much as possible, you should feel that you don’t have to reinvent the wheel to do things in Python that have a dedicated library or set of functions in R. Chances are, Python and/or Pandas has implemented it. The Pandas documentation is very good, and you can google any of the methods in this section and find it easily.

## Pivoting, Reshaping, and Melting

A common pattern in my team’s workflow is to go from a record-like format for doing transformations and summary statistics, to a melted format for plotting.

In [None]:
import numpy as np
import pandas as pd

# Create a fake dataframe for us to play with, with an index of timestamps
rng = pd.date_range('1/1/2015', periods=10, freq='H')
df = pd.DataFrame(data=np.random.randn(10, 5), index=rng, columns=['A', 'B', 'C', 'D', 'E'])

# Melt the DataFrame
df = df.reset_index()
melt_df = pd.melt(df, id_vars=['index'])
melt_df.head()

In [None]:
# "Unmelt" the dataframe (a.k.a., pivot it back to what it was)
unmelt_df = melt_df.pivot(index='index', columns='variable', values='value')
unmelt_df.head()

## `apply()`, `map()`, and `filter()`

These I also use a lot: to convert columns containing strings to dates, to truncate dates to weeks or quarters, manually create categorical variables, filter out certain groups, etc.

In [None]:
from datetime import date

df['date_str'] = df['index'].apply(lambda x: str(x.date()))

def f(x):
    return x**2

df['E^2'] = df['E'].apply(f)  # passing a named function instead of a lambda function
df.head()

In [None]:
num_df = pd.Series([1, 2, 3, 4, 5], index=['one', 'two', 'three', 'four', 'five'])
name_df = pd.Series(['uno', 'dos', 'tres', 'quatro', 'cinco'], index=[1, 2, 3, 4, 5])

print num_df
print name_df

In [None]:
num_df.map(name_df)

In [None]:
a_idx = melt_df.filter('varible', 'A')  # get the index labels where the variable == 'A'

## Discretization and Binning

Continuous values can be discretized using `cut()` and `qcut()`. The latter produces bins based on sample quartiles.

In [None]:
pd.cut(df['E^2'], 10)

In [None]:
pd.qcut(df['E^2'], [0, .25, .5, .75, 1])

In [None]:
factor = pd.cut(df['E^2'], 10)
pd.value_counts(factor)  # histogram counts by bin

## Random Sampling

In [None]:
large_df = pd.DataFrame(np.random.randn(1000, 5))
large_df.sample(n=15) # sample 15 observations

## Aggregations and group operations

Pandas `groupby()` is a key component of any split-apply-combine process. A split-apply-combine process, as you may already know, involves:
+ Splitting the data into groups based on some criteria
+ Applying a function to each group independently
+ Combining the results into a data structure

Splitting is usually the most straightforward part. In the apply step, maybe you want to aggregate, or transform observations within the group, or filter some groups out entirely. Finally, we want to bring our "applied" groups back into a single data structure, probably another DataFrame.

### Splitting

You can split a Pandas object into groups on any axis, rows or columns. When you create a `groupby` object in Pandas, it is a dictionary whose keys correspond to each group's label.

In [None]:
new_df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                              'foo', 'bar', 'foo', 'foo'],
                       'B' : ['one', 'one', 'two', 'three',
                              'two', 'two', 'one', 'three'],
                       'C' : np.random.random((8,)),
                       'D' : np.random.random((8,))})
gb = new_df.groupby('A') # group by column A
gb['C'].min() # first element of C within each group

In [None]:
gb.groups # list the (index of) observations in each group
gb.median() # sum of column C within each group

`groupby()` is one you just need to play around with to get used to, but I’ve found the syntax and behavior to be much better than in R. For instance, it’s very easy in Pandas to group by multiple columns, and to count unique observations within each group, both of which are a nightmare in R. 

In [None]:
gb = new_df.groupby(['A','B'])
gb['C'].nunique()

### Iterating Over Groups

Iterating over a `groupby` object is similar to iterating over other things in Python. Again, a `groupby` object is a dictionary with keys corresponding to the group's label and the values are the observations from your original dataframe that belong to that group.

In [None]:
for key, value in gb:
    print key # the group name
    print value # the rows in that group
    
gb['C'].count()

In [None]:
gb_df = pd.DataFrame(gb.sum())
gb_df.index

# statsmodels: The Basics

[statsmodels](http://statsmodels.sourceforge.net/) is a statistical and econometric analysis library for Python. In this section, we’re going to cover a few different models to illustrate how they differ from the same models in R.

The installation page will tell you to use `easy_install` to install statsmodels, but you can and should install it with pip instead. statsmodels depends on [scipy](), which you may need to install first. You may also need to install patsy, which is a dependecy for some parts of statsmodels.
```
$ pip install scipy
$ pip install statsmodels
$ pip install patsy
```

For this section, we’re also going to start using [matplotlib](http://matplotlib.org/), which can also be installed with pip.

`$ pip install matplotlib`

## Modeling

Modeling naturally lends itself to object-oriented design.
```python
class Model(object):

	def __init__(self, dependent, independent=None):
		self.dependent = dependent
		self.independent = independent

	def fit(self):
		…

	def predict(self):
		…
```

With this very basic blueprint, we could write many models that inherit from it, tuning the parameters to the `fit()` method depending on the specifics of the model.

Similarly, model results could inherit from a base `Results` class, extending the “out-of-the-box” functionality of the base class with model-specific metrics.

This is in fact how statsmodels implements models. When we use a statsmodels model, then, we are simply initializing and using a new instance of a subclassed `Model` object.

## Datasets

Like [scikit-learn](http://scikit-learn.org/stable/) and many libraries in R, statsmodels provides some datasets in the library itself; they’re available once you import the library via the `datasets` module.

In [None]:
import statsmodels as sm
dir(sm.datasets)

In fact, the `statsmodels.datasets` module has a method [`get_rdataset()`](http://statsmodels.sourceforge.net/devel/datasets/statsmodels.datasets.get_rdataset.html) for downloading and returning a dataset from an R package. I’ve never do this, but could be fun to play with!

## Ordinary Least Squares

Let's make a simple OLS model using one of the included datasets, the [Longley dataset](https://stat.ethz.ch/R-manual/R-patched/library/datasets/html/longley.html) of macroeconomic data for the U.S. There are only 16 observations in this dataset, so if you get a warning about a kurtosis test just ignore; the kurtosis test is only valid with at least 20 observations.

In [None]:
from statsmodels import datasets, regression, tools
import numpy as np
import pandas as pd

data = datasets.longley.load_pandas()
df = data['data']
df.head()

We need an intercept, so we need to add a column of `1`s. The `add_constant()` method will add a column to an ndarray or a Pandas DataFrame. If there’s already a column of `1`s, it will just return the ndarray or the DataFrame as is, otherwise it will return the ndarray or the DataFrame with a column of `1`s added.

In [None]:
y = df['TOTEMP']  # total employment
X = df['GNP']
X = tools.add_constant(X)  # add a constant term to fit the intercept

# Initialize a new OLS model with the cancer dataset
ols_model = regression.linear_model.OLS(y, X)
ols_results = ols_model.fit()
print ols_results.summary()

The contents of the results summary are also available as attributes of the results object `ols_results`. For instance, `params` gives you the intercept and slope coefficient of the fitted line.

In [None]:
ols_results.params

### Plotting Ordinary Least Squares

The contents of the results summary are also available as attributes of the results object `ols_results`. For instance, `params` gives you the intercept and slope coefficient of the fitted line.

We're going to generate 100 points (in the range of the original data) to predict the value of, and plot our regression line using those predicted values.

In [None]:
%pylab inline

In [None]:
X_new = np.linspace(df['GNP'].min(), df['GNP'].max(), 100) # 100 evenly spaced points between the min and max GNP
X_new = tools.add_constant(X_new)  # add a constant, as before

# Calculated the predicted values at the points in X_new using our OLS model
y_hat = ols_results.predict(X_new)

# Make a figure to plot on
plt.figure(figsize=(8,6))

plt.scatter(df['GNP'], df['TOTEMP'], label='Data')  # plot the raw data
plt.xlabel("Gross National Product")
plt.ylabel("Total Employment")
plt.plot(X_new[:, 1], y_hat, 'r.', label='OLS')  # add the regression line
plt.legend(loc='best')  # add a legend wherever is best

In [None]:
plt.savefig('emp_by_gnp.png')  # save the image

## Matplotlib

Before seeing some different models, we're going to talk about [Matplotlib](http://matplotlib.org/index.html) a bit. The Matplotlib API isn't my favorite. And, I confess, when I need to produce graphics quickly for our board or our clients, I will use R because `ggplot` makes the prettiest graphics. [yhat](http://ggplot.yhathq.com/) has begun the noble work of porting `ggplot` to Python, but there's still a lot of work that needs to be done.

##### `from some_module import *`

The Matplotlib documentation, and examples of Matplotlib on the Internet, take a lot of things for granted that can make debugging your plots a pain. In some of the documentation or code examples for Matplotlib, you'll experience something that you should never do, and you'll experience firsthand why: using `from some_module import *` to import everything from a module. **Don't do this.** It will clutter your namespace, and make it very difficult for someone reading your code to determine where a particular module _came_ from.
```python
from matplotlib.pyplot import *
from numpy import *

x = linspace(0, 1, 100)
```

What if that `linspace` doesn't return what you expect? Too bad! Impossible to say where it came from: the Python standard library, NumPy, or Matplotlib. You'll probably get to the source with some Googling, but this can become burdensome as your files get longer and as more modules are imported. Worse, there could be namespace collisions between the modules you're importing. What if you import everything from two different modules that both have a function named `linspace`?

If you need to import many submodules from a module, you can enclose a list of submodules in parentheses, which is  little nicer to look at than _n_ different import statements. Or, just import the module and alias it to a shorter name if you need. (We often do this with NumPy: `import numpy as np`.)
```python
from some_module import (module_a, module_b, module_c,
    module_d, module_e)
import some_module
```

### Matplotlib Gallery

The [Matplotlib Gallery](http://matplotlib.org/gallery.html) is your friend. There are dozens of example plots, and references for [line](http://matplotlib.org/examples/lines_bars_and_markers/line_styles_reference.html) and [marker](http://matplotlib.org/examples/lines_bars_and_markers/marker_reference.html) styles.

### Example: histogram

In [None]:
from matplotlib import pyplot as plt
import numpy as np


# Create some fake age data
mu = 46  # mean of distribution
sigma = 15  # standard deviation of distribution
x = mu + sigma * np.random.randn(1000)

plt.hist(x, facecolor='green', alpha=0.5)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title(r'Histogram of Age: $\mu={0}$, $\sigma={1}$'.format(mu, sigma))

In [None]:
n_bins = 23  # change the number of bins
plt.hist(x, n_bins, facecolor='green', alpha=0.5)
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title(r'Histogram of Age: $\mu={0}$, $\sigma={1}$'.format(mu, sigma))

In [None]:
from matplotlib import mlab  # mlab provides a set of numerical Python functions named after their MATLAB counterparts


# Assign the information returned by plt.hist to variables that we can use to
# compute the pdf
# n    : the values of the histogram bins
# bins : the edges of the bins
# _    : a "silent" list of patches used to create the histogram
# See: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist
n, bins, _ = plt.hist(x, n_bins, normed=1, facecolor='green', alpha=0.5)

# Normalize the bin heights and add a probability density curve
y = mlab.normpdf(bins, mu, sigma)  # normal pdf evaluated at 
plt.plot(bins, y, 'r--')  # add a dashed red line to indicate the pdf
plt.xlabel('Age')
plt.ylabel('Probability')
plt.title(r'Histogram of Age: $\mu={0}$, $\sigma={1}$'.format(mu, sigma))

In [None]:
mu_b = 35
sigma_b = 10
b = mu_b + sigma_b * np.random.randn(1000)


# Plot two histograms over each other
# Adjust the alpha parameter to make the histograms more or less transparent
plt.hist(x, facecolor='green', alpha=0.5, label=r'A: $\mu={0}$, $\sigma={1}$'.format(mu, sigma))
plt.hist(b, facecolor='blue', alpha=0.5, label=r'B: $\mu={0}$, $\sigma={1}$'.format(mu_b, sigma_b))
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.title('Histogram of Age')
plt.legend(loc='best')  # add a legend

### Example: scatterplot

In [None]:
# Generate some random data
x = np.random.randn(2, 100)
y = np.random.randn(2, 100)

plt.scatter(x, y, color='teal', s=25, alpha=0.5)  # adjust the s parameter to make the points bigger or smaller

In [None]:
areas = np.pi * (15 * np.random.rand(100))**2  # generate some random radii
plt.scatter(x, y, color='teal', s=areas, alpha=0.5)

### Example: subplots

In [None]:
x = np.linspace(-2, 2, 100) # let’s create meaningless data

plt.subplot(2, 2, 1) # 2x2 plots, selecting the 1st one
plot(x, x) # plot some data on the 1st subplot

plt.subplot(2, 2, 2) # 2x2 plots, selecting the 2nd one
plot(x, x**2) # plot some data on the 2nd subplot

plt.subplot(2, 2, 3) # 2x2 plots, selecting the 3rd one
plot(x, x**3) # plot some data on the 3rd subplot

plt.subplot(2, 2, 4) # 2x2 plots, selecting the 4th one
plot(x, x**4) # plot some data on the 4th subplot

## Multiple Linear Regression

Fitting a linear regression model with more than one predictor variable is almost exactly the same. Again, we need to add an intercept. For this example, we’ll use the [State Crime](http://statsmodels.sourceforge.net/0.6.0/datasets/generated/statecrime.html) from 2009 available in the `statsmodels.datasets` module. We'll look at the relationship between rates of violent crime and the following variables:
+ percent of the population living in urban areas
+ percent of the population living below the poverty line
+ percent of the population having at least a high school degree
+ percent of the population living in a single-parent home

Again, we can load this independent and dependent variables of this dataset straight into Pandas DataFrames.

In [None]:
from statsmodels import datasets, regression, tools

crime_df = datasets.statecrime.load_pandas()
crime_df.exog.head()

In [None]:
crime_df.endog.head()

In [None]:
crime_X = crime_df.exog[['urban', 'poverty', 'hs_grad', 'single']]
crime_y = crime_df.endog
crime_X = tools.add_constant(crime_X)  # add a constant term to fit the intercept

crime_ols_model = regression.linear_model.OLS(crime_y, crime_X)
crime_ols_results = crime_ols_model.fit()
print crime_ols_results.summary()

Including multiple predictor variables is as straightforward as Ordinary Least Squares regression. 

### Interaction Terms

For adding interaction terms, it’s worth looking into [fitting models using R-style formulas](http://statsmodels.sourceforge.net/devel/example_formulas.html) in statsmodels. Many of the datasets that come with statmodels have columns corresponding to interaction terms between the other columns in the dataset.

### Categorical Variables

To see an example of using categorical variables, let’s look at the [U.S. Macroeconomic Data](http://statsmodels.sourceforge.net/0.6.0/datasets/generated/macrodata.html) from 1959—2009. We'll model the relationship between the GDP and the following variables:
+ Seasonally adjusted unemployment rate
+ End-of-quarter total population
+ Inflation rate
+ Real interest rate

Let’s say we also wanted to include the quarter in our model. While quarters are given by numbers, they aren’t quantitative. We would like to be able to handle this naturally within our model. The support for R-style formulas is really helpful here. There are a couple of different approaches to encoding categorical variables, and statsmodels [supports](http://statsmodels.sourceforge.net/devel/contrasts.html) many of them.

In [None]:
import statsmodels.formula.api as smf  # R-style formulas for fitting models

econ_data = datasets.macrodata.load_pandas()

# Convert quarter to a categorical variable using pd.Categorical
# econ_data.data['quarter_c'] = pd.Categorical(econ_data.data['quarter']).codes
econ_data.data.head()

In [None]:
# The C() around quarter_c indicates that it should be treated as a categorical variable
# The * indicates that an interaction term should be included
econ_ols_model = smf.ols(formula="realgdp ~ C(quarter) + unemp + pop + infl + realint + infl*realint", data=econ_data.data)
econ_ols_results = econ_ols_model.fit()
print econ_ols_results.summary()

## Logistic Regression

There is a [great tutorial](http://www.ats.ucla.edu/stat/r/dae/logit.htm) on logistic regression in R from UCLA. We’re going to use the same dataset that they used with the same objective: can we identify which factors influence graduate school admissions?

The possible factors in the dataset are:
+ gpa
+ gre scores
+ rank (of the institution)


First, we'll load the dataset into a dataframe and get a feel for it.

In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

data = pd.read_csv("http://www.ats.ucla.edu/stat/data/binary.csv")
data.describe()

In [None]:
data.std()

In [None]:
pd.crosstab(data['admit'], data['rank'])  # frequency table of admission by rank

### Dummy Variables

Another useful thing to know how to do is create dummy variables. Dummy variables are boolean (i.e., 1 or 0, False or True) variables that indicate the presence or absence of some categorical effect. In this example, we’re going to dummify `rank`. Luckily, Pandas makes this really easy.

In [None]:
dummy_df = pd.get_dummies(data['rank'], prefix='rank')

data = data[['gre', 'gpa', 'admit']].join(dummy_df)  # join dummified ranks back to our data
data.head()

In [None]:
y = data.pop('admit')
X = data
X = tools.add_constant(X)  # add a constant term

In [None]:
logit_model = sm.Logit(y, X)
logit_results = logit_model.fit()
print logit_results.summary()

Like the other models we’ve seen in statsmodels, the `summary()` gives an overview of the coefficients of the model, how well those coefficients fit, the overall fit quality, and several other statistical measures.

# Unit tests

In this final section we’re going to talk about testing your code. We’re going to be learning Python’s builtin [`unittest`](https://docs.python.org/2/library/unittest.html) library. It’s less important _what_ you use for testing that that you _write tests at all_. At the end of the day, you might choose to use [nose](https://nose.readthedocs.org/en/latest/index.html) or [py.test](http://pytest.org/latest/) because it offers some testing functionality that you’re looking for. Or, after poking around with a few tests in each, you decide you like one best, perhaps finding that one maps onto your code better than the others. That’s fine. 

Whichever one you choose, take the time to learn it. One of the last things you want to do when writing unit tests is to inadvertently reinvent some testing functionality that already exists. Why? Because the whole point of writing tests is to ensure that _your code_ does what you think it does. If you’re writing a bunch of your own test code you’d need to more tests for your tests. When does it end?

## What are unit tests?

Unit tests are small functions (defined with `def`) that test small chunks of code—a function, class, method, a module—in isolation.

Let’s say I have a function that determines if a number is prime and another function that determines if a number is even. To test these two functions, I need to create a test case. A [test case](https://docs.python.org/2/library/unittest.html#test-cases) is the smallest unit of testing in unittest. New test cases are created by subclassing the `unittest.TestCase` class, which has all the attributes and methods that the test runner actually needs to run the tests, check test results, and report failures.
```python
import unittest
import numpy as np


def is_prime(n):
    '''Return True if n is a prime number.'''
    return all(n % i for i in xrange(2, int(np.sqrt(n))))


def is_even(n):
    '''Return True if n is an even number.'''
    return n % 2 == 0


class NumberTest(unittest.TestCase):

    def test_is_prime_non_prime(self):
        '''Does returning the primeality of a non-prime number work?'''
        self.assertFalse(is_prime(12))

    def test_is_prime_prime(self):
        '''Does returning the primeality of a prime number work?'''
        self.assertTrue(is_prime(67))

    def test_is_prime_one(self):
        '''Does returning the primeality of 1 work?'''
        self.assertFalse(is_prime(1))  # 1 isn't a prime number

    def test_is_even_odd(self):
        '''Does returning the evenness of an odd number work?'''
        self.assertFalse(is_even(9))

    def test_is_even_even(self):
        '''Does returning the evenness of an even number work?'''
        self.assertTrue(is_even(16))

    def test_is_even_zero(self):
        '''Does returning the evenness of 0 work?'''
        self.assertTrue(is_even(16))  # 0 is an even number
```

Now we can run this file and it will run our tests.
```
$ python -m unittest discover tests # run any unit tests you can find in tests/
....F.
======================================================================
FAIL: test_is_prime_one (test_numbers.NumberTest)
Does returning the primeality of 1 work?
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/jsa/Development/python-for-enigma/tests/test_numbers.py", line 29, in test_is_prime_one
    self.assertFalse(is_prime(1))  # 1 isn't a prime number
AssertionError: True is not false

----------------------------------------------------------------------
Ran 6 tests in 0.000s

FAILED (failures=1)
```

Our test `test_is_prime_one()` didn't return `False` for the number 1, which means our code doesn't work. The great thing about tests is that you _know_ when your code doesn't work, and if you write good (read: extensive) tests, you'll be able to tell _how_ it doesn't work and refactor your code. Knowing what's not working is a much better place to start debugging and refactoring from than having no idea.

Let's edit our `is_prime()` function to account for 1.
```python
import unittest
import numpy as np


def is_prime(n):
    '''Return True if n is a prime number.'''
    if n < 2:
        return False
    return all(n % i for i in xrange(2, int(np.sqrt(n))))


def is_even(n):
    '''Return True if n is an even number.'''
    return n % 2 == 0


class NumberTest(unittest.TestCase):

    def test_is_prime_non_prime(self):
        '''Does returning the primeality of a non-prime number work?'''
        self.assertFalse(is_prime(12))

    def test_is_prime_prime(self):
        '''Does returning the primeality of a prime number work?'''
        self.assertTrue(is_prime(67))

    def test_is_prime_one(self):
        '''Does returning the primeality of 1 work?'''
        self.assertFalse(is_prime(1))  # 1 isn't a prime number

    def test_is_even_odd(self):
        '''Does returning the evenness of an odd number work?'''
        self.assertFalse(is_even(9))

    def test_is_even_even(self):
        '''Does returning the evenness of an even number work?'''
        self.assertTrue(is_even(16))

    def test_is_even_zero(self):
        '''Does returning the evenness of 0 work?'''
        self.assertTrue(is_even(16))  # 0 is an even number
```

Let's run our tests again (they should pass now):

```
......
----------------------------------------------------------------------
Ran 6 tests in 0.000s

OK
```

### Setup and Teardown

Within each `unittest.TestCase` subclass that we write, we can define whatever specific tests we want to run and what (if anything) needs to be done to setup and/or teardown the text fixture. For instance, if the piece of code that you’re testing requires a database connection or a dictionary, you can mock up that database connection or that dictionary within the setup for the test case. The idea with tests is to write them such that if one of them fails, you’re almost positive that it failed because _the code it tests is broken_, not for some other reason.


## Separate Your Tests

The above is actually an example of something you _don’t_ want to do: namely, put your testing code in the same file as the code it tests. Instead, you want to import the module(s) or files that contain the functions, classes, methods, etc. that you’re testing.
```python
import unittest

from scripts import numbers


class NumberTest(unittest.TestCase):

    def test_is_prime_non_prime(self):
        ‘’’Does returning the primeality of a non-prime number work?’’’
        self.assertFalse(numbers.is_prime(12))

    def test_is_prime_prime(self):
        ‘’’Does returning the primeality of a prime number work?’’’
        self.assertTrue(numbers.is_prime(67))

    def test_is_prime_one(self):
        ‘’’Does returning the primeality of 1 work?’’’
        self.assertFalse(numbers.is_prime(1))  # 1 isn’t a prime number

    def test_is_even_odd(self):
        ‘’’Does returning the evenness of an odd number work?’’’
        self.assertFalse(numbers.is_even(9))

    def test_is_even_even(self):
        ‘’’Does returning the evenness of an even number work?’’’
        self.assertTrue(numbers.is_even(16))

    def test_is_even_zero(self):
        ‘’’Does returning the evenness of 0 work?’’’
        self.assertTrue(numbers.is_even(16))  # 0 is an even number
```

The main reason you want to separate your tests is so that you can run your tests independently of the source code it tests. This helps keep the testing environment clean, which will make you more confident that your tests are passing or failing for the right reasons.

The other main reason is that, if your testing strategy changes, you’re not in your source code changing things around. In general, your test code should be refactored _much less often_ than the code it tests. If your tests are in the same file as some code you’re refactoring, there can be a temptation to change the test to reflect the refactoring. **Don’t do this.** If you’re refactoring your code (i.e., making changes that don’t change the observable behavior of the code, just how that behavior is implemented) your tests shouldn’t change because the observable behavior that you’re testing for hasn’t changed.

For changes that are not simply refactoring, you’ll obviously need to change your test code a bit, i.e. to add new test cases or remove obsolete test cases, change the names of some functions or methods that are being tested, and so on. Keeping your tests separate from the source code can help to reinforce that what you’re testing for is observable behavior, not implementation details.

## Automate Your Tests, Run Your Tests

If your tests are burdensome to run, you won’t run them. Ideally, each test should take only a few microseconds to run. If you have several larger tests that take longer to run, it’s a good idea to put them into a separate test directory so you can run your fast tests as frequently as possible.

I like to run tests before and after a coding session. At the beginning, it tells me if I need to fix something. At the end, it tells me that I didn’t break anything. Also, if I didn’t finish working on something, that failing test at the beginning of my next coding session will help get me back on track with what I was working on when I left.

## Test-Driven Development (TDD)

Test-driven development (TDD) is a software development process in which you write tests for the functionality that you want to build _before_ you build the actual functionality. TDD can be very good in helping you flesh out the design of the interface that you’re building.

Sometimes this isn’t relevant or possible when you’re doing data science. However, let’s say that you know that a particular ID takes a certain value. You can have a test that checks to make sure that that ID has the right value. It’s hard to get really comprehensive with this approach, but if you have found some edge cases in the logic of your approach that you want to make sure make it through your full munging and analysis process tests are a good way of doing that.

**TASKS**:
+ Write a test for some piece of the `EnigmaAPI` class