In this project, we'll cover some best practices that will make our code much easier to use, read, and maintain, including:

* How to document our code so that others can easily understand it.
* How to create functions that are easier to test, debug, and change.
* How to setup default arguments in functions so that our code doesn't behave unexpectedly.

If we wanted to understand what the function does, what the arguments are supposed to be, and what it returns, we would have to spend some time deciphering the code.

With a **docstring** though, it is much easier to tell what the expected inputs and outputs should be, as well as what the function does. A [docstring](https://en.wikipedia.org/wiki/Docstring) is a string written as the first line of a function. Because docstrings usually span multiple lines, they are enclosed in triple quotes, Python's way of writing multi-line strings:

In [1]:
def split_and_stack(df, new_names):
    """Split a DataFrame's columns into two halves and then stack
    them vertically, returning a new DataFrame with `new_names` as the
    column names.

    Args:
      df (DataFrame): The DataFrame to split.
      new_names (iterable of str): The column names for the new DataFrame.

    Returns:
      DataFrame
    """
    half = int(len(df.columns) / 2)
    left = df.iloc[:, :half]
    right = df.iloc[:, half:]
    return pd.DataFrame(
      data=np.vstack([left.values, right.values]),
      columns=new_names
    )

Every docstring has some (although usually not all) of these five key pieces of information:

1. Description of what the function does.
2. Description of the arguments, if any.
3. Description of the return value(s), if any.
4. Description of errors raised, if any.
5. Optional extra notes or examples of usage.

Docstrings makes it easier for us and other data scientists or engineers to use, read, and maintain our code in the future. Remember that even though computers execute it, code is actually written for humans to read (otherwise we'd just be writing the 1s and 0s that the computer operates on).

Every function in Python comes with a [__doc__ attribute](https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy) that holds the contents of the function's docstring.

In [2]:
print(split_and_stack.__doc__)

Split a DataFrame's columns into two halves and then stack
    them vertically, returning a new DataFrame with `new_names` as the
    column names.

    Args:
      df (DataFrame): The DataFrame to split.
      new_names (iterable of str): The column names for the new DataFrame.

    Returns:
      DataFrame
    


Notice that the __doc__ attribute contains the raw docstring, including any tabs or spaces that were added to make the words visually line up.

To get a cleaner version, with those leading spaces removed, we can use the [getdoc() function](https://docs.python.org/3/library/inspect.html#retrieving-source-code) from the inspect module.

In [3]:
import inspect
print(inspect.getdoc(split_and_stack))

Split a DataFrame's columns into two halves and then stack
them vertically, returning a new DataFrame with `new_names` as the
column names.

Args:
  df (DataFrame): The DataFrame to split.
  new_names (iterable of str): The column names for the new DataFrame.

Returns:
  DataFrame


Consistent style makes a project easier to read, and the Python community has evolved several standards for how to format docstrings. [Google style](http://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings) and [Numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html) are the most popular formats.

However, since Numpydoc takes up more vertical space, we'll focus on Google style to keep the examples compact and legible.

In Google style, 

1. the docstring starts with a concise description of what the function does.

2. Next comes the "Args" section where we list each argument name, followed by its expected type in parentheses, and then its role in the function. If an argument has a default value, mark it as "optional" when describing the type. If the function does not take any parameters, leave this section out.

3. The next section is the "Returns" section, where we list the expected type or types of what gets returned.

However, every docstring can also contain two additional pieces of information:
4. Description of errors raised, if any.
5. Optional extra notes or examples of usage.

If our function intentionally raises any errors, we should add a "Raises" section, like below. We can also include any additional notes or examples of usage in free form text at the end:

Now that we know how to make our functions easier to understand, let's look at how we can also make them easier to test, debug, and change. The [Don't repeat yourself](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) principle, also known as **DRY**, and the **Do One Thing principle** are good ways to ensure that our functions are well designed and easy to test.

When we write code to look for answers to a research question, it is totally normal to copy and paste a bit of code, tweak it slightly, and re-run it. However this, kind of repeated code can lead to real problems.

One of the problems with copying and pasting is that it is easy to accidentally introduce errors that are hard to spot.

Another problem with repeated code is that if we want to change something, we often have to do it in multiple places. For instance, if we realized that our CSVs used the column name "label" instead of "labels," we would have to change our code in six places. Repeated code like this is a good sign that we should write a function.

Wrapping the repeated logic in a function and then calling that function several times makes it much easier to avoid the kind of errors introduced by copying and pasting.

Instead of one big function, we could have a more nimble function.

We get several advantages from splitting function into smaller functions. Our code becomes:

* More flexible
* More easily understood
* Simpler to test
* Simpler to debug
* Easier to change