# Building Command-Line Tools with Python 

> Multiple exclamation marks are a sure sign of a diseased mind.
>
> --- Terry Pratchett

The [Jupyter Notebook](https://jupyter.org/), PyCharm, and other graphical interfaces
are great for prototyping code and exploring data, but eventually we may need to apply our code to thousands of data files,
run it with many different parameters, or combine it with other programs as part of a data analysis pipeline.
The easiest way to do this is often to turn our code into a standalone program that can be run in the Unix shell
just like other command-line tools {cite:p}`Tasc2017`.

In this chapter we will develop some **command-line Python program** that handle input and output in the same way as other shell commands,
can be controlled by several option flags, and provide useful information when things go wrong.
The result will have more scaffolding than useful application code, but that scaffolding stays more or less the same as programs get larger.

After the previous chapters, our Zipf's Law project should have the following files and directories:


```text
zipf/
├── bin
│   └── book_summary.sh
└── data
    ├── README.md
    ├── dracula.txt
    ├── frankenstein.txt
    ├── jane_eyre.txt
    ├── moby_dick.txt
    ├── sense_and_sensibility.txt
    ├── sherlock_holmes.txt
    └── time_machine.txt
```

> **Python Style**
>
> When writing Python code there are many style choices to make.
> How many spaces should I put between functions?
> Should I use capital letters in variable names?
> How should I order all the different elements of a Python script?
> Fortunately,
> there are well established conventions and guidelines
> for good Python style.
> We follow those guidelines throughout this book
> and discuss them in detail in Appendix **TODO** \@ref(style).

## Programs and Modules 

To create a Python program that can run from the command line,\index{Python!program vs.\ module}
the first thing we do is to add the following to the bottom of the file:

In [None]:
if __name__ == '__main__':

This strange-looking check tells us whether the file is running as a standalone program or whether it is being imported as a module by some other program.
When we import a Python file as a module in another program, the `__name__` variable is automatically set to the name of the file.\index{\_\_name\_\_ variable (in Python)}\index{Python!\_\_name\_\_ variable}
When we run a Python file as a standalone program, on the other hand, `__name__` is always set to the special string `"__main__"`.
To illustrate this, let's consider a script named `print_name.py` that prints the value of the `__name__` variable:

In [1]:
print(__name__)

__main__


When we run this file directly, it will print `__main__`: 

```bash
$ python print_name.py
```

```text
__main__
```

But if we import `print_name.py` from another file or from the Python interpreter, it will print the name of the file, i.e., `print_name`.

```bash
$ python
```

```text
Python 3.7.6 (default, Jan  8 2020, 13:42:34) 
[Clang 4.0.1 (tags/RELEASE_401/final)] :: 
Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license"
for more information.
```

```python
>>> import print_name
```
```text
print_name
```

Checking the value of the variable `__name__` therefore tells us whether our file is the top-level program or not. If it is, we can handle command-line options, print help, or whatever else is appropriate;
if it isn't, we should assume that some other code is doing this. 

We could put the main program code directly under the `if` statement like this:

```python
if __name__ == "__main__":
    # code goes here
```

but that is considered poor practice, since it makes testing harder (Chapter **TODO** ref(testing)). Instead, we put the high-level logic in a function, then call that function if our file is being run directly:



In [9]:
def main():
    print('Hello World!')

if __name__ == '__main__':
    main()

Hello World!


This top-level function is usually called `main`, but we can use whatever name we want.

## Handling Command-Line Options 

The main function in a program usually starts by parsing any options the user gave on the command line.
The most commonly used library for doing this in Python is [`argparse`](https://docs.python.org/3/library/argparse.html), which can handle options with or without arguments, convert arguments from strings to numbers or other types, display help, and many other things.

The simplest way to explain how `argparse` works is by example. 

Let's create a short Python program called `script_template.py`:


In [14]:
import argparse
import sys; sys.argv=['']; del sys


def main(args):
    print('Input file:', args.infile)
    print('Output file:', args.outfile)


if __name__ == '__main__':
    USAGE = 'Brief description of what the script does.'
    parser = argparse.ArgumentParser(description=USAGE)
    parser.add_argument('infile', type=str,
                        help='Input file name')
    parser.add_argument('outfile', type=str,
                        help='Output file name')
    args = parser.parse_args()
    main(args)

usage: [-h] infile outfile
: error: the following arguments are required: infile, outfile


SystemExit: 2