# Command-Line Programms
The **Jupyter Notebook and other interactive tools are great for prototyping code and exploring data**. But sooner or later we will want to use our program in a pipeline to process thousands of data files. In order to do that, we need to make our programs work like other Unix command-line tools.


## Switching to Shell Commands

In this lesson we are switching from typing commands in a Python interpreter to typing commands in a shell terminal window (such as bash). When you see a `$` in front of a command that tells you to run that command in the shell rather than the Python interpreter.

In [8]:
# %load ../src/pycli-1.py
#!/usr/bin/env python3
# valid for UNIX system
import sys
print('My command-line arguments:', sys.argv)


My command-line arguments: ['/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py', '-f', '/run/user/17802/jupyter/kernel-c828dcaa-1827-45c0-879c-975513e66be6.json']


The strange name `argv` stands for “argument values”. Whenever Python runs a program, it takes all of the values given on the command line and puts them in the list `sys.argv` so that the program can determine what they were.

### Implementation of a guardian

This is useful if your code also acts as an importable module.

In [None]:
# %load ../src/pycli-2.py
#!/usr/bin/env python3
import datetime

def main():
    print(datetime.datetime.now())

if __name__ == '__main__':
    main()


When you import a Python file, `__name__` is the name of that file. However, when running a script in bash, `__name__` is always set to `'__main__'` in that script so that you can determine if the file is being imported or run as a script.

In [12]:
def main():
    pass

if __name__ == '__main__':
    main()  # Or whatever function produces output

## Command-Line Arguments
We write a programm to download data we generated yesterday in Sebastian's session:

In [None]:
# %load ../src/pycli-3.py
#!/usr/bin/env python3
import sys
import urllib.request


def main():
    script = sys.argv[0]
    action = sys.argv[1]

    if action == 'download':
        print('Downloading data...')
        url = urllib.request.urlopen('https://heise.de')
        with open('data.csv', 'w') as f:
            f.write(url.read().decode())


if __name__ == '__main__':
        main()

We start adding more functionality to our program

In [None]:
# %load ../src/pycli-4.py
#!/usr/bin/env python3
import sys
import urllib.request
import numpy as np


filename = 'data.csv'


def main():
    if sys.argv < 2:
        raise Exception('Not enough parameters')

    action = sys.argv[1]

    if action == 'download':
        print('Downloading data...')
        url = urllib.request.urlopen('https://heise.de')

        with open(filename, 'w') as f:
            f.write(url.read().decode())

    elif action == 'mean':
        values = np.loadtxt(filename, delimiter=',')
        print(values.mean())

    else:
        print('unknown action \'%s\'' % action)


if __name__ == '__main__':
    main()


but there are several things wrong with it:

1. **`main` is too large to read comfortably.**

2. If we do not specify at least two additional arguments on the command-line, one for the flag and one for the filename, but only one, the program will not throw an exception but will run. It assumes that the file list is empty, as `sys.argv[1]` will be considered the action, even if it is a filename. Silent failures like this are always hard to debug.

3. The program should check if the submitted action is one of the three recognized flags.

In [18]:
# %load ../src/pycli-5.py
#!/usr/bin/env python3
import sys
import urllib.request
import numpy as np


filename = 'data.csv'


def download():
    print('Downloading data...')
    url = urllib.request.urlopen('https://heise.de')

    with open(filename, 'w') as f:
        f.write(url.read().decode())


def process(action):
    values = np.loadtxt(filename, delimiter=',')

    if action == 'min':
        return values.min()
    elif action == 'max':
        return values.max()
    elif action == 'max':
        return values.mean()


def main():
    if len(sys.argv) < 2:
        raise Exception('Not enough parameters')

    action = sys.argv[1]
    assert action in ['download', 'min', 'max', 'mean']

    if action == 'download':
        download()
    else:
        process(action)


if __name__ == '__main__':
    main()


TypeError: '<' not supported between instances of 'list' and 'int'

# Excercise: Add automatic downloading
If the file is not existing, download it automatically.

## `argparse`: The easier way
Sometimes the command line arguments become an ugly mess. It's that and because it is boring, someone came up with a lazier solution: `import argparse`