# Launching Programs
## Introduction

In the previous labs, you did learn how to interact with your system by using modules from the Python standard library like `os` or `shutil`. But sometimes there is no library for the system operation you need, or it is too complicated to use. This notebook will show you how to launch system commands or programs directly from Python.

*We now arrived in the second half of the book ([chapters 11-20](https://automatetheboringstuff.com/#toc)), where we've learned all the Python knowledge we need and now move to more specialized topics for solving specific tasks. As we won't have the time to look at everything, we are skipping most of the rest of the book - but remember that those book chapters exist, you might want to look at them if you ever need to do one of those things.*

*The topic of this lab (launching external programs) is covered in book [chapter 17](https://automatetheboringstuff.com/2e/chapter17/), but that chapter teaches an older way of spawning subprocesses, as well as containing various other topics which won't be part of this course (dealing with times/dates and multithreading).* 

**Thus, we recommend that you skip the book chapter, and instead read an excellent [Tutorial by Digital Ocean](https://www.digitalocean.com/community/tutorials/how-to-use-subprocess-to-run-external-programs-in-python-3)** on this topic.

If you have not worked in a shell / commandline environment before, it's also recommended to read [the "The Shell" chapter](https://missing.csail.mit.edu/2020/course-shell/) of ["The Missing Semester of Your CS Education"](https://missing.csail.mit.edu/), an excellent resource by the "Computer Science and Artificial Intelligence Laboratory" at the MIT. While various parts aren't required for this lab (only "What is the shell?" and "Using the shell" will be relevant), the knowledge in there will most likely be useful in your further studies and/or work career.

### Optional resources

- [Python docs: subprocess — Subprocess management](https://docs.python.org/3/library/subprocess.html)
- [PyMOTW: subprocess — Spawning Additional Processes](https://pymotw.com/3/subprocess/index.html)

## Summary

### Basic Usage

In [None]:
import subprocess
subprocess.run(["uname", "-a"])

After importing the subprocess module, the `subprocess.run` method can be used. It takes a list of strings as its argument, each of which is an argument of the system command to run, just like in a shell. The first argument is always the name of the command. The `uname` command used here, for example, will output system information like the kernel version and CPU architecture (you can try to run it!).

**Note:** You should only need to use `subprocess.run` and perhaps `subprocess.Popen`.

Functions such as `subprocess.call`, `subprocess.check_call`, `subprocess.check_output`, `subprocess.getoutput`, `os.system`, `os.popen*`, `os.spawn` have all been replaced by `subprocess.run`. **Some of them are insecure**, so be wary of resources recommending them. For the exercises below, **use `subprocess.run` only**.

### Handling of arguments

When using a shell to call tools, arguments are split by spaces. Thus, running `uname -a` gets split into:

- `uname`
- `-a`

When an argument itself contains a space, quotes can be used (`'` or `"` depending on the shell and desired outcome). For example, `rm file1.txt file2.txt` is used to remove two files, but `rm "file with spaces.txt"` gets split into:

- `rm`
- `file with spaces.txt`

and would be used to remove a file with spaces in its filename.

When calling tools with `subprocess.run`, you instead supply a list of strings, like with `["uname", "-a"]` above.  Make sure you know the difference between:

- `["sometool", "-i", "filename.txt"]` (Shell: `sometool -i filename.txt`)
- `["sometool", "-i filename.txt"]` (Shell: `sometool "-i filename.txt"`)

The second example will not work properly, as it passes `-i filename.txt` as one argument instead of two arguments.

### Library vs. System Command

Usually, it is preferrable to use a Python library to accomplish a task, if possible, instead of launching an external program or system command. Libraries allow you to handle input/output data appropriate data types and structures, instead of the "flat" unstructured text data returned by executing a system command. It may take more time to get to know the library at first, and to set it up, but it will save you time in handling and parsing data in the long run.

#### Security

By default, subprocess does not invoke any shell with the command specified, but instead calls the external command directly. This mitigates some security issues, where unrelated commands could be injected if arguments are controlled by (malicious) user input. It also means that you do **not have to worry** about most special characters: While e.g. `'` and `"` inside a string have a special meaning for a shell, they are not a problem when using `subprocess` with the default setting.

You should **avoid using `shell=True`** when using subprocess. As soon as a shell is involved between Python and the command being executed, you will have to **deal with special characters**, will potentially have **security issues** when not doing so, and you will need to **take care of platform differences** between different shells/operating systems.

Even with `shell=False` (the default), calling an extermal command with user controlled arguments might still bear more risk than using a purpose-built, well-designed library, so consider your options carefully.

#### Example: File listing

If you need the file listing of a directory, you might be tempted to call the `ls` system command from your Python script like this:

In [None]:
import subprocess
subprocess.run(["ls"])

This seems to be working fine, however, this approach comes with some pitfalls. Your system may not know the `ls` command if your're running Windows. And even on minimal Linux systems, some fairly common system commands may not be available.  

Also, there are potential issues with parsing the unstructured output in your code. You may assume that every output line is another file, but what if a filename were to contain a newline character? In the previous labs, you saw how to solve common file management tasks in Python directly - for example, instead of using `ls`, something like:

In [None]:
import pathlib

for path in pathlib.Path.cwd().iterdir():
    print(path)

If you wanted to process the outputs further, which approach would seem more convenient to work with to you? Using the `ls` command or the os/pathlib library?

### Capturing Output

Often, you'd want to not only print the subprocess output on the terminal, but parse and process it further. To achieve this, the standard output (stdout) can be captured like this:

In [24]:
import subprocess

result = subprocess.run(["echo", "psssst!"], capture_output=True, text=True)
print(f"very quiet output: {result.stdout}")

very quiet output: psssst!



As you can see, the command output is contained in the `stdout` property, instead of being printed to the standard output directly. There is also the `stderr` property, capturing the standard error stream (typically used by tools to display error messages, rather than "normal" output). To receive the outputs as text strings, specify the `text=True` argument. Otherwise you will get a bytes output.

### Errors and Exceptions

Running programs or subprocesses may result in them terminating with an error. This is indicated through the exit code of the program, which is **zero on successful termination**. Any exit code **other than zero indicates an error**, which you might want to know about in order to handle the situation.  
If given the `check=True` argument, subprocess will raise an exception if the program terminated unsuccessfully. Alternatively, you can check the exit code through the `returncode` attribute after execution.

**Note:** When using `capture_output=True`, this will also capture the error message printed by the external process, thus potentially hiding the underlying issue of the `CalledProcessError`. For debugging purposes, it's recommended to temporarily use `capture_output=False`.

In [25]:
import subprocess

result = subprocess.run(["uname", "--unknown"], check=True)

uname: unrecognized option '--unknown'
Try 'uname --help' for more information.


CalledProcessError: Command '['uname', '--unknown']' returned non-zero exit status 1.

## Exercises

### Exercise 1: QR Code as string

Surely you're familiar with [QR Codes](https://en.wikipedia.org/wiki/QR_code):

![XKCD 1237: QR Code](https://imgs.xkcd.com/comics/qr_code.png)

*([XKCD 1237: QR Code](https://xkcd.com/1237/))*

[libqrencode](https://fukuchi.org/works/qrencode/) is a library, implemented in C, which can generate such codes. While there are Python packages exposing this functionality to Python, they are either archived ([pyqrencode](https://github.com/bitly/pyqrencode)) or very limited in functionality ([python-libqrencode](http://mubeta06.github.io/python/libqrencode/qrencode.html)). They both also haven't been updated since 2012/2013 and don't seem to be very popular.

To avoid locking us into outdated/archived external dependencies which could pose a problem in the future, we want to use the `qrencode` shell command from Python instead. You can find more information about how to use it [in the ubuntuusers.de wiki](https://wiki.ubuntuusers.de/qrencode/) or [in its manpage](https://manpages.ubuntu.com/manpages/jammy/en/man1/qrencode.1.html).

The `qrencode` command has a way to print a QR code "drawn" with ASCII characters, such as this:

```
##############    ##            ##  ##############
##          ##  ####  ######  ####  ##          ##
##  ######  ##    ######    ##      ##  ######  ##
##  ######  ##  ####  ##  ########  ##  ######  ##
##  ######  ##      ##  ##      ##  ##  ######  ##
##          ##  ##  ##        ##    ##          ##
##############  ##  ##  ##  ##  ##  ##############
                        ##  ####                  
##########  ##########  ########  ##  ##  ##  ##  
  ######      ######      ##                ##    
  ##  ########    ##  ##########    ####    ######
##########    ##  ######      ####  ####        ##
  ####    ######  ##  ######  ##  ############    
######  ##      ######  ##      ##    ##  ##      
##  ##      ##  ####          ##    ######    ####
##    ######  ##  ##  ##      ######  ######  ##  
##          ##    ##  ##  ####  ##################
                ##  ######  ######      ##  ####  
##############  ######      ##  ##  ##  ####  ####
##          ##        ####      ##      ####      
##  ######  ##  ######    ####################    
##  ######  ##  ########  ##      ####  ####  ####
##  ######  ##  ##  ##                ##  ##    ##
##          ##  ##      ##      ####  ##        ##
##############  ##############    ##    ##  ######
```

Write a function `qrcode_ascii(data)`, which takes a string `data` to store in the QR code. It should then use `subprocess` to run `qrencode` as follows:

- `qrencode` needs to get the proper arguments to output the code in `ASCII` format, rather than a PNG file.
- `qrencode` will print a margin (blank space) around the QR code. Don't change anything about that (and also don't change any other `qrencode` settings).
- If `qrencode` produces an error, a `subprocess.CalledProcessError` exception should be raised. To try this, you can pass a string with 3000 characters to your function, which will be too large to represent in a QR code.
- The function should **capture and return** the output as a string (**Hint:** if you see `b"..."`, that's a [bytes object](https://docs.python.org/3/library/stdtypes.html#bytes-objects)).
- Make sure you use **appropriate arguments** to your `subprocess` invocation for the requirements above, don't do things by hand which `subprocess` can do for you.
- Use modern ways of running `subprocess` (Python 3.7+).

In [19]:
import subprocess

def qrcode_ascii(data):
    result = subprocess.run(["qrencode", "-t", "ASCII", data], text=True, check=True, capture_output=True)
    return result

Use this separate cell to try out your code.
Your code should work with the example below, but you're free to change it.

In [20]:
print(qrcode_ascii("test"))

CompletedProcess(args=['qrencode', '-t', 'ASCII', 'test'], returncode=0, stdout='                                                          \n                                                          \n                                                          \n                                                          \n        ##############      ##  ##  ##############        \n        ##          ##          ##  ##          ##        \n        ##  ######  ##  ##  ##      ##  ######  ##        \n        ##  ######  ##          ##  ##  ######  ##        \n        ##  ######  ##    ##  ####  ##  ######  ##        \n        ##          ##    ######    ##          ##        \n        ##############  ##  ##  ##  ##############        \n                        ##  ##                            \n        ######  ##########  ##  ######      ##            \n          ##  ####          ####  ##  ##      ####        \n            ##      ##        ##  ######  ########        \n        ##  ##  ###

### Exercise 2: QR Code as image

While the output format of the previous exercise is a nice gimmick, it's rather impractical - most likely, you weren't able to scan the QR code above. We now want to generate a PNG image instead, to get a code you can actually scan (e.g. using your phone).

For this exercise, implement a `qrcode_png(path, data, margin)` function. It should call the `qrencode` tool so that a PNG file is created at the given `path`, with a QR code containing the `data` passed into it.

Further requirements:

- `path` is a `Path` object, pointing to a path where the PNG file should be written to (the desired output file, not a folder).
- `data` is a string with the data which should end up in the QR code.
- `margin` is the width of the margin around the QR code, given **as an integer**.
    - It should be an [**optional argument**](https://realpython.com/python-optional-arguments/#using-python-optional-arguments-with-default-values), that is, it should be possible to call `qrcode_png(path, data)` as well.
    - If `margin` is not given, default to the value mentioned as default in the [manpage of qrencode](https://manpages.ubuntu.com/manpages/jammy/en/man1/qrencode.1.html).
- **Don't do any manual error handling in Python.** Instead, use the error handling as done by `qrencode` already, and invoke it in a way that a `subprocess.CalledProcessError` exception is raised if the tool fails.
- The function doesn't need to return anything.

In [37]:
import subprocess

def qrcode_png(path, data, margin=4):  # TODO: ensure that margin is optional
    print(path)
    print(data)
    print(margin)
    subprocess.run(["qrencode", "-o {}".format(path), "-m {}".format(margin), data,], check=True)

Use this separate cell to try out your code.
Your code should work with the example below, but you're free to change it.

In [38]:
import codecs
from pathlib import Path

data = codecs.encode("uggcf://lrjgh.or/jngpu?i=qDj4j9JtKpD", "rot13")  # ;)
qrcode_png(Path("qrcode.png"), data)

qrcode.png
https://yewtu.be/watch?v=dQw4w9WgXcQ
4


After running your code, a `qrcode.png` file should appear in the file browser sidebar. Double click it to open the file and ensure it looks correct.

### Exercise 3: Where's Waldo?

Implement a function `search(directory, pattern)` that takes a directory and a search term as arguments (both strings). The function should return a **list of text files** in the given directory where the search string is contained in. Consider only **text files** (`.txt` extension) in your search. If there is no text file which contains the search string, you should return an empty list.

Contrary to the previous lab, launch an external process to do the work, instead of implementing this "manually" in Python. The [grep command](https://www.digitalocean.com/community/tutorials/grep-command-in-linux-unix) is a very versatile command line tool which will help you.

- Command to run: `grep -rlw testpath -e waldo --include "*.txt" --exclude-dir .ipynb_checkpoints`
- Reference: [grep manpage](https://manpages.ubuntu.com/manpages/jammy/en/man1/grep.1.html)

Use the following external commands to create two test files:

In [27]:
!echo "waldo" > plain_sight.txt
!echo "waldo" > crowd.txt

Expected output:

```python
>>> search(".", "waldo")
['./crowd.txt', './plain_sight.txt']
```

In [63]:
import subprocess
 
def search(directory, pattern):
    try:    
        command = [
            "grep",
            "-rlw",
            str(directory),
            "-e",
            pattern,
            "--include",
            "*.txt",
            "--exclude-dir",
            ".ipynb_checkpoints",
        ]
        result = subprocess.run(
            command,
            capture_output=True,
            text=True,
            check=True,
        )
        return result.stdout.strip().split("\n")  
    except subprocess.CalledProcessError as e:
        return []

Use this separate cell to try out your code.
Your code should work with the example below, but you're free to change it.

In [64]:
search(".", "waldo")

['./plain_sight.txt', './crowd.txt']