# Strings and Subprocesses

---

## Overview

1. String Formatting:
    1. String operations 
    2. Olde style
    3. New style
    4. f-strings
    5. Template strings
2. Subprocesses
    1. Purpose
    2. OS dependency
    3. `subprocess`
        * `check_output`
        * `run`
        * Syntax
        * Blocking vs Non-blocking (`Popen`)

---

## String Formatting

There is *always* more than one way to do something with the same result in programming. However, there is a specific line from the Zen of Python that explains what to do in this scenario:
> There should be one-- and preferably only one --obvious way to do it.

However, when it comes to *string formatting*, there are 4 ways to do it...and each has its place.

### Setup

We will be working with a couple of variables throughout the string formatting demonstration. So, let's get them out of the way now:

In [None]:
fName = 'Sharkus'
job = 'GSI'
power = 0x2329

dict_of_types = {
    'string': 'spam',
    'int': 42,
    'exp': 1e12,
    'float': 3.14
}

### String Operations

When considering `str` objects, they have *a lot* of methods.

In [None]:
[i for i in dir(str) if not i.startswith('_')]

However, strings have very few *operations*

In [None]:
# Concatenation
fName + job

Making simple strings with concatenation makes sense, but making complete statements, log entries, or complex outputs is draining on the coder

In [None]:
greeting = 'Hello, my name is ' + fName +\
', and I have ' + str(power) + ''' power
 and currently working as a ''' + job
print(greeting)

In [None]:
# Multiplicity


In [None]:
# What about difference?


In [None]:
# Division


Not to be particularly rigorous, but what about `modulo` math?

In [None]:
# Modulo


That's different...</br>
The modulo (`%`) is a unique operator when used on strings. It is called a *formatting* or *interpolation* operator, and its usage is a little unique.

## Ye olde style: `printf`

In [None]:
# Syntax


The biggest issue with `%` string formatting is that it requires the user to *declare* their types before hand

In [None]:
# Quote
print('''In order to maintain air-speed velocity, 
a %s needs to beat its wings %s times every 
second, right?''' % ('swallow', 42))

See the difference? `%s` means string, and `%d` means integer. There is a [whole list of these](https://docs.python.org/3/library/stdtypes.html#old-string-formatting). This is known as *conversion types*. Using these, the user indicates what type of object they are going to put in that place and the interpreter figures out how to convert it to a string.

### Placeholder Unpacking

In [None]:
'%(string)s %(int)d %(exp)e %(float)d' % (dict_of_types)

There is a lot to be seen in the last example. Let's walk through it.

1. Dictionary of values of different types
2. Placeholders that link to the dictionary keys
3. Conversion of types
4. naturally unpacked dictionary

The `%` acts as a placeholder, such that we can make template strings. This can be helpful in operations like SQL

In [None]:
# Greeting example: old style


### The Drawbacks
* Can be burdensome to read and write.
* Not very flexible or extensible
* Discouraged by the Python Devs:
> “The formatting operations described here exhibit a variety of quirks that lead to a number of common errors (such as failing to display tuples and dictionaries correctly).

## The 'new-style'

In [None]:
# Quote
quote = '''in order to maintain air-speed velocity,
a {} needs to beat its wings {} times every second, right?'''
print(quote.format('sparrow', 42))

What's more is that you can use the placeholders to index the arguments.

In [None]:
# Argument Unpacking
args = dict_of_types.values()

# index placeholders
idx_str = 

idx_str.format()

Again, I slipped in some extra here. You have seen it before, but mainly in `print` functions: `*` *unpacking*

That is cool, right? But what if you wanted to still convert the values like we did last time? We can use a 'mini-language' to control conversion and precision. The full documentation of the mini-language can be found [here](https://docs.python.org/3/library/string.html#formatspec)

In [None]:
print(
    '\n'.join(
        [
            '{:*^10}', # Fill 
            '{:08b}',  # Binary
            '{:.2e}',  # Scientific Notation
            '{:.0f}'   # Integer
        ]  
    ).format(*args)
)

Like 'old-style', we can still use keyword arguments as well. Like list unpacking, we are using keyword unpacking (`**`)

In [None]:
print(
    '\n'.join(
        [
            '{string:*^10}',
            '{int:08b}',
            '{exp:.2e}',
            '{float:.0f}'
        ]
    ).format(**dict_of_types)
)

In [None]:
# Greeting example: new style


Now, why does this matter? What could you use it for? Think of some use cases.

### Generating logs

In [None]:
import string
import uuid
template = 'item: {}, guid: {}, character: {}'
for i, s in enumerate(string.punctuation):
    print(template.format(i, uuid.uuid4(), s))

### The Drawbacks

1. having to write `.format()`
2. cannot call local variables instead
3. variable evaluation is 'dumb'

## The *newest-style*: f-strings

New to Python 3.6, f-strings look and feel like `{}`, but with a little extra under the hood.

In [None]:
# Greeting example: f-strings


Above shows that I no longer need to worry about the order arguments within a `.format()` method, or passing keyword arguments. I can directly reference the variables themselves.

You may ask, "Oh Senpai, what about formatting and type conversion?" and I would said, "Young padawan, there is much doubt in your voice. Still convert and format, we can"

In [None]:
# Convert and format

print(
    '\n'.join(
        [
            f'{dict_of_types["string"]:*^10}',
            f'{dict_of_types["int"]:08b}',
            f'{dict_of_types["exp"]:.2e}',
            f'{dict_of_types["float"]:.0f}'
        ]
    )
)

### The important part

Besides just doing type conversion and formatting, there was something else a little more subtle you probably can pick up on: *expression interpolation*.

This is just a fancy way of saying that expressions within `{}` are evaluated and the output is converted into strings.

In [None]:
import random
from collections import namedtuple

# bar square

# Pass namedtuple or default
def foo():
    pass

for i in range(0, 100, 10):
    print(f'{foo(random.randint(i, 100))}')

## Template Strings

To use Template strings, you must import them from the `string` module.

In [None]:
from string import Template

In [None]:
# Greeting example: Template strings
t_string = Template('''Hello, my name is $name, 
                    I have $power power, 
                    and currently working as a $job''')

# Template substitution


---

## Subprocesses

One of the biggest strengths of Python is its use as a *glue* language. It can 'glue' together a bunch of programs into a highly extensible & flexible pipline.

### Purpose (Obligatory wall of text)
One of the most common, yet complicated, tasks that most programming languages need to do is spawning new processes. This could be as simple as seeing what files are present in the current working directory (`ls`) or as complicated as creating a program workflow that *pipes* output from one program into another program's input. <br/><br/>
Many such tasks are easily taken care of through the use of Python libraries and modules (`import`) that *wrap* the programs into Python code, effectively creating Application Programming Interfaces (API). <br/><br/>
However, there are many use cases that require the user to make calls to the terminal from ***within*** a Python program.

### Operating System Conundrum

First, we need to address the following issue. As many in this class have found out, while Python can be installed on most operating systems; doing the same thing in one operating system (Unix) may not always yield the same results in another (Windows).<br/><br/>
The very first step to making a program **"OS-agnostic"** is through the use of the `os` module.

In [None]:
import os

The `os` module wraps OS-specific operations into a set of standardized commands. <br/>
For instance, the Linux end-of-line (EOL) character is a `\n`, but `\r\n` in Windows. In Python, we can just use the following:

In [None]:
os.linesep

This above command detects the current environment, and sets the EOL it will be using based on that.

Anomther example, in a Linux environment, one must use the following command to list the contents of a given directory:
```
ls -alh 
```

In Windows, the equivalent is as follows:
```
dir
```

Python allows users to do a single command, in spite of the OS:

In [None]:
os.listdir()

However, the biggest issue for creating an OS-agnostic program is ***paths*** <br/>
Windows: `"C:\Users\MDS\Documents/"`<br/>
Linux: `/mnt/c/Users/MDS/Documents/`<br/><br/>
Enter Python:

In [None]:
# path joining


## ***NOTE***: From here on out, this notebook will *only* work on **Linux**

### `subprocess`

If you Google anything on how to run shell commands, but don't specify Python 3.x, you will likely get an answer that includes `popen`, `popen2`, or `popen3`. These were the most prolific ways to *open* a new *p*rocess. In Python 3.x, they encapsulated these functions into a new one called `run` available through the `subprocess` library.

In [None]:
# Import and alias
import subprocess as sp

The first thing we will look are trivial examples that demonstrate just capturing the *output* (stdout) of a program

#### `check_output`

In [None]:
# check_output returns a bytestring by default, so I set encoding to convert it to strings.
print(
    sp.check_output(
        [
            'ls',          # Command
            '-ahl',        # Command line arguments
            os.getcwd()    # ...
        ], 
        encoding='utf_8')  # Change from bytes to string
    
)

However, while the `check_output` function is still in the `subprocess` module, it can easily be converted into into a more specific and/or flexible `run` function signature.

#### `run`

In [None]:
sub = sp.run(
    [
        'ls',                # The command we want to run
        '-ahl',              # Arguments for the command
        os.getcwd()          # ...
    ],
    encoding='utf_8',        # Converting byte code
    stdout=sp.PIPE,          # Where to send the output
    check=True               # Whether to raise an error if the process fails
)              

In [None]:
# Let's see what sub can do
[i for i in dir(sub) if not i.startswith('_')]

In [None]:
print(sub.stdout)

The main utility of `check_output` was to capture the output (stdout) of a program. By using the `stdout=subprocess.PIPE` argument, the output can easily be captured, along with its return code. A return code signifies the program's exit status: 0 for success, anything else otherwise

In [None]:
sub.returncode

With our `run` code above, our program ran to completetion, exiting with status 0. The next example shows a different status.

In [None]:
print(
    sp.run(
        'exit 1',      # Command & arguments
        shell = True   # Run from the shell
    )
)

However, if the `check=True` argument is used, it will raise a `CalledProcessError` if your program exits with anything different than 0. This is helpful for detecting a pipeline failure, and exiting or correcting before attempting to continue computation.

In [None]:
print(
    sp.run(
        'exit 1',      # Command & arguments
        shell = True,  # Run from the shell
        check = True   # Check exit status
    )
)

## Syntax

Hopefully, you have picked up that I seemingly used two different syntaxes when using `run`:<br/>
1. A list of arguments: `subprocess.run(['ls', '-ahl', os.getcwd()], ...)` 
2. A string and `shell`: `subprocess.run('exit 1', shell = True, ...)`

The preferred way of using `run` is the first way. This is mainly for security purposes (to prevent shell injection attacks), but it also allows the module to take care of any required escaping and quoting of arguments for a pseudo-OS-agnostic approach. That said, some programs only work on one OS, and therefore, there is often little reason one should use `run` one way or another besides habit.

There are some guidelines though:
1. Sequence (list) of arguments is generally preferred
2. A str is appropriate if the user is just calling a program with no arguments
3. The user should use a str to pass argument if `shell` is `True`<br/>
Your next questions should be, "What is `shell`?"

`shell` is just your terminal/command prompt. This is the environment where you call `ls/dir` in. It is also where users can define variables. More importantly, this is where your *environmental variables* are set...like `PATH`.<br/><br/>
By using `shell = True`, the user can now use shell-based environmental variable expansion from within a Python program.

In [None]:
print(
    sp.run(
        'echo $PATH',            # Command
        shell = True,            # Use the shell
        stdout=sp.PIPE,          # Where to send it
        encoding='utf_8'         # Convert from bytes to string
    ).stdout.split(':')[:5]      # Look at the output
)

For the most part, you shouldn't need to use `shell` simply because Python has modules in the standard library that can do most of the shell commands. For example `mkdir` can be done with `os.mkdir()`, and `$PATH` can be found using `sys.path` 

# Blocking vs Non-blocking

The last topic of this lecture is "blocking". This is computer science lingo/jargon for whether or not a program ***waits*** until something is complete before moving on. Think of this like a really bad website that takes forever to load because it is waiting until it has rendered all its images first, versus the website that sets the formatting and text while it works on the images.

For the purposes of instruction, here are a few notes:
1. `subprocess.run()` is blocking (it waits until the process is complete)
2. `subprocess.Popen()` is non-blocking (it will run the command, then move on)

As said before, ***most*** use cases can be taken care of through the use of `run()`. However, `Popen()` allows the user a more flexible control of the subprocess call. `run()` is just a *wrapped* version of `Popen()` that simplifies use. However, `Popen()` can be used *almost* exactly the same way (albeit with more optional parameters).

An example use case for `Popen()` is if the user has some intermediate data that needs to get processed, but the output of that data doesn't necessarily affect the rest of the pipeline.

#### `Popen`

In [None]:
# Use context manager to handle process while it is running,
# and gracefully close it
with sp.Popen(
    [
        'ls',         # Command
        '-ahl',       # Command line arguments
        os.getcwd()   # ...
    ],
    encoding='utf_8', # Convert from byte to string
    stdout=sp.PIPE    # Where to send it
) as proc:            # Enclose and alias the context manager
    print(
        proc.stdout.read() # Look at the output
    )

Furthermore, `Popen` can create background processes. As such, `Popen` has a lot more functionality than `run`

In [None]:
sub_popen = sp.Popen(
    [
        'ls',          # Command
        '-ahl',        # Command line arguments
        os.getcwd()    # ...
    ],
    encoding='utf_8',  # Convert from byte to string
    stdout=sp.PIPE     # Where to send it
)
for j in dir(sub_popen):
    if j.startswith('_'):
        pass
    else:
        print(j)
sub_popen.kill()       # Close the process