# Python: peeking under the hood

In [None]:
!git clone https://github.com/biof309/fall2019.git oct_31

First let's try to delve in to some behavior that has demonstrated but not yet been explained in depth. We want to try and understand the difference between 

a) the observed output of a statement when a python expression is not assigned to a given variable
b) the observed output of using the print function on a given variable

By way of example:

#### a)

In [1]:
2 + 2

4

#### b)

In [2]:
print(2 + 2)

4


### Are they different?

The two examples above have completely different outcomes although they look the same superficially. This become clearer if we assign each expression to a variable

#### a)

In [3]:
a = 2 + 2

In [4]:
a

4

#### b)

In [5]:
b = print(2 + 2)

4


In [6]:
b

In the case of a, the result is assigned to the label "a" and when "a" is evaluated as an expression we observe the value we stored.

In contrast, the expression in b displays the result of the expression "2 + 2" but since the print function always returns "None", we can't do much with our variable "b"

### Why might I care about this difference?

A simple reason to care is if we like the look of the output and want to capture it as a string to use it elsewhere. We can use the `str` function for that...

In [3]:
my_list = [float(x) for x in range(2,10)]
str(my_list)

'[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]'

In [4]:
print(my_list)

[2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]


In [5]:
str(my_list)[1:-1]

'2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0'

That's nice. It makes some things easier, but it doesn't quite explain what is going on. For example, using different methods of seeing a Path object gives different results:

In [6]:
from pathlib import Path

type

In [8]:
test_path = Path('a_test_file.txt')

In [9]:
test_path

PosixPath('a_test_file.txt')

In [10]:
print(test_path)

a_test_file.txt


In [11]:
str(test_path)

'a_test_file.txt'

### Ok. Enough said. What's under the hood?

It turns out that when we print a variable we are calling a "hidden" `__str__` method of the variable and display the resulting string. To capture the string we could instead call this method. This is still not exactly the same as `print`, which will interpret the character encodings and display the string without quotes. It's close enough though.

In [12]:
test_path.__str__()

'a_test_file.txt'

And finally, the `__repr__` method will give us something more official and is often sufficient to instantiate the object that we currently have:

In [14]:
test_path.__repr__()

"PosixPath('a_test_file.txt')"

In [16]:
sys.stdout.write(repr(test_path))

PosixPath('a_test_file.txt')

The are lots of hidden methods like this that enable all python objects to behave the way they do:

In [15]:
[x for x in dir(test_path) if x.startswith('__')]

['__bytes__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__fspath__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rtruediv__',
 '__setattr__',
 '__sizeof__',
 '__slots__',
 '__str__',
 '__subclasshook__',
 '__truediv__']

### I'm still some of the details of the print function
[Here](https://snarky.ca/why-print-became-a-function-in-python-3/) is a description that you might like.

# Finishing the exercise to generate a tree of data

In [18]:
from pathlib import Path
import itertools
import random
import shutil

def generate_text():
    nums = range(random.randint(1,100))
    # Generate a random number of random numbers between 0 and 1
    numlist = [random.random() for x in nums]
#     Turn it into a string
    output = ',\n'.join([str(x) for x in numlist])
    return output

A "fixed" version of this function would be:

In [35]:
def generate_test_tree(test_dir,overwrite=False):
    seasons = 'spring summer autumn winter'.split()
    animals = 'cat dog bat monkey elephant'.split()

    if test_dir.exists() and overwrite:
        shutil.rmtree(test_dir)
    for animal,season in itertools.product(animals,seasons):
        this_loop_dir = test_dir / animal / season
        text_path = this_loop_dir / 'data.txt'
        text_path.parent.mkdir(parents=True)
        text_string = generate_text()
        text_path.write_text(text_string)



This now works when we want it to:

In [38]:
tdir = Path('testoutput')
generate_test_tree(tdir)

Fails when we want it to if the directory exists:

In [39]:
generate_test_tree(tdir)

FileExistsError: [Errno 17] File exists: 'testoutput/cat/spring'

But can still run if the directory already exists...

In [40]:
generate_test_tree(tdir,overwrite=True)

### Making this better...

There are many things we can do to make this code better. Let's discuss some of them.

### Breaking down the problem. 

This breakdown should ease rather than hinder the debugging process. Apart from creating separate functions and making sure each work as we expect before composing these into a "higher level" function there are ways to make sure we are solving our problems efficiently.

Executing a function in the same cell as it is defined is a useful way of iterating rapidly to make sure it is working.

Better yet, we could formally define tests that we expect to pass and run them each time...

### Carefully thinking about idempotence.

The function should be re-runnable. Giving the same result regardless of previous state. In this case we don't want to erase data by default so we have set "overwrite" to False. If we set "overwrite" to True we can rerun our function without error. Since we are generating random output, this will change each time. That's worth thining about... 

### Carefully thinking about breaking backwards compatibility as we expand the functionality of our code.

Keyword arguments are a great way of adding optional extras to our functions without breaking all prior usage of it.
For changes that don't effect the arguments of the function but to modify it's behavior we should try to ensure that the changes don't break all of the previous use cases of that function. Having tests for that function are a quick and easy way of determining whether or not we have inadvertently changed the behavior in a way that we did not mean to.

### Saving our code in a way that is more reusable, shareable.

Creating functions is a great way to write some code once but reuse it many times. We can push this further though. We can move our functions from a specific notebook to a "module" (a text file with an appropriate description with a shebang and the top of the file).

As we begin to accumulate more modules we may consider moving them into their own directory so we do not mix them with our data/notebooks.

Finally, creating our own packages is an excellent way to reuse our code across multiple projects and to share it with others.

### Reassessing whether we solved the problem in the correct way. Would we attack the problem differently?

+ Do not have lots of code on the same line.
+ Add documentation/comments. It is most likely your future self whom you are being considerate to.
+ Consider whether you have solved/are solving a problem that someone has already thought about much more than you have.

### The end of course project

A rough rubric that is subject to change is available on the [course repository](https://github.com/biof309/fall2019.git)