<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Refactoring" data-toc-modified-id="Refactoring-1">Refactoring</a></span></li><li><span><a href="#Learning-Outcomes" data-toc-modified-id="Learning-Outcomes-2">Learning Outcomes</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#By-the-end-of-this-session,-you-should-be-able-to:" data-toc-modified-id="By-the-end-of-this-session,-you-should-be-able-to:-2.0.1">By the end of this session, you should be able to:</a></span></li></ul></li></ul></li><li><span><a href="#What-is-refactoring?" data-toc-modified-id="What-is-refactoring?-3">What is refactoring?</a></span></li><li><span><a href="#Lines-of-Code-(loc)" data-toc-modified-id="Lines-of-Code-(loc)-4">Lines of Code (loc)</a></span></li><li><span><a href="#Unit-tests-are-a-safety-net-for-refactoring" data-toc-modified-id="Unit-tests-are-a-safety-net-for-refactoring-5">Unit tests are a safety net for refactoring</a></span></li><li><span><a href="#Testing-Bunny-Ears" data-toc-modified-id="Testing-Bunny-Ears-6">Testing Bunny Ears</a></span></li><li><span><a href="#Remember-our-recipe-for-writing-software" data-toc-modified-id="Remember-our-recipe-for-writing-software-7">Remember our recipe for writing software</a></span></li><li><span><a href="#Student-Activity" data-toc-modified-id="Student-Activity-8">Student Activity</a></span></li><li><span><a href="#Brian's-programming-principle" data-toc-modified-id="Brian's-programming-principle-9">Brian's programming principle</a></span></li><li><span><a href="#in-for-membership-tests-" data-toc-modified-id="in-for-membership-tests--10"><code>in</code> for membership tests </a></span></li><li><span><a href="#and-and-any-built-ins" data-toc-modified-id="and-and-any-built-ins-11"><code>and</code> and <code>any</code> built-ins</a></span></li><li><span><a href="#De-Morgan's-laws" data-toc-modified-id="De-Morgan's-laws-12">De Morgan's laws</a></span></li><li><span><a href="#Is-De-Morgan's-Law-Pythonic?" data-toc-modified-id="Is-De-Morgan's-Law-Pythonic?-13">Is De Morgan's Law Pythonic?</a></span></li><li><span><a href="#De-Morgan's-Law-Summary" data-toc-modified-id="De-Morgan's-Law-Summary-14">De Morgan's Law Summary</a></span></li><li><span><a href="#Student-Activity" data-toc-modified-id="Student-Activity-15">Student Activity</a></span></li><li><span><a href="#Summary" data-toc-modified-id="Summary-16">Summary</a></span></li><li><span><a href="#Bonus-Material" data-toc-modified-id="Bonus-Material-17">Bonus Material</a></span></li><li><span><a href="#Vectorization" data-toc-modified-id="Vectorization-18">Vectorization</a></span></li><li><span><a href="#IDEs-are-required-for-serious-refactoring" data-toc-modified-id="IDEs-are-required-for-serious-refactoring-19">IDEs are required for serious refactoring</a></span></li><li><span><a href="#Text-searching" data-toc-modified-id="Text-searching-20">Text searching</a></span></li><li><span><a href="#Student-Activity" data-toc-modified-id="Student-Activity-21">Student Activity</a></span></li><li><span><a href="#Solutions" data-toc-modified-id="Solutions-22">Solutions</a></span></li></ul></div>

Refactoring
----

<center><img src="images/refactoring.jpg" width="55%"/></center>

<center><h2>Learning Outcomes</h2></center>

#### By the end of this session, you should be able to:

- Define refactoring in your own words.
- List best practices for refactoring.
- Use additional Python built-ins: `in`, `all`, `any`
- Refactor code to be more Pythonic.

What is refactoring?
-----

Restructure code without changing its observable behavior.

For example, changing from procedural paradigm to functional paradigm.

In [186]:
def total_procedural(nums):
    result = 0
    for n in nums:
        result += n
    return result

In [187]:
from functools import reduce

def total_functional(nums):
    return reduce(lambda x, y: x+y, nums)

In [188]:
# Same result
nums = [42, 42, 42]

assert total_procedural(nums) == sum(nums)

assert total_functional(nums) == sum(nums)

Lines of Code (loc)
------

Generally, we want fewer lines of codes.

Faster to read. Less information to hold in a human memory (human and computer memory are the limiting factors of programming).

Even as we minize loc, we still have to be clear.

> “Readability counts”  
> — Tim Peters, Zen of Python

Unit tests are a safety net for refactoring
-----

<center><img src="https://assets.atlasobscura.com/media/W1siZiIsInVwbG9hZHMvcGxhY2VfaW1hZ2VzLzM3NjAzNzAzNzVfNTk0ZjM0ZDdlYy5qcGciXSxbInAiLCJ0aHVtYiIsIngzOTA-Il0sWyJwIiwiY29udmVydCIsIi1xdWFsaXR5IDgxIC1hdXRvLW9yaWVudCJdXQ/3760370375_594f34d7ec.jpg" width="75%"/></center>

<center>Circus Center in San Francisco</center>

Software "regression" - code that worked before does not work now!

If you have green test, keep it green.

Testing Bunny Ears
-------

<center><img src="http://www.peterprovost.org/images/blog/2012-05-02-the-only-way-to-learn-tdd-kata/Red-Green-Refactor-Bunny.png" width="55%"/></center>

Remember our recipe for writing software
-----

1. Make it run.
1. Make it right.
1. Make it better.

<center><h2>Student Activity</h2></center>

1. Write a function that checks if a word ends with any of the following: 'ly', 'ed', 'ing', 'ers'

1. Write a function that checks if a number is within a range of numbers.

1. Write a function that checks if a sequence contains a False. For example: `(True, True, True, False)`

Hints:

- First, write your own tests. Then, write the function.    
- Use a combination of methods and built-ins.
- Minimize iteration.

__1) Write a function that checks if a word ends with any of the following: 'ly', 'ed', 'ing', 'ers'__

In [189]:
# Procedural
# Gets the job done but lots of code.
def ends_with_suffix(word, suffixes):
    for suffix in suffixes:
        if word.endswith(suffix):
            return True
    else:
        return False

suffixes = ('ly', 'ed', 'ing', 'ers')
assert ends_with_suffix('swimming', suffixes)
assert ends_with_suffix('swimmingly', suffixes)
assert not ends_with_suffix('swim', suffixes)

In [190]:
# Pythonic
# Leverage methods to their full power
def ends_with_suffix(word, suffixes):
    return word.endswith(suffixes)

suffixes = ('ly', 'ed', 'ing', 'ers')
assert ends_with_suffix('swimming', suffixes)
assert ends_with_suffix('swimmingly', suffixes)
assert not ends_with_suffix('swim', suffixes)

__2) Write a function that checks if a number is within a range of numbers.__

In [191]:
# Procedural: Clear & Verbose

def is_in_range(n, lb, ub):
    if (n >= lb) and (n <= ub):
        return True
    else:
        return False

lb, ub = -3.13, 10.75    
assert is_in_range(5.25, lb, ub)
assert not is_in_range(lb-.001, lb, ub)
assert not is_in_range(ub+.001, lb, ub)

__Sidebar - Inclusive or exclusive range?__

Programmers choice!

That is a good type of thing to put in the docstring so people know what to except.

Also a good thing to be clear in a unit test:

In [192]:
assert is_in_range(lb, lb, ub) # Test for inclusive
assert is_in_range(ub, lb, ub) # Test for inclusive

Brian's programming principle
-----

Let Python do the work.

In [193]:
# Let Python do the work

def is_in_range(n, lb, ub):
    return lb <= n <= ub
    
lb, ub = -3.13, 10.75    
assert is_in_range(5.25, lb, ub)
assert not is_in_range(lb-.001, lb, ub)
assert not is_in_range(ub+.001, lb, ub)
assert is_in_range(lb, lb, ub) # Test for inclusive
assert is_in_range(ub, lb, ub) # Test for inclusive

[Source](https://www.techbeamers.com/python-check-integer-in-range/)

__3) Write a function that checks if a sequence contains a False.__

In [194]:
from typing import Sequence

def contains_false(iterable: Sequence[float]) -> bool:
    return bool(iterable.count(False))

assert contains_false((True, True, True, False))
assert not contains_false((True, True, True, True))

In [195]:
def contains_false(iterable: Sequence[float]) -> bool:
    return False in iterable

assert contains_false((True, True, True, False))
assert not contains_false((True, True, True, True))

`in` for membership tests 
----

`in` is not just for iteration.

`in` can also test for for membership.

`x in s` evaluates to True if x is a member of s, and False otherwise.

`not in` is the negation, aka the boolean flip

> x in y calls y.__contains__(x) if y has a __contains__ member function. Otherwise, x in y tries iterating through y.__iter__() to find x, or calls y.__getitem__(x) if __iter__ doesn't exist. 

[Python docs for in](https://docs.python.org/3/reference/expressions.html#membership-test-details)

[Source](https://stackoverflow.com/questions/38204342/python-in-keyword-in-expression-vs-in-for-loop)
                     
                     

`and` and `any` built-ins
-----

The `all()` function is a chain of `and`s.

The `any()` function is a chain of `or`s.

In [196]:
# and
print(True and True)
print(True and False)
print(False and True)
print(False and False)

True
False
False
False


In [197]:
# all
print(all([True, True, True]))
print(all([True, True, False]))
print(all([False, False, False]))

True
False
False


In [198]:
# or
print(True or True)
print(True or False)
print(False or True)
print(False or False)

True
True
True
False


In [199]:
# any
print(any([True, True, True]))
print(any([True, True, False]))
print(any([False, False, False]))

True
True
False


In [200]:
# Extends to truthiness
all([1, 1, 1])

True

In [201]:
# Extends to truthiness
all(["stuff", 
     "",
     "stuff"])

False

In [202]:
# and / any can apply to iterables of conditionals
any([True,
     2 > 1,
     " "])

True

In [203]:
# Extends to any iterable
all((True, True, True)) # tuple

True

In [204]:
all({True, True, False}) # set

False

De Morgan's laws
-----

If you are serious about performance and elegance, it is worth learning to apply [De Morgan's laws](https://en.wikipedia.org/wiki/De_Morgan%27s_laws) for boolean comparisons.

Performance - `and` is faster than `all` because a single `False` in `and` makes it all `False`

Elegance - It might be easier to read and reason about one version of a  complex boolean comparison than another.

De Morgan's laws are an easy way to find the inverse of a boolean expression.

<center><img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/b55ab78fcd4c3b617df4e2195d487dda13c09e7d" width="55%"/></center>

Each term is complemented - OR’s become AND’s;  AND’s become OR’s.


In [205]:
True # Boolean

True

In [206]:
not True # Complement

False

In [207]:
# Using and
(not True) and (not True)

False

In [208]:
# Using or
not (True or True)

False

In [209]:
# Let's explore other options

a, b = True, True
# a, b = True, False
# a, b = False, True
# a, b = False, False

print((not a) and (not b))
print(not (a or b))

False
False


In [210]:
# Let's explore other options

a, b = True, True
# a, b = True, False
# a, b = False, True
# a, b = False, False

print((not a) or (not b))
print(not (a and b))

False
False


Is De Morgan's Law Pythonic?
------

> Use the version is easiest to read, depending on what a and b are.

[Source](https://stackoverflow.com/questions/13012459/is-de-morgans-law-pythonic)

__De Morgan's laws generalization__

Any number of booleans comparisons

(not A and not B and not C) == not (A or B or C)   
(not A or not B or not C) == not (A and B and C)

In [211]:
# Let's explore not `any`
bools = [True, True, True]
bools = [True, True, False]
bools = [False, False, False]

print(not any(bools))
print(all( [not b for b in bools] ))


True
True


In [212]:
# Let's explore not `all`
bools = [True, True, True]
bools = [True, True, False]
bools = [False, False, False]

print(not all(bools))
print(any( [not b for b in bools] ))

True
True


In [213]:
# 0 is "False"
# Use extended boolean comparsions with `all`

def contains_false(iterable):
    return not all(iterable) 

assert contains_false([1, 1, 0])
assert not contains_false([1, 1, 1])

De Morgan's Law Summary
-----
- Since readiblity counts, refactor code with boolean logic.

- Here is a hand guide:
    - all() is True when every value is True. 
    - not any() is True when every value is False.
    - any() is True when at least one value is True.
    - not all() is True when at least one value is False.

[Source](http://thepythonreport.blogspot.com/2016/03/mini-lesson-de-morgans-laws.html)

<center><h2>Student Activity</h2></center>

- Refactor this code:

```python
results = []
for item in iterable:
    if item == match:
        results.append(item)
```

- Refactor this code:

```python
def contains_error(data):
    for item in data:
        if isinstance(item, str):
            if item == "error":
                return True
    else:
        return False
```

__Hints__:

- "Understand" what this code does. There is no documentation so you must guess.
- Did you start with writing unit tests? 

Positive and negative unit tests?

“Flat is better than nested”

— Tim Peters, Zen of Python

__Follow refactoring steps:__

1. Wrap it in a function for testing.
1. Write positive and negative tests.
1. Refactor
1. Keep tests passing


In [214]:
# 1. Wrap it in a function for testing
def return_found_target(iterable, target): # change variable name
    results = []
    for item in iterable:
        if item == target:
            results.append(item)
    return results

In [215]:
# 2. Then test it
assert return_found_target((1, 2, 3), 1) == [1]
assert return_found_target((1, 2, 3, 1), 1) == [1, 1]
assert return_found_target((1, 2, 3, 1), 42) == []

In [216]:
# 3. Refactor: Make Pythonic:
#      - Let Python do the work of building up a list
#      - Flatten
def return_found_target(iterable, target):
    return [item for item in iterable if item == target]

In [217]:
# 4. Keep tests passing
assert return_found_target((1, 2, 3), 1) == [1]
assert return_found_target((1, 2, 3, 1), 1) == [1, 1]
assert return_found_target((1, 2, 3, 1), 42) == []

Sidebar - The code is silly.

Most likely the functionality that is need:

In [218]:
# Just use counts (metadata)
# Passing the actual data around is wasteful
(1, 2, 3, 1).count(1)

2


[Source](https://realpython.com/python-refactoring/)

In [219]:
# Refactor contins error code

def contains_error(data):
    for item in data:
        if isinstance(item, str):
            if item == "error":
                return True
    else:
        return False

In [220]:
assert contains_error('bah bah error bah'.split())
assert not contains_error('bah bah bah'.split())

__Observations__:
    
- Too much code for too little functionality
- Generally, Python does not need type checking

In [221]:
def contains_error(data):
    return "error" in data

assert contains_error('bah bah error bah'.split())
assert not contains_error('bah bah bah'.split())

In [222]:
# Now that we have refactored, we can see an easy generalization

def contains(container, element):
    return element in container

assert contains('bah bah error bah'.split(), "error")
assert not contains('bah bah bah'.split(), "error")
assert contains([1, 2, 3], 3)
assert not contains([1, 2, 3], 0)

# Note - This is useful. We might want the same functionality that is built-in statement in a funciton. 
# Functions can be passed to other functions and tested more easily.

Summary
-----

- Refactoring is improve code without changing its outward behavior.
- Refactoring could be:
    + Improving clarity
    + Making faster
    + Making it easy to extend in the future.
- Unit test are the guides for refactoring.
- `in` can test for membership.
- `all` is a chain of `and` operations.
- `any` is a chain of `or` operations.


Bonus Material
----

Vectorization
-----

> Life is too short for for-loops

Often times it is easy to speed-up code with vectorization

In [223]:
import numpy as np

In [224]:
m = np.array([[ 0.,  1.,  2.],
              [ 3.,  4.,  5.],
              [ 6.,  7.,  8.]])

In [225]:
for i in range(m.shape[0]):
    for j in range(m.shape[1]):
        m[i, j] = m[i, j]**.5

m

array([[0.        , 1.        , 1.41421356],
       [1.73205081, 2.        , 2.23606798],
       [2.44948974, 2.64575131, 2.82842712]])

What is a better way?


In [226]:
# Let NumPy do the work for you with broadcasting
m = np.array([[ 0.,  1.,  2.],
              [ 3.,  4.,  5.],
              [ 6.,  7.,  8.]])
m**.5

array([[0.        , 1.        , 1.41421356],
       [1.73205081, 2.        , 2.23606798],
       [2.44948974, 2.64575131, 2.82842712]])

In [227]:
np.sqrt(m)

array([[0.        , 1.        , 1.41421356],
       [1.73205081, 2.        , 2.23606798],
       [2.44948974, 2.64575131, 2.82842712]])

[Source](https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html)

IDEs are required for serious refactoring
------

IDEs have many tools to help you.

__De Morgan's laws generalizations___

1. Any number of booleans comparisons
1. Any notion of set

<center><img src="https://upload.wikimedia.org/wikipedia/commons/0/06/Demorganlaws.svg" width="55%"/></center>

Text searching
-----

De Morgan’s laws commonly apply to text searching using Boolean operators AND, OR, and NOT. 

Consider a set of documents containing the words “cars” and “trucks”. 

De Morgan’s laws hold that these two searches will return the same set of documents:

Search 1: (NOT cars) AND (NOT trucks)  
Search 2: NOT (cars OR trucks)  
 
The corpus of documents containing “cars” or “trucks” can be represented by four documents:

1. Contains only the word “cars”.
2. Contains only “trucks”.
3. Contains both “cars” and “trucks”.
4. Contains neither “cars” nor “trucks”.

To evaluate Search A, clearly the search “(cars OR trucks)” will hit on Documents 1, 2, and 3. So the negation of that search (which is Search A) will hit everything else, which is Document 4.

Evaluating Search B, the search “(NOT cars)” will hit on documents that do not contain “cars”, which is Documents 2 and 4. Similarly the search “(NOT trucks)” will hit on Documents 1 and 4. Applying the AND operator to these two searches (which is Search B) will hit on the documents that are common to these two searches, which is Document 4.



In [228]:
d1 = ['cars', 'cars']
d2 = ['trucks', 'trucks']
d3 = ['cars', 'trucks']
d4 = ['moped', 'scooter']

docs = (d1, d2, d3, d4)

for doc_num, doc in enumerate(docs, 1):
    if (not ('cars' in doc)) and (not ('trucks' in doc)):
        print(doc_num, doc)
        
for doc_num, doc in enumerate(docs, 1):
    if not (('cars' in doc) or ('trucks' in doc)):
        print(doc_num, doc)

4 ['moped', 'scooter']
4 ['moped', 'scooter']


<center><h2>Student Activity</h2></center>

The corpus of documents containing “cars” or “trucks” can be represented by four documents:

1. ['cars', 'cars']
2. ['trucks', 'trucks']
3. ['cars',  'trucks']
4. ['moped', 'scooter']

The search query is: `(not cars) or (not trucks)`.

1. Which documents will it return?
1. Rewrite it with De Morgan’s laws.

Just pen & paper - No computers.

Solutions
-----

1. Documents 1, 2, 4
1. `NOT (cars AND trucks)`.

In [229]:
d1 = ['cars', 'cars']
d2 = ['trucks', 'trucks']
d3 = ['cars', 'trucks']
d4 = ['moped', 'scooter']

docs = (d1, d2, d3, d4)

for doc_num, doc in enumerate(docs, 1):
    if (not ('cars' in doc)) or (not ('trucks' in doc)):
        print(doc_num, doc)

1 ['cars', 'cars']
2 ['trucks', 'trucks']
4 ['moped', 'scooter']
