# Schedule
* Today - Review & Boring stuff
* Tomorrow - Intro. to Numpy/Pandas

# Class Expectations

* Too much to cover - class materials and exercises will be provided for offline study
* * We will stop when we run out of time, and you can review the additional material on your own after class
* * **Do the practice exercises!**
* Make sure questions are relevant for the whole class
* Give feedback afterwards! 
* Make use of the documentation pages - good engineers use Google!

# Acknowledgements

# IPython

IPython is a custom Python shell (`ipython`, from the command line) that includes rich features like tab-completion, inline documentation, all in the same REPL (Read; Evaluate; Print; Loop) format that the normal `python` shell has. It's configurable, customizable, and overall, the best environment for interactive Python work.

IPython *also* has what we're using now - a notebook feature. IPython notebook's are web-based frontends to your Python session that you setup/start from the command line. The notebook is cell-oriented, meaning we execute codes in chunks, and we can go back and re-execute code selectively as needed. This setup provides a great way to work on data - load it in once, and then work on your data iteratively, without re-loading or re-extracting.

Finally, IPython has great built-in documentation features for sharing projects - just as we're doing now for this class!


## Highlights
- Toolbar
- Cell-related options
- Printed vs. Returned values
- Selective re-execution
- Help features (?, tab, shift tab)
- Magics
- Export features

### Reference material
IPython Home - http://ipython.org/

IPython Cheatsheet - https://damontallen.github.io/IPython-quick-ref-sheets/

In [1]:
1 + 5

6

In [2]:
print(5 + 5)

10


In [4]:
import sys
print(sys.version)

3.4.3 |Anaconda 2.0.1 (64-bit)| (default, Mar  6 2015, 12:03:53) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]


# A Brief Note on Python 3

Most of the R&D network software uses Python2 by default. This is the default `python` executable in your path. We will be using a distribution of python provided by DEG Soft and primarily use the `python3`/`ipython3` executables. This is Python version 3.4 - the most recent stable release.

Python 2 and Python 3 aren't that different for the most part, but Python 3 is not backwards compatible. As a result, we're focusing on Python 3 going forward, because our intention is to use the best tools available to us, and the best tools happen to be developed primarily on Python 3 now.

You can find a number of guides and tutorials online about Python 3 differences, but here are a quick few differences people may run into right away:

- `print()` needs parantheses - it's a function
- iterate over `dict`s using `dict.items()`
- strings and bytestrings are explicitly different, and strings by default are typically unicode
- different syntax for exception handling
- many functions return iterable objects/generators, not lists (map, zip, etc)
- `xrange()` has been replaced with `range()`

# Quick Syntax Reminder

In [None]:
# variable assignment
variable = "value"

# simple function definition
def function(arg1, arg2):    
    result = arg1 + arg2
    return result

# lists + iteration
my_list = [1, 2, 3, 4, 5, 6]

for i in my_list:
    j = i * 2

# dicts and sets
my_dict = { 'key1': 'value1', 'key2': 'value2' }
my_set = {'Design', 'PE', 'Test', 'Apps'}

# iterate over a dict
for key, value in my_dict.items():
    print(key, ' => ', value)

# strings and fancy indexing
words = 'this is my string made of words'.split(' ')
first_word = words[0]
even_words = words[::2]
odd_words = words[1::2]
last_two_words = words[-2:]
sentence = ' '.join(words)

# write a file
with open('/tmp/myfile.txt', 'w') as f:
    f.write('my super simple text file string\n')
    print(" -- this also works", file=f)
    
# read a file (slurp)
with open('/tmp/myfile.txt') as f:
    contents = f.read()
    
# read a file (line by line)
with open('/tmp/myfile.txt') as f:
    for line in f:
        print(line)

# Advanced Python Topics

Python is a big language, well-equipped for a number of tasks. Introductory courses tend to cover a lot of the basics - data types, simple math operations, defining functions, list basics, and so on. In this session, we'll take a look at a number of 'advanced' Python features and practices targeted at problems that are common with Micron data.

We will be skipping between a number of topics, but today's session is just to get everyone familiar with some useful topics and get people caught back up to a reasonable level on Python. The biggest topic of interest to us - for the purposes of working with data - is working with collections of data. There are always more than one way to get things done, but we'll specifically be looking at features and syntax that make these operations expressive and concise.

Make sure to refer to the official Python documentation page, specifically for Python version 3.4:

https://docs.python.org/3/


### Collections

Lists and dicts are the most core data structures in Python, but there are a number of ways to utilize them in the standard library that tend to get overlooked in a first dive into the language. There are always a number of additional data structures in Python's core that we find useful for a number of purposes.

First, let's take a look at what we can do with lists. List comprehensions are a simple way to define or modify a list in an expressive manner.

In [6]:
# start by copying a list
base_list = [1, 2, 3, 4, 5, 6]
copied = [x for x in base_list]
copied

[1, 2, 3, 4, 5, 6]

In [7]:
# or apply some code element-wise
modified = [x ** 2 for x in base_list]

# apply a function directly
def my_func(x):
    return x ** 2

applied = [my_func(x) for x in base_list]

modified, applied

([1, 4, 9, 16, 25, 36], [1, 4, 9, 16, 25, 36])

In [8]:
# apply conditionals inline, performing filtering
modified = [x for x in base_list if x % 2 == 0]
modified

[2, 4, 6]

In [None]:
# common use case: remove items that exist in another list
filter_out = [1, 4]
filtered = [x for x in base_list if x not in filter_out]
filtered

List comprehensions typically offer a simple, concise way to express behavior, relative to an iterative approach. They can also get pretty ugly, in which case it's always a question of doing what makes the most sense - whether dictated by performance needs or by readability.

Here's a (relatively) simple example of using a list comprehension to repeat every element in a list N times, but note how its intention isn't necessarily immediately clear:

In [11]:
N = 4
[x
 for x in base_list
     for i in range(N)]

[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6]

As you can expect, it's easy to start composing more and more complex expressions that descend quickly into uselessness. Who cares how clever you can be with a bit of code if you can never read or understand it? While iteration may be more verbose, it inherently forces you to decompose problems a little bit further, which may help with the debug process.


In many cases, somewhat manual iteration (read: for loops) may be necessary. Let's take a look at the `enumerate()` function to aid in simple loops over lists.

In [12]:
# iterating with an index, C-style
for i in range(len(base_list)):
    elem = base_list[i]
    print(i, elem * 2)

0 2
1 4
2 6
3 8
4 10
5 12


In [13]:
for i, elem in enumerate(base_list):
    print(i, elem * 2)

0 2
1 4
2 6
3 8
4 10
5 12


From this simple example, it should be immediately apparent what the `enumerate` function does: for every element in some kind of iterable object, return a tuple of (index, value). This is used very frequently to express a C-style loop with an index. 

Let's look at another handy function - `zip()` - starting with an example.

In [15]:
indices = [0, 1, 2, 3, 4]
my_list = [1, 10, 100, 1000, 10000]

zip(indices, my_list)

<zip at 0x2af7bce6df48>

`zip()` takes two N iterable objects and gives us back a series of N-tuples of the elements. So with three lists:

In [16]:
# return a list of 3-tuples
zipped = list(zip(indices, my_list, [x * 2 for x in my_list]))
zipped

[(0, 1, 2), (1, 10, 20), (2, 100, 200), (3, 1000, 2000), (4, 10000, 20000)]

Let's look at one more useful application of `zip()`. Say we need a `dict` to use as a replacement for column names in a table. 

In [None]:
columns = ['PRB_FID', 'PRB_X', 'PRB_Y', 'PRB_WAFER']
fixed = [x.replace('PRB_', '') for x in columns]
mapping = dict(zip(columns, fixed))
mapping

Let's go back and look at comprehensions a little further. We've shown the power of list expressions, but the same syntax exists for both `dicts` and `sets`. Below are some simple examples of each:

In [None]:
# create a dict entry for each element in a list
k = { x: x * 2 for x in base_list }

print(k)
print('----')

# replicate the column naming example above
columns = ['PRB_FID', 'PRB_X', 'PRB_Y', 'PRB_WAFER']
mapping = {x: x.replace('PRB_', '') for x in columns}
print(mapping)


In [20]:
repeated_list = [1, 1, 1, 2, 3, 4, 5, 5, 5, 8, 9, 10, 10]
k = set(repeated_list)

l = set([1,2,3])

k & l

{1, 2, 3}

In [None]:
# note that we can create a set exactly how we perform a 
# list comprehension - the end result is just a set
{x for x in repeated_list}, set([x for x in repeated_list])

`dicts` are incredibly valuable in Python, just as they are in Perl, but a common need is to have a default value for keys that don't yet exist. This is in part due to Python's strict handling of this behavior with exceptions. Take the simple example below:

In [21]:
dd = {'a': 1, 'b': 2, 'c': 3}

first = dd['a']
second = dd['x']

KeyError: 'x'

There are two different ways to address this. One, `dicts` have a `.get()` method that allows you to provide a default value for missing keys (rather than throwing an exception) as shown below:

In [None]:
dd.get('x', 'no value')
#dd['x']['y'] = 5

if 'x' not in dd.keys():
    dd['x'] = {}

Another alternative is to start exploring the `collections` module provided in Python which contains a variety of useful variations on existing types. Several of the containers this module provides are intended for easy subclassing of built-in types, but the two most useful objects it contains are **defaultdict** and **OrderedDict**.

`defaultdict` is a class which has a 1-argument constructor, expecting a zero-argument function that provides a default behavior for a value when a key is missing. This is kind of confusing at first, so let's see how this looks:

In [22]:
from collections import defaultdict

# 0-argument function
def cons():
    return 'no value'

d = defaultdict(cons)
d['a'] = 5

value = d['x']
value

'no value'

So when do you use which approach? It's mostly a matter of choice - either one provides the same basic behavior. Defaultdict is a bit nicer when you want to use something naiively as a `dict` and just get back a default value, whereas using the normal `dict.get()` method may make your intention more explicit to provide a default value.

But there is one more interesting case to consider - what if you want a `dict` where each default value is actually a `defaultdict` as well? We can use recursion to give us a `defaultdict` that can build itself:

In [27]:
def shane_is_a_loser():
    return defaultdict(shane_is_a_loser)

d = defaultdict(shane_is_a_loser)
d['x']['y']['z'] = 0

# the visual representation is pretty ugly, but should still make sense:
print(d)

defaultdict(<function shane_is_a_loser at 0x2af7bce6a400>, {'x': defaultdict(<function shane_is_a_loser at 0x2af7bce6a400>, {'y': defaultdict(<function shane_is_a_loser at 0x2af7bce6a400>, {'z': 0})})})


In one (admittedly confusing) line, we defined something a little more like what Perl users may be familiar with, regarding hash behavior. Sometimes we need a flexible dict object where we can assign sub-dicts freely without worrying about if certain keys have been initialzied yet. 

**OrderedDict** is only useful in a handful of cases, but it's useful nonetheless. Remember that dicts are functionally *hash tables* behind the scenes. Whatever key we provide for association is first hashed and then used as an index into a 'raw' array to pull out a value. In a naiive implementation, there's no understanding or bookkeeping of the order in which values are assigned. But there are a number of cases where this may be something we care about. For that purpose, `OrderedDict` is a type provided in the `collections` module that keeps track of the order in which keys are assigned.

In [None]:
# first, an example of the randomness of iterating over dicts

d = {}
d[1] = 0
d['200'] = 3
d[4] = 10
d[100] = 17
d[5] = 10

for k, v in d.items():
    print(k, ' \t= ', v)

In [None]:
# now, using an OrderedDict
from collections import OrderedDict

d = OrderedDict()
d[1] = 0
d['200'] = 3
d[4] = 100
d[100] = 17
d[5] = 10

for k, v in d.items():
    print(k, ' \t= ', v)

# Functions

There are a number of bits of 'advanced' Python syntax that relate directly how we use and define functions. Let's dive in first by looking at how we deal with complex arguments. First, the arguments list of a function defintion take the form we're used to:

function(arg1, arg2)

But one of Python's strengths is in its flexibility in arguments that may be more complex (or at least less expressive) in other languages. We can easily define a function that takes any number of arguments, such as a function that prints each of its inputs, no matter how many you give it. Let's see this below:

In [28]:
def deg(x):
    return x + "is awesome"

def my_func(*args):
    for arg in args:
        print(arg)
        
my_func('one', 'two', 'three')
print('-----')
my_func('or just one')

one
two
three
-----
or just one


This `*args` syntax allows us to take a flexible number of arguments, which the function basically sees as a list. Conveniently, we can use this to provide some optional arguments and some required ones. In this case, the `*args` must come last.

In [29]:
def my_func(arg1, arg2, *args):
    print(arg1, ' and ', arg2)
    print('But you also gave ', len(args), ' more arguments!')

my_func('one', 'two')
print('-----')
my_func('or one', 'two', 'and three', 'maybe a fourth')

one  and  two
But you also gave  0  more arguments!
-----
or one  and  two
But you also gave  2  more arguments!


There's an inverse operation to this, however, called argument unpacking. Say we have a function that takes three arguments, and we have a list of three items. We can use this syntax to intelligently feed each element of the list (or tuple, etc) to the function, as shown below:

In [None]:
def my_func(arg1, arg2, arg3):
    return arg1 + arg2 + arg3

my_vals = [1, 2, 3]
my_func(*my_vals)

The syntax, as expected, is basically the same as when we provide optional arguments. This is convenient for a number of reasons, especially when you feed data into a variable-argument function.

Remember before when we discussed `zip()` and finding an inverse function? We can use `zip()` as its own inverse function, simply by using this argument unpacking syntax.

In [30]:
l1, l2 = [1,2,3,4], [4,5,6,7]
zipped = list(zip(l1, l2))
print(zipped)

unzipped = list(zip(*zipped))
print(unzipped)

[(1, 4), (2, 5), (3, 6), (4, 7)]
[(1, 2, 3, 4), (4, 5, 6, 7)]


Similar to how we can provide optional arguments using this `*args` syntax, we can provide optional arguments as key-value pairs using a dict-like syntax. Let's define a function that uses all of these together, and look at some clever ways to take advantage of it.

In [31]:
def complex_func(arg1, arg2, *args, **kwargs):
    print(arg1, arg2)
    print('variable length = ', args)
    print('keywords = ', kwargs)

complex_func(1, 2, 4, 5, 6, 7, my_var='x', other_var='y')

1 2
variable length =  (4, 5, 6, 7)
keywords =  {'other_var': 'y', 'my_var': 'x'}


Notice that using `**kwargs` provides a `dict`-like interface to access named parameters of our function. These are inherently optional, and we can iterate over them just as we would in a dict. We can provide default values for keyword arguments in a function definition as well:

In [32]:
def func_defaults(key=5, value=1):
    print(key, ' = ', value)

func_defaults()
print('----')
func_defaults(1, 2)
print('----')
func_defaults(key=10, value=5)
print('----')
func_defaults(value=5, key=10)

5  =  1
----
1  =  2
----
10  =  5
----
10  =  5


And as expected, argument unpacking works on keyword arguments just the same:

In [33]:
d = {'key': 5, 'value': 15}
func_defaults(**d)

5  =  15


Python supports a number of more useful features related to functions. Below we'll look at a few examples. The first note to make is that Python has a concept of first-class function (or higher-order functions) - that is, functions are values just like numbers are, and functions can accept functions as arguments or return functions just as they can values. We saw a simple example of this with `defaultdict` - it required a function as an argument.

As a result, there are a number of times we need a quick little function defined without necessarily giving it a name. This concept is known as 'anonymous functions', but Python implements the `lambda` operator to support this. The `lamba` syntax basically provides a quick way to define a 0-N argument anonymous function in-line. Let's use this `map` function for an example: `map()` takes two arguments - a function to apply, and an iterable object on which to apply it.

In [34]:
def multiply2(x):
    return x * 2

data = [1,2,4,5,6]
list(map(multiply2, data))

[2, 4, 8, 10, 12]

Using a `lambda` expression, we can encode this simple behavior in one line:

In [35]:
s = ["this is my string", "this is my other string"]
list(map(lambda x: x[:7], s))

['this is', 'this is']

Lambda expressions are very useful, but their use can be a little tricky. Lambda expressions consider their scope, so they can lead to some complicated scenarios, for better or worse. Thankfully, you can't assign values from within a lambda expression, so we can't do too much harm, but you have to be mindful of values in scope:
    

In [None]:
some_value = 10

# pull in a variable from our current scope
list(map(lambda x: x + some_value, data))

Our long-term goal is to work with tabular data. Do you suppose this approach will be valuable in this context?

Lambda functions can be assigned as variables, making them look just like a named function:

In [None]:
multiply = lambda x, y: x * y

multiply(5, 10)

We will be using `lambda` expressions **very** heavily throughout the rest of the course, so make sure you understand the general idea before moving on.

Let's look at a few more topics on functions before moving on. The `functools` module provides us a number of useful function-related functions that we can use to address a lot of common behavior. First, the `partial` function gives us a way to create a new function, based on a provided function, with some of its arguments filled in. 

In [None]:
from functools import partial

my_func = lambda x, y: x + y
add2 = partial(my_func, 2)

print(add2(5))

# we can also apply keyword arguments)
def my_func(x=1, y=2, z=3):
    return x * y + z

somefunc = partial(my_func, y=10)
print(somefunc(x=1, z=5))

**Decorators** are a cute syntax for applying some high-level behavior around a function, such as logging its arguments when it's called. Again, this topic goes beyond the scope of this class, but we just want to touch a useful case. The `functools` module provides a handy decorator - **lru_cache** - which can be used as a decorator to a function to cache return values for recently-seen arguments. Let's use this to speed up our naiive fibonacci example, by defining a function for fibonacci, and using the `@` symbol to tell our function to use this decorator.

In [None]:
from functools import lru_cache

@lru_cache(maxsize=255)
def fib(n):
    if n < 2:
        return 1
    return fib(n - 1) + fib(n - 2)

In [None]:
%%timeit
fib(100)

Effectively, `lru_cache` wraps our function with a look-up table for `{arg: return_value}` to speed up our function. This requires that a function be well-behaved - every time you call the function with a given argument, you always get the same value. Side effects, such as writing to a file, throw a wrench into this.

# A Quick Note on Concurrency

Concurrency is hard. Simultaneous access of multiple actors (threads, processes, etc.) of shared state can lead to race conditions, deadlocks, or starvation, either giving incorrect results or killing a program in its tracks. Managing these effects well is worthy of an entire college course or more, so we'll only look at an easy place to exploit concurrency in our programs.

Python's default implementation effectively prevents concurrency of any kind using the Global Interpreter Lock, or GIL. The GIL is necessary for the current Python implementation since the majority of its memory management is not thread safe. So if we can never execute Python code in parallel, how do we take advantage of multiple cores/threads for any performance gains?

The simple answer is that, in our workflow in PE, a lot of the work that we may want to speed up is actually done by other programs, meaning our code just sits there waiting for a result much of the time. This applies for data extraction commands, like `tc`, as well as IO-bound operations, or any case where our Python code isn't the bottleneck.

Let's look at one very simple example of how to take advantage of simple, "embarassingly parallel" problems, using Python's `concurrent.futures` module. The `multiprocessing` and `concurrent` modules provide a lot of flexible solutions for parallelism, but we have two ojects of interest we'll use - `ThreadPoolExecutor` and `ProcessPoolExecutor`.

These two objects create resource pools that we can exploit as workers for code we write, but more importantly, they implement a `map()` method that lets us feed functions data and naiively parallelize the work. As always, let's look at a simple example:

In [None]:
from concurrent.futures import ThreadPoolExecutor

e = ThreadPoolExecutor(max_workers=7)

def my_func(x):
    
    # let's pretend this function goes and calls a 
    #   complicated data extraction command, like prbext
    
    return x + 5

data = list(range(10))

results = list(e.map(my_func, data))
results

Just as when we used `map()` before, we can simply feed it a function and some data, and the thread pool goes and parallelized the application of the function. If your function spends most of its time waiting, this can provide a quick speedup for data extraction.

One common workflow I personally use is with `prbext`. This command can be very slow if called on an entire lot, to collect die-level probe data, but if you instead call it on each probe log individual (from psums) you can parallelize the extraction and benefit significantly.

So my workflow often resembles the following:

`psums = get_psums_logs()`

`def get_data(log): # go and call prbext`

`results = list(e.map(get_data, psums))`

`final_data = concatenate(results)`

One gotcha: when you write code in a notebook, Python can sometimes have some trouble feeding your code to child threads/processes. It's easier to write your `map`'ed function in a module that your notebook may call - just keep this in mind if you run into any hiccups!


# StringIO

StringIO/BytesIO are useful classes for writing/reading to virtual files. This can be really useful when some kind of API you're given expects a file to write to, but you really want to return a string (to modify before saving).

In [None]:
from io import StringIO

# create a "file" object from a string
s = 'this is my amazing string that should be treated like a file \n this is a thing'
sio = StringIO(s)
sio.read()

# a number of interfaces return bytes rather than a string
from io import BytesIO
some_bytes = BytesIO(b'this is a bytestring').read()
some_string = some_bytes.decode('utf-8')

swrite = StringIO()
swrite.write('this is something i wrote to like it was a file')
swrite.getvalue()

# Regular Expressions

Regex is a common tool in Product Engineering given the vast number of data formats we use and its flexibility for parsing. While PE tends to use Regex from Perl (in addition to in our typical data extraction tools), the Python support is comparable, though it has a bit of a different interface. We'll look at an example, starting first from some Perl run through IPython, then look at the Python equivalents.

"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." - Jamie Zawinski

In [None]:
%%script perl

$line = "An_example_line_of_complex: Text!.";
if ($line =~ m/([A-z_]+):\s(.+)/g) {
    print($1, "\n");
    print($2, "\n");
}

Here's the quickest, simplest example of comparable matching in Python:

In [None]:
#import regex module
import re

line = "An_example_line_of_complex: Text!."

results = re.findall('([A-z_]+):\s(.+)', line)[0]

if results:
    print(results[0])
    print(results[1])

Non-matches simply produce an empty list. Let's see that by looking for all uppercase characters in our example string:

In [None]:
# non-matches simply produce an empty list
re.findall('([A-Z_]+):\s(.+)', line)

### match.search()

It's a common use case to re-use a single regular expression multiple times on new input data. While other languages feature the concept of pre-compiling regular expressions, all of the various Python regular expression functions compile and cache the most recently used expressions, giving the same benefit one would otherwise see with compilation.

Nonetheless, it's preferable when reusing a single regular expressions repeatedly to use the formal `re.compile()` function and call its `.match()`/`.search()` methods on each new set of data.

In [None]:
re_summary = re.compile('([A-z_]+):\s(.+)')
matches = re_summary.search(line)
if matches:
    # group 0 has full match
    print(matches.group(0))
    
    # other groups have other matches:
    print(matches.group(1))
    print(matches.group(2))
else:
    print("no match!")

Our `matches` variable contains the result of the applied regular expression, and we can use the `.group()` method to access matched elements directly, or we can use the `.groups()` method to return a tuple of all match groups.

In [None]:
match = re_summary.search("another_Simple_example_piece: of_text.")
if match:
    print(match.groups())
else:
    print("no matches")

### Named Fields

One of the best practices for working with regular expressions in Python, however, is to use named fields and access the results using the `groupdict()` method to retrieve a `dict` of all your matches. Remember - regular expressions are hard! Save yourself the cost of maintaining ugly code by making your intentions a little more clear!

In [None]:
data_regex = re.compile('^(?P<FIRST>[A-z-]+):\s'+
                         '(?P<OTHER>.+)$')
match = data_regex.search(line)
if match:
    print(match.groupdict())
    print(match.groupdict()['FIRST'])
    print(match.groupdict()['OTHER'])

In [None]:
data_regex = re.compile('^(?P<FIRST>[A-z-]+):\s'+
                         '(?P<OTHER>.+)$')
match = data_regex.search(line)
if match:
    print(match.groupdict())
    print(match.groupdict()['FIRST'])
    print(match.groupdict()['OTHER'])

In [None]:
# compile a long regex, spanning multiple lines
# make it readable!
data_regex = re.compile('^(?P<FIRST_WORD>[A-z]+)_' +
                         '(?P<SECOND_WORD>[A-z]+)_' +
                         '(?P<THIRD_WORD>[A-z]+)_' +
                         '(?P<FOURTH_WORD>[A-z]+)_' +
                         '(?P<FIFTH_WORD>[A-z]+):\s' +
                         '(?P<OTHER>.+)$')
match = data_regex.search(line)
if match:
    print(match.groupdict())

# End of Lecture 1

# Exercises

Remember, the intention of the exercises is to get you writing some code and solving some problems for practice. Try your hand at every problem, but don't spend too long on each if you can't come up with a good solution. Any practice you get is beneficial. Instructor solutions will be distributed the following day.

Try and avoid the urge to look up a solution to your problem, but feel free to look at the documentation for a particular function or use available resources to work toward your own solution. We want you thinking in Python and solving code problems, but we also want you to learn the standard library of functions available to you that make your job easier!

#### Exercise 1

Write some code that that prints the numbers from 1 to 50 (inclusive). But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

Note: this is a frequent interview question for programming candidates; it's a little deceptive in terms of difficult.

In [None]:
# write your code here!


In [None]:
# instructor simple solution
# use range(1, 51) for the real solution
#  just saving space
for i in range(1, 21):
    if i % 15 == 0:
        print('FizzBuzz')
    
    elif i % 3 == 0:
        print('Fizz')
        
    elif i % 5 == 0:
        print('Buzz')
    
    else:
        print(i)
    

#### Exercise 2

Write a function to reverse the order of words in a string (but don't reverse the letters in the words).

Input:

```sentence = 'the quick brown fox'```

Output:

```flipped = 'fox brown quick the'```

##### Hints
- How would you reverse the elements in a list?
- Strings have a `.split()` function that may be of interest
- Strings also have a `.join()` function that may be of interest

In [None]:
sentence = 'the quick brown fox jumped over the lazy dog'

# write your code here!
flipped = ''

# return the output
flipped

In [None]:
# instructor iterative solution

words = sentence.split(' ')
flipped = ''

# iterate over the list of words
for i in range(len(words)):
    flipped += words[len(words) - i - 1] + ' '

# remove the trailing space
flipped = flipped[:-1]
print(flipped)

# -------
# instructor one-line solution
flipped = ' '.join(sentence.split(' ')[::-1])
print(flipped)

# alternative one-line using reversed()
flipped = ' '.join(reversed(sentence.split(' ')))
print(flipped)

#### Exercise 3

Given a comma-separated value (CSV) string, generate a list of dictionaries which represents each record, as shown below:

Input:

```csv = """design,quantity,status
v80a,1400,on hold
v90b,2524,moving
v00h,159,on hold"""```

Output:

```records = [
               {'design': 'v80a', 'quantity': 1400, 'status': 'on hold' },
               {'design': 'v90b', 'quantity': 2524, 'status': 'moving'  },
               {'design': 'v00h', 'quantity': 159,  'status': 'on hold' },
          ]```
        
##### Hints:
- Strings have a `.split()` function that may be of interest
- Line separators are still '\n'

##### Hard modes:
- Try it without using loops (hint: list or dict comprehensions)
- Try it using `zip()` (hint: `dict()` called on a list of tuples is useful)
- Can you do it in under 5 lines? Under 3?

In [None]:
# sample input string
csv = """fid,test,bitcounts
089154K:10:P11:26,tWR,15
089154K:10:P11:26,Refresh,4
089154K:11:P11:26,tWR,19
089154K:11:P11:26,Refresh,161
089154K:12:P11:26,tWR,2509
089154K:12:P11:26,Refresh,97"""

# write your code here!
records = {}

# return result to display in cell below:
records

In [None]:
# instructor verbose solution

# split text into a list of lines
lines = csv.split('\n')

headers = lines[0].split(',')
lines = lines[1:]

records = []

for line in lines:
    
    # split each line by comma
    tokens = line.split(',')
    dd = {}
    
    for i, token in enumerate(tokens):
        dd[ headers[i] ] = token
    
    records.append(dd)
    
records
    

In [None]:
# instructor short solution
lines = [x.split(',') for x in csv.split('\n')]
records = [
    dict(zip(lines[0], x))
        for x in lines[1:]
          ]
records

#### Exercise 4

Write a function that accepts a nested list of strings (between 1 and N levels deep) and returns a flattened version of the string.

Input:

```nested = [
                ['this', 'is', 'a', 'list'],
                'but i am a string',
                [['extra', 'nested'], ['two', 'levels']]
         ]```

Output:

```flattened = ['this', 'is', 'a', 'list',
             'but i am a string', 'extra',
             'nested', 'two', 'levels']```


##### Hints
- Write helper functions if necessary
- Recursion can be a useful tool as well
- Look through the `itertools` doc pages
- You can check if an item is a list or string using the `instanceof()` function
- You can add a new element to a list using `mylist.append(value)`
- You can concatenate two lists simply using addition: `newlist = firstlist + secondlist`

In [None]:
nested = [
                ['this', 'is', 'a', 'list'],
                'but i am a string',
                [['extra', 'nested'], ['two', 'levels']]
         ]

def flatten(x):
    
    # write your code here! 
    
    return 

flatten(nested)

In [None]:
# instructor recursive solution

def flatten(x):

    vals = []
    
    for i in x:
        
        if isinstance(i, str):
            vals.append(i)
        else:
            vals += flatten(i)
    
    return vals

flatten(nested)

#### Exercise 5

Write a function `reducel(func, sequence, optional_initial_value)`, that reduces a sequence to a single value, according to a user-provided, 2-argument function. For example, the `sum()` function could be written `reducer(lambda x, y: x + y, [1,2,3,4], 0)`, or the `product()` function could be written `reducel(lambda x, y: x * y, [1,2,3,4], 1)`.

The function should be applied left-to-right down the sequence. For the sum case, this should be evaluated as:

((((1) + 2) + 3) + 4)

**Take note**: would this be different if we applied our function right-to-left? What functions could you provide that would behave differently? When might you use a left-to-right versus a right-to-left function?

Note: this is a critical feature in *functional programming languages*, and is referred to as a *fold* or *reduce* operation. Python implements the `reduce()` function for this purpose.

Read more: http://en.wikipedia.org/wiki/Fold_(higher-order_function)

##### Test cases

`reducel(lambda x, y: x + y, [1,2,3,4,5])`

`reducel(lambda x, y: x + x * y, [1,2,3,4,5], 1)`

`reducel(lambda x, y: x + ' ' + y, ['words', 'become', 'a', 'sentence'], '')`

##### Hard modes
- Can you make the initial value optional? What do you do if it's optional?
- How could you leverage this pattern to rewrite the `string.split()` function? Hint: write a named function to do some work, and think about what values you might return. What if you use an intermediate return value (like a tuple?)

In [1]:
# using standard library
from functools import reduce
print(reduce(lambda x, y: x + y, [1,2,3,4,5], 0))

# instructor iterative solution
def reducel(func, sequence, *args):
    
    # handle the optional argument
    # if not included, use first val.
    if len(args) == 0:
        accum = sequence[0]
        sequence = sequence[1:]
    else:
        accum = args[0]
    
    for i in range(len(sequence)):
        accum = func(accum, sequence[i])
    return accum

print(reduce(lambda x, y: x + y, [1,2,3,4,5], 0))

# for the hardmode case #2
def split_string(s, x):
    """
    This is a little bit tricky, so pay attention!
    At each step, we see two parameters:
     what we've calculated so far, and the next
     character in the string.
    Our in-progress data is simply a list of words
    but we'll temporarily store an empty word
    to signal that we saw a space.
    """

    if x != ' ' and len(s) == 0:
        return [x]
    
    # create an empty token to hold onto, as long as
    #  the last character we saw wasn't a space
    elif x == ' ' and len(s) > 0 and len(s[-1]) > 0:
        return s + ['']
    
    # what if our string starts with a space?
    #  then start with an empty token
    elif x == ' ' and len(s) == 0:
        return ['']
    
    # we saw an empty token; add our current character to it
    elif s[-1] == '':
        return s[:-1] + [x]
    
    # we see a valid character and have a partial string
    else:
        return s[:-1] + [s[-1] + x]
    

split = reduce(split_string, 'this is  my string', [])
recombined = reduce(lambda x, y: x + ' ' + y, split)

# we've just written split and join as reduce functions
split, recombined

15
15


(['this', 'is', ' my', 'string'], 'this is  my string')

In [2]:
reducel(lambda x, y: x + x * y, [1,2,3,4,5], 1)

720

# "Good" Code

We're not software engineers, but we have to use our own code, and that means reading code we've written. Follow this short set of best practices for coding for the purposes of this class and exercises:

* Give variables sane, expressive names
* Write lots of comments!
* But comment on why, not how, unless necessary
* Never write "tricky" code!
* Regex should be commented just like any other code
* **DRY** (Don't Repeat Yourself) - Decompose things into functions whenever relevant
* **YAGNI** (You Ain't Gonna Need It) - Don't abstract things unless you really should

**Excerpts From the Zen of Python:**
* Beautiful is better than ugly.
* Explicit is better than implicit.
* Simple is better than complex.
* Complex is better than complicated.
* Flat is better than nested.
* Readability counts.
* In the face of ambiguity, refuse the temptation to guess.
* There should be one-- and preferably only one --obvious way to do it.
* If the implementation is hard to explain, it's a bad idea.
* If the implementation is easy to explain, it may be a good idea.