# Agenda

1. Sorting + passing functions (maybe even `lambda`)
2. Modules + packages
3. boto3 -- working with AWS

# Sorting 

There are many times when we might want to sort our data! In Python, lists have a `sort` method that actually changes the list, and sorts its elements from lowest to highest. (You can indicate that you want to sort in reverse order, if you want.)

Don't use this method.  Don't sort things with `list.sort`!  There are several reasons:

1. This changes the list itself
2. It only works on lists, and you often want to sort other types of data.

We can instead use the `sorted` builtin function. It's *not* a method, but is a function that takes any iterable as an argument.  It returns a list of the input's elements, sorted from lowest to highest.

In [2]:
# let's create a list of random integers

import random
random.seed(0)   # reset the random-number generator to a known state

numbers = [random.randint(-50, 50)
           for i in range(10)]

numbers

[-1, 47, 3, -45, -17, 15, 12, 1, 50, -12]

In [4]:
# let's sort our numbers from lowest to highest, using sorted

sorted(numbers)

[-45, -17, -12, -1, 1, 3, 12, 15, 47, 50]

# Things to notice/know about `sorted`

1. It assumes that all of the values in the input argument are comparable in general, and to one another.  Integers, floats, strings, lists, and tuples are all comparable. You can use `<` and `>` on any of them.  But you cannot compare integers with strings, or strings with lists.
2. If you're dealing with strings, lists, or tuples, then `sorted` compares index 0 in both. If they're the same, it checks index 1. If those are the same, it checks index 2. This continues until (a) they're found to be equal, (b) one obviously comes first, or (c) one is shorter.
3. `sorted` returns a new list, and doesn't modify its input argument.

If you're curious, `sorted` uses TimSort, written by ... Tim.  It combines heapsort and insertion sort, and generally works well if (a) you have big objects and (b) some of the values are already in order.

In [5]:
# if I want to sort in reverse order, I can pass the keyword argument reverse=True

sorted(numbers, reverse=True)

[50, 47, 15, 12, 3, 1, -1, -12, -17, -45]

In [6]:
# what if I want to sort these numbers by absolute value? (That is, by positive value?)

# this does *not* mean that I want to convert all of the numbers to positive
# I want to keep the original values, but sort them as if they were all positive



# TimSort compares two elements at a time

It doesn't compare everything against everything. But it does check two elements, and compares one with the other:

    a < b
    
Given two elements, `a` and `b`, TimSort checks whether `a` is lower. If so, then it returns `a`. If not, then it returns `b`.

What we want to do is this:

    abs(a) < abs(b)
    
We want to run `abs`, Python's builtin function for absolute values. This will return a new positive value. If we can compare the positive values, then we'll have things sorted the way we want.

More generally, we might want to do this:

    func(a) < func(b)
    
That is, take a function -- any function! -- and apply it to each value in our comparison, and then find out which comes first.

`sorted` supports this, thanks to our ability to pass functions as arguments to functions.

Remember that functions in Python are objects, just like integers, strings, lists, etc. Just as we can pass a list as an argument to a function, we can pass a function as an argument to a function. Then the function we call invokes the function we passed.  We won't be invoking `abs` ourselves. Rather, `sorted` will do it for us.

We do this by passing a function as the argument to the `key` keyword argument.

Note that we don't want to call our key function. Rather, we just want to pass it.

In [7]:
sorted(numbers)

[-45, -17, -12, -1, 1, 3, 12, 15, 47, 50]

In [8]:
sorted(numbers, key=abs)   # tell TimSort to apply abs to each number when comparing...

[-1, 1, 3, 12, -12, 15, -17, -45, 47, 50]

TimSort is a "stable sort," meaning that if two values are seen to be equal, they will be put in our output in their original order. 

In [9]:
numbers

[-1, 47, 3, -45, -17, 15, 12, 1, 50, -12]

# What can we pass as a key function?

1. A function that takes exactly one argument, and
2. Returns a comparable value which TimSort can use.

The function doesn't need to return the same type as it got in its input.

In [11]:
words = 'This is a bunch of words for an example sentence in my Python course'.split()

words

['This',
 'is',
 'a',
 'bunch',
 'of',
 'words',
 'for',
 'an',
 'example',
 'sentence',
 'in',
 'my',
 'Python',
 'course']

In [12]:
# sorted sorts strings according to their Unicode values (very similar to ASCII values)
# capital letters all come before lowercase letters

sorted(words)

['Python',
 'This',
 'a',
 'an',
 'bunch',
 'course',
 'example',
 'for',
 'in',
 'is',
 'my',
 'of',
 'sentence',
 'words']

In [13]:
# we can pass str.lower as our key function
# the comparison will be made between lowercase strings, not the original strings

sorted(words, key=str.lower)   # s.lower() == str.lower(s)

['a',
 'an',
 'bunch',
 'course',
 'example',
 'for',
 'in',
 'is',
 'my',
 'of',
 'Python',
 'sentence',
 'This',
 'words']

In [14]:
def by_very_loud_lower(one_word):    # I like to call my key functions by_ something, for readability
    print(f'Now looking at {one_word}')
    return one_word.lower()

sorted(words, key=by_very_loud_lower)

Now looking at This
Now looking at is
Now looking at a
Now looking at bunch
Now looking at of
Now looking at words
Now looking at for
Now looking at an
Now looking at example
Now looking at sentence
Now looking at in
Now looking at my
Now looking at Python
Now looking at course


['a',
 'an',
 'bunch',
 'course',
 'example',
 'for',
 'in',
 'is',
 'my',
 'of',
 'Python',
 'sentence',
 'This',
 'words']

# Exercise: Sorting

1. Define a list containing 10+ strings (words).
2. Sort the list by word length, from longest to shortest.
3. Sort the list by the number of vowels, from 0 vowels to the maximum.

In [15]:
words = 'This is yet another attempt to define a ridiculous sentence that is wonderful and extensive and full of vowels'.split()

words

['This',
 'is',
 'yet',
 'another',
 'attempt',
 'to',
 'define',
 'a',
 'ridiculous',
 'sentence',
 'that',
 'is',
 'wonderful',
 'and',
 'extensive',
 'and',
 'full',
 'of',
 'vowels']

In [16]:
# basic sort -- will be alphabetical
sorted(words)

['This',
 'a',
 'and',
 'and',
 'another',
 'attempt',
 'define',
 'extensive',
 'full',
 'is',
 'is',
 'of',
 'ridiculous',
 'sentence',
 'that',
 'to',
 'vowels',
 'wonderful',
 'yet']

In [19]:
# sort by length -- meaning, we want to run len() on each word, and sort them by that result

sorted(words, key=len)

['a',
 'is',
 'to',
 'is',
 'of',
 'yet',
 'and',
 'and',
 'This',
 'that',
 'full',
 'define',
 'vowels',
 'another',
 'attempt',
 'sentence',
 'wonderful',
 'extensive',
 'ridiculous']

In [20]:
sorted(words, key=len, reverse=True)

['ridiculous',
 'wonderful',
 'extensive',
 'sentence',
 'another',
 'attempt',
 'define',
 'vowels',
 'This',
 'that',
 'full',
 'yet',
 'and',
 'and',
 'is',
 'to',
 'is',
 'of',
 'a']

In [22]:
# sort by the number of vowels
# I'll write a function!

def count_vowels(one_word):
    total = 0
    
    for one_letter in one_word.lower():
        if one_letter in 'aeiou':
            total += 1
            
    print(f'{one_word}: {total}')
            
    return total

sorted(words, key=count_vowels)

This: 1
is: 1
yet: 1
another: 3
attempt: 2
to: 1
define: 3
a: 1
ridiculous: 5
sentence: 3
that: 1
is: 1
wonderful: 3
and: 1
extensive: 4
and: 1
full: 1
of: 1
vowels: 2


['This',
 'is',
 'yet',
 'to',
 'a',
 'that',
 'is',
 'and',
 'and',
 'full',
 'of',
 'attempt',
 'vowels',
 'another',
 'define',
 'sentence',
 'wonderful',
 'extensive',
 'ridiculous']

In [23]:
def by_count_lowercase_vowels(one_word):
    return ...

In [29]:
mylist = [[2, 4, 6, 8, 10, 50], [3,4,5,6,7,8,9], [10, 20,30], [9, 11, 13, 15],
         [2,3,4,5], [10,11,12,13]]

mylist

[[2, 4, 6, 8, 10, 50],
 [3, 4, 5, 6, 7, 8, 9],
 [10, 20, 30],
 [9, 11, 13, 15],
 [2, 3, 4, 5],
 [10, 11, 12, 13]]

In [30]:
sorted(mylist)

[[2, 3, 4, 5],
 [2, 4, 6, 8, 10, 50],
 [3, 4, 5, 6, 7, 8, 9],
 [9, 11, 13, 15],
 [10, 11, 12, 13],
 [10, 20, 30]]

In [31]:
sorted(mylist, key=sum)    # sort each list by its total

[[2, 3, 4, 5],
 [3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13],
 [9, 11, 13, 15],
 [10, 20, 30],
 [2, 4, 6, 8, 10, 50]]

In [32]:
# let's define "shoes" to be a list of dicts, our shoes from last time

filename = 'shoe-data.txt'

def line_to_dict(s):
    fields = s.strip().split('\t')    
    
    return {'brand': fields[0],
           'color': fields[1],
           'size': fields[2]}

shoes = [line_to_dict(one_line)
         for one_line in open(filename)]

In [33]:
shoes

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'New Balance', 'color': 'pink', 'size': '41'},
 {'brand': 'Nike', 'color': 'white', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '38'},
 {'brand': 'Nike', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '39'},
 {'brand': 'New Balance', 'color': 'black', 'size': '43'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '44'},
 {'brand': 'Nike', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '41'},
 {'brand': '

In [34]:
# sort our list of dicts

sorted(shoes)

TypeError: '<' not supported between instances of 'dict' and 'dict'

In [35]:
{} < {}

TypeError: '<' not supported between instances of 'dict' and 'dict'

In [37]:
# say I want to sort them by brand

def by_brand(shoe_dict):
    return shoe_dict['brand']

sorted(shoes, key=by_brand)

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '35'},
 {'brand': 'Adidas', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '35'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '40'},
 {'brand': 'Adidas', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'black', 'size': '40'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '38'},
 {'brand': 'Adidas', 'color': 'white', 'size': '39'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'col

In [38]:
# say I want to sort them by size

def by_size(shoe_dict):
    return shoe_dict['size']

sorted(shoes, key=by_size)

[{'brand': 'Adidas', 'color': 'pink', 'size': '35'},
 {'brand': 'Nike', 'color': 'black', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '35'},
 {'brand': 'Nike', 'color': 'black', 'size': '35'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '35'},
 {'brand': 'New Balance', 'color': 'white', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '35'},
 {'brand': 'Nike', 'color': 'black', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '36'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Nike', 'color': 'orange', 'size': '36'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '37'},
 {'brand': 'Nike', 'color': 'white', 'size': '37'},
 {'brand': 'Nike', 'color': 'white', 'size': '37'},
 {'brand': 'Adidas', 'color': 'o

In [39]:
# say I want to sort them first by brand, then by size

def by_brand_and_size(shoe_dict):
    return shoe_dict['brand'], shoe_dict['size']    # here, I return a tuple

sorted(shoes, key=by_brand_and_size)

[{'brand': 'Adidas', 'color': 'pink', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '35'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '38'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'white', 'size': '39'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '39'},
 {'brand': 'Adidas', 'color

In [41]:
# there must be a more standard way to solve this problem if it's general..
# we can use the "operator" module, and specifically operator.itemgetter

# when you call operator.itemgetter, it returns a function that operates as []

# meaning:

import operator

x = operator.itemgetter(3)   # x is a function that runs [3] on any data

x('abcde')   # same as 'abcde'[3]

'd'

In [42]:
# want to sort by brand? We don't need to define our own, new function!

sorted(shoes, key=operator.itemgetter('brand'))

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '35'},
 {'brand': 'Adidas', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '35'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '40'},
 {'brand': 'Adidas', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'black', 'size': '40'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '38'},
 {'brand': 'Adidas', 'color': 'white', 'size': '39'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'col

In [43]:
# let's do it by size

sorted(shoes, key=operator.itemgetter('size'))

[{'brand': 'Adidas', 'color': 'pink', 'size': '35'},
 {'brand': 'Nike', 'color': 'black', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '35'},
 {'brand': 'Nike', 'color': 'black', 'size': '35'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '35'},
 {'brand': 'New Balance', 'color': 'white', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '35'},
 {'brand': 'Nike', 'color': 'black', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Nike', 'color': 'pink', 'size': '36'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Nike', 'color': 'orange', 'size': '36'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'New Balance', 'color': 'orange', 'size': '37'},
 {'brand': 'Nike', 'color': 'white', 'size': '37'},
 {'brand': 'Nike', 'color': 'white', 'size': '37'},
 {'brand': 'Adidas', 'color': 'o

In [44]:
# brand and size?

sorted(shoes, key=operator.itemgetter('brand', 'size'))    # itemgetter takes *args!

[{'brand': 'Adidas', 'color': 'pink', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '35'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '35'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '38'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'white', 'size': '39'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '39'},
 {'brand': 'Adidas', 'color

# What the heck is `lambda`?

When we use `def` to define a function, we're doing two things:

1. Creating a function object
2. Assigning that function object to a variable.

`lambda` is a special keyword in Python that does the first of these (i.e., creating a function object) but not the second (i.e., assigning it).

That is, it allows us to create an *anonymous function*. Why would we want that? Typically, for sorting...

`lambda` has a bunch of restrictions:

1. No statements -- no `for`, no `if`, no assignment. (You can have list comprehensions and some other weird syntax, but only one expression.)
2. Only one line.
3. No `return` -- the expression is automatically returned.

In [45]:
# the syntax of lambda is:

lambda shoe_dict: shoe_dict['brand']

<function __main__.<lambda>(shoe_dict)>

In [46]:
# if we want to use a lambda, we typically pass it as an argument to another function
# this way, we don't have to create a new function with a name

# this sorts by brand, because our lambda does the same thing as by_brand and operator.itemgetter('brand')

sorted(shoes, key=lambda shoe_dict: shoe_dict['brand'])

[{'brand': 'Adidas', 'color': 'orange', 'size': '43'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '44'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'color': 'black', 'size': '38'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '36'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '36'},
 {'brand': 'Adidas', 'color': 'pink', 'size': '35'},
 {'brand': 'Adidas', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'white', 'size': '35'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '40'},
 {'brand': 'Adidas', 'color': 'black', 'size': '41'},
 {'brand': 'Adidas', 'color': 'black', 'size': '39'},
 {'brand': 'Adidas', 'color': 'black', 'size': '40'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '38'},
 {'brand': 'Adidas', 'color': 'white', 'size': '39'},
 {'brand': 'Adidas', 'color': 'orange', 'size': '37'},
 {'brand': 'Adidas', 'col

In [47]:
vowels = 'aeiouAEIOU'
sorted_by_vowels = sorted(words, 
                          key=lambda word: sum(1 for letter in word if letter in vowels))

# Modules, packages, etc.

DRY -- don't repeat yourself

1. If you have several nearly identical lines of code in a row, use a loop instead.
