# Section 2

* [Helper Functions](#Helper-Functions)
* [Sophisticated Math Syntax](#Sophisticated-Math-Syntax)
* [Lists](#Lists)
    * [String `.split()` and `.join()`](#.split()-and-.join())
    * [`range()`](#range())
        * [`range()` function arguments](#range()-function-arguments)
    * [List Slicing](#List-Slicing)
    * [Other List/Sequence Operations](#Other-List/Sequence-operations)
        * [The `in` keyword](#The-in-keyword)
        * [The `not in` keywords](#The-not-in-keywords)
        * [`min()` and `max()` functions](#min()-and-max()-functions)
        * [`.index()` method](#.index()-method)
        * [`.count()` method](#.count()-method)
        * [`sorted()` function](#sorted()-function)
        * [`.sort()` method](#.sort()-method)
    * [Adding Elements to a List](#Adding-Elements-to-a-List)
* [For..in](#For..in)
* [List Comprehensions](#List-Comprehensions)
    * [Filtering List Comprehensions](#Filtering-List-Comprehensions)
* [2D Lists (and beyond)](#2D-Lists-(and-beyond))
    * [2-D Examples](#2-D-Examples)
* [Tuples](#Tuples)
    * [Tuple Comprehensions](#Tuple-Comprehensions)
    * [Destructuring](#Destructuring)
    * [`enumerate()`](#enumerate())
* [Dictionaries](#Dictionaries)
    * [Creating A Dictionary](#Creating-A-Dictionary)
    * [The Wizard Of Oz](#The-Wizard-Of-Oz)
    * [Stupid Dictionary Tricks](#Stupid-Dictionary-Tricks)
    * [Dictionaries From Comprehensions](#Dictionaries-From-Comprehensions)
    * [Frankenstein Exercises](#Frankenstein-Exercises)
* [Old Information, New Perspective](#Old-Information,-New-Perspective)
    * [`.splitlines()`](#.splitlines())
    * [`.readlines()`](#.readlines())
    * [The `with..as:` Block](#The-with..as:-Block)
    * [Functios: Named Arguments And Default Values](#Functions:-Named-Arguments-And-Default-Values)
* [Lambda Expressions](#Lambda-Expressions)
    * [Structure Of A Function](#Structure-Of-A-Function)
    * [`.sort()` And `sorted()`](#.sort()-And-sorted())
    * [`map()`](#map())
    * [`filter()`](#filter())
    * [Perspective Thoughts On `map()` And `filter()`](#Perspective-Thoughts-On-map()-And-filter())
* [Zipping And Unzipping Lists](#Zipping-And-Unzipping-Lists)
    * [The Elusive Unzip](#The-Elusive-Unzip)
* [Sets](#Sets)
    * [The Important Parts](#The-Important-Parts)
* [Baby Names Dataset](#Baby-Names-Dataset)

# Helper Functions

The following cell contains function definitions that will be needed at some point in this Notebook.

All that you need to do is to run the cell and then proceed through the Notebook like normal.

In [1]:
def loadBook(url, filename, encoding="utf-8"):
    '''
    The function will download a file from the specified `url`
    and return its contents as a string.
    
    The file will be cached on the disk in the current directory
    using the provided `filename`.
    
    Arguments:
    url      - The URL of the file to be downloaded.
    filename - The name used to cache the file locally.
    encoding - The encoding of the file when reading.
               Defaults to "utf-8"
    '''

    try:
        # Assume the file exists.  Try to open it.
        file = open(filename, 'r', encoding=encoding)
    except:
        # The file open() failed.
        try:
            # Try to download the file.
            from urllib.request import urlopen
            response = urlopen(url)
            text = response.read().decode(encoding, errors='ignore')

            # Write the text of the book to a file.
            file = open(filename, 'w', encoding=encoding)
            file.write(text)
            file.close()
            
            # Open the file on disk for reading.
            file = open(filename, 'r', encoding=encoding)

        except:
            # Something else went wrong.  Do minor cleanup,
            # then raise another Exception, which will cause
            # the script to stop running (with an error).
            if file:
                file.close()
            raise Exception("Could not load file.")

    # Read the contents of the file from the disk.
    text = file.read()
    file.close()
    return text


In [2]:
#########################################################
###    Run This Code To Download And Load The Data    ###
#########################################################

from zipfile import ZipFile
def readnames(path):
    '''
    This function creates a list of baby name data, which
    is the dataset on baby names from the Social Security
    Administration.  Data is grouped by state, sex, year,
    name, and then finally the number of babies born that
    year with that name.  Names that occurred less than 5
    times that year in that state are not included.
    The code also saves a copy of the dataset in a file on
    your computer.  The next time that you run the script,
    it will first try to load the dataset directly from your
    computer.  If it doesn't find it, then it will go ahead
    and try to download the file.
    
    Arguments:
    path - The path to the babynames zip file
    
    Returns:
    A list of tuples containing the data in the form of:
      (state, sex, year, name, count)
    '''

    babyNamesRaw = []
    
    # Open the .zip file.  `babyzip` is the variable that
    # represents the .zip file.
    with ZipFile(path) as babyzip:
        
        # .zip files are a compressed container for
        # potentially many other files.
        # Iterate through the files and process the ones
        # that represent state data.
        for filename in [filename for filename in babyzip.namelist() if ".TXT" in filename]:
            print("Loading data for " + (filename[:2]))
            
            # Open the file from the .zip file.
            # The file is opened in RAM, so no additional
            # disk space is required.
            with babyzip.open(filename) as state:
                
                # Iterate through each line in the file
                # and append to the `babyNamesRaw` list.
                for line in [line for line in state.read().decode().splitlines() if line != '']:
                    
                    # Convert numerical data (year and count).
                    state, sex, year, name, count = line.split(',')
                    babyNamesRaw.append((state, sex, int(year), name, int(count)))

    return babyNamesRaw


zipname = "babynames.zip"
try:
    # Assume the file exists.  Try to open it.
    babyList = readnames(zipname)
except:
    # The file open() failed.
    try:
        # Try to download the file.
        from urllib.request import urlopen
        print("Fetching the file from the interwebs...")
        print("NOTE: This will take a while. The file is around 20 megabytes!  And that's compressed, too!")
        response = urlopen("https://www.ssa.gov/oact/babynames/state/namesbystate.zip")

        # Write the downloaded data to a file.
        with open(zipname, 'wb') as babyfile:
            babyfile.write(response.read())
        babyList = readnames(zipname)
    except:
        # Something else went wrong.
        # Raise another Exception, which will cause
        # the script to stop running (with an error).
        raise Exception("Could not load file.")

print(f'Baby Names data is loaded.  There are {len(babyList)} records.')


Fetching the file from the interwebs...
NOTE: This will take a while. The file is around 20 megabytes!  And that's compressed, too!
Loading data for AK
Loading data for AL
Loading data for AR
Loading data for AZ
Loading data for CA
Loading data for CO
Loading data for CT
Loading data for DC
Loading data for DE
Loading data for FL
Loading data for GA
Loading data for HI
Loading data for IA
Loading data for ID
Loading data for IL
Loading data for IN
Loading data for KS
Loading data for KY
Loading data for LA
Loading data for MA
Loading data for MD
Loading data for ME
Loading data for MI
Loading data for MN
Loading data for MO
Loading data for MS
Loading data for MT
Loading data for NC
Loading data for ND
Loading data for NE
Loading data for NH
Loading data for NJ
Loading data for NM
Loading data for NV
Loading data for NY
Loading data for OH
Loading data for OK
Loading data for OR
Loading data for PA
Loading data for RI
Loading data for SC
Loading data for SD
Loading data for TN
Loading 

# Sophisticated Math Syntax

As stated many times in the class, "programmers are lazy".  And, since programmers are lazy, they come up with shortcuts for any repetitive task.

As you have already seen, it is very common for us to write lines such as:

`index = index + 1`

This is an example of a line of code that is so often used that programmers decided to shorten it.  This same line can be written as:

`index += 1`

* Note that the **variable name** only needs to be written once.
* Note that the **mathematical operation** appears before the equal sign.

This advanced math syntax exists for all 6 mathematical operators: `+=` `-=` `*=` `/=` `//=` `%=`

And, of course, they can be used anywhere that the standard mathematical operators can be used.

In [None]:
index = 1
print(index)

index = index * 3
print(index)

index *= 4
print(index)

In [None]:
message = "Ho"
print(message)

message += "Ho" # message = message + "Ho"
print(message)

message *= 3 # message = message * 3
print(message)
print("Wait... it's not Christmas!")

We wil use the `+=` syntax from now on in our examples.

# Lists

**Lists** ([documentation](https://docs.python.org/3.6/library/stdtypes.html#list)) are one of the things that make Python *very* powerful (expressive).

Informally, you can think of a list to be similar to a shopping list.  It is just a bunch of "things" grouped together.

More formally, **list** is a type of **Collection** in Python.  A list can contain zero or more other variables, and these variables are kept in a deterministic (*e.g.*, predictable) order.  The first ordering of the **elements** of the list is the order shown when the list is created.

For example, a list could be a group of numbers (int or float) or a group of student names(strings).

Lists are created by putting a series of **expressions**, separated by commas, all inside square brackets (*i.e.*, **[]**).

In [None]:
# Creating a simple list:

myList = [1, 2, 3]

i = 0
while i < len(myList):
    print(myList[i])
    i += 1

print(myList)
print()

print(myList[0])
print(myList[1])
print(myList[2])

The preceding example tells us several things about lists:

* A list is represented by a **variable**.  In this example, we gave it the variable name `myList`
* We can use the **`len()` function** to tell us how many items are in the list.
* We can access individual elements in the list using the **bracket (or, square brace) notation**, just like strings.
* We can print out the entire list using a **`print()`** statement.
* Lists use **0-based counting** for their element index.

In [None]:
# What does this tell us about lists?
myList = [1, 3, -8, 42.5, "a"]
print(myList)
print('=' * 30)
counter = 0
while counter < len(myList):
    print(f"{counter}: {myList[counter]:>8}     {type(myList[counter])}")
    counter += 1

The preceding example demonstrates:

* Lists can have mixed types.  In formal terms, their elements can be **heterogeneous**.
* Indexing of a list can appear inside **format strings**.

In [None]:
myList = [1, 3, -8, 42.5, "a"]
print(f"{myList}")
print(type(myList))

Evidently, we can even put the entire list variable inside of a **format string**!

Also, notice that the **type** of `myList` is `list`.  So a list is a formal **type** in the Python language.

In [None]:
# In this instance, x is all multiples of 2, from 2 to 20
x = [2,4,6,8,10,12,14,16,18,20]
z = 3
index = 0
while index < len(x):
    if (x[index] % z) == 0:
        print(f"{x[index]} is a multiple of {z} and a multiple of 2")
    index += 1

## .split() and .join()

Strings have a [`.split()` method](https://docs.python.org/3.6/library/stdtypes.html#str.split) which will split the string into a list of smaller strings.  This operation is often called **tokenization**, where each of the smaller strings is called a **token**

The default behavior of `.split()` is to split on whitespace, however you can split on any arbitrary value.

`.join()` is a bit weirder.  It is a **method** on a string.  The easiest way to think about the method is that it takes 2 parts: a **list** of words, and a **string** to put between the words.  Because `.join()` is a method on a string, the string is the **glue** that will be put between the elements of the list.

In [None]:
message = "The quick brown fox jumps over the lazy dog."

print(message)
print()

print(message.split())
print()

print(message.split()[::-1])
print()

# Here, the space character (' ') is the glue that will
# be used to put all of the words into a single string.
backwardsString = ' '.join(message.split()[::-1])
print(backwardsString)

In [None]:
word = 'abracadabra'
print(word.split())
print()

print(word.split('a'))
print()

print(word.split('ab'))
print()

print('a'.join(word.split('a')))

The preceding example demonstrates a couple of important points about the `.split()` method:

* `.split()` *always* returns a **list**.  Even if that list only contains the original string.
* The character that is used to split the string apart will, in turn, **not be included** in the list.
* You can split a **sequence of characters**, but the split will occur only if that exact character sequence appears.
* If you split apart a string, and then join it back together by the same character sequence, you should end up with a string of **equivalent value**.

## range()

The [`range()` function](https://docs.python.org/3.6/library/functions.html#func-range) returns an object that can produce a sequence of integers.

In [None]:
help(range)

`range()` is not the same as a list.  It is **not** a collection of numbers.

In [None]:
# Unfortunately, range() is not a list, so we can't see its contents.
print(range(10))

`range()`, however, can **generate** a list!

In [None]:
# We can convert a range into a list
print(list(range(10)))

In [None]:
# We can also use this shorthand (as of Python 3.5)
# It is sometimes called the "splat" operator, but that is not
# an official name.  It is more generally called the
# "asterisk" operator.
# It has the following format: [* <some sequence> ]
print([*range(10)])

We will use this **spat operator** quite often!

### range() function arguments

The `range()` function can take one, two, or three arguments.

If only **one** argument is given, then it is interpreted as the `stop` value.  The result is a generator from the numbers 0 up to but not including the `stop` number.

If only **two** arguments are given, then they are interpreted as the `start` and `stop` values.  The result is a generator of numbers from the `start` variable, and up to but not including the `stop` variable.

If all **three** arguments are given, then they are interpreted as the `start`, `stop`, and `skip` values.  The `start` and `stop` values are used as described above, and the `skip` value is used as the increment of the values (it's default is 1).

In [None]:
print([*range(10)])

In [None]:
print([*range(5,10)])

In [None]:
print([*range(-5,10,2)])

In [None]:
# We can have a decreasing range if we use a negative skip.
print([*range(10,-8, -1)])

In [None]:
# Just another example of a negative skip.
print([*range(10,-8,-3)])

In [None]:
x = [*range(15,22)]
print("x        =", x)
print("x[0]     =", x[0])
print("x[1]     =", x[1])
print("x[2]     =", x[2])
print("x[3]     =", x[3])
print("x[1]+x[2]=", x[1] + x[2])
print("x[-1]     =", x[-1])


## List Slicing

Just like strings, we can slice lists, using the same bracket notation (*e.g.*, **[from:to:skip]**) to get a part of a list.  The behavior is almost identical.

In [None]:
# The general format is [<from>:<to>]
x = [*range(15,22)]
print("x        =", x)
print("x[3:5]   =", x[3:5])

In [None]:
# If <to> is omitted, then it will return up to the end of the list.
print("x        =", x)
print("x[3:]    =", x[3:])

In [None]:
# If <from> is omitted, then it will return beginning with the
# first element of the list.
print("x        =", x)
print("x[:3]    =", x[:3])
print("x[3:5]   =", x[3:5])

In [None]:
# A negative index means to start counting from the back of the list.
print("x        =", x)
print("x[-1]    =", x[-1])
print("x[-3]    =", x[-3])
print("x[-3:]   =", x[-3:])
print("x[:-3]   =", x[:-3])
print("x[-5:-3] =", x[-5:-3])

In [None]:
# Compare the slicing behavior to that of strings:
x = "FizzBuzz!!!"
print("x      =", x)
print("x[3:7] =", x[3:7])

In [None]:
# Lists can be added together
# This will create a new list containing everything
# from the first list, followed by everything from
# the second list.
# Notice that this behavior is similar to strings, too!
print([5,6,7] + [1,2,3])

In [None]:
# What do you think will happen if we multiply a list?
# Why is it that this behavior might be expected?
print([5,6,7] * 3)

In [None]:
# Notice the philosophy that Python uses for the splicing
# boundaries.  `x[:3]` will return elements 0,1,2, but not 3.
# `x[3:6]` will return 3,4,5.  In this way, you can use
# the same boundary as the end of one list, and the start of
# another (in this case, the boundary is `3`), and the
# `x[3]` will not be duplicated.
#
# That is why Python behaves in the way that it does.
x = [*range(10)]
print(x)
print(x[:3])
print(x[3:6])
print(x[:3] + x[3:6])

In [None]:
# Of course, lists can be combined in any way that you want.
x = [*range(10)]
print(x)
print(x[:3] + x[-2:])

In [None]:
# How would you describe this behavior?

x = [*range(10)]

x = x[-1:] + x[:-1]

print(x)

In [None]:
x = [*range(142)]

x = x[len(x)//2:] + x[:len(x)//2]

print(x)

## Other List/Sequence operations

As you can see, a **string** is a sequence of **characters**, and a **list** is a sequence of **elements**.

Python makes it very easy to manipulate and ask questions about sequences and collections, whether the type is a string, list, or something that we haven't learned about yet.  This "consistency" is one reason for Python's popularity.

While not an exhaustive set of examples, here are a few more things that you might like to be aware of.

See https://docs.python.org/3/library/stdtypes.html#common-sequence-operations for more operations on lists/sequences!

## The `in` keyword

Expressions using `in` are evaluated as returning a boolean `True` or `False` value.

In [None]:
# the "in" keyword (strings)

word = "abracadabra"

print("a" in word)
print("dab" in word)

letter = ord("a")
while letter <= ord("z"):
    if chr(letter) in word:
        print(f"The letter \"{chr(letter)}\" is in \"{word}\"")
    else:
        print(f"The letter \"{chr(letter)}\" is NOT in \"{word}\"")
    letter += 1

In [None]:
# the "in" keyword (lists)

nonsense = ["foo", "bar", "baz"]

testList = ["The", "quick", "foo", "dog"]

index = 0
while index < len(testList):
    if testList[index] in nonsense:
        print(f'"{testList[index]}" is nonsense!')
    else:
        print(f'"{testList[index]}" seems alright to me.')
    index += 1

In [None]:
message = "The quick brown fox jumps over the lazy dog."
myList = message.split()

if 'fox' in myList:
    print('The list contains a fox.  What does the fox say?')


### The `not in` keywords

There is also a `not in` syntax, and it does exactly what you would expect:

In [None]:
word = "abracadabra"

print("a" not in word)
print(not ("a" in word))
print("dab" not in word)

letter = ord("a")
while letter <= ord("z"):
    if chr(letter) not in word:
        print(f"The letter \"{chr(letter)}\" is NOT in \"{word}\"")
    else:
        print(f"The letter \"{chr(letter)}\" is in \"{word}\"")
    letter += 1

### `min()` and `max()` functions

Return the smallest and the largest elements of the sequence/container.

In [None]:
# min() and max() functions

myList = [1,2,3,4,5]

print(min(myList))
print(max(myList))

In [None]:
word = "abracadabra"

print(min(word))
print(max(word))

In [None]:
# BEWARE!!!
# min() and max() use ASCII values!

word = "Foo"
print(word)
print("min is:", min(word))
print("max is:", max(word))
print()

word = "Zoom"
print(word)
print("min is:", min(word))
print("max is:", max(word))
print()

# one solution:
word = "Zoom"
print(word)
print("min is:", min(word.lower()))
print("max is:", max(word.lower()))

### `.index()` method

Finds a value in the sequence/collection and returns it's index.

In [None]:
help(str.index)

In [None]:
# the `.index()` method will tell you where the value exists in the sequence
message = "The quick brown fox jumps over the lazy dog."
print(message)

i = message.index('fox')
print(message[i:])

In [None]:
message = "The quick brown fox jumps over the lazy dog."
myList = message.split()

print(myList)
print(myList.index('fox'))
print()

# This will error:
#print(myList.index('Fox'))

# potential fix:
if 'Fox' in myList:
    print(myList.index('Fox'))
else:
    print('"Fox" is not in the list.')


In [None]:
message = "The quick brown fox jumps over the lazy dog."
myList = message.upper().split()

print(myList)
print(myList.index("THE"))

### `.count()` method

Used to count the number of times that a value appears in the sequence/container.

In [None]:
# .count() method (strings)

word = "abracadabra"

print(word.count("a"))
print(word.count("ab"))
print(word.count("dog"))


In [None]:
# .count() method (lists)

message = "The quick brown fox jumps over the lazy dog."

myList = message.split()
print(myList)
print(myList.count("the"))
print()

myList = message.upper().split()
print(myList)
print(myList.count("THE"))


### `sorted()` function

Returns a new copy of the sequence/container in which all values are sorted by value.  The original sequence/container is not altered in any way.

**Warning: Uses ASCII value!**

In [None]:
message = "The quick brown fox jumps over the lazy dog."

myList = message.split()
print(myList)
print(sorted(myList))
print(myList)

In [None]:
word = "abracadabra"
print(word)
print(sorted(word))
print(''.join(sorted(word)))
print(word)

### `.sort()` method

Does not return any value.  Alters the original sequence/container by reordering elements in ascending order.

**Warning: Uses ASCII values!!**

In [None]:
message = "The quick brown fox jumps over the lazy dog."

myList = message.split()
print(myList)
print(myList.sort())
print(myList)

In [None]:
word = "abracadabra"
print(word)
print(word.sorted())
print(word)

Oops!  Strings don't have a `.sort()` method!

In [None]:
# There is a workaround for this functionality, but we
# haven't learned enough yet to understand it properly.
# You can see here, though, that the solution will use
# the `sorted()` function.

# Don't worry about the rest, we will cover it in the
# next few lectures.  You will find that it is quite easy!
word = "abracadabra"
print(word)
word = ''.join(sorted([char for char in word]))
print(word)

## Adding Elements to a List

You can add elements to the end of a list.  And, because programmers like to use big words to show off how smart they are, this process is called **appending** to a list.

In [None]:
# Example of the .append() function on a list
x = []
print(x)
x.append(42)
print(x)
x.append(-28)
print(x)

In [None]:
# Other ways to append to a list
x = []
print(x)
x = x + [42]
print(x)
x += [-28]
print(x)

# For..in

`for..in:` is a looping structure that is very popular in Python.  [Official documentation here.](https://docs.python.org/3/tutorial/controlflow.html#for-statements)

`for..in:` is considered a graceful way to loop through any sequence container (lists, string) or generator (`range()`) without having to use a counter (like we do with the `while:` loop).

The format for this loop is:

```
for <variable> in <sequence>:
    # code block here
```

In this instance, `<variable>` will be assigned to the next item in the sequence until all items in the sequence have been represented.

In [None]:
# old way
message = "The quick brown fox jumps over the lazy dog."
myList = message.split()

index = 0
while index < len(myList):
    print(myList[index])
    index += 1

In [None]:
# old way
# In this example, I'm adding a variable "word" just to make it clearer
message = "The quick brown fox jumps over the lazy dog."
myList = message.split()

index = 0
while index < len(myList):
    word = myList[index]
    print(word)
    index += 1

In [None]:
# new way
message = "The quick brown fox jumps over the lazy dog."
myList = message.split()

for word in myList:
    print(word)


In [None]:
# With a string:
for letter in "abracadabra":
    print(letter)

In [None]:
# with the range() generator:
for foo in range(11):
    print(foo)

In [None]:
# Equivalent while: statement
foo = 0
while foo < 11:
    print(foo)
    foo = foo + 1

In [None]:
listOfWords = "Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say they were perfectly normal, thank you very much.".split()
print(listOfWords)
print("==================")
for word in listOfWords:
    print(word)

In [None]:
for num in range(10,-1,-2):
    print(num)

# List Comprehensions

Python is powerful in how it can quickly and concisely handle lists.  It has special syntax to build lists on the fly.  These are called **List Comprehensions**.

List Comprehensions are a combination of the list syntax (*i.e.*, the square braces **[]**) and the `for..in` syntax.

`[<expression> for <variable> in <sequence>]`

Where

1. `<sequence>` is a sequence, container, or generator of some sort,
2. `<variable>` is a variable that will be assigned to each element of the sequence, one at a time, and
3. `<expression>` is a valid Python expression which, when evaluated (often using `<variable>` in some way) results in some value that will be added as an element to the list which is being generated.

As is usual with Python, it may seem that you have to read the code backwards in order to understand what is happening.  When read in order, however, it often sounds the same as you would say it in spoken English.

In [None]:
# Consider this loop to build a list of integers from 0 to 9:
integers = []
for x in range(10):
    integers.append(x)
print(integers)

In [None]:
# The preceding loop to build a list can be
# condensed down to a single line, called a
# list comprehension.
#
# The syntax for a list comprehension is:
# [<expression> for <var> in <sequence>]

print([x for x in range(10)])

In [None]:
# Consider the impact of the change in this example:
print([x * 2 for x in range(10)])

In [None]:
# You can use the <expression> part of the list
# comprehension in order to build more complex lists.
print([x**2 for x in range(10)])

In [None]:
# How about this?

word = "abracadabra"

newWord = ''.join([chr(ord(letter) + 1) for letter in word])

print(word)
print(newWord)

## Filtering List Comprehensions

OK, so perhaps you agree that list comprehensions are helpful in some cases.  

But what if you only want to add some things to the list from the sequence?

Consider the following loop that builds a list of elements whose value is odd.

In [None]:
oddNumbers = []
for x in range(10):
    if x % 2 != 0:
        oddNumbers.append(x)
print(oddNumbers)

The List Comprehension can emulate this behavior, too!  How? You can add an `if` to the end of the list comprehension syntax which will act as a filter!

`[<expression> for <variable> in <sequence> if <expression>]`

Notice that there are 3 parts to this particular list comprehension:

1. the `<expression>`
2. the `for <var> in <sequence>`
3. the `if <expression>`, which will be cast to a `bool` (Boolean) value.

First, **part 3** will filter the elements of **part 2**.  **If** the test passes, **then part 1** is evaluated for that item in the new array.  If the test does not pass, then the item is not added to the new array.


In [None]:
oddNumbers = [x for x in range(10) if x % 2]
print(oddNumbers)

In [None]:
# Here is an additional example of this extended, 3-part syntax.
# Notice, in this example, that the list is filtered
# *before* the `x + 1` is evaluated.
print([x + 1 for x in range(10) if x % 2])

In [None]:
print([chr(ord("A") + x) for x in range(10) if x % 2])

In [None]:
# You can do some interesting things.
# What does this code produce?
print([x for x in "abracadabra" if x not in "abc"])

In [None]:
# See the behavior of ''.join()
print('\n'.join([x for x in "abracadabra" if x not in "abc"]))

In [None]:
[chr(ch) for ch in range(65, 75)]

What's the moral of the story?  The **List Comprehension** is an elegant, flexible Python code syntax that can simplify your code and make it easier to reason about and faster to write.

*As long as it is not abused, that is!!!*

## Do you remember the "splat" operator?

In [None]:
print(range(10))

In [None]:
print(list(range(10)))

In [None]:
print([*range(10)])

In [None]:
print([i for i in range(10)])

Basically, the "splat" operator (`[* <sequence> ]`) can function like an ultra-shorthand list comprehension!

(It's not exactly the same as a list comprehension, but in this example, it functions like one.)

In [None]:
print("Hello")

In [None]:
print([character for character in "Hello"])

In [None]:
print([*"Hello"])

The following example is just a silly example of how the splat operator and list comprehension are similar.

In [None]:
print([*[1,2,3]])

In [None]:
print([x for x in [1,2,3]])

Of course, the splat operator can't modify the list in any way, but the list comprehension can.

In [None]:
print([-x for x in [1,2,3]])

In conclusion, the splat operator and list comprehension both do the job of converting a sequence into a list.

In [None]:
sentence = "It was a dark and stormy night."
print(sentence)
print(sentence.upper())
print([x.upper() for x in sentence.split()])
print(' '.join(sorted(sentence.upper().split())))

In [None]:
# Other weird ways that lists and comprehensions can be used.
print([x for x in sentence.split()])
print([len(x) for x in sentence.split()])
# Advanced, please don't worry about it now:
print(sorted(sentence.split(), key=len))

# 2D Lists (and beyond)

Lists are a **1-dimensional** structure.  This means that, although the list in its entirety is referenced using a single variable name, you must provide an **index** in order to access any particular element of that list.  This is nothing new.

What is new is the realization that each list item is, in fact, a variable that can contain *any* type of value... **including another list!!!**


In [None]:
myList = [0, 1, 2, ['a', 'b']]

print(myList)
print()

for item in myList:
    print(item)

**Remember:** Lists are heterogeneous.  That means that one item in the list may be a string, and another item can be an integer, and another item can be a list.

Let's see how we can access that "inner" list.

In [None]:
myList = [42, "foo bar baz", [1, 2, 3, 4, 5]]

print(myList)
print(type(myList))
print()

print("List Items:")
print("-="*15)

for item in myList:
    print(item)
    print(type(item))
    print()

In [None]:
# Notice that we can access each item in the list by using
# the bracket (square brace) notation.
print(myList[0])
print(type(myList[0]))
print()
print(myList[1])
print(type(myList[1]))
print()
print(myList[2])
print(type(myList[2]))

In [None]:
# Because the bracket notation returns a value
# (and that value has a type, etc.), we are
# able to further break down that value, when
# that value is a sequence type.

print(myList[1])
print(myList[1][0])
print()

print(myList[2])
print(myList[2][-1])

These examples show why we call this type of structure a **2-D** (or 2-Dimensional) structure.  It represents a mental model of how the data is arranged.

**Is this truly a 2-D structure?**  No.

Remember that lists are heterogenous.  In the preceding example, `myList[0]` is an `int`.  If you try to access a 2nd dimension like this: `myList[0][0]`, you will receive a runtime error, because the interpreter does not know how to access `[0]` from an `int`.

**Why do we say that it is 2-D, then?**

Because it helps us to visualize the data.  Remember the quote from the first lecture: "All models are wrong, but some are useful."  A list is never truly a 2-D structure, but it can behave like one if set up properly.  And, when set up properly, it can help us to think through problems.

## 2-D Examples

In [None]:
# Creating a 2D list:
myList = [[0,1,2],[3,4,5],[6,7,8]]

print(myList)

In [None]:
# Traversing a 2D list:
myList = [[0,1,2],[3,4,5],[6,7,8]]

for row in myList:
    print("row:", row)
    
    print("This row contains: ", end="")
    for column in row:
        print(column, end=" ")
    print()
    print()

In [None]:
# Creating a 3x3 list quickly:
myList = [[0, 0, 0]] * 3
print(myList)

In [None]:
# Creating a nxn list quickly:
n = 5
myList = [[0] * n] * n
print(myList)

In [None]:
# Creating a 3x3 list with a list comprehension:
myList = [[0, 0, 0] for _ in range(3)]
print(myList)

In [None]:
# Of course, once that you realize that list
# comprehensions can create 2D lists, you just
# have to use your imagination to come up with
# interesting applications.

myList = [
    [1 if col == row else 0 for col in range(10)]
    for row in range(10)]
print(myList)
print()
for row in myList:
    print(row)

In [None]:
# Rotating characters
message = "Go Mustangs!"

block = [[character for character in message[row:] + message[:row]] for row in range(0, len(message))]
block *= 2
print(block)
print()
for row in block:
    print(''.join(row))

In [8]:
# Of course, the previous code could be written a bit more concisely:
message = "Go Mustangs!"

for row in [[*(message[row:] + message[:row])] for row in range(0, len(message))] * 2:
    print(''.join(row))

Go Irish!
o Irish!G
 Irish!Go
Irish!Go 
rish!Go I
ish!Go Ir
sh!Go Iri
h!Go Iris
!Go Irish
Go Irish!
o Irish!G
 Irish!Go
Irish!Go 
rish!Go I
ish!Go Ir
sh!Go Iri
h!Go Iris
!Go Irish


In [None]:
# Of course, the previous code could be written a bit more concisely:
message = "Go Mustangs!"

print('\n'.join([message[row:] + message[:row] for row in range(0, len(message))] * 2))

In [None]:
# How would you change the code to give you a list of strings?
# (instead of a 2-d list of characters)
message = "Go Mustangs!"



In [None]:
# And lastly, if you want to only print out the last
# block of text and condense the code all down to a single line:



# Tuples

First, a review of what we know about strings and lists

In [None]:
# l is a list of individual words taken from this string:
l = "Hello, how are you today".split()
print(l)

In [None]:
# We can add (append) to a list
l.append(3)
print(l)

In [None]:
# We can access an individual element of the list
print(l[2])

In [None]:
# We can change an individual element of a list
l[2] = "blablabla"
print(l)

### Compare this behavior to strings

In [None]:
# Let's compare that to how strings operate
message = "hello how are you"
print(message)

In [None]:
# We can access a specific element of a string
print(message[3])

In [None]:
# We can append something to the end of a string
message += "5"
print(message)

In [None]:
# We can't change something in the middle of a string, so the 
# following code is not permitted.  If you un-comment it and try
# to run it, you will receive an error.
message[3] = "X"

### Key takeaway:

**Lists are mutable.** That is, they can be changed.  The values of the items in the list can be altered.  A list can have items added to or removed from it.

**Strings are immutable.**  Once they are created, they cannot be changed.  When we "added to" a string in the preceding example (`message += "5"`), we were not actually modifying the old string.  Rather, we were creating a new string that was equal to the value of the two preceding strings being concatenated together.

## Tuples, what they really are...

Tuples are very much like lists, but they are unchangeable (just like strings are unchangeable).  As programmers, we use **lists** to imply that the data in the sequence is expected to change.  Conversely, we use **tuples** to imply that the data in the sequence is expected to **not** be changed.

A **Tuple** implies that the data inside the tuple is interconnected in some way.

In [None]:
# First, notice how a list is declared (using square braces)
xList = [1,2,3]

# Tuples are declared using parenthesis
xTuple = (1,2,3)

# Notice the difference when we print them out.
print(xList)
print(xTuple)

In [None]:
# We can append() to a list
xList.append('Hi')
print(xList)

# We cannot append() to a tuple
# xTuple.append('Hi')
print(xTuple)

In [None]:
# We can reassign the value of one element in a list
xList[1] = "changed"
print(xList)

# We cannot reassign any value in a tuple.
# xTuple[1] = "changed"
print(xTuple)

In [None]:
# Notice the difference now when we print them out.
print(xList)
print(xTuple)

In [None]:
# A tuple can be declared explicitly.
(1,1,1,1,1)

In [None]:
# Just like lists, concatenating tuples will result in a new tuple being created.
(1,1) + (2,2)

In [None]:
# Just like lists (and strings), we can multiply them by an integer, which makes them repeat
(1,2) * 5

In [None]:
# Example of multiplying an array.
[0] * 26

In [None]:
# Example of creating a list with a single element.
l = [23]
print(l)

In [None]:
# Example of creating a tuple with a single element (notice the comma that is
# required when the tuple only has a single element).
t = (23, )

print(t)

In [None]:
#############################################################


In [None]:
# In a surprising twist, we can leave the parenthesis out, but because we have commas,
# Python will still understand that we are implying a tuple.
t = 23,4,5,.17,"why Python.... why???"
print(t)
print(t[4])

In [None]:
# This also works for a tuple with only one value in it.
t = 23,
print(t)

In [None]:
# Here is a two-dimentional list.  Notice how we access the elements of the deeper dimensions.
tt = [[0, "a"], [0,42]]
print(tt)
print(tt[1])
print(tt[1][1])

In [None]:
# Here is a two-dimentional tuple.
tt = (0, "a"), (0,42)
print(tt)
print(tt[1])
print(tt[1][1])

In [None]:
# Here is a list that contains tuples.
tt = [(0, "a"), (0,42)]
print(tt)
print(tt[1])
print(tt[1][1])

In [None]:
# Here is a list containing tuples and other value types.
tt = [(0, "a"), (0,42), 123456789, "Python is weird!"]
print(tt)
print(tt[1][1])
print(tt[3][:6])


## Tuple Comprehensions

There is no **tuple comprehension** *per se*.  But, you can convert a list into a tuple using the `tuple()` constructor.

In [None]:
print([x for x in "Hello"])
print(tuple(x for x in "Hello"))

Some people also wanted a spat-like syntax for tuples, so, since Python 3.5, you can do this:

In [None]:
# Ugly syntax when assigning the tuple to a variable.
t = *(x for x in "Hello"),

print(t)

In [None]:
# Even uglier syntax when using directly as an argument in a function.
print((*(x for x in "Hello"),))

As you can see in the last line of the above example, though, this is messy. :(

Specifically, the format for a tuple comprehension is `*( <for..in expression> ),`.

Notice that it starts with `*(` and ends with `),`.  It's not a true comprehension, but rather a hack (IMO).

When using the splat tuple comprehension as an argument to a function, you must surround it with an additional pair of parenthesis.

In short, it might be better to just use the `tuple( <for..in expression> )` syntax, for both readability and neatness.

## Destructuring

Python has a neat trick called **destructuring**.  Destructuring itself is a huge topic, so we will focus on the part that we are most interested in.

In short, **destructuring** is a syntax that allows us to assign variable values based on the *structure* of a list.

An example will make this much easier to understand.

In [None]:
(a, b) = (1, 2)

print(f'a is {a}')
print(f'b is {b}')

As you can see, the tuple `(1, 2)` on the right of the equal assignment is **destructured** into the left hand structure of `(a, b)`, so that the value `1` is assigned to `a` and `2` is assigned to `b`.

To put it another way, Python observes the paralled structure that is given on either side of the equal assignment operator and uses it to assign values to a corresponding variable.

In [None]:
# It works for multiple values
(a, b, c) = ("foo", "bar", "baz")

print(a)
print(b)
print(c)

In [None]:
# It even works if you use a list instead of a tuple!!!
(a, b, c) = ["foo", "bar", "baz"]
print(a)
print(b)
print(c)
print()

[d, e, f] = ("foo", "bar", "baz")
print(d)
print(e)
print(f)
print()

[g, h, i] = ["foo", "bar", "baz"]
print(g)
print(h)
print(i)


**Remember** that tuples **don't** need to use a parenthesis, though!

In [None]:
a, b = 1, 2

print(a)
print(b)

In [None]:
# Destructuring even works when the tuple (or list) is in a variable.

import math

coordinates = math.pi, math.sin(math.pi)
print(coordinates)
print()

x, y = coordinates
print(f'The coordinates are ({x}, {y})')
print(f'The coordinates are ({x:.3f}, {y:.3f})') # Why?

Do you remember the Fibonacci example?

```
total = int(input("How many Fibonacci numbers should I generate? "))

num1 = 1
num2 = 1
num3 = num1 + num2
counter = 0
while counter < total:
    print(num1, end=" ")
    num1 = num2
    num2 = num3
    num3 = num1 + num2
    counter = counter + 1
```

Re-write the code so that the assignment of `num1`, `num2`, and `num3` all occur on the same line.

In [None]:
# Re-write Fibonacci here:
total = int(input("How many Fibonacci numbers should I generate? "))

(num1, num2, num3) = (1, 1, 2)
for _ in range(total):
    print(num1, end=" ")
    num1, num2, num3 = num2, num3, (num2 + num3)


In [None]:
foo = (1,2)
print(foo)
foo = (3,4)
print(foo)
foo.append(5)

In [None]:
s = "abc"
print(s)

s += "foo"
print(s)

s[0] = "Z"

In [None]:
a, b = ((0, 1), (2,3))

print(a)
print(b)

a = (10,20)
print(a)

Python has another use of the splat operator.  Consider the following (which produces an error).

In [None]:
a, b = 1, 2, 3

print(a)
print(b)

If we use the splat operator, we can use the last variable as a "catch-all" for the rest of the tuple/list.

In [None]:
a, *b = 1, 2, 3

print(a)
print(b)

In [None]:
# This can lead to some interesting (valid) Python notation.
a, *b, c = 1, 2, 3

print(a)
print(b)
print(c)

In [13]:
# Does it make sense why `b` is a list?
a, *b, c = 1, 2, 3, 4

print(a)
print(b)
print(c)

1
[2, 3]
4


In [11]:
# And, in case you were wondering, Python does not allow ambiguity.
a, *b, *c = 1, 2, 3, 4

print(a)
print(b)
print(c)

SyntaxError: multiple starred expressions in assignment (2562304086.py, line 2)

Lastly (for now), we should observe that destructuring works as expected when the right hand side has mixed types.

In [12]:
a, b, *c = 42, [a for a in range(10)], "foo bar baz"

print(a)
print(b)
print(c)

42
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
['foo bar baz']


In [None]:
theList = [*"hello"]
print(theList)

index = 0
while index < len(theList):
    print(theList[index])
    index += 1
print()
    
for item in theList:
    print(item)

## enumerate()

**`enumerate()`** is a function that that takes a sequence as its argument, and returns a sequence generator.  For each item in the source sequence, the `enumerate()` generator will produce a tuple representing the index and value of that item in the source sequence.

Because `enumerate()` returns a generating object, it is often difficult to inspect directly.  Just like the `range()` generator, however, we can convert the `enumerate()` generator into a list of values by using the splat operator.

In [None]:
help(enumerate)

In [None]:
myList = [0,1,2]
print(enumerate(myList))
print([*enumerate(myList)])

In [10]:
# The point of this code is to walk you through the concept of what the enumerate()
# function does.  That is, it returns a list, in which each row contains a tuple
# with two elements.  The first number in the tuple is the index from the original
# list.  The second value is the corresponding value from the original list.

import string
print(string.ascii_lowercase)
print()

lowalpha = [*string.ascii_lowercase]
print(lowalpha)
print()

print([*enumerate(lowalpha)])


abcdefghijklmnopqrstuvwxyz

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'), (6, 'g'), (7, 'h'), (8, 'i'), (9, 'j'), (10, 'k'), (11, 'l'), (12, 'm'), (13, 'n'), (14, 'o'), (15, 'p'), (16, 'q'), (17, 'r'), (18, 's'), (19, 't'), (20, 'u'), (21, 'v'), (22, 'w'), (23, 'x'), (24, 'y'), (25, 'z')]


Notice how in the last example (the splatted line) that the resulting list has **tuples** for its elements.

Combine this observation with the fact that we can iterate through generated sequences using the `for..in:` construct, and we get the following:

In [9]:
for pair in enumerate("Go Mustangs!"):
    print(pair)

(0, 'G')
(1, 'o')
(2, ' ')
(3, 'M')
(4, 'u')
(5, 's')
(6, 't')
(7, 'a')
(8, 'n')
(9, 'g')
(10, 's')
(11, '!')


In this example, `pair` is a **tuple**!  That means that we can use **destructuring** on it and end up with something like this:

In [None]:
for index, character in enumerate("Go Mustangs!"):
    print(f"'{character}' is in index position {index}.")

Of course, the same trick applies to lists!

In [None]:
message = 'The quick brown fox jumps over the lazy dog.'

for word in message.split():
    print(word)

print('------------------------')

for i, word in enumerate(message.split()):
    print(i, word)

As you can see, **destructuring** and the **`enumerate()`** function make it easy for us to get at the index of a list/sequence item.

**This is a trick that you want to remember!**

# Dictionaries

https://docs.python.org/3.6/tutorial/datastructures.html#dictionaries

A **dictionary** is a collection object that allows us to store **key-value pairs**.  That is, a **value** can be associated with some identifier, called a **key**.

For comparison, Think about how a list operates.  In a list, the "key" is the index (number), starting with 0.  In a dictionary, the key can be any value that we want to use, number or string.


In [None]:
myList = ['a', 'b', 'c']
print(myList[2])

In the case of the previous example, the list `myList` contains 3 items.

`myList[0]` has the value `a`, `myList[1]` has the value `b`, and `myList[2]` has the value `c`.

In this sense, the **key** are the indexes `0`, `1`, and `2`, and their associated values are `a`, `b`, and `c`.

Lists can only have numeric indexes.  Dictionaries can have **any** value as an index.

**To sum it up:**

**List indexes** are:

1. **Automatic**.  List indexes are assigned automatically when building the list.
2. **Numeric, Zero-based**.  The first index is `0`.
3. **Sequential**.  Index values are numerically sequential.

**Dictionary keys** are:
1. **Manual**.  Dictionary keys must be declared manually.
2. **Any type**.  Dictionary keys can have any type.
3. **Non-sequential**.  Index values have no inerhent sequence in **Python 3.6**, the version of Python that we are using.  As of **Python 3.7**, The **insertion order** is preserved, but this is not a behavior that you should rely on.

## Creating A Dictionary

Dictionaries can be created in several ways.  Some are more common than others.  Here are a few examples:

In [None]:
# The least-used method:
myDictionary = dict(foo="hey", bar=42, baz=True)

print(myDictionary)
print()

print(myDictionary["foo"])

As you can see, a dictionary can be recognized by its use of curly braces (**{}**).  You can also see the **key** and **value** pairs.  Lastly, you see that we access the individual elements in the dictionary using the familiar square braces (**[]**).

In [None]:
# Create a dictionary from a list of tuples
myDictionary = dict([(1, "foo"), (2, "bar"), (3, "baz")])
print(myDictionary)

In [None]:
# Creating a dictionary *using* the curly braces.
# This way is quite common to use.
myDictionary = {"a": 1, "b": 42, "c": "foo", "6": "good", 6: "evil"}
print(myDictionary)
print()

print(myDictionary["b"])
print(myDictionary["6"])
print(myDictionary[6])

In the above example, notice that keys `'6'` (a string) and `6` (an int) represent different key-value pairs.  It is because the **type** matters!

In [None]:
# Changing a value in the dictionary
print(myDictionary)
print()

print(myDictionary["b"])
print("Change d['b']")
myDictionary["b"] = "LOOK AT ME!"
print(myDictionary["b"])
print()

print(myDictionary)

In [None]:
nums = {"a" : 0, "b" : 0}
print(nums)
print()

nums["a"] += 1
print(nums)

## The Wizard Of Oz

Let's do something fun.  The code here will download the text of the book **The Wonderful Wizard of Oz** and put it into a variable called `oz`.

In [None]:
oz = loadBook("http://www.gutenberg.org/cache/epub/55/pg55.txt", 'oz.txt', "utf-8-sig")

print('The Wonderful Wizard of Oz is loaded.  It is %d characters long.' % len(oz))
print('-' * 80)
print(oz)

As a warm-up, let's count the number of times that the word `the` and `oz` each appear in the text.

In [None]:
theCount = 0
ozCount = 0

for word in oz.lower().split():
    if word == "the":
        theCount += 1
    if word == "oz":
        ozCount += 1

print(f"`the` occurs {theCount} time(s)")
print(f"`oz` occurs {ozCount} time(s)")


Let's use a **dictionary** to count the number of times that each word appears in the text.

For the key-value pair, the key will be the word and the value will be the number of times that the word appears.

In this example, we are only going to count occurences of these two specific words.  The **`in`** keyword here tests whether or not the `word` exists as a **key** in the dictionary.

In [None]:
wordCount = {"oz": 0, "the": 0}

for word in oz.lower().split():
    if word in wordCount:
        wordCount[word] += 1

print(wordCount)

Why do we have to test whether or not the key already exists in the dictionary?

Let's see what happens when we forget to test whether or not the key exists.

In [None]:
wordCount = {"oz": 0, "the": 0}

for word in oz.lower().split():
    wordCount[word] += 1

print(wordCount)

As you can see, the **KeyError** indicates that you tried to get the value of a key that doesn't exist in the dictionary.

The code `wordCount[word] += 1` is the problem.  In order to do a `+=`, there must be a value that exists first.

How can we fix this?  By checking to see whether or not a word is in the dictionary and taking the appropriate action.

In [None]:
wordCount = {}

for word in oz.lower().split():
    if word not in wordCount:
        wordCount[word] = 1
    else:
        wordCount[word] += 1

print(wordCount)

## Stupid Dictionary Tricks

* `.keys()` will generate a list of the dictionary **keys**.  The order is unpredictable for this version of Python.
* `.values()` will generate a list of the dictionary **values**.  The order is unpredictable for this version of Python.
* `.items()` will generate a list of tuples, which are the key-value pairs.
* `.setdefault(key, value)` will set a "default value" for the provided `key`
* `len()` will give the number of values in the dictionary.
* `del` can delete a key/value pair from the dictionary.  E.g., `del myDictionary["a"]`


In [None]:
# Getting at the contents of the dictionary
myDictionary = dict([(0,'a'), (1,'b'), (2,'c'), (3,'d'), (4,'e'), (5,'f'), (6,'g'), (7,'h'), (8,'i'), (9,'j')])
print(myDictionary)
print()

print(".keys()")
print(myDictionary.keys())
print()

print(".values()")
print(myDictionary.values())
print()

print(".items()")
print(myDictionary.items())

Don't let the names `dict_keys()`, `dict_values()`, or `dict_items()` throw you off.  They are just sequences.  You can convert them to a list or use them in any place that you use sequences.

For example:

In [None]:
myDictionary = dict([(0,'a'), (1,'b'), (2,'c'), (3,'d'), (4,'e'), (5,'f'), (6,'g'), (7,'h'), (8,'i'), (9,'j')])
print(myDictionary)
print()

for key, value in myDictionary.items():
    print(f"key `{key}` has value `{value}`")

In [None]:
wordCount = {}

for word in oz.lower().split():
    if word not in wordCount:
        wordCount[word] = 0
    wordCount[word] += 1

#del wordCount["the"]
#del wordCount["and"]
lookFor = max(wordCount.values())
for word in wordCount.keys():
    if (wordCount[word] == lookFor):
        print(word)

In [None]:
# Simple code to count the number of occurences of a character in a string.
characters = {}
for ch in oz:
    if ch not in characters:
        characters[ch] = 0
    characters[ch] += 1
print(characters)
print(characters["A"] + characters['a'])

# Print out the count of all the letters in the alphabet
import string
for ch in string.ascii_letters:
    print(ch + " : " + str(characters[ch]))
    print(f"{ch} : {characters[ch]}")

In [None]:
# Example using .setdefault()
wordCount = {}

for word in oz.lower().split():
    wordCount.setdefault(word, 0)
    wordCount[word] += 1

print(wordCount)

In [None]:
# Deleting values from the dictionary by their key
myDictionary = dict([(0,'a'), (1,'b'), (2,'c'), (3,'d'), (4,'e'), (5,'f'), (6,'g'), (7,'h'), (8,'i'), (9,'j')])
print(myDictionary)
print()

print("Deleting keys 1, 5, and 8.")
print()

del myDictionary[1]
del myDictionary[5]
del myDictionary[8]

print(myDictionary)

In [None]:
# Of course, if the key is not in the dictionary, then trying to
# delete it will result in a KeyError.
myDictionary = dict([(0,'a'), (1,'b'), (2,'c'), (3,'d'), (4,'e'), (5,'f'), (6,'g'), (7,'h'), (8,'i'), (9,'j')])
print(myDictionary)
print()

print("Deleting keys 1, 5, and 42.")
print()

del myDictionary[1]
del myDictionary[5]
del myDictionary[42]

print(myDictionary)

In [None]:
# Deleting values from the dictionary by their key
myDictionary = dict([(0,'a'), (1,'b'), (2,'c'), (3,'d'), (4,'e'), (5,'f'), (6,'g'), (7,'h'), (8,'i'), (9,'j')])
print(myDictionary)
print()

print("Deleting keys 1, 5, and 42.")
print()

if 1 in myDictionary:
    del myDictionary[1]
if 5 in myDictionary:
    del myDictionary[5]
if 42 in myDictionary:
    del myDictionary[42]

print(myDictionary)

In [None]:
# Another way to write it:
myDictionary = dict([(0,'a'), (1,'b'), (2,'c'), (3,'d'), (4,'e'), (5,'f'), (6,'g'), (7,'h'), (8,'i'), (9,'j')])
print(myDictionary)
print()

print("Deleting keys 1, 5, and 42.")
print()

for num in [1, 5, 42]:
    if num in myDictionary:
        del myDictionary[num]

print(myDictionary)

## Dictionaries From Comprehensions

You can be very imaginitive with this.

The pure comprehension uses the pattern: `{key:value for <> in <>}`

In [None]:
# From a dictionary comprehension
d = {x: x**2 for x in range(10)}
print(d)

In [None]:
# same result as dictionary comprehension above
e = dict([(x, x**2) for x in range(10)])
print(e)

In [None]:
# From a dictionary comprehension
d = {x: x**2 for x in range(10) if x % 2 == 0}
print(d)

## Frankenstein Exercises

Quick side note: observe the difference at the beginning of the string when using different encodings.

In [None]:
loadBook("http://www.gutenberg.org/files/84/84-0.txt", 'frank-without-sig.txt', "utf-8")[:250]

In [None]:
loadBook("http://www.gutenberg.org/files/84/84-0.txt", 'frank.txt', "utf-8-sig")[:250]

As you can see, the `utf-8` encoding includes [byte order marks](https://en.wikipedia.org/wiki/Byte_order_mark#:~:text=Byte%20order%20marks%20by%20encoding%20%20%20,the%20null%20character%29%20%207%20more%20rows%20), which we don't want.  The `utf-8-sig` encoding will remove this information.

This brings us to an important question: **How do you know which codec to use when doing stuff on your own?**

The simple answer is this: you look at the data yourself, and make an educated guess.  You know know that if you see a string starting with `\ufeff`, then that is a byte order mark, and you need to remove it by using the correct encoding.

In other words, there is no hard-and-fast rule.  Your job is to be aware of the problems of different data sources so that you know what to look for (search online for) to get your code working correctly.

In [None]:
frank = loadBook("http://www.gutenberg.org/files/84/84-0.txt", 'frank.txt', "utf-8-sig")

print(frank)

In [None]:
# Get the most common word.
import string
'''
slimFrank = ''

for ch in frank.lower():
    if ch in (string.ascii_lowercase + ' \n'):
        slimFrank += ch
        
print(slimFrank)
'''

slimFrank = ''.join([ch for ch in frank.lower() if ch in (string.ascii_lowercase + ' \n')])

counts = {}

for word in slimFrank.split():
    counts.setdefault(word, 0)
    counts[word] += 1
    
most = max(counts.values())

print(most)

for key, value in counts.items():
    if value == most:
        print(key)
        
print([key for key,value in counts.items() if value == most])

### Q1: What is the longest word(s) in the text?

In [None]:
import string

slimFrank = ''.join([ch for ch in frank.lower() if ch in (string.ascii_lowercase + ' \n')])

wordLengths = [len(word) for word in slimFrank.split()]

maxLen = 4 # max(wordLengths)

for word in slimFrank.split():
    if len(word) == maxLen:
        print(word)


In [None]:
import string

slimFrank = ''.join([ch for ch in frank.lower() if ch in (string.ascii_lowercase + ' \n')])

wordLengths = [len(word) for word in slimFrank.split()]

maxLen = max(wordLengths)

displayed = {}

for word in slimFrank.split():
    if len(word) == maxLen:
        if word not in displayed:
            displayed[word] = True
            print(word)


In [None]:
import string

slimFrank = ''.join([ch for ch in frank.lower() if ch in (string.ascii_lowercase + ' \n')])

wordLen = {word:len(word) for word in slimFrank.split()}

maxLen = max(wordLen.values())

[word for word, length in wordLen.items() if length == maxLen]


### Q2: What are the 20 most commonly used words in the text?

In [None]:
slimFrank = ''.join([ch for ch in frank.lower() if ch in (string.ascii_lowercase + ' \n')])

counts = {}

for word in slimFrank.split():
    counts.setdefault(word, 0)
    counts[word] += 1

most = max(counts.values())

print(most)

mostFrequent = []

for length in range(most, 0, -1):
    mostFrequent += [word for word in counts.keys() if counts[word] == length]

print(mostFrequent[:20])

In [None]:
slimFrank = ''.join([ch for ch in frank.lower() if ch in (string.ascii_lowercase + ' \n')])

counts = {}

for word in slimFrank.split():
    counts.setdefault(word, 0)
    counts[word] += 1
    
most = max(counts.values())

print(most)

mostFrequent = []

for length in range(most, 0, -1):
    mostFrequent += [word for word in counts.keys() if counts[word] == length]

print(mostFrequent[:20])

In [None]:
slimFrank = ''.join([ch for ch in frank.lower() if ch in (string.ascii_lowercase + ' \n')])

counts = {}

for word in slimFrank.split():
    counts.setdefault(word, 0)
    counts[word] += 1
 
reverseDictionary = {}

for word in counts.keys():
    key = counts[word]
    reverseDictionary.setdefault(key, [])
    if word not in reverseDictionary[key]:
        reverseDictionary[key].append(word)
    

most = max(reverseDictionary.keys())

print(most)

mostFrequent = []
for length in range(most, 0, -1):
    if length in reverseDictionary:
        mostFrequent += reverseDictionary[length]

print(mostFrequent[:20])


### Q3: What are the 20 most commonly used words in the text, ignoring the 100 most commonly used words in the English language?

https://en.wikipedia.org/wiki/Most_common_words_in_English

### Q4: The text has a lot of boilerplate that might be skewing the data.  How can you remove the boilerplate text?

**Hint:** look up the string [.splitlines()](https://docs.python.org/3.6/library/stdtypes.html#str.splitlines) method.

# Old Information, New Perspective

* string .splitlines()
* reading a file with .readlines()
* reading a file using `with..as`
* function argument naming and default values

## `.splitlines()`

Strings have a method called `.splitlines()` which will convert the string into a **list** of strings, using the **newline** characters in the original text as the split points.

The following two examples show the difference between `.split()` and `.splitlines()`.

**Note:** We could not cover this information earlier, when we were learning about strings, because we did not know about lists yet.

In [None]:
oz = loadBook("http://www.gutenberg.org/cache/epub/55/pg55.txt", 'oz.txt', "utf-8-sig")

print(oz.split())

In [None]:
print(oz.splitlines())

## `.readlines()`

Similar to the string `.splitlines()` method, there is a corresponding method for dealing with text from files.

Before, we saw that the `.read()` method would put the contents of a file into a string.  Now we will look at the **`.readlines()`** method, which will return the contents of a file as a **list** of strings, using the **newline** characters in the original text as the split points.

In [None]:
# Create a file that we can read from.
outputFile = open('out.txt', 'w')

print('Testing', file=outputFile)
print(file=outputFile)

for i in range(20):
    print('*' * i, file=outputFile)
    
outputFile.close()

In [None]:
# Read the file contents
inputFile = open('out.txt', 'r')

contents = inputFile.readlines()
inputFile.close()

print(f'{"File Contents":-^30}')
print(contents)
print(f'{"End File":-^30}')


Notice that the **newline** at the end of each string still exists.  In the next example, we will print out each line, but because each line already has a newline, we will use the `end=''` argument to avoid double-spacing.

In [None]:
# Alternatively, you can loop through the list:
inputFile = open('out.txt', 'r')

contents = inputFile.readlines()
inputFile.close()

print(f'{"File Contents":-^30}')
for line in contents:
    print(line, end='')
print(f'{"End File":-^30}')


In [None]:
# Lastly, here is an example using `enumerate()`.
# Also notice a different tactic for handling the
# newline at the end of each line.
inputFile = open('out.txt', 'r')

print(f'{"File Contents":-^30}')

for number, line in enumerate(inputFile.readlines()):
    print(f"{number:03}: {line[:-1]}")

print(f'{"End File":-^30}')

inputFile.close()


## The `with..as:` Block

As you have seen, Python provides many advanced syntax shortcuts, the goal of which is to minimize mistakes on the part of the programmer.  One of these common mistakes is to open a file, and then forget to close it.

Python provides a syntax that helps with this oversite: `with..as:`

The format is: `with <expression> as <variable>:`, followed by a code block.  Inside the code block, `<variable>` will act as an alias for `<expression>`.

How does this help us with files?  Well, the `with..as:` construct will **automatically close the file for you** when it's code block is exited.

Consider the next two, equivalent cells:

In [None]:
# Create a file that we can read from.
outputFile = open('out.txt', 'w')

print('Testing', file=outputFile)
print(file=outputFile)

for i in range(20):
    print('*' * i, file=outputFile)
    
outputFile.close()

In [None]:
# Create a file that we can read from.

with open('out.txt', 'w') as outputFile:
    print('Testing', file=outputFile)
    print(file=outputFile)

    for i in range(20):
        print('*' * i, file=outputFile)


The two preceding cells are **equivalent**, but the second is cleaner.

There are more [sophisticated](https://docs.python.org/3.6/reference/compound_stmts.html#the-with-statement) uses for this syntax, but file operations are the most commonly encountered.

## Functions: Named Arguments And Default Values

Now that you have a bit more experience with functions, let's see how Python can make them easier to use.

First, we will look at **Named Arguments**.

**Named Arguments** is the ability to change the calling order of the arguments of a function by providing the expected **variable name** of that argument.

As always, an example will be explain this idea most effectively.

In [None]:
def printThreeThings(a, b, c):
    print(a, b, c)

printThreeThings('foo', 'bar', 'baz')

This example doesn't surprise us in its output.  `'foo'` is assigned to the variable `a`, etc.

Now look at this example of calling the same function, but **naming** the arguments.

In [None]:
printThreeThings(c='foo', b='bar', a='baz')

Here, you can see that we can supply the arguments in any order, so long as you **name** which argument you intend the value to be associated with.

In [None]:
# This example shows that the named arguments cannot
# precede non-named (or, "positional") arguments.
# It is a syntax error.
printThreeThings(c='foo', 'bar', 'baz')

In [None]:
# This example shows that named arguments cannot
# replace the value of the positional arguments.
# It is a Runtime Error.
printThreeThings('foo', 'bar', b='baz')

It may seem as though **named arguments** aren't very useful, given the limitations observed above.  But they can be combined with the next concept, **default argument values**, to provide sophisticated behavior.  Let's cover the default values by themselves first, then we will combine the two.

When declaring a function, you can provide a **default value** for that argument in order to simplify the code for the majority of use cases.

Consider the following:

In [None]:
def doubleIt(number=42):
    return number * 2

print(doubleIt(10))
print(doubleIt(20))
print()
print(doubleIt())

In this example, `number` is given the default value **only if** an argument is not provided by the caller.

A function may contain multiple arguments with a declared default value, but they should always appear later in the argument list than those without a default value.

In [None]:
def addUp(a, b=0):
    return a + b

print(addUp(4,5))
print(addUp(42))


In [None]:
# This cell will not compile (syntax error).
def addUp(a=0, b):
    return a + b

print(addUp(4,5))
print(addUp(42))


**Combining Named Arguments with Default Values!**

Named arguments allow us to provide arguments in any order.  Default values allow us to provide a default value for any value not otherwise specified.

Look at the resulting behavior:

In [None]:
def printThreeThings(a='foo', b='bar', c='baz'):
    print(a, b, c)
    
    
print("A:", end="")
printThreeThings()
print()

print("B:", end="")
printThreeThings(c="HOTDOGS!")
print()

print("C:", end="")
printThreeThings("Klondike", c="is a good snack!")

As you can see, we can either **(A)** provide no arguments, **(B)** provide only named arguments, or **(C)** provide some positional and some named arguments.

In all 3 examples, though, we left out some or all of the arguments.

**Do named arguments look familiar?**  They should!

You have been using named arguments with the `print()` function since the beginning, with the arguments `end=`, `sep=`, and `file=` (for writing to files).  If you look at the `help()` function information on the `print()` statement, you will see that it shows all of this information, as well as their default values.

In [None]:
help(print)

# Lambda Expressions

Before talking about Lamda expressions, let's think about **functions**.

Some functions are **stand-alone**.  Some functions are attached to a variable, and we call these types of functions **methods**.

Fundamentally, though, a function represents some computational unit.  Just like a variable, a function can be re-assigned.

In [None]:
# First declaration of foo()
def foo():
    print("bar")
    
print(foo)
foo()

print()

# Second declaration of foo()
def foo():
    print("baz")
    
print(foo)
foo()

As you can see, the name `foo` is a **variable**, and the **value** that it contains is a **function**!

The value that `foo` represents can be changed.  (As a reminder, we talked about this behavior earlier, and warned you that it is possible to re-define the built-in functions, which would cause you to have to restart your kernel in order to fix it!)

We can even assign the "function value" to another variable name!

In [None]:
differentName = foo

differentName()

**That's amazing!**

What does all of this have to do with **Lambda Expressions**?

Well, **Lambda expressions** are a way of writing **short functions that don't have a (variable) name**.  Just like the value `3` can exists without being assigned to a variable, so can functions exist without being given a name.

**Why are they called "Lambda expressions"?**  Mathematical history.  Lambda expressions are a part of "Lambda Calculus", where "Lambda" is a Greek letter (λ) representing arbitrary mathematical functions that could be defined.  In other words, **Alonzo Church** published about it in the 1930's, and we're stuck with it!

Of course, you may want to know **why** we care about lambda expressions.  The answer is that, similar to Comprehensions, lambda expressions allow us to express small computational ideas quickly and elegantly in Python.

## Structure Of A Function

Think about what is needed to define a function:

In [None]:
def doubleIt(num):
    return num * 2

We see:

* The `def` keyword, letting Python know that we are defining a function.
* `doubleIt`, the name of the function.
* `(num)`, the argument list.
* `:`, indicating that the following will be the code block
* `return num * 2`, which is the code block with a `return` value.

We have already said that lambda expressions don't have a name, so let's see the same function expressed as a lambda expression.


In [None]:
lambda num: num * 2

Notice the differences:

* There is the keyword `lambda` instead of `def`.
* There is no function name.
* There is no parenthesis around the arguments list.
* There is no code block, but rather just a single line.
* There is no `return` statement.  Lambda expressions can only be **1** line long, and whatever value is evaluated on that line is the value that is returned.  Then again, we have already seen that Python can do **a lot** in only 1 line of code!

Of course, we can save this lambda expression to a variable and use it just like a function.

In [None]:
doubleIt = lambda num: num * 2

print(doubleIt(42))

In [None]:
# Example of a lambda expression with two arguments
addUp = lambda a,b: a + b

print(addUp(3,5))

The most eloquent description in the world won't be enough to convince you of the utility of lambda expressions.  For that, we need to see examples of **where** we can use lambda expressions, and to do that, we need to learn more Python!

## `.sort()` And `sorted()`

You have already seen that `.sort()` and `sorted()` both serve to sort a sequence.  `.sort()` only exists for lists (not strings), and alters the associated list so that its items are rearranged.  `sorted()` returns a new sequence in which the items are rearranged.

By default, these work using ASCII (or numerical) order.  Both functions, however, expose a `key=` argument so that you can customize what is being used to perform the sort.

The `key=` argument expects a **function** as the value.  Compare the following:

In [None]:
message = "The quick brown fox jumps over the lazy dog"

print(sorted(message.split()))

Notice that the word `'brown'` is *after* the word `'The'`, because of ascii ordering.

We can change this.  We can change what the sort function sees when it compares the items during the sorting process.

In [None]:
def changeToUppercase(word):
    return word.upper()

message = "The quick brown fox jumps over the lazy dog"

print(sorted(message.split(), key=changeToUppercase))

It worked!

What is happening is that, when Python is deciding which string should go first, because we are passing the values through the `key=` function, then instead of seeing `'The'` and `'brown'`, it sees `'THE'` and `'BROWN'`.

Also, notice that `'The'` and `'the'` are next to one another.  Because the `sorted()` function sees them both as `'THE'`, it is arbitrary which one will appear first.

Let's see what this looks like as a **lambda expression**, using the next two cells.

In [None]:
# First, convert the function to a lambda expression.

changeToUppercase = lambda word: word.upper()

message = "The quick brown fox jumps over the lazy dog"

print(sorted(message.split(), key=changeToUppercase))

In [None]:
# Second, replace the variable `changeToUppercase` with the lambda expression.

message = "The quick brown fox jumps over the lazy dog"

print(sorted(message.split(), key=lambda word: word.upper()))

We can use pre-defined functions, too.  Here is an example using the **`len()`** function so that we can sort the words by their length.

In [None]:
message = "The quick brown fox jumps over the lazy dog"

print(sorted(message.split(), key=len))

sortedWordsByLen = sorted(message.split(), key=len)
longestWord = sortedWordsByLen[-1]
longestWordLength = len(longestWord)

print(f"The longest word has {longestWordLength} characters.")

print(f"The longest word has {len(longestWord)} characters.")

print(f"The longest word has {len(sortedWordsByLen[-1])} characters.")

print(f"The longest word has {len(sorted(message.split(), key=len)[-1])} characters.")



Notice that this example doesn't need a lambda expression at all, because `len()` already exists by default!

**`.sort()`** works the same way in respect to the `key=` argument.  Remember that `.sort()` is a method on a list, and it doesn't return anything, but rather modifies the list in-place.

In [None]:
# Example with sorted()

message = "The quick brown fox jumps over the lazy dog"

messageList = message.split()

print(messageList)
print()

messageList.sort(key=lambda word: word.upper())

print(messageList)

Both `sorted()` and `.sort()` have additional arguments that you should explore, including an option to reverse the sort order.

## `filter()`

`filter()` is a function that will filter a sequence according to some test.  The test is provided in the form of a **function** that takes an item from the list and returns either a `True` or `False` value, which decides whether or not the item will be included in the filtered version of the list.  Of course, the only items which will be included in the final list are those items that receive a `True` response from the filter function.

The form is `filter(<function>, <sequence>)`.

As usual, an example is most helpful.  Let's filter a list of words so that only those of length 4 or longer are allowed.

In [None]:
message = "The quick brown fox jumps over the lazy dog"

def wordIsLongEnough(word):
    return len(word) >= 4

print(filter(wordIsLongEnough, message.split()))

As you can see, we need to convert this "filter object" to an actual list if we want to print it out.  Or, we could use it as a sequence in some other context.

In [None]:
message = "The quick brown fox jumps over the lazy dog"

def wordIsLongEnough(word):
    return len(word) >= 4

print([*filter(wordIsLongEnough, message.split())])

In [None]:
message = "The quick brown fox jumps over the lazy dog"

def wordIsLongEnough(word):
    return len(word) >= 4

for word in filter(wordIsLongEnough, message.split()):
    print(word)

And now, let's write this same code using Lambda expressions.

In [None]:
message = "The quick brown fox jumps over the lazy dog"

print([*filter(lambda word: len(word) >= 4, message.split())])

In [None]:
message = "The quick brown fox jumps over the lazy dog"

for word in filter(lambda word: len(word) >= 4, message.split()):
    print(word)

### Challenge

Can you write code that will filter out all words that do not contain an "o"?

In [None]:
message = "The quick brown fox jumps over the lazy dog"

for word in filter(lambda word: 'o' in word, message.split()):
    print(word)


## `map()`

`map()` is a sister function to `filter()`, and they are often taught together.

`map()` will iterate through the items of a list and apply some function (that you provide) to each item individually.  The results from that function will then become a new list.

Similar to `filter()`, the syntax of `map()` is like this: `map(<function>, <sequence>)`

In [None]:
message = "The quick brown fox jumps over the lazy dog"

def changeToUppercase(word):
    return word.upper()

print(map(changeToUppercase, message.split()))

As with its sibling, `map()` also returns a "map object" that can either be converted to a list, or used as a sequence elsewhere.

In [None]:
message = "The quick brown fox jumps over the lazy dog"

def changeToUppercase(word):
    return word.upper()

print([*map(changeToUppercase, message.split())])

In [None]:
message = "The quick brown fox jumps over the lazy dog"

def changeToUppercase(word):
    return word.upper()

for word in map(changeToUppercase, message.split()):
    print(word)

And, of course, we can use lambda expressions here as well.

In [None]:
message = "The quick brown fox jumps over the lazy dog"

print([*map(lambda word: word.upper(), message.split())])

In [None]:
message = "The quick brown fox jumps over the lazy dog"

for word in map(lambda word: word.upper(), message.split()):
    print(word)

Yes, you can use built-in functions, too:

In [None]:
# This will print out the length of each word.

message = "The quick brown fox jumps over the lazy dog"

print([*map(len, message.split())])

### Challenge

Can you print the average word length for the letters in this string?

In [None]:
message = "The quick brown fox jumps over the lazy dog"

wordLengths = [*map(len, message.split())]

print(wordLengths)

print(sum(wordLengths) / len(wordLengths))

## Perspective Thoughts On `map()` And `filter()`

`map()` and `filter()` are not complicated ideas.  In fact, you have been doing them for a while now, just in another form.  "Where?", you may ask.  **Comprehensions.**

Consider the following comprehension:

In [None]:
# Create a list of tuples, where the tuple is a word and its length.
# Only allow words with a length greater than 3.
message = "The quick brown fox jumps over the lazy dog."

print([(word, len(word)) for word in message.split() if len(word) > 3])

This comprehension is first performing a **filter** behavior, followed by a **mapping** behavior.  It could be re-written as this:

In [None]:
# Create a list of tuples, where the tuple is a word and its length.
# Only allow words with a length greater than 3.
message = "The quick brown fox jumps over the lazy dog."

print([*map(lambda word: (word, len(word)), filter(lambda word: len(word) > 3, message.split()))])

Admittedly, it might be a bit harder to read, but the essential parts are there!

**Why would you use one approach over the other?**

It's all about expressing your ideas.  Sometimes it's easier (or more readable) to express your idea using a list comprehension.  Sometimes it's easer to use either `filter()`, or `map()`, or both!  Python provides the flexibility and the syntax.  You, as the programmer, provide the choice and the reasoning.

In [None]:
import string

frank = loadBook("http://www.gutenberg.org/files/84/84-0.txt", 'frank.txt', "utf-8-sig")

slimFrank = ''.join([ch for ch in frank.lower() if ch in (string.ascii_lowercase + ' \n')])

wordCount = {}

for word in slimFrank.split():
    wordCount.setdefault(word, 0)
    wordCount[word] += 1
 
sortedWords = sorted(wordCount.items(), key=lambda pair: pair[1], reverse=True)

print(sortedWords[:20])


# Zipping And Unzipping Lists

What is "zipping"?  It's a way to combine two different lists into a single list, pairing up their elements side-by-side in tuples.

https://docs.python.org/3.6/library/functions.html?highlight=sorted#zip

First, consider these two lists:

In [None]:
a = [0, 1, 2, 3, 4, 5]
b = ['a', 'b', 'c', 'd', 'e', 'f']

print(a)
print(b)

Now, think of how you could combine these two lists so that their structure is like this:

```
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]
```

You could do it clumsily using loops:

In [None]:
combined = []

for i in range(len(a)):
    combined.append((a[i], b[i]))
    
print(combined)

That would work, but building lists with `.append()` is **slow**.  What if you wanted to use a **comprehension** instead?

In [None]:
print([(a[i], b[i]) for i in range(len(a))])

That works, too, but it's a bit ugly.  And what if your lists are different lengths?

The `zip()` function takes care of all of these problems in a beautifully short syntax:

In [None]:
print(zip(a,b))

Wait... of course, this is Python, and **a lot** of sequences are an object of some sort.  We can splat it into a list, though if we want to print it out:

In [None]:
print([*zip(a,b)])

And, as usual, we don't have to splat it if we are just using it as a sequence:

In [None]:
print(a)
print(b)
print()

for number, letter in zip(a,b):
    print(f"{number} is in a tuple with '{letter}'.")

What if they have different numbers of elements?

In [None]:
a = [0, 1, 2, 3, 4, 5]
b = ['a', 'b', 'c', 'd', 'e', 'f']
c = "foo bar baz".split()

print([*zip(a,c)])
print([*zip(c,b)])

OK, so it truncates to the length of the shorter list.

Can you zip more than two lists?

In [None]:
a = [0, 1, 2, 3, 4, 5]
b = ['a', 'b', 'c', 'd', 'e', 'f']
c = "foo bar baz".split()

print([*zip(a,b,c)])


Indeed you can!

### Challenge

Imagine that you are trying to analyze text.  for whatever reason, you need to loop through the text and examine each word, but you also need to know what the word before it is.

You realize that you could do this with a clever `zip()`, so that the end result is:

```
[('', 'The'), ('The', 'quick'), ('quick', 'brown'), ... ('the', 'lazy'), ('lazy', 'dog.')]
```

Can you write this code?

(Yes, I realize that the `message` variable has the sentence in there twice.)

In [None]:
message = "The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog."

print([*zip([''] + message.split(), message.split())])

## The Elusive Unzip

Logically speaking, **unzipping** a list should be the opposite of **zipping**.  That is, to unzip a list, you should start with a list of tuples, and end up with two list, each list containing the respective parts of the tuple.

If you start with:

`
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]
`

Then you should be able to **separate** out the parts into two distinct lists:

`
[0, 1, 2, 3, 4, 5]
`

and

`
['a', 'b', 'c', 'd', 'e', 'f']
`

**Problem:** There is no **unzip** function.

But, it is easy to achieve the effect using list comprehensions!

In [None]:
originalList = [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]
print(originalList)
print()

a = [item[0] for item in originalList]
print(a)
print()

b = [item[1] for item in originalList]
print(b)

Using **destructuring**, we can make this idea a bit more compact.

In [None]:
originalList = [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f')]
print(originalList)
print()

a, b = [item[0] for item in originalList], [item[1] for item in originalList]

print(a)
print()

print(b)

Of course, what happens if the `originalList` is not **well-formed**?  That is to say, what if `originalList` has an item that is not a tuple?

In [None]:
originalList = [(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e'), (5, 'f'), 42]
print(originalList)
print()

a, b = [item[0] for item in originalList], [item[1] for item in originalList]

print(a)
print()

print(b)

As you can see, it will fail.  It is up to you, the programmer, to ensure that your data is in a predictable state (or, **sanitized**) before trying to process it.

# Sets

This part on Sets will be quite brief, because it overlaps greatly with other material covered in this Section.

So far, we have seen **Lists**, **Tuples**, and **Dictionaries**.  Now, we have one last (main) container type: **Sets**.

As **Set** is much like the mathematical idea, in that it is a collection of items in which each **value** is only stored once.

For example, let's suppose that I have the list:

```
[1,3,5,5,7,7,9,9,9]
```

The **Set** of these numbers would be:

```
{1, 3, 5, 7, 9}
```

Notice that it uses the curly brace syntax, just like a Dictionary, but it does not have any **key-value** pairs.  Also notice that each value is only present once.


## The Important Parts

* The items in the set are **unordered**.
* Sets support comprehensions and syntax such as **`x in set`**, **`x not in set`**, **`len(set)`**, **`min(set)`**, **`max(set)`**, and **`for x in set`**.
* Set **do not** support indexing, slicing, or any other sequence-like behaviors.
* There is actually both the **`set()`** and **`frozenset()`** containers.  The `frozenset()` is immutable (*i.e.*, cannot be changed once created).  We will only be using the `set()` in our examples.
* Sets can represent the standard mathematical functions: intersection, union, difference, symmetric difference.  They can also be used for membership testing and  removing duplicates from a sequence.  See [the documentation](https://docs.python.org/3.6/library/stdtypes.html#set-types-set-frozenset) for the most complete information and syntax.

In [None]:
# Create a set() using a list.

a = set([1,3,5,5,7,7,9,9,9])

print(a)

In [None]:
# Create a set using curly braces.
a = {1, 3, 5, 5, 7, 7, 9, 9, 9}

print(a)

In [None]:
# Create a set from a Comprehension
a = {a for a in "abracadabra"}

print(a)
print(type(a))

In [None]:
# Create a set using a string
a = set("abracadabra")

print(a)

In [None]:
# Create a set that contains a single string
a = set(["abracadabra"])
b = {"abracadabra"}

print(a)
print(b)

The most common methods are:

* `.add(<item>)` - Add the item to the set.
* `.remove(<item>)` - Remove the item from the set.  Raise a KeyError if the item is not in the set.
* `.discard(<item>)` - Like `.remove()`, but won't raise a KeyError.
* `.pop(<item>)` - Remove and return an arbitrary item from the set.
* `.clear()` - Remove all elements from the set.

# Baby Names Dataset

https://www.ssa.gov/oact/babynames/limits.html

The **Social Security Administration** makes available data about the names given to babies.  It is downloadable as a .zip file, and data is split across multiple files included in that .zip file.

I provided a cell (at the top of this notebook) that will download the data and put it all into a single list called `babyList`.  Run that cell first, and then continue with this section.

In [None]:
# The first 10 records in the dataset:
print(babyList[:10])

print()

# The last 10 records in the dataset:
print(babyList[-10:])

Given this **real** data, what kind of questions could you answer?

**Warm-Up Question: How many distinct names are in the dataset?**

In [5]:
names = set()

for state, sex, year, name, count in babyList:
    names.add(name)
    
print(len(names))

32403


In [6]:
print(len({name for state, sex, year, name, count in babyList}))

32403


In [7]:
nameSet = {name for state, sex, year, name, count in babyList}

print(sum([len(name) for name in nameSet]))

203954


**Q1: Given a name and a year, how many births match that constraint?**  Write this as a function.

In [None]:
def nameAndYear(searchName, searchYear):
    return sum([count for state, sex, year, name, count in babyList if name == searchName and year == searchYear])


print(nameAndYear("Charles", 2018))

**Q2: Given a name and a year, how many births match that constraint, *separated by sex*?**  Write this as a function.

In [None]:
def nameAndYearBySex(searchName, searchYear):
    maleCount = sum([count for state, sex, year, name, count in babyList if name == searchName and year == searchYear and sex == "M"])
    femaleCount = sum([count for state, sex, year, name, count in babyList if name == searchName and year == searchYear and sex == "F"])
    return {"M": maleCount, "F": femaleCount}

print(nameAndYearBySex("Joanne", 1957))

**Q3: Given a name, how has the use of that name changed, for all years in the dataset?** Write this as a function.

In [4]:
def change (searchName):
    changed = {}
    for state, sex, year, name, count in babyList:
        if name == searchName:
            changed.setdefault(year,0)
            changed[year] += count
    return changed

print(change("Cailey"))

nameChangeData = change("Cailey")

for year in sorted(nameChangeData.keys()):
    print(f"{year}: {nameChangeData[year]}")

{2000: 132, 2004: 134, 1998: 136, 1999: 146, 1996: 38, 1991: 5, 1992: 9, 1993: 32, 1994: 39, 1995: 39, 1997: 72, 2001: 139, 2002: 93, 2003: 105, 2005: 121, 2006: 104, 2007: 81, 2008: 53, 2009: 79, 2010: 90, 2011: 40, 2012: 67, 2013: 62, 2014: 34, 2015: 22, 2016: 31, 2017: 26, 2019: 21, 2020: 11, 2021: 8, 1990: 6, 2018: 7}
1990: 6
1991: 5
1992: 9
1993: 32
1994: 39
1995: 39
1996: 38
1997: 72
1998: 136
1999: 146
2000: 132
2001: 139
2002: 93
2003: 105
2004: 134
2005: 121
2006: 104
2007: 81
2008: 53
2009: 79
2010: 90
2011: 40
2012: 67
2013: 62
2014: 34
2015: 22
2016: 31
2017: 26
2018: 7
2019: 21
2020: 11
2021: 8


In [None]:
# Problem: year is over-written so that only
# the last record examined will set the value.
# Should be adding the values together instead.

searchName = "Cailey"

dictionary = {year: count for state, sex, year, name, count in babyList if name == searchName}
print(dictionary)

for record in babyList:
    if record[3] == "Cailey":
        print(record)

**Q4: What is the most popular name of all time?**

In [None]:
nameCount = {}
for state, sex, year, name, count in babyList:
    nameCount.setdefault(name,0)
    nameCount[name] += count

orderedData = sorted(nameCount.items(), key=lambda record: record[1])

print(orderedData[-10:][::-1])