# Python Data Structures

This notebook will work through examples from the Python
[Standard Types](https://docs.python.org/3/library/stdtypes.html) documentation.

## Three Ways to Match a Value

We motivate dictionaries or maps as one of three ways to match a value

### Matching with `if` Statements

If statements allow for the most flexibility in defining the logic at design time. The
tradeoff is that at runtime we ca only work with the choices we have coded.

This is acceptable when you are checking a very small list of values, generally less than
a half dozen. This is a readability convention.

In [None]:
# Get some data
response = input("Type Anything")

# Decide what to do with it
if response == "y":
    print("Hello.")
elif response == "n":
    print("Goodbye.")
else:
    print("I did not understand.")

### Matching with `match` Statements

Match statements have less flexible logic at design. However, they idiomatically represent
testing a single variable against a fixed set of values. Like the if statement this cannot
change at runtime.

Instead of catching all the other conditions with an else, we catch with a `case _`, where
the underscore matches all other values.

This is best used for no more than a couple dozen values.

In [None]:
# Get some data
response = input("Type Anything")
# response = [ response, 1 ]

# Decide what to do with it
match response:
    case "y":
        print("Hello.")
    case "n":
        print("Goodbye.")
    case [ "n", 1 ]:
        print("Deep comparison match.")
    case _:
        print("I did not understand.")

### Matching with a Dictionary

A dictionary, or map, is like a list, except it maps a unique key to a value (which may be
a duplicate). Importantly dictionaries are a built-in data structure, not a control syntax.

Dictionaries are great for matching arbitrarily large numbers of values. The act like an
index or look-up table in a database. The value in a dictionary can be changed at run time.

In [None]:
# Get some data
response = input("Type Anything")

# Initialize our look-up table
lookup = {
    "y": "Hello.",
    "n": "Goodbye.",
    "r": "Delete me.",
    -45: [ "this is also an item.", 78 ],
    78.09: { "this": "is", "a": "map" }
}
print(lookup)

# Display the output of looking up the response
print(lookup.get(response, "I did not understand."))

if response not in lookup:
    newvalue = input("What value do you want associated with your new key?")
    lookup[response] = newvalue

print(lookup)

print(lookup.pop("k", "Not found."))
print(lookup.pop("r", "Not found."))
print(lookup)
print(lookup[78.09]["a"])
print(lookup[-45][1])

## Loops and Iterables

We motivate iterables with `for` loops. We have seen using a while loop to repeat a task.
If we have a way of computing how many times to do something we can use a `for` loop.

`For` loops in Python, as with all interpreted languages such as Julia, R, JavaScript, or
Matlab are computationally expensive. We will demonstrate some of the simpler techniques
to replace `for` loops with idiomatic Python constructs that are optimized for iteration.
This techniques are commonly called vectorization. It is best to use `for` loops when
executing in strict order complex logic that references itself.

Python `for` loops iterate over any object that is iterable. In the language of computer
science the object must support the 
[iterable interface](https://docs.python.org/3/glossary.html#term-iterator). The most basic
object that supports iteration is the
[`range`](https://docs.python.org/3/library/functions.html#func-range) object. The main
characteristic of the range object is that it only stored the data that defines the range.
The boundaries and step size. The `range` object does not store the actual data that is
iterated over. In the language of object oriented programming the `range` object
[encapsulates](https://en.wikipedia.org/wiki/Encapsulation_(computer_programming))
the means of
[generating](https://docs.python.org/3/howto/functional.html#generator-expressions-and-list-comprehensions)
the data, rather than the data.

In [None]:
for item in range(8):
    print(item)

A dictionary is an iterator as well. By default a `for` loop will iterate over the keys of
the dictionary.

In [None]:
for key in {"first": 45, 87: "hello", "list": [3,4]}:
    print(f"key: {key}")

We can uses the 
[`.items()`](https://docs.python.org/3/library/stdtypes.html#dict.items) method to iterate
over both the keys and values. Note that we are implicitly unpacking each item in a key and
value.

In [None]:
for key, value in {"first": 45, 87: "hello", "list": [3,4]}.items():
    print(f"key: {key}, value: {value}")

Strings are iterables as well. The `for` loop iterates over each letter

In [None]:
for c in "Hello World!":
    print(c)

The `range` object uses a default step size of 1. This means any combination of integer
for the boundaries will create a valid range object. It simply might return no data! If the
start of the range is larger than the end of the range then we need to use a decrement to
produce data.

Take a look at what each `for` loop produces. Also note that the `range` object produces
data that does **NOT** contain the stop boundary.

In [None]:
print("Default step size")
for x in range(4, -5):
    print(x)
print("Decrement steps")
for x in range(4, -5, -2):
    print(x)

To see that the `range` object is type that is distinct from objects that contain data we
can display the type of the `range` object, a `list` object, and a conversion of the `range`
to a list.

In [1]:
print(type(range(4, -5, -2)))
print(type([4,2,0,-2,-4]))
print(type(list(range(4, -5, -2))))

<class 'range'>
<class 'list'>
<class 'list'>


Note how we called the `list`
[constructor](https://docs.python.org/3/library/stdtypes.html#list) function passing in a
range object. We can accomplish the same thing by using the Python 
[unpacking](https://docs.python.org/3/reference/expressions.html#expression-lists) `*`
operator

In [6]:
print(list(range(4, -5, -2)))
print(range(4,-5,-2))
print([ *range(4, -5, -2) ])
print([ *{"first": 45, 87: "hello", "list": [3,4]}.items() ])


[4, 2, 0, -2, -4]
range(4, -5, -2)
[4, 2, 0, -2, -4]
[('first', 45), (87, 'hello'), ('list', [3, 4])]


## Lists and Comprehensions

[List comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions)
allow us to build lists from combinations of other iterable objects, and thus replace
explicit iteration with faster implicit iteration.

In [None]:
# This will go from 4 to -4 in steps of 0.25
xs = [ x/4 for x in range(16, -17, -1) ]

# Here is the same thing as a for loop
l = list()
for x in range(16, -17, -1):
    l.append(x/4)
print(l)
print(xs)

In this more complex example we will generate a matrix as the tensor product of two lists.
Not that in a list comprehension the rightmost `for` acts as the outer loop.

In [9]:
# Initial ranges
rs = range(3, 0, -1)
cs = range(1, 4, 1)
print([ *cs ])
print([ *rs ])

# Matrix from a list comprehension
a = [ [ r*c for c in cs ] for r in rs ]
print(a)

# The same way using nested for loops
s = list()
for r in rs:
    t = list()
    for c in cs:
        t.append(r*c)
    s.append(t)
print(s)

[1, 2, 3]
[3, 2, 1]
[[3, 6, 9], [2, 4, 6], [1, 2, 3]]
[[3, 6, 9], [2, 4, 6], [1, 2, 3]]


We can also conditionally filter out combinations by using an `if` clause. Note that
the `if` clause can only see the loop variables assigned to the left of it. So in the
example the `if` claus can reference `z` and `c`, but not `r`.

In [10]:
# Initial ranges
rs = range(3, 0, -1)
cs = range(1, 4, 1)
zs = range(2, 8, 2)

# Note the difference between a list of lists
a = [ [ r*c for c in cs ] for r in rs ]
print(a)

# And a single list
b = [ r*c+z for z in zs for c in cs if (c+z)%3 == 0 for r in rs ]
print(b)

[[3, 6, 9], [2, 4, 6], [1, 2, 3]]
[5, 4, 3, 10, 8, 6, 15, 12, 9]


## Indexing

Many Python objects support complex indexing, including `strings`, `lists`, `dictionaries`,
`sets`, and `tuples`. Indexing can be used to access or assign on or more values to an
indexable object. Indexes have similar arguments to ranges

* A single index `[where]` gets or assigns a single element.
* A pair `[start:stop]` gets or assigns a range of elements.
* A triple `[start:stop:step]` gets or assigns a range of elements, skip by the amount step.
* The stop is never include.
* Python indexables are 0 indexed, meaning 0 is the first .
* Leaving out any value from the index while including the colon means use the default.
* `[:stop]` defaults to starting from the beginning.
* `[start:]` defaults to ending and the last element. This is really helpful when you do
not know how long an iterable is.
* Using negative numbers counts back from the end, for example `[-2:]` starts at the
second to last element.

In [12]:
p = "Hello World"

# The last two elements
print(p[-2:])

# All but the last element
print(p[0:-1])

# Every fifth element
print(p[::5])

ld
Hello Worl
H d


Finally we can check to see if an element is in an iterable using the `in`
[membership operator](https://docs.python.org/3/reference/expressions.html#membership-test-operations).
We can get the number of elements using the
[`len()`](https://docs.python.org/3/library/functions.html#len) function.

In [15]:
q = "Goodbye Class"
print("z" in q)
print("u" not in q)
print("o" in q)
print(len(q))

False
True
True
13
