<img src="https://github.com/christopherhuntley/BUAN5405-docs/blob/master/Slides/img/Dolan.png?raw=true" width="180px" align="right">

# **Lesson 8: Lists**
_If you can only have one collection type, make it a list_

## **Learning Objectives**

### Theory / Be able to explain ...
- The list as a mutable sequential type
- Slice Assignment
- The various list methods and functions
- The effects of list aliasing
- The function and syntax of list comprehensions

### Skills / Know how to  ...
- Create and modify lists in place
- Splice one list into another
- Make shallow and deep copies of lists
- Use list comprehensions as list-generating expressions

**What follows is adapted from Chapter 8 of the _Python For Everybody_ book. If you have not read it, then please do so before continuing on.**

---

In [None]:
#@title Lesson 8 Introduction
%%html
<div style="max-width: 1000px">
  <div style="position: relative;padding-bottom: 56.25%;height: 0;">
    <iframe style="position: absolute;top: 0;left: 0;width: 100%;height: 100%;" rel="0" modestbranding="1"  src="https://www.youtube.com/embed/hnCWyEfndO0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
  </div>
</div>


## **C Arrays: A Lingering Legacy**
> "Real Programmers write in FORTRAN."    
> -- [The Story of Mel](http://www.catb.org/~esr/jargon/html/story-of-mel.html) in _The Cathedral and the Bazaar_

If we dig deep enough into the [Python libraries](https://github.com/python/cpython), we eventually find that much of [Python itself is actually written in C](https://github.com/python/cpython/blob/master/Objects/listobject.c), a language that is still commonly used for operating systems and certain high performance applications. C is about as close to programming in assembly language, one step up from machine code, as we might ever want to venture. Writing C code is like driving a 1960s muscle car: it runs everything as fast as possible but lacks any and all safety features. It is, as Ralph Nader famously described one muscle car, _Unsafe at Any Speed_.  

Writing code in C that dealt with collections of things meant using *arrays*. A C array was a block of raw memory (RAM) set aside in advance to fit whatever data was poured into it (as bytes). The array data itself was always of one data type (usually `int`, `float`, or `char`) and it was darn near impossible to safely resize the array after the memory was allocated. Fortunately in the heyday of C for data analysis datasets tended to be small enough to estimate in advance how much memory was needed. 

To avoid the need to think too hard when creating arrays, smart and lazy programmers wrote utility code like this (from _Numerical Recipes in C_) to handle basic memory management with some degree of safety:
```c

float *vector(nl,nh)
int nl, nh;
Allocates a float Vector with range [nl..nh] 
{
    v=(float *)malloc((unsigned) (nh-nl+1)*sizeof(float);
    if ((!v) nerror("allocation failure in vector()");
    return v-nl;
}
```

What a great innovation! We could specify that we wanted a `float` array of a given size (`nh-nl+1`) and it would do all the tricky math for us. 

Now for the scary part ... Let's say that we have allocated an array to hold 100 items but then we try to insert 101 items into it? C would just do it anyway, overwriting whatever was already in memory just beyond the end of the array. The result? Usually either a major security bug or a random system crash. It was possible to write code that would run flawlessly for days at a time before randomly crashing. The programmer then might spend weeks hunting through the code looking for the error. Imagine if you had to run something a week at a time just to test it one time! You would make sure you looked over every line of code a dozen times before kicking off the next test. 

So, it is perhaps no surprise that smart and lazy programmers tend to write their code in Python these days. It's just so much more time efficient. 

Python has a data type called `array` that is like the old C arrays but with all the proper data protections built in. Not many people use it, however. Instead they just use a `list` which is so much more convenient:

- A list can contain **any number of items**, subject to how much memory is available.
- Each list item can be of **any data type**. We can even **mix data types** within the list. 
- Lists can be **extended**, **sliced**, and even **truncated** as needed. 

We have already seen lists in action in the previous lessons. In this lesson we will fill in the details.  

---
## **Lists as Mutable Sequences**
The two types of sequences we have looked at so far, files and strings, are very hard or impossible to modify in place. A list, however, is designed to be modified. Mutability is sort of why we make lists.

So, for example, consider the following snippet:

In [None]:
print("Go Stags!")

go_stags = list("Go Stags!")   # Convert to a list
print(go_stags)

go_stags[3:3]="Lady "          # Insert items into the middle of the list
print(go_stags)

del go_stags[len(go_stags)-1]  # Truncate the list 
print(go_stags)

go_stags += list("! Go!")      # Extend the list
print(go_stags)

print("".join(go_stags))       # convert back to a string

Go Stags!
['G', 'o', ' ', 'S', 't', 'a', 'g', 's', '!']
['G', 'o', ' ', 'L', 'a', 'd', 'y', ' ', 'S', 't', 'a', 'g', 's', '!']
['G', 'o', ' ', 'L', 'a', 'd', 'y', ' ', 'S', 't', 'a', 'g', 's']
['G', 'o', ' ', 'L', 'a', 'd', 'y', ' ', 'S', 't', 'a', 'g', 's', '!', ' ', 'G', 'o', '!']
Go Lady Stags! Go!


### **Appending and Deleting List Items**
The `+=` operator works just like with strings except that it works in place. The expression to the right of the `+=` must evaluate to a list, even if it is just one item. 

To remove an item from a specific position in a list we use the `del` statement as shown in the example above, which removed the trailing exclamation point from the list.  

### **Lists of Lists**
Lists can contain items of any data type, including lists. To refer to elements in the inner ("nested") list you just add another bracket operator [] to the end. This works for string items as well.

In [None]:
a_list = [1, ["alpha","beta","gamma"], 3, 4]

print(a_list)           # the full list
print(a_list[1])        # the nested sublist
print(a_list[1][2])     # an item ('Gamma') in the nested sublist 
print(a_list[1][2][3])  # the second 'm' from the string 'Gamma'

[1, ['alpha', 'beta', 'gamma'], 3, 4]
['alpha', 'beta', 'gamma']
gamma
m


### **Slice Assignment (Splicing)**
We used slices with strings. They work almost the same with lists, with one notable exception: we can **use the slice operator in assignment statements**. 

When we slice a list, the part "sliced out" is replaceable with something else. It's like Python creates a temporary variable (representing the gap in the list) that we can assign list values to as we please.

In [None]:
my_list = [1,2,3,4]

print(my_list[1:3])

my_list[1:3] = ["a","b","c","d"]
print(my_list)

[2, 3]
[1, 'a', 'b', 'c', 'd', 4]


In essence we cut the list just before positions 1 and 3 (the slice) and then spliced a new sequence into the gap. If we want to do the splicing without losing any items in the list, then we just use a 0-position slice (with the same number on either side of the `:`).

In [None]:
my_list = [1,2,3,4]
my_list[1:1] = ["a","b","c","d"]
print(my_list)

[1, 'a', 'b', 'c', 'd', 2, 3, 4]


### **Pulse Check ...**
**Rewrite the code below so that it inserts `["a","b","c","d"]` as a nested list.** The result should be `[1, ['a', 'b', 'c', 'd'], 2, 3, 4]`

In [None]:
# REWRITE THIS CODE CELL
my_list = [1,2,3,4]
my_list[1:1] = ["a","b","c","d"]
print(my_list)

[1, 'a', 'b', 'c', 'd', 2, 3, 4]


In [None]:
#@title <--- Check your work
%%html
<div style="max-width: 1000px">
   <div style="position: relative;padding-bottom: 56.25%;height: 0;">
     <iframe style="position: absolute;top: 0;left: 0;width: 100%;height: 100%;" rel="0" modestbranding="1"  
     src="https://www.youtube.com/embed/dGFyz4xe6xo"
     frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
   </div>
</div>

---
## **List Methods**
While the methods available for lists are not as impressive as for strings, they are more than adequate:
- `count()` counts the number of times a  given item appears in the list
- `index()` returns the first position where a given item appears in the list
- `reverse()` and `sort()` reorder the items in the list
- `append()`, `extend()`, and `insert()` splice in new items into the list
- `remove()`, `pop()`, and `clear()` delete items from the list
- `copy()` returns a **shallow copy** of the list; we'll come back to this in a bit

### **Pulse Check ...**
**Write a function called `mirror()` that returns the reverse of a string appended to itself.**
`mirror("Go Stags!")` returns `'Go Stags!!sgatS oG'`

In [None]:
# YOUR CODE HERE

In [None]:
#@title <--- Check your work
%%html
<div style="max-width: 1000px">
   <div style="position: relative;padding-bottom: 56.25%;height: 0;">
     <iframe style="position: absolute;top: 0;left: 0;width: 100%;height: 100%;" rel="0" modestbranding="1"  
     src="https://www.youtube.com/embed/dRXTMQdVBdE"
     frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
   </div>
</div>

---
## **List Functions**
Many of Python's built-in functions work with lists.
- `len()` counts the items in the list.
- `sum()` totals the values of the items in the list.
- `min()` and `max()` do what they appear to do.
- `sorted()` and `reversed()` return iterators of the list items in the indicated order.

Other functions work too, once you translate the list to an iterator with `sorted()`, `reversed()` or `__iter__()`. 

### **Pulse Check ...**
**Rewrite your `mirror()` function so that it uses the `reversed()` function.**  

In [None]:
# YOUR CODE HERE

In [None]:
#@title <--- Check your work
%%html
<div style="max-width: 1000px">
   <div style="position: relative;padding-bottom: 56.25%;height: 0;">
     <iframe style="position: absolute;top: 0;left: 0;width: 100%;height: 100%;" rel="0" modestbranding="1"  
     src="https://www.youtube.com/embed/rPqt8rM-3EA"
     frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
   </div>
</div>

---
## **Lists as Objects**

### **Aliases vs Copies**
All of the data types we have considered so far can be considered as singular, (somewhat) immutable entities. In order to divide them we have to create new entities. While 2 = 1 + 1 that does not mean we can turn 1 into 2. Or, though "Go" is composed of the characters 'G' and 'o', the characters are each different strings from the original. 

A list is different. It exists to be a **container** for other entities. So, mull over the following: 

In [None]:
x = 1 
y = x     # assignment y = value of x
y = 2     # modify y
print(x)  # x is unchanged

a = [1,2,3,4]
b = a     # assignment b = value of a
b[2]="a"  # modify b
print(a)  # a is changed too!

1
[1, 2, 'a', 4]


It's like voodoo action at a distance. By changing `b` we also change `a`. How is that possible? Because the assignment `b = a` sets the value of `b` to the _container_ `a`. If we alter the contents of the container, we modify the value of both `a` and `b`. In Python terms we say that `a` and `b` are **aliases** for the same list.

To eliminate such alias effects make a **copy** of the list. 

In [None]:
a = [1,2,3,4]
b = list(a)
b[2]="a"
print(a)

[1, 2, 3, 4]


Calling `list()` to construct a list from the original list creates a **shallow copy**. The copy has _exactly_ the same items as the original list (but is nonetheless a new list). If any of those copied items is a nested list, then **the same container** (the nested list) is in both copies. 

In [None]:
a_list = [1, ["alpha","beta","gamma"], 3, 4]
b_shallow = list(a_list)         # a shallow copy of a_list
b_shallow[1][1]=2                # modify the nested list
print(a_list)                    # also modifies a_list

[1, ['alpha', 2, 'gamma'], 3, 4]


To make a **deep copy** (without any aliasing of nested lists) we need the `copy` module from the standard library.

In [None]:
import copy
a_list = [1, ["alpha","beta","gamma"], 3, 4]
b_deep = copy.deepcopy(a_list)   # deep copy of a_list
b_deep[1][1]=2                   # modify the deep copy
print(a_list)                    # a_list is unchanged

[1, ['alpha', 'beta', 'gamma'], 3, 4]


### **Impact of List Aliasing on Functions**
Just as assignment to a list creates an alias, so it also works for functions. After all, a function parameter is just a kind of local variable. The parameters get set via assignment from the arguments just before executing the function body. So, if the function modifies the list in any way, then the modifications live on after the function is done. If that is not what you want, then be sure to pass copies as arguments instead of the lists themselves. That way the _copies_ get aliased and then discarded after the function returns.  

### **Pulse Check ...**
**The code below has an infinite loop. Debug it to eliminate the loop.** (You will need to scroll down to the code cell below this text cell.)

```python
def add_0(lst):
    lst += [0]

x = [1,2,3,4]
for i in x:
    add_0(x)
    print(x)
```
After fixing the loop, the correct output is:
```
[1,2,3,4,0]
[1,2,3,4,0,0]
[1,2,3,4,0,0,0]
[1,2,3,4,0,0,0,0]
```
Hints
- You will need to make a shallow copy of `x` somewhere in your code.
- The fix only affects one line of code and it is outside the loop body.
- If you get stuck in an infinite loop, then restart the runtime.

In [None]:
# REWRITE THIS CODE CELL

# The Infinite Loop Code
def add_0(lst):
    lst += [0]

x = [1,2,3,4]
for i in x:
    add_0(x)
    print(x)

In [None]:
#@title <--- Check your work
%%html
<div style="max-width: 1000px">
   <div style="position: relative;padding-bottom: 56.25%;height: 0;">
     <iframe style="position: absolute;top: 0;left: 0;width: 100%;height: 100%;" rel="0" modestbranding="1"  
     src="https://www.youtube.com/embed/HLDxiZ1DCHQ"
     frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
   </div>
</div>

---
## **Pro Tips**

### **List Comprehensions**
A list comprehension is a quirky one-line combination of a `for` loop with a conditional expression. The result is a list. The syntax is
```python
[ expression for item in sequence if condition]
```
- the `for` loop iterates through the sequence
- the value of `expression` (which likely includes `item`) is added to the list
- skips `expression` whenever the `condition` (which also likely includes `item`) is False

A comprehension is 100% equivalent to 
```python
lst = [] # an empty list
for item in sequence:
    if condition:
        lst += expression
```
Except, of course, that the comprehension doesn't need to create a local variable for the list. A comprehension is an expression to be evaluated, not a statement to be executed. If we want the comprehension to be remembered then we use an assignment statement. 

List comprehensions are very handy at times, especially when you only need a list one time, say as an argument to a function call. You may never need to use one, but when you do, it can save a lot of effort. 

#### **Slicing as a sublist operator**
In Lesson 6 we saw how slicing could be used to extract substrings of characters.

In [None]:
'Google'[2:4]

'og'

Here is what Python is doing behind the scenes. 

In [None]:
''.join( ['Google'[i] for i in range(2,4)] )  # we are using join() to reassemble the characters into a string

'og'

The logic is the same for lists, of course, only without the `join()`. 

In [None]:
lst = ["bread", "peanut butter", "jelly", "chips"]   # don't judge!
print( lst[1:3] )

['peanut butter', 'jelly']


In Lesson 11, we'll see how pandas can slice sequences with non-integer keys (e.g., 'fname', 'lname','bdate') instead of position numbers. The logic behind the scenes uses something like a list comprehension, pretty much like this:

In [None]:
print( [lst[i] for i in range(1,3)] )

['peanut butter', 'jelly']


## **Exercises**
**1. Write a code snippet that applies your `waist2hip_ratio()` function to each (W, H, G) triplet in the following list of lists:**
`[[28,40,'F'],[23, 35, 'F'],[30,40,'M'],[30,37,'M'],[32,39,'M']]`

In [None]:
# YOUR CODE HERE

In [None]:
#@title <--- Check your work
%%html
<div style="max-width: 1000px">
   <div style="position: relative;padding-bottom: 56.25%;height: 0;">
     <iframe style="position: absolute;top: 0;left: 0;width: 100%;height: 100%;" rel="0" modestbranding="1"  
     src="https://www.youtube.com/embed/XdCg1yZ7FpM"
     frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
   </div>
</div>

**2. Write a function called `inside_out()` that works as follows:** 
- It takes as input a list with a nested inner list. 
- Call the original list `list_outer` and the nested one `list_inner`.
- Remove `list_inner` from `list_outer`, remembering the `position` of `list_inner` within `list_outer`. 
- Insert `list_outer` (as a nested list) into the `list_inner` in the same `position` as `list_inner` was inside `list_outer`.
- If `position` >= `len(list_inner)` then append `list_outer` (as a nested list) to the end of `list_inner`.
- Return `list_inner` after insertion. 

For example, `inside_out([1,2,['a','b','c','d'],3, 4])` returns `['a','b',[1,2,3,4],'c','d']`.

In [None]:
# YOUR CODE HERE

In [None]:
#@title <--- Check your work
%%html
<div style="max-width: 1000px">
   <div style="position: relative;padding-bottom: 56.25%;height: 0;">
     <iframe style="position: absolute;top: 0;left: 0;width: 100%;height: 100%;" rel="0" modestbranding="1"  
     src="https://www.youtube.com/embed/QALK5jhxIKk"
     frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
   </div>
</div>

---
## **Before you go ... Submit your work on Google Classroom**
- Save your notebook to be sure it is up to date.
- Go to the assignment in Google Classroom. 
- Turn in your notebook. Your notebook will become read-only. 
- Once it has been reviewed it will be returned and no-longer be read-only.

---
> ## Every Tee Shirt Has a Story
> ABOUT DB2    
> When I was in engineering school, a bunch of my classmates (including my future spouse) were taking jobs on Wall Street working with this new thing called DB2. It sounded pretty silly to me. The software was supposed to make it easy to read and write data to and from a hard disk? I could do that already ... just write a file system like the real programmers do it. DB2, of course, was a pioneering relational database package from IBM. Given that I now teach relational database design (and a few other things) for a living, I suppose I should have listened to them. IBM, Oracle, and the other RDBMS vendors won out in the end, as they should have, and I learned a great lesson about humility and keeping an open mind about technology. Try everything you can and try not to prejudge what you don't understand.     

![L8 Tee Front](https://github.com/christopherhuntley/BUAN5405-docs/raw/master/Photos/L08_TeeFront.jpeg)

## Copyright &copy; 2020 Christopher Huntley. All rights reserved. 