# Contents
- [Python Lists](#Python-Lists)
    - [Slicing and Indexing](#Slicing-and-Indexing)
    - [Concatenation](#Concatenation)
    - [Mutability](#Mutability)
    - [Nested Lists](#Nested-Lists)
    - [Shallow Copies](#Shallow-Copies)
- [Exercises](#Exercises)

# Python Lists

![ds_list.png](attachment:ds_list.png)


A [python list](https://docs.python.org/3.8/library/stdtypes.html#list) can be written as a list of comma-separated values (items) between square brackets.  The items of a list may be different types (heterogeneous data), but typically they are of the same type.  Lists are useful when the order of elements matter or when elements can be addressed by a numeral (index).

Lists differ from strings in that they can hold more than just characters, and lists are [mutable](#Mutability) whereas strings are immutable.  More on this later in these notes.

In [1]:
squares = [1, 4, 9, 16, 25]
squares

[1, 4, 9, 16, 25]

In [2]:
days_of_the_week = ["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]
days_of_the_week

['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']

In [3]:
mixed_bag = ["Space", 42, "Towel"]
mixed_bag

['Space', 42, 'Towel']

## Slicing and Indexing
Lists may be indexed and sliced just like strings (and all other built-in [sequence](https://docs.python.org/3.8/glossary.html#term-sequence) types).  All slice operations return a new list containing a [shallow copy](https://docs.python.org/3.8/library/copy.html#shallow-vs-deep-copy) of the requested elements (more on this below).

In [4]:
squares[0]

1

In [5]:
squares[-1]

25

In [6]:
squares[:-1]

[1, 4, 9, 16]

In [7]:
squares[-3:]

[9, 16, 25]

In [8]:
days_of_the_week[2]

'Tuesday'

  ## Concatenation
  
  Lists may be concatenated using the plus (+) operator.

In [9]:
squares + [36, 49, 64, 81, 100]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

## Mutability

Unlike strings, lists are [mutable](https://docs.python.org/3.8/glossary.html#term-mutable), meaning that their contents can be changed.  The values of a list can be modified using their index, slicing and even through methods like [`append()`](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists) (functions and methods will be covered later).

In [10]:
cubes = [1, 8, 27, 65, 125] # 65 is incorrect here
cubes[3] = 4 ** 3 # replace the incorrect value
cubes

[1, 8, 27, 64, 125]

In [11]:
letters = ['a', 'b', 'c', 'd', 'e']
letters

['a', 'b', 'c', 'd', 'e']

In [12]:
# replace some letters
letters[2:4] = ['C', 'D']
letters

['a', 'b', 'C', 'D', 'e']

In [13]:
# remove some letters
letters[:3] = []
letters

['D', 'e']

In [14]:
letters = letters + ['q'] # append 'q' to the list letters
letters

['D', 'e', 'q']

In [15]:
letters += ['z'] # append 'z' using the compound operator
letters

['D', 'e', 'q', 'z']

The length of a list can be found using the [`len()`](https://docs.python.org/3/library/functions.html#len) function.

In [16]:
len(letters)

4

## Nested Lists

Lists may also be nested, meaning that lists can contain other lists.  

In [17]:
a = ['a', 'b', 'c']
n = [1, 2, 3]
x = [a, n]
x

[['a', 'b', 'c'], [1, 2, 3]]

In [18]:
x[0]

['a', 'b', 'c']

In [19]:
x[0][1]

'b'

## Shallow Copies

A [shallow copy](https://docs.python.org/3.8/library/copy.html#shallow-vs-deep-copy) effectively creates a new object and inserts references into it referring back to the original object.  This can be beneficial when dealing with large objects in memory as you do not always need a true deep copy.  However, if one is unaware that a shallow copy is being used it can lead to unexpected and difficult to find errors.

To demonstrate how shallow copies, such as those made by slicing and indexing, can lead to unexpected errors consider the following:

In [20]:
a = [[1,2,3], 42, "adf"]
print(a)
b = a[0]
b += [10]
print(a)

[[1, 2, 3], 42, 'adf']
[[1, 2, 3, 10], 42, 'adf']


In the above example the list `a` contains another list as its first element.  The first `print` statement outputs the list `a` as expected.  The next line performs a shallow copy of `a[0]` and stores this in `b`.  After this we append `10` to the list `b`, i.e. `[1,2,3]` -> `[1,2,3,10]`.  However, when we print `a` again after this change to `b`, we see that `a` also reflects this change even though we never explicitly modified `a`.  This is a consequence of a shallow copy and the mutability of list objects.  The shallow copy, `b = a[0]`, effectively makes a reference in `b` pointing back to the mutable list stored in `a[0]`, and the modification, `b += [10]`, actually changes the list rather than returning a new list (like you would see with immutable objects, e.g. integers, strings).  Thus both objects, `a` and `b`, reflect the change.

Let us try the same example again, but instead referring to an immutable type to see the difference:

In [21]:
a = [[1,2,3], 42, "adf"]
print(a)
b = a[1]
b += 10
print(a)

[[1, 2, 3], 42, 'adf']
[[1, 2, 3], 42, 'adf']


# Exercises

1. What does the following code do, and why?
```
a = [[1,2,3], 42, "adf"]
print(a)
b = a[2]
b[1] = 'D'
print(a)
```
1. What does the following code do, and why?
```
a = [[1,2,3], 42, "adf"]
print(a)
b = a[0]
b = [3,2,1]
print(a)
```
1. If all slice operations return a shallow copy, why do we not have the same issues with string that was laid out for lists in the [Shallow Copies](#Shallow-Copies) section above?
1. Store the list `["apple", "banana", "cherry", "orange", "kiwi", "melon", "mango"]` in a variable and complete the following:
    1. Print the middle three items.
    1. Print a single list containing the first two and last two items.
    1. Add "grapefruit" to the list.
    1. Remove "banana" and "cherry" from the list.


# Additonal Material: Did you notice?

Why do the following examples give different results?

In [22]:
a = [[1,2,3], 42, "adf"]  # line 1
print(a)
b = a[0]  # line 2
b += [4]  # line 3
print(a)

[[1, 2, 3], 42, 'adf']
[[1, 2, 3, 4], 42, 'adf']


In [23]:
a = [[1,2,3], 42, "adf"] # line 1
print(a)
b = a[0] # line 2
b = b + [4] # line 4 - note we replace += with the expanded version
print(a)

[[1, 2, 3], 42, 'adf']
[[1, 2, 3], 42, 'adf']


In the first example above we see that we assign the variable `b` to 'be' (or point to) to the first element in `a`, which is a list.  We then use the `+=` operator to modify `b` by concatenating it with the list `[4]`.  We do not explicitly modify `a`, however we see that `a` has changed when we print it out.  


What is happening?

We will have a section on name resolution later (07_name_resolution) that will give you additional detail, but I will attempt to give some insight here.

Typically, people think of variables as holding some value, even 'physically' containing the value by some means.  This is inaccurate.  In Python when we discuss variables we are really referring to [name binding](https://docs.python.org/3/reference/executionmodel.html#naming-and-binding), and a variable effectively 'points' to a region of memory that actually does contain the value.  When we use the assignment operator `=` we are telling the interpreter to add the name of our variable to an internal table and point it to the object from the right hand side.  You could implement this as a dictionary where the keys are variable names, and the values are the objects to which the variables 'point'.

From the example above, after `line 1` and `line 2` in the block above we have something like the following:

![Picture1.png](attachment:Picture1.png)

The `+=` operator in `line 3` performs an **in-place** operation that appends `4` to the existing list (pointed to by `b`) rather than returning a new list (See [In-place Operators](https://docs.python.org/3/library/operator.html#in-place-operators) in the manual for much more information).  The use of the in-place operation results in the following figure:

![Picture2.png](attachment:Picture2.png)



At this point you are probably ok.  However, after our discussion yesterday we saw that `x += 5` and `x = x + 5` are equivalent operations, and indeed if we plug in `b` we see that these are equivalent.

In [24]:
b = [1,2,3]
b += [4]
b

[1, 2, 3, 4]

In [25]:
b = [1,2,3]
b = b + [4]
b

[1, 2, 3, 4]


But wait, why does the second example (repeated below) give a different result?  

In [26]:
a = [[1,2,3], 42, "adf"] # line 1
print(a)
b = a[0] # line 2
b = b + [4] # line 4 - note we replace += with the expanded version
print(a)

[[1, 2, 3], 42, 'adf']
[[1, 2, 3], 42, 'adf']


The expression `b + [4]` is evaluated on its own (not in-place as with `+=`), thus creating a new list containing the values from `b` and `[4]`.  The assignment, `b = b + [4]` then redirects `b` to point at the new list and no longer at the list in `a[0]`.  

![Picture3.png](attachment:Picture3.png)



Similarly, let us consider another example. In the following we replace the assignment/concatentation of `b` with `b = [3,2,1]`.  



In [27]:
a = [[1,2,3], 42, "adf"] # line 1
print(a)
b = a[0] # line 2
b = [3,2,1] # line 5 - note we use an explicitly created list
print(a)


[[1, 2, 3], 42, 'adf']
[[1, 2, 3], 42, 'adf']


Again we see that `a` remains unchanged.  The assignment in `line 5`, just as in `line 4`, is re-assigning the variable `b` to point to the new list `[3,2,1]`.

### Not for the faint of heart

To get a better idea on what is happening under the hood we will need to introduce Python built-in function [`id()`](https://docs.python.org/3/library/functions.html#id) that returns the "identity" of an object.  This "identity" is a unique integer that will remain unchanged for the object's lifetime.  If you are familiar with other programming languages (e.g. C++) where memory addresses can be used directly, these "identities" are somewhat similar except they do not guarantee or represent any particular arrangement in memory, and should not be used as such.

We will use the `id()` function to show how the variables change when using `+=` and `+` operators.

First we create our list `b` with `[1,2,3]` and output its identity.

In [28]:
b = [1,2,3]
print('id =', id(b))
print(b)

id = 140504704889152
[1, 2, 3]


We perform the in-place concatentation and again output the identity.

In [29]:
b += [4]
print('id =', id(b))
print(b)

id = 140504704889152
[1, 2, 3, 4]


Note that the identity of `b` does not change, but the contents of the list do.  Let us try this again using the `=` and `+` operators separately.

In [30]:
b = [1,2,3]
print('id =', id(b))
print(b)

id = 140504704874848
[1, 2, 3]


In [31]:
b = b + [4]
print('id =', id(b))
print(b)

id = 140504704874608
[1, 2, 3, 4]


Two things to notice:

- The id of `b` changed in the first block even though the values of the list are the same as before.  We are binding `b` to a new list so this makes sense.

- The id of `b` changed between these two blocks due to the explicit separation `+` and `=` operators for the same reason.  We created a new list from the expression `b + [4]` and bound `b` to it.

With this new understanding of how variables (or names) work in Python, let us redraw our figure of the lists `a` and `b` from the original example (repeated below).

In [32]:
a = [[1,2,3], 42, "adf"]  # line 1
print(a)
b = a[0]  # line 2
b += [4]  # line 3
print(a)

[[1, 2, 3], 42, 'adf']
[[1, 2, 3, 4], 42, 'adf']




![Picture4.png](attachment:Picture4.png)


A list is a complex data structure that can store heterogeneous data.  In our example we are storing a list, a scalar and a string.  As such it should make sense that the list is really a collection of names (variables) that point to the object we assign.


We can use the `id()` function to give us some additional insight.

In [33]:
a = [[1,2,3], 42, "adf"]  # line 1
print('a =', a)
print('id(a)    =', id(a))
print('id(a[0]) =', id(a[0]))
print('id(a[1]) =', id(a[1]))
print('id(a[2]) =', id(a[2]))

a = [[1, 2, 3], 42, 'adf']
id(a)    = 140504705532544
id(a[0]) = 140504705532304
id(a[1]) = 4532730272
id(a[2]) = 140504704078704


Again, notice that the identities of these objects are not necessarily linearly related as one would expect from memory pointers as in C++ vectors, for example.  

In [34]:
b = a[0]  # line 2
print('b =', b)
print('id(b)    =', id(b))
print('id(a[0]) =', id(a[0]))


b = [1, 2, 3]
id(b)    = 140504705532304
id(a[0]) = 140504705532304


When we make the assignment from `line 2` we see that `b` is bound to the same identity as `a[0]`.  Thus `a[0]` and `b` both point to the same object.

### More Confusing Still - Small Integer Caching

It happens that small integers are used so often in code that it is more efficient to assign the objects containing those values at startup so they are available for use when needed.  The exact range of small integers may vary depending on your version of Python, but this is typically between -5 and 256.  So what does this mean?

In [35]:
x = 42
print(id(42))
print(id(x))
print(id(a[1]))

4532730272
4532730272
4532730272


In the above example we assign the variable `x` to the value `42` and the output the identity of `42`, `x` and `a[1]` (since `a[1]` also contained the value `42`).  Notice that they all have the exact same identity.  This is due to the small integer caching.  

While the above holds for integers in the 'small integer' range, it is not necessarily true that all variables that have the same value share the same identity.  Consider the following:

In [36]:
x = '123a'
print(id('123a'))
print(id(x))
x = '12,,3a'
print(id('12,,3a'))
print(id(x))

140504705548528
140504705548528
140504705550064
140504705549744


The first two strings share the same identity above, but the second two do not.  This may occur for some immutable types.

In [37]:
x = [1,2,3]
print(id([1,2,3]))
print(id(x))

140504705532064
140504704873808


In this last example the two lists do not share the same identity even though their values are equivalent.  