# Week 4 : Lecture A 
 ## Data structures: Lists, sorting and tuples
 ##### CS1P - University of Glasgow - John H. Williamson - 2017/2018 Python 3.x

## Data structures
Organising data into **data structures** is a key part of programming. Most languages support a number of different types of data structures which **aggregate** (collect together) simple data like numbers, strings or other data structures. They are sometimes called **compound** data structures because they are made up of elements.

## Lists
Lists are probably the most important compound data type in Python and are very widely used. They are a **sequence type** and represent an ordered sequence of values. Sequence types are a specialised kind of **collection type** (they impose **order**); a **collection type** just means a data type that holds multiple elements.

I'll use **element** to mean a value that can be in a list, and **sequence** to refer to any ordered collection of **elements**. For example the list 

    [1,2,3]
    
is a **sequence** of **elements** 1, 2 and 3.

----

# Why lists?
How would you use lists to design a program? Lists are extremely versatile, and most Python programs will use them extensively. 

Lists are so useful that some other languages, like **Lisp** (LISt Processing), make lists the primary datatype. Programs in Lisp are centered around working with data in list form.

### Unordered collections
A common use is just to aggregate data together. Say you wanted a collection of all the albums on Spotify released by a particular artist. A list is a reasonable way to store this. We can then test if an album is in this list, or add new ones, and so on.

In [2]:
import spotipycaching 

# this actually queries spotify directly
artist = spotipycaching.get_artist("Boards of Canada")
albums = spotipycaching.get_albums(artist)

for album in albums:
    print(album)

Tomorrow's Harvest
The Campfire Headphase
Geogaddi
Music Has The Right To Children
Twoism
Reach For The Dead
In A Beautiful Place Out In The Country
Trans Canada Highway
Peel Session
Hi Scores
Hi Scores 2014 Edition
DJ-Kicks (Lone) [Mixed Tracks]
Late Night Tales: BADBADNOTGOOD
Mr Mistake (Boards Of Canada Remix Instrumental)
Mr Mistake (feat. Boards Of Canada) [Boards Of Canada Remix]
Late Night Tales: Nils Frahm
DJ-Kicks (DJ Koze) [Mixed Tracks]
Balance 026
Late Night Tales: Franz Ferdinand
Warp20 (Unheard)
Warp20 (Chosen)
Pretty Swell Explode
Guerolito (UK Only Version)
Remix EP #1
Corymb


In [3]:
# test if this album is in the list
print("Geogaddi" in albums)

True


In [4]:
# get a list of image URLs and show them
artist_images = spotipycaching.get_album_images('Stars of the Lid')
for image_url in artist_images:
    spotipycaching.show_image(image_url)


### Time-ordered data
Because lists are sequences, we can use them to store ordered data. For example, we might capture the rainfall each day in a measuring cup, and store it as a sequence:
<img src="imgs/rain_chart.jpg" width="600px">
*[Image credit: CambridgeBayWeather. Public Domain]*

In [9]:
week_rainfall = [0.5, 2.5, 10.7, 2.9, 0.0, 0.0, 1.8]
week_days = ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"]
for i in range(len(week_days)):
    print("On", week_days[i], week_rainfall[i], "mm of rain fell")

On Sun 0.5 mm of rain fell
On Mon 2.5 mm of rain fell
On Tue 10.7 mm of rain fell
On Wed 2.9 mm of rain fell
On Thu 0.0 mm of rain fell
On Fri 0.0 mm of rain fell
On Sat 1.8 mm of rain fell


### Vector
We can use lists to store numerical vectors. By a vector I mean a numerical data structure where we can perform operations like vector addition, scalar multiplication, and compute aggregated results like the sum or mean.
<img src="imgs/vector.png">

*[Image credit: User:Acdx via wikimedia commons https://commons.wikimedia.org/wiki/File:3D_Vector.svg]*

In [10]:
pt_1 = [0.0, 5.0, 7.0]
pt_2 = [1.0, 0.0, -1.0]
pt_3 = []

# add together two vectors
for i in range(3):
    pt_3.append(pt_1[i] + pt_2[i])
print(pt_3)

# sum is built in to Python
print(sum(pt_3))

[1.0, 5.0, 6.0]
12.0


## Irregular iteration
`for i in range()` is fine for numerical loops (e.g. every number from 0 to 20). But often you want to run through an iteration of an irregular sequence. Lists make this trivial:

In [11]:
for bird in ["duck", "duck", "goose"]:
    print("You're a %s" % bird)

You're a duck
You're a duck
You're a goose


### Task queues
Imagine we have to schedule some actions to be performed in the future. This happens all the time when you press keys on your keyboard -- what actually happens is that a request to do something (like say copy some text) is put onto a queue. 

When the OS is ready to do something about that request it "pops" the request off and does some action. Lists are the natural data structure for this.

This "pattern" where a queue is used to hold tasks that might come in asynchronously (e.g. requests from a browser to a web server) and process them in a sensible order (e.g. the order they arrived in) is extremely common.

<img src="imgs/keyboard.jpg">
*[Image credit: Jeroen Bennink via flickr.com CC-BY 2.0]*


In [36]:
import subprocess

queue = []  # list of things to do
queue.append(["cd", "imgs"])
queue.append(["dir"])
queue.append(["cd", ".."])

# now we service those requests
while len(queue) > 0:
    # take the first task off the queue
    next_task = queue.pop(0)
    print(str(" ".join(next_task)))
    # some slight hackery to make this work on windows
    print(subprocess.check_output(["cmd", "/k"] + next_task).decode(), end=' ')

cd imgs

C:\Users\jhw___000\Dropbox\cs1p-2017\week_4\lecture_a\imgs> dir
 Volume in drive C is OS
 Volume Serial Number is F2AC-1EA1

 Directory of C:\Users\jhw___000\Dropbox\cs1p-2017\week_4\lecture_a

09/10/2017  11:11    <DIR>          .
09/10/2017  11:11    <DIR>          ..
07/09/2017  11:56    <DIR>          .ipynb_checkpoints
07/09/2017  11:56    <DIR>          imgs
09/10/2017  11:10             6,434 spotipycaching.py
21/09/2016  09:00             7,032 spotipycaching.pyc
09/10/2017  11:11            66,853 week_4_a.ipynb
09/10/2017  11:10    <DIR>          __pycache__
               3 File(s)         80,319 bytes
               5 Dir(s)  56,179,118,080 bytes free

C:\Users\jhw___000\Dropbox\cs1p-2017\week_4\lecture_a> cd ..

C:\Users\jhw___000\Dropbox\cs1p-2017\week_4> 

### Data stack
**Stacks** are data structures which can have elements **pushed** onto them and **popped** off. 

The last element pushed is the first one popped (like stacking sheets of paper -- the first one you lift off the table ("pop") is the last one you put down ("push")). 

This is often a very convenient way of storing temporary results, if you need to be able to go back to a temporary calculation later. 

Ye olde HP calculators used to use this structure to do computations; which meant you never had to worry about brackets or precedence, but you did have to learn to write expressions down in **Reverse Polish Notation**.

<img src="imgs/hp35.jpg">
*[Image credit: Seth Morabito via flickr.com CC-BY-SA 2.0]*


    4 + (9 * 10 + 3 * 10) * 100
    
    becomes
    
    9 10 * 3 10 * + 100 * 4 +
    
    [push 9 and 10] 
    [pop two elements and multiply them, push the result] 
    [push 3] 
    [pop two elements, add and push result] 
    [push 500] 
    [pop two elements and multiply them, push the result] 
    [push 4] 
    [pop two elements, add and push result] 
        


In [None]:
def add(stack):
    # pop two values, add and push the result
    stack.append(stack.pop() + stack.pop())


def mul(stack):
    # pop two values, multiply and push the result
    stack.append(stack.pop() * stack.pop())


# empty stack
stack = []

# push is called append in Python!
stack.append(9)  # 9
stack.append(10)  # 10
mul(stack)  # *
stack.append(3)  # 3
stack.append(10)  # 10
mul(stack)  # *
add(stack)  # +
stack.append(100)  # 100
mul(stack)  # *
stack.append(4)  # 4
add(stack)  # +

# print the output
print(stack.pop())

### Spreadsheet-like tables
A spreadsheet or 2D table is a common way of representing data. A table can be seen as a list of rows, each of which is a list of elements, one per column. This representation is a *list-of-lists*, one list nested inside the other.


In [14]:
# a simple list
columns = ["Month", "Sales", "UnitCost", "Tax"]

# a list of lists
sheet = [["Jun", 205, 0.96, 0.2], 
         ["Jul", 193, 0.94, 0.2],
         ["Aug", 141, 1.02, 0.2], 
         ["Sep", 290, 0.88, 0.2]]

for col in columns:
    # ljust left justifies text to 12 characters -- i.e. it makes
    # each column exactly the same width
    print(col.ljust(12), end=' ')

# we will calculate this row from the existing data
print("Gross")

for row in sheet:
    for col in row:
        print(str(col).ljust(12), end=' ')
    # compute the gross receipts given the current months' figures
    print(row[columns.index("Sales")] * row[columns.index("UnitCost")] *
          (1 + row[columns.index("Tax")]))

    # When we see dictionaries we will see a much more efficient way of doing this
    # We could do this, but then we hard code the position of the columns!
    # print row[1] * row[2] * (1+row[3])

Month        Sales        UnitCost     Tax          Gross
Jun          205          0.96         0.2          236.16
Jul          193          0.94         0.2          217.704
Aug          141          1.02         0.2          172.584
Sep          290          0.88         0.2          306.24


-----

# Lists in Python

#### Dynamically sized
Python lists are dynamically sized: this means you can add and remove elements as you wish, without having to declare the size of the list in advance.

In [15]:
a = [1, 2, 3]
a.append(4)  # we can just append values as we want
print(a)

[1, 2, 3, 4]


#### Dynamically typed 
Like all built in Python collection types, lists don't care about what type their constituent elements are, so you can have lists of numbers, strings, other lists, or any Python data type.

In [16]:
int_list = [1, 2, 3]
string_list = ["one", "two", "three"]

# note that it's fine for a list to hold more lists
# or any other collection type
list_of_lists = [int_list, string_list]

print(int_list)
print(string_list)
print(list_of_lists)

[1, 2, 3]
['one', 'two', 'three']
[[1, 2, 3], ['one', 'two', 'three']]


*Note that you can freely **mix** types within a single list!*. In Python, data types are associated with individual values (e.g. `"three"` has type `string`) and collection types hold collections of values, each with their own indivdual type.

We don't have, for example, a special "list-of-string" type in Python -- you can mix in strings into any list.

In [17]:
mixed_list = [1, 2, "three", "four", 5, int_list]
print(mixed_list)

[1, 2, 'three', 'four', 5, [1, 2, 3]]


This is a consequence of Python's duck typing model, which does not impose any checks on the type of values. A list is just a sequence of any values.

## Syntax
Lists have a **literal syntax** (this means you can directly create a list in a single step in Python). 

Just put values between square brackets `[ ]` and separate with commas:

In [38]:
single_element_list = [1]
valid_list = [5, 6, 7, 8]
string_list = ["first", "last"]

# note that we can put lists inside lists using this syntax
nested_list = [1, [2, [3]], 4]

# we might write the card hand a
# ace-of-hearts, ace-of-clubs, king-of-clubs, king-of-spades
# like this, as a list of four pairs
two_pair = [["a", "hearts"], 
             ["a", "clubs"], 
             ["k", "clubs"],
             ["k", "spades"]]

empty_list = []  # a list can have nothing at all in it

## Length
The length of a list -- the number of values it contains --  can be returned using `len()`. `len()` actually works for **any** sequence type, e.g. for strings.

In [19]:
a = []
print(a, len(a))
b = [1]
print(b, len(b))
c = [1,2,3]
print(c, len(c))

[] 0
[1] 1
[1, 2, 3] 3


In [20]:
## tricky: this list has one element -- which is itself a list
d = [[1,2,3]]
print(d, len(d))

[[1, 2, 3]] 1


## Indexing
To access a single elements of a list, we use square brackets after the list. This is called **indexing**.

In [39]:
elements = ["air", "earth", "fire", "water"]
print(elements[0])  # first element of a
print(elements[3])  # last element of a

air
water


#### 0-based indexing
Python indices lists beginning at 0 (not 1!), so the first element of a list is indexed by [0]  and the last element is (len(list)-1).

In [22]:
print(elements[4]) # this is an error at runtime -- there are only 4 elements!

IndexError: list index out of range

## Chained indices
If we have a list-of-lists, we can just stick more index operators on the end. This just means index the first list (this produces another list) and then index the result of the first indexing operation.
    
    

In [6]:
l = [[1,2,3], 
     [4,5,6], 
     [7,8,9]]

print(l[0][0], l[0][1], l[1][0])

1 2 4


In [8]:
# This would be the same as doing
# l[0][1] is just the same as
row = l[0]
elt = row[1]
print(elt)

2


Note that it's completely fine to have any other operation after the indexing operator:

In [25]:
# count the number of occurences of 1
print(l[0].count(1))

3


## Negative-indices
If you use a negative index, Python treats it as counting backwards from the end of the list. For example `elements[-1]` means the last element in the list; `elements[-2]` means the second-to-last, and so on.


In [41]:
print(elements[-1]) # you might think this would be an error... but it isn't!
print(elements[-2])

water
fire


## Slicing

As well as **indexing** which extracts a specific element, Python lets you **slice** sequence types like lists. This makes it very easy to pick out subsections of a list. Slicing "chops out" a subsequence from a sequence. It works for lists, strings, arrays and many other sequence types.

Slicing uses the syntax:

    my_list[start:end]

and will return all elements starting at `start` and ending at **but not including** `end`.

In [9]:
a = list(range(20))   # range creates a list of numbers 0..19

print("a", a)

a [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]


In [43]:
print("0:5 ", a[0:5])   # creates a new list with 
                        # first five elements of a
    
# We can just omit the start
# it will default to 0
print(":5 ", a[:5])     # creates a new list 
                        # with first five elements of a

0:5  [0, 1, 2, 3, 4]
:5  [0, 1, 2, 3, 4]


In [None]:
print("5:10", a[5:10])  # 6th to 11th element

In [10]:
print(":10", a[10:])  # After the tenth element

:10 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]


In [44]:
print(":-5 ", a[:-5])   # omitting the first index means the same as zero; 
               # negative numbers work exactly as in indexing
               # this takes everything except the last five elements
        

:-5  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]


In [46]:
print("-5: ", a[-5:])    # omitting the last index means until the end
               # this takes the last five elements only

-5:  [15, 16, 17, 18, 19]


## Advanced slicing
You can specify **three-element** slices in Python:
   
       my_list[a:b:c]

which means take the elements from `a:b` (as before), **taking only every `c`th element**

In [48]:
a = list(range(20))   # range creates a list of numbers 0..19

print("0:10", a[0:10])   # simple 2 element slice
print("0:10:1", a[0:10:1]) # exactly the same as a[0:10]
print("0:10:2", a[0:10:2]) # every second element of a[0:10]
print("0:10:20", a[0:10:20]) # every twentieth element of a[0:10]; 
print("::3", a[::3]) # every third element of a
# note that the count starts at zero, so we will always get the first element

0:10 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
0:10:1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
0:10:2 [0, 2, 4, 6, 8]
0:10:20 [0]
::3 [0, 3, 6, 9, 12, 15, 18]


#### Slicing backwards
If the step is negative, we will step **backwards** through the list. This is exactly how the `range()` function worked in defining `for` loops.

In [57]:
a = list(range(20))

print(a[10:0:-1]) # every element of a[0:10] in reverse order
print(a[10:0:-2]) # every second element of a[0:10] in reverse order.

## shorthand to reverse a list
print(a[::-1])   # the whole list, in reverse order

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[10, 8, 6, 4, 2]
[19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


### Item and slice assignment

We can assign directly to list elements. This has an indexed list on the LHS and any value on the RHS

    list[index] = value
    
    

In [2]:
l = [1,2,3]
l[0] = "Wonderful"
print(l)

['Wonderful', 2, 3]


This also works with slices, as long as the LHS and RHS have a
matching number of elements.

    mylist[a:b] = otherlist

In [3]:
# you can assign to slices -- as long as they have the same size on both
# the left and right hand sides
l[0:2] = ["seven", "nine"]
print(l)

['seven', 'nine', 3]


In [11]:
# any sequence will do
# does not have to be a list on both sides
l[0:2] = "ab"
print(l)

['a', 'b', [7, 8, 9]]


In [4]:
l1 = [1,2,3,4,5]
l2 = ["A", "B", "C", "D", "E"]
l1[2:4] = l2[3:5]
print(l1)

[1, 2, 'D', 'E', 5]


In [12]:
# I did not actually know this would work, but it does
x = list(range(20))
# assign 0 to every even element
x[::2] = [0] * 10
print(x)

[0, 1, 0, 3, 0, 5, 0, 7, 0, 9, 0, 11, 0, 13, 0, 15, 0, 17, 0, 19]


## Iterating
One of the most common uses of a sequence data type is to do something to every element in order. For example, to sum all the numbers in a list, or to add an apostrophe to every string in a list.

In Python, we use the `for` loop to iterate over lists

    for element in my_list:
        some_fn(element)
        

In [2]:
# simple use of for loop
chinese_elements = ['earth', 'fire', 'water', 'wood', 'metal']
def print_all(my_list):
    for elt in my_list:
        print(elt)
        
print_all(chinese_elements)

earth
fire
water
wood
metal


In [76]:
# add together all of the elements in a list
def sum_list(my_list):
    total = 0
    for elt in my_list:
        total += elt
    return total


print(sum_list([1, 1, 10, 1]))

13


In [None]:
# actually, Python already has sum() built in because it is so often used
print(sum([1,1,1,1]))

A very common problem is to do something to every element to a list and create a *new list* of the elements being transformed in some way. One way to do this is by creating an empty list and appending the transformed elements to it:


In [78]:
## multiply each element of a list by a constant
def mul_list(x, a):
    new_list = []  # a shiny, new empty list
    for n in x:
        new_list.append(n * a)  # add the transformed elements
    return new_list

In [79]:
print(mul_list([1, 2, 3], 2))

[2, 4, 6]


This works fine, but we'll see a much cleaner way of doing this type of operation when we cover **list comprehensions** later in the term.

### Counted loops
It's possible to use a **counted loop** to iterate through the elements of a list, where we have an **index** and increment it to go through the elements. However, this is much more clumsy than the `for x in l:` syntax and should be avoided where possible.

Always prefer to use:
    
    for elt in list:
        # do something to elt
        
instead of the counted loop alternative

    for i in range(len(list)):
        # do something to list[i]
        

In [48]:
# the use of range() with len() works
# for any sequence type
for i in range(len(chinese_elements)):
    print(i, chinese_elements[i])

0 earth
1 fire
2 water
3 wood
4 metal


### Enumerate
Sometimes it is useful to both get the elements of a list and their index. This is particularly handy if you want to index two lists at the same time. The handy `enumerate()` function makes that easy:


In [3]:
chemical_elements = ["hydrogen", "helium", "lithium", 
                     "beryllium", "boron"]
for i, elt in enumerate(chinese_elements):
    print(i, elt, chemical_elements[i])

0 earth hydrogen
1 fire helium
2 water lithium
3 wood beryllium
4 metal boron


# List operations

We can do many things with lists:
* index and slice `l[0]` `l[0:5]`
* join `list_a + list_b`
* add elements  `list_a.append("one")`
* remove elements `list_a.remove("one")` or `del list_a[0]`
* sort  `sorted(list_a)`
* test for membership `elt in list_a`
* find the index of elements `list_a.find("one")`
* copy them `list_a.copy()`

## Joining
The `+` operator is **overloaded** for use with lists (or in fact any sequence). This means that using `+` on lists will join them together. Note that `+` does **not** add each elements of a lists to another; it joins two lists into a new one.

Joining of sequences is called **concatenation** in computer science; the `+` operator is sometimes called the "concatenation operator" when used on lists.

In [29]:
chinese_elements = ['earth', 'fire', 'water'] + ["wood", "metal"]
print(chinese_elements)

['earth', 'fire', 'water', 'wood', 'metal']


In [30]:
a = [1, 2, 3]

print(a + a + a)  # we can concatenate many lists

[1, 2, 3, 1, 2, 3, 1, 2, 3]


#### The * operator on lists
Somewhat less usefully, the `*` operator is also overloaded; it repeats a list multiple times. The left operand must be a list and the right operand must be an integer:


In [31]:
print(a*3)   # same as a + a + a
print(a*1)   # one repetition, same as a[:]
print(a*0)   # empty list -- note that it is quite ok to have an empty list

[1, 2, 3, 1, 2, 3, 1, 2, 3]
[1, 2, 3]
[]


## Adding and removing
As well as creating new lists, lists can be modified **in place**. **This is a major difference from types we have seen so far**. We can add new elements to a list with `.append()`

In [63]:
l = []
l.append(1)
l.append(2)
l.append(3)
print(l)

[1, 2, 3]


This doesn't make a new list. It changes the elements of the existing list. This is a very important difference.

#####  Append is not +

In [5]:
a = [1,2,3]
b = a+a
print("b", b)

c = []
c.append(a)
c.append(a)
print("c", c)

b [1, 2, 3, 1, 2, 3]
c [[1, 2, 3], [1, 2, 3]]


We can insert elements at any position using `list.insert(index, element)`

In [17]:
l = ["one", "two"]

# the argument to insert
# says to insert *just before* the
# element that is currently
# in that position
l.insert(1, "one and a half")
# insert at start
l.insert(0, "a half")   
print(l)


['a half', 'one', 'one and a half', 'two']


### Popping: lists like stacks
If we want to treat a list like a stack, we can "pop" the elements off using .pop(). Note that this **removes** the last element and returns it.

In [64]:
print(l)
print(l.pop(), l)
print(l.pop(), l)
print(l.pop(), l)

[1, 2, 3]
3 [1, 2]
2 [1]
1 []


#### Popping off the front
If we want to use a list like a queue, we can tell `pop()` to pop elements off the start of the list instead of the end.

In [65]:
l = [1,2,3]
print(l.pop(0), l)   # now we pop from the start
print(l.pop(0), l)   # now we pop from the start
print(l.pop(0), l)   # now we pop from the start

1 [2, 3]
2 [3]
3 []


### Deleting by value
We can remove elements with a specific value using `.remove(val)`. This will:
* search the list to find `val`
* remove it from the list

Each call to `remove` will only remove *one* occurrence of `val`. 

In [26]:
l = ["alpha", "bravo", "charlie", "bravo"]
l.remove("bravo")
print(l)
l.remove("bravo")
print(l)


['alpha', 'charlie', 'bravo']
['alpha', 'charlie']


In [27]:
l.remove("bravo")
print(l)

ValueError: list.remove(x): x not in list

### Deleting at index
We can remove elements using the `del` operator (note that `del` is an **operator**, not a function). We must specify where in the list we want to remove the element from.

In [28]:
l = [1,2,3]
del l[1]   # remove the second element
print(l)
del l[1]
print(l)

[1, 3]
[1]


Removing is generally slower than using `del` because it requires a **linear search** through the list to find elements that match.

## WARNING: Do not modify a list while iterating over it
If you try and modify a list while iterating over it, you can cause some very unexpected errors:

In [29]:
num_list = [2,2,3,2,1,2,3,2,2,2,5,2,2,2,2]
for i,num in enumerate(num_list):
    if num==2:
        # NO!
        del num_list[i]
print(num_list)

[2, 3, 1, 3, 2, 5, 2, 2]


### Mutability
Note that we have actually changed the lists using `.append()` and `del`. The property of a data structure being changeable is called **mutability** (the ability to **mutate** or change). This mutability of `list`s has important consequences, which we'll discuss below.

## Membership test
We can see if an element is present in a sequence using the `in` operator. This returns a Boolean value which will be True if the value on the right is in the sequence on the left of `in`.

In [1]:
print("earth" in chinese_elements)
print("helium" in chinese_elements)

NameError: name 'chinese_elements' is not defined

In [69]:
print(1 in [1,2,3]) # ok

True


In [70]:
print(1 in [[1,2,3]]) # not true: the element on the right is a list of lists

False


In [71]:
my_list = [1,2,3]

# define a function to test if an element is inside a sequence
def print_if_in(elt, l):    
    if elt in l:
        print("%s is in %s" % (elt,l))
    else:
        print("%s is not in %s" % (elt, l))
        
print_if_in(1, my_list)
print_if_in(10, my_list)

1 is in [1, 2, 3]
10 is not in [1, 2, 3]


## Finding elements
We can find which index an element is at using `index()`, which takes a value and returns the *first* index where that element appears:

In [72]:
print("Fire is at index", chinese_elements.index("fire"))

NameError: name 'chinese_elements' is not defined

It will cause an error if you attempt to find the index of an element that is not in a list.

In [54]:
# this will cause an exception, since helium ain't there!
if "helium" in chinese_elements:
    print("Helium is at index", chinese_elements.index("helium"))

NameError: name 'chinese_elements' is not defined

We can also count how many times an element appears in a list using `list.count(value)`

In [73]:
fruits = ["apple", "apple", "apple", "pear", "orange", "pear", "apple"]

for name in ["apple", "pear", "orange", "kumquat"]:
    print("There are %d %ss" % (fruits.count(name), name))

There are 4 apples
There are 2 pears
There are 1 oranges
There are 0 kumquats


## Sorting, reversing and shuffling
### Sorting
Many sequences can be **ordered** according to some attribute; a list of numbers can be ordered according to their magnitude, for example. A list of strings can be ordered by the lexicographically (alphabetic) ordering.

Python provides very efficient sorting functionality to sort sequences into order. Sorting sequences is one of the essential operations in computer science. It is used very extensively.

There are two ways to sort: **in-place** which changes the list directly; or by **copying** which creates a new list with the sorted elements in it.

`list.sort()` sorts the elements of a list in-place. `sorted(seq)` returns a new sequences with the elements in order and leaves the original unchanged.

In [83]:
anagram = ["s", "m", "t", "o", "a", "l"]
anagram.sort()
print(anagram)

['a', 'l', 'm', 'o', 's', 't']


In [84]:
anagram = ["s", 'm', "t", "o", "a", "l"]
sorted_anagram = sorted(anagram)
print(anagram)          # this will not have changed
print(sorted_anagram)   # this will be the sorted version

['s', 'm', 't', 'o', 'a', 'l']
['a', 'l', 'm', 'o', 's', 't']


### The order of lists
Sorting uses whatever the result of the comparison operators (<,>,=) would be. Any type you can compare can be sorted. And lists are defined to be ordered according to their first element. So sorting a list of lists has a well-defined and useful effect

In [55]:
menu_items = [[5,"pork"], [7,"beef"], 
              [13, "chicken"], [11, "tofu"], 
              [3, "prawn"], [1, "vegetable"]]
print(sorted(menu_items))

[[1, 'vegetable'], [3, 'prawn'], [5, 'pork'], [7, 'beef'], [11, 'tofu'], [13, 'chicken']]


## Shuffling
Another useful operation on lists is **random shuffling**. This is often important in games (e.g. shuffling a deck of cards) and in statistical models (to estimate the distribution of data). The Python module `random` provides a `shuffle()` method:

In [86]:
l = [2, 3, 4, 5, 6, 7, 8, 9, 10, "jack", "king", "queen", "ace"]

import random  # make shuffle available

# NB shuffle works **in-place**, like .sort() does
random.shuffle(l)
print(l)

[5, 9, 'queen', 'king', 6, 2, 'ace', 'jack', 3, 10, 7, 8, 4]


In [87]:
random.shuffle(l)
print(l)

[6, 5, 'queen', 8, 9, 'ace', 4, 10, 7, 'jack', 3, 'king', 2]


## Copying and references
Let's return to the idea of **mutability**. In Python, simple types like numbers and strings are **immutable** (they cannot be changed after they have been created).

In [57]:
## integers
a = 32
b = a
b = b + 1
print("a=",a)
print("b=", b)
print()

## strings
c = "hello"
d = c
d += " world"
print("c=",c)
print("d=", d)

a= 32
b= 33

c= hello
d= hello world


But lists are **mutable**; they can be changed after they have been created.
The list itself lives off in some bit of memory -- variables just **point at the** list.  They are **references** to the list that has been created.

If multiple variables point at one single list then making a change to the list will appear for every variable that refers to it!

In [88]:
a = [1,2,3]
b = a
b.append("!")

# note: both a and b will be [1,2,3,"!"]
# because they both refer to the *same* list
print("a=",a)
print("b=",b)

a= [1, 2, 3, '!']
b= [1, 2, 3, '!']


## Side effects of mutability
This can have subtle side effects:

In [89]:
# x is the list of [0] repeated 10 times
x = [[0]] * 10
print(x)

[[0], [0], [0], [0], [0], [0], [0], [0], [0], [0]]


In [90]:
y = x
# this means to take the first element of the 
# first element of y and set it to 1
y[0][0] = 1

# what happens to x?
print(x)

[[1], [1], [1], [1], [1], [1], [1], [1], [1], [1]]


### Copying
If you want to work a list without affecting existing variables which refer to it, you need to make a **copy** of the list. 

This is true for all data structures in Python (it's true for numbers and strings too, but there is no way to modify them, so no need to ever copy them).

**Handy note: slicing a list returns a new list with the same elements.**

Thus, the syntax [:] (a slice taking the whole list) can be used to copy a list:

In [91]:
a = [1,2,3]
b = a[:]          # create a new list with the same entries as are in a
b.append("!")

# Now, a and b refer to *different* lists
print("a=",a)
print("b=",b)

a= [1, 2, 3]
b= [1, 2, 3, '!']


You can test two lists for **equality**, which tests if they have the same elements:

In [62]:
a = [1,2,3]
b = [1,2,3]
print(a==b)

False


But if you want to test if a list is a **copy** of another, you can use the `is` operator. This tests if two variables refer to the same value, not if their elements are equal. 

**Make sure you understand this difference!**

In [63]:
a = [1,2,3]
b = a      # A *reference* to a
c = a[:]   # A *copy* of a

print("a==b", a==b)
print("a==c", a==c)
print("a is b", a is b)
print("a is c", a is c)

a==b True
a==c True
a is b True
a is c False


In [64]:
# This means that if we changed b, the value of a would change as well. 
# But has no effect on c, which is a separate list
b.append("sentinel")
print(a)
print(b) 
print(c)

[1, 2, 3, 'sentinel']
[1, 2, 3, 'sentinel']
[1, 2, 3]


### Operators
Operators like `+` (concatenate) return new lists and do **not** modify lists in place:

In [65]:
a = [1,2,3]
b = [4,5,6]
c = a + b         # c is now a new list which has the elements of a and b
c.append("!") 
print(a)
print(b)
print(c)

[1, 2, 3]
[4, 5, 6]
[1, 2, 3, 4, 5, 6, '!']


### Mutable vs immutable operations
In Python, there is a simple rule:
* If a function (e.g. `a.sort()`) changes a data structure **in place** (i.e. **mutates** it), it always returns `None`
* If it does not change the original data structure, then it **creates an entirely new data structure** (e.g. `sorted(a)`), fills it with elements from the original list and returns the new data structure.
* Operators like +, * etc. always return new lists, and don't modify lists in place.

You should follow this convention in any code that you write!


-----
## Tuples
Sometimes it is useful to have sequences which are **immutable**; they cannot change after creation. There are several use cases for this, one important case we will see when we cover *dictionaries*.

Python provides **tuples** to fill this role. The **literal syntax** is just like for lists, except that round brackets `(` and `)` are used:


In [67]:
a_tuple = (1,2,3) # this is a tuple (round brackets)
a_list = [1,2,3]  # this is a list (square brackets)

In [68]:
a_list[0] = 5 # fine, lists can be mutated
print(a_list)

[5, 2, 3]


In [69]:
a_tuple[0] = 5 # this will be an error, because tuples cannot be mutated

TypeError: 'tuple' object does not support item assignment

You can create a new tuple from a list (or any other sequence type) using `tuple()`,
and since tuples are sequence types themselves, you can convert tuples to lists using `list()`


In [93]:
a = [1,2,3]
b = tuple(a)  # new tuple with same values as a
c = list(b)   # new list with same values as b
print(a,b,c)

[1, 2, 3] (1, 2, 3) [1, 2, 3]


Converting a list to a tuple "freezes" it and it, as the tuple version can no longer be modified.

In [94]:
b[0] = 10 # error, b is frozen

TypeError: 'tuple' object does not support item assignment

In [95]:
a[0] = 10 # fine
print(a, b)

[10, 2, 3] (1, 2, 3)


The round brackets are there just to avoid ambiguity, and you can often leave them off:

In [59]:
a = 1,2,3
print(a)

# this doesn't generate a tuple
# because commas mean "separate arguments" inside
# a call. 
print(1,2,3)

# we need the brackets to make it unambiguous that we want to print a tuple
print((1,2,3))

(1, 2, 3)
1 2 3
(1, 2, 3)


### Converting to lists
Any sequence can be converted to a list using `list()`

In [60]:
list('string')

['s', 't', 'r', 'i', 'n', 'g']

There's a built in function `reversed()` which reverses a sequence. For good reasons, it returns an **generator**, a data type which we haven't seen yet. But we can easily convert it to a list with `list()`

In [61]:
forward = ["for", "wa", "rd"]
backward = reversed(forward)
print(backward) # what is this?!
print(list(backward))

<list_reverseiterator object at 0x000002707B7262B0>
['rd', 'wa', 'for']


## Multiple return arguments and  unpacking
We saw that Python can return **mutiple values from a function** using syntax like:

    def two_names():
        return "First", "Last"
        
What's actually happening is that the `return a,b,c` syntax is creating a new **tuple** and returning it. 

In [19]:
def two_names():
    return "First", "Last"
print(two_names())

('First', 'Last')


In [20]:
# here we **unpack** into two variables
first, last = two_names()
print(first)
print(last)

First
Last


In [98]:
# here we keep the return value as a single tuple
ret_val = two_names()
print(ret_val)

('First', 'Last')


### Unpacking
This **unpacking** syntax, where we can write

    a,b,c = (1,2,3)
    
or more generally:

    var_1, var_2, ... = some_tuple
        
works for any sequence type. But there must be the same number of variables on the left hand side as there are in the sequence!

In [99]:
a, b = [500, 1000]
print(a)
print(b)

500
1000


In [100]:
a, b = list(range(20)) # create a list of numbers 0..19
print(a,b)

ValueError: too many values to unpack (expected 2)

# Finally
Please send me feedback via YACRS. You have one tweet worth of text!

-----

## Week Review

* Lists are a data structure for **sequences**.
* They can hold any type of data -- anything that can go in a variable can go in a list.
* You can write a list in code using comma separated values within square brackets `[1,2,3]`.
* You can get the length of a list using `len`
* You can join lists with `+`
* They can be changed after they have been created with functions like `.append()` and `del`
* `del` removes elements by *index*, `.remove()` will remove an element by *value*.
* Slices `[start:end]` allow you to chop out sections of lists (e.g. `l[5:8]`), and in the two colon form `[start:end:step]` also allow you to reverse lists and/or take every *nth* item.
* Iterating over lists is the most common control structure in Python `for x in l:`
* `in` will tell you if an element is in a list; 
* `.index()` will tell you where an element is in a list.
* `.sort()` and `sorted()` can sort a list for you. The `key` parameter lets you choose how to transform list items before they are compared for sorting.
* Lists are mutable. They can be changed after creation.
* The fact that you can change them after they have been created is handy, but you must be careful when multiple variables refer to one list.
* Tuples are like lists, but are immutable. They cannot be changed after creation.
* Returning multiple values, and parallel assignment, and really using tuples.

## Syntax review [from learnxinyminutes.com]

In [None]:
# Lists store sequences
li = []
# You can start with a prefilled list
other_li = [4, 5, 6]

# Add stuff to the end of a list with append
li.append(1)    # li is now [1]
li.append(2)    # li is now [1, 2]
li.append(4)    # li is now [1, 2, 4]
li.append(3)    # li is now [1, 2, 4, 3]
# Remove from the end with pop
li.pop()        # => 3 and li is now [1, 2, 4]
# Let's put it back
li.append(3)    # li is now [1, 2, 4, 3] again.

# Access a list like you would any array
li[0]  # => 1
# Assign new values to indexes that have already been initialized with =
li[0] = 42
li[0]  # => 42
li[0] = 1  # Note: setting it back to the original value
# Look at the last element
li[-1]  # => 3

# Looking out of bounds is an IndexError
li[4]  # Raises an IndexError

# You can look at ranges with slice syntax.
# (It's a closed/open range for you mathy types.)
li[1:3]  # => [2, 4]
# Omit the beginning
li[2:]  # => [4, 3]
# Omit the end
li[:3]  # => [1, 2, 4]
# Select every second entry
li[::2]   # =>[1, 4]
# Reverse a copy of the list
li[::-1]   # => [3, 4, 2, 1]
# Use any combination of these to make advanced slices
# li[start:end:step]

# Remove arbitrary elements from a list with "del"
del li[2]   # li is now [1, 2, 3]

# You can add lists
li + other_li   # => [1, 2, 3, 4, 5, 6]
# Note: values for li and for other_li are not modified.

# Concatenate lists with "extend()"
li.extend(other_li)   # Now li is [1, 2, 3, 4, 5, 6]

# Remove first occurrence of a value
li.remove(2)  # li is now [1, 3, 4, 5, 6]
li.remove(2)  # Raises a ValueError as 2 is not in the list

# Insert an element at a specific index
li.insert(1, 2)  # li is now [1, 2, 3, 4, 5, 6] again

# Get the index of the first item found
li.index(2)  # => 1
li.index(7)  # Raises a ValueError as 7 is not in the list

# Check for existence in a list with "in"
1 in li   # => True

# Examine the length with "len()"
len(li)   # => 6


# Tuples are like lists but are immutable.
tup = (1, 2, 3)
tup[0]   # => 1
tup[0] = 3  # Raises a TypeError

# You can do all those list thingies on tuples too
len(tup)   # => 3
tup + (4, 5, 6)   # => (1, 2, 3, 4, 5, 6)
tup[:2]   # => (1, 2)
2 in tup   # => True

# You can unpack tuples (or lists) into variables
a, b, c = (1, 2, 3)     # a is now 1, b is now 2 and c is now 3
d, e, f = 4, 5, 6       # you can leave out the parentheses
# Tuples are created by default if you leave out the parentheses
g = 4, 5, 6             # => (4, 5, 6)
# Now look how easy it is to swap two values
e, d = d, e     # d is now 5 and e is now 4