## Module 4: Python


# What We Can Do With Data
## LISTS, SETS, TUPLES

## Smart SORTING
<br>

Asel Kushkeyeva<br>
Data Science Institute, University of Toronto<br>
2022

### Jupyter Notebook as a Slideshow

To see this notebook as a live slideshow, we need to install RISE (Reveal.js - Jupyter/IPython Slideshow Extension):

1. Insert a cell and execute the following code: `conda install -c conda-forge rise`
2. Restart the Jupyter Notebook.
3. On the top of your notebook you have a new icon that looks like a bar chart; hover over the icon to see 'Enter/Exit RISE Slideshow'.
4. Click on the RISE icon and enjoy the slideshow.
5. You can edit the notebook in a slideshow mode by double clicking the line.
*This is done only once. Now all your notebooks will have the RISE extension (unless you re-install the Jupyter Notebook).*

# Agenda

1. Lists
2. Sets and Tuples

# We Can Store Data in LISTS

*List* allows to store many values under one variable.

In [None]:
list_a = [10, 15, 46, 97]

In [None]:
type(list_a)

list

A list can store various types of data. In other words, lists can be __heterogeneous__:

In [None]:
list_b = ['power', 34, 'just do it', 0.9, True]

In [None]:
list_c = [] # this is an empty list

Lists are __ordered__. Items can be indexed. The first item has index [0], the second item has index [1].

In [None]:
list_a[0]

10

In [None]:
list_a[4]

IndexError: list index out of range

How do we change the previous code to avoid the error?

Lists are __changeable__, or __mutable__.

In [None]:
list_a

[10, 15, 46, 97]

In [None]:
list_a[1] = 8

In [None]:
list_a

[10, 8, 46, 97]

### Type Annotations

In [None]:
def lowest_number(L: list) -> float:
    """ Return the lowest number in the list L.
    
    lowest_number([3.4, 7.4, 2.5, 1.3])
    1.3
    """

It is not denoted that this function accepts only list of floats (not strings or booleans). To help solve this discrepancy, Python has module *typing*.

In [None]:
from typing import List
def lowest_number(L: List[float]) -> float:
    """ Return the lowest number in the list L.
    
    lowest_number([3.4, 7.4, 2.5, 1.3])
    1.3
    """

## Operations on Lists

In [None]:
len(list_a)

4

In [None]:
max(list_a)

97

In [None]:
min(list_a)

8

In [None]:
sum(list_a)

161

In [None]:
sorted(list_b)

TypeError: '<' not supported between instances of 'int' and 'str'

TypeError: '<' not supported between instances of 'int' and 'str'. Why did we get this error?

We can concatenate lists...

In [None]:
new_list = list_a + list_b
new_list

[10, 8, 46, 97, 'power', 34, 'just do it', 0.9, True]

In [None]:
list_n = [1, 2, 3] + ['a', 'b', 'c']
list_n

[1, 2, 3, 'a', 'b', 'c']

... and multiply them.

In [None]:
list_a * 2

[10, 8, 46, 97, 10, 8, 46, 97]

In [None]:
list_a

[10, 8, 46, 97]

__list_a__ remained unchanged because we did not assign a new variable to the product of list_a and 2.

*in* operator checks if an object is in the list:

In [None]:
list_b

['power', 34, 'just do it', 0.9, True]

In [None]:
word = input('Enter a word: ')

Enter a word: power


In [None]:
if word in list_b:
    print('{} is in the list_b.'.format(word))

power is in the list_b.


*in* operator with integers:

In [None]:
[34, 0.9] in list_b

False

In [None]:
[1, 2] in [0, [1, 2], 3]

True

## Slicing Lists

In [None]:
rainbow_colors = ['red', 'orange', 'yellow', 'green', 'light blue', 'blue', 'violet']

In [None]:
rainbow_colors[:3]

['red', 'orange', 'yellow']

In [None]:
rainbow_colors[3:]

['green', 'light blue', 'blue', 'violet']

To preserve the original list, we can make a copy of it and safely change its content:

In [None]:
rainbow_copy = rainbow_colors[:]

In [None]:
rainbow_copy[0] = 'fire red'

In [None]:
rainbow_copy

['fire red', 'orange', 'yellow', 'green', 'light blue', 'blue', 'violet']

In [None]:
rainbow_colors

['red', 'orange', 'yellow', 'green', 'light blue', 'blue', 'violet']

### Be aware:

In [None]:
rainbow_copy = rainbow_colors

In [None]:
rainbow_copy[0] = 'fire red'

In [None]:
rainbow_copy

['fire red', 'orange', 'yellow', 'green', 'light blue', 'blue', 'violet']

In [None]:
rainbow_colors

['fire red', 'orange', 'yellow', 'green', 'light blue', 'blue', 'violet']

This way the changes made to the copy will also take place in the original list. Here, `rainbow_copy` and `rainbow_colors` are aliases as they have the same memory address.

## List Methods

In [None]:
rainbow_colors.extend(['purple', 'magenta'])

In [None]:
rainbow_colors

['fire red',
 'orange',
 'yellow',
 'green',
 'light blue',
 'blue',
 'violet',
 'purple',
 'magenta']

In [None]:
rainbow_colors.append('pink')

In [None]:
rainbow_colors

['fire red',
 'orange',
 'yellow',
 'green',
 'light blue',
 'blue',
 'violet',
 'purple',
 'magenta',
 'pink']

Notice the difference:

In [None]:
rainbow_colors.append(['pink'])

In [None]:
rainbow_colors

['fire red',
 'orange',
 'yellow',
 'green',
 'light blue',
 'blue',
 'violet',
 'purple',
 'magenta',
 'pink',
 ['pink']]

In [None]:
rainbow_colors.insert(6, 'navy')

In [None]:
rainbow_colors

['fire red',
 'orange',
 'yellow',
 'green',
 'light blue',
 'blue',
 'navy',
 'violet',
 'purple',
 'magenta',
 'pink',
 ['pink']]

In [None]:
rainbow_colors.remove(['pink'])

In [None]:
rainbow_colors

['fire red',
 'orange',
 'yellow',
 'green',
 'light blue',
 'blue',
 'navy',
 'violet',
 'purple',
 'magenta',
 'pink']

For more commonly used list methods, please see p. 142 of the *Practical Programming: An Introduction to Computer Science Using Python 3.6*.

As we have briefly witnessed, a list can contain other lists -- list `rainbow_colors` had list `pink` as one of its objetcs.

They are called *nested lists*.

In [None]:
students_per_class = [['Grade 9', 20], ['Grade 10', 17], ['Grade 11', 13], ['Grade 12', 22]]

In [None]:
students_per_class[1]

['Grade 10', 17]

In [None]:
students_per_class[1][1]

17

## PRACTICE IN YOUR NOTEBOOK

List `books` contain the following items: ['War and Peace', 'Pride and Prejudice', 'Mocking Jay', 'Three Musketeers', 'The Adventures of Robinson Cruzo', 'Yevgeniy Onegin'].
<br>

1. Using slicing or indexing, create the following:
    - An empty list
    - The last item of `books`
    - List of three items: 'Three Musketeers', 'The Adventures of Robinson Cruzo', 'Yevgeniy Onegin'.

<br>
    
2. Using list methods:
    - Remove 'Pride and Prejudice' from the list.
    - Insert 'Harry Potter and the Chamber of Secrets' after 'Mocking Jay'.
    - Reverse the list.
  

Complete the examples in the docstring and then write the body of the following function:

In [None]:
def same_first_last(L: list) -> bool:
    """Precondition: len(L) >= 2
    Return True if and only if first item of the list is the same as the
    last.
    >>> same_first_last([3, 4, 2, 8, 3])
    True
    >>> same_first_last(['apple', 'banana', 'pear'])
    >>> same_first_last([4.0, 4.5])
    """

# Sorting

Two ways of sorting lists:

In [None]:
fruits = ['apple', 'pineapple', 'kiwi', 'banana']

In [None]:
fruits.sort()

In [None]:
# sort() method sorts the list's element in place, meaning it changes the original list.
fruits

['apple', 'banana', 'kiwi', 'pineapple']

In [None]:
veggies = ['potato', 'celery', 'cabbage', 'bell pepper', 'onion']

In [None]:
sorted(veggies)

['bell pepper', 'cabbage', 'celery', 'onion', 'potato']

In [None]:
# sorted() function sorts the list but does not change the original list
veggies

['potato', 'celery', 'cabbage', 'bell pepper', 'onion']

`sorted()` function has a *key* argument that changes sorting criteria.

In [None]:
sorted(veggies, key = len)

['onion', 'potato', 'celery', 'cabbage', 'bell pepper']

We can pass any function to the *key* argument depending how we want to sort a list.

In [None]:
def last_letter(item):
    return item[-1]

In [None]:
# arranged according to the last letter of each list element
sorted(veggies, key = last_letter)

['cabbage', 'onion', 'potato', 'bell pepper', 'celery']

Sorting nested lists:

In [None]:
students_per_class = [['Grade 9', 20], ['Grade 10', 17], ['Grade 11', 13], ['Grade 12', 22]]

In [None]:
def second_element(item):
    return item[1]

In [None]:
sorted(students_per_class, key = second_element)

[['Grade 11', 13], ['Grade 10', 17], ['Grade 9', 20], ['Grade 12', 22]]

The good news is that we do not need to write functions every time we want to sort our lists. Python has a function `itemgetter` to do the work for us.

In [None]:
from operator import itemgetter

In [None]:
sorted(students_per_class, key = itemgetter(1))

[['Grade 11', 13], ['Grade 10', 17], ['Grade 9', 20], ['Grade 12', 22]]

In [None]:
sorted(students_per_class, key = itemgetter(0))

[['Grade 10', 17], ['Grade 11', 13], ['Grade 12', 22], ['Grade 9', 20]]

## PRACTICE IN YOUR NOTEBOOK

Given the list `people` sort it by people's first name, last name and age. Store the sorted lists under the following names: `by_first_name`, `by_last_name`, and `by_age`, respectively.

people = [('Mark', 'Harrison', 56),
('Ken', 'Wolseley', 23),
('Emily', 'Robinson', 77)]

Sort `colors` list keeping the original list unchanged.

colors = ['purple', 'black', 'maroon', 'mauve', 'aquamarine']

# We Can Store Data in SETS and TUPLES

A *__set__* is:
- unordered;
- distinct;
- mutable.

In [None]:
things = {'coat', 'lock', 'box', 'book', 'apple', 'hair','xylophone', 'lock', 'book'}
print(things)

{'coat', 'xylophone', 'apple', 'box', 'hair', 'lock', 'book'}


An empty *set*:

In [None]:
a = set()
a

set()

In [None]:
rainbow_set = set(rainbow_colors)
rainbow_set

{'blue',
 'fire red',
 'green',
 'light blue',
 'magenta',
 'navy',
 'orange',
 'pink',
 'purple',
 'violet',
 'yellow'}

In [None]:
set(range(7))

{0, 1, 2, 3, 4, 5, 6}

## Operations on Sets

In [None]:
things.add('napkin')
print(things)

{'napkin', 'coat', 'xylophone', 'apple', 'box', 'hair', 'lock', 'book'}


In [None]:
things.remove('apple')
print(things)

{'napkin', 'coat', 'xylophone', 'box', 'hair', 'lock', 'book'}


In [None]:
other_things = {'napkin', 'phone', 'tree', 'xylophone', 'hair', 'book', 'coin'}

In [None]:
things.difference(other_things)

{'box', 'coat', 'lock'}

In [None]:
things.intersection(other_things)

{'book', 'hair', 'napkin', 'xylophone'}

Please see commonly used operations on sets on page 206 *Practical Programming: An Introduction to Computer Science Using Python 3.6*.

## PRACTICE IN YOUR NOTEBOOK

Write a function called `find_dups` that takes a list of integers as its input argument and returns a set of those integers occurring two or more times in the list.

## Tuples

A *__tuple__* is:
- ordered;
- immutable;
- can be subscripted, sliced, and looped over.

In [None]:
mutable_synonyms = ('changeable', 'fluctuating', 'inconstant', 'variable')
mutable_synonyms

('changeable', 'fluctuating', 'inconstant', 'variable')

An empty *tuple*:

In [None]:
e = ()
type(e)

tuple

A *tuple* containing only one element:

In [None]:
(17)
type((17))

int

In [None]:
(17,)
type((17,))

tuple

We can assign to multiple variables using tuples:

In [None]:
(color, shape) = ('red', 'round')

In [None]:
color

'red'

In [None]:
shape

'round'

# References

- Chapter 8, Gries, Campbell, and Montojo, 2017, *Practical Programming: An Introduction to Computer Science Using Python 3.6*
- Itemgetter. https://docs.python.org/3/library/operator.html