# Collection Data Structures: Lists (and Tuples)


We have covered in detail much of the basics of Python’s primitive data types: numeric types, strings, and booleans. 
We are now going to examine how these basic types can be composed together, in more complex structures. 
We will start by examining lists.


## Basics of Lists

### Creating Lists

A list, sometimes called and array, or a vector, is an ordered collection of values. 

In Python, lists are specified by square brackets, `[ ]`, containing zero or more values, separated by commas. 

For example:

In [None]:
number_list = [1, 2, 3, 0, 5, 10, 11]
print(number_list)
type(number_list) # check the type of the list!

In [None]:
name_list = ["Elsa", "Anna", "Olaf"]
print(name_list)

In [None]:
mixed_list = ["Elsa", 21, "Anna", 18]

In [None]:
empty_list = []
print(empty_list)

In [None]:
empty_list = list()
print(empty_list)

In [None]:
type(number_list)

Lists are a very  common data structure, and are often generated as a result of other functions, for instance, the `split(" ")` command for a string, will split a string on space, and then return a list of the smaller substrings.

In [None]:
my_string = "The cold never bothered me anyway"
list_of_words = my_string.split(" ")
print(list_of_words)

### Accessing parts of a list: Indexing and Slicing revisited



We can retrieve the value of a particular element of a list by writing in square brackets the location of the element. In python, like most programming languages, list indices start at 0, that is, to get the first element in a list, request the element at index 0.

In [1]:
my_string = "The cold never bothered me anyway"
list_of_words = my_string.split(" ")
print(list_of_words)
print(list_of_words[0])

['The', 'cold', 'never', 'bothered', 'me', 'anyway']
The


Negative indices can be used to traverse the list from the right.

In [2]:
print(list_of_words[-2])

me


If you remember the case with strings and accessing the individual characters, the concept is exactly the same. In fact, strings are treated in Python as lists of characters.

This also means that the slicing operator works for accessing parts of the list as well:

In [3]:
# Retrieves the elements at positions 2, 3, and 4.
# Remember these are the 3rd, 4th, and 5th elements of the list
print(list_of_words)
print(list_of_words[2:5])

['The', 'cold', 'never', 'bothered', 'me', 'anyway']
['never', 'bothered', 'me']


In [4]:
# Retrieved the last 3 elements of the list
print(list_of_words[-3:])

['bothered', 'me', 'anyway']


We can even update the elements of the lists by indexing and slicing

In [5]:
list_of_words[-2] = "you"

### Exercise

You are given a list of names of penguins as one big, multiline string. Each line contains one name.

In [8]:
penguins_string = '''Emperor
King
Gentoo
Magellanic
Chinstrap
Adelie
Macaroni
Rockhopper'''


* Use the `split` command, appropriately configured, to separate `penguins` into a list of penguins. 
* Extract the 3rd penguin from the list
* Extract the second from the last penguin, using negative indexing
* Retrieve the last 2 penguins.

In [10]:
# Use the split command, appropriately configured, to separate names into a list of names.
# We use the newline character to split the string into one element per *line*
penguins_list = penguins_string.split("\n")
print(penguins_list)

['Emperor', 'King', 'Gentoo', 'Magellanic', 'Chinstrap', 'Adelie', 'Macaroni', 'Rockhopper']


In [11]:
# Extract the 3rd penguin from the list
print(penguins_list[2])


Gentoo


In [12]:
# Extract the second from the last penguin, using negative indexing
print(penguins_list[-2])

Macaroni


In [13]:
# Retrieve the last 2 penguins
print(penguins_list[-2:])

['Macaroni', 'Rockhopper']


## Functions that apply to lists


### Common Functions



```
# This is formatted as code
```

* `len`: The function `len(list)` returns the number of elements in a list.
* `sorted`: The function `sorted(list)` Returns the list sorted
* `max`: Returns the maximum element of a list
* `min`: Returns the minimum element of a list
* `sum`: The function `sum(list)` sums up all the (numeric) elements of a list


In [None]:
muppets = ["Big Bird", "Oscar", "Ernie", "Kermit", "Cookie Monster", "Julia", "Rosita", "Elmo", "Bert"]

In [None]:
print("Length of muppets list:", len(muppets))

In [None]:
numbers = [3, 41, 12, 9, 74, 15]

In [None]:
print("Length of numbers list:", len(numbers))

In [None]:
muppets = ["Big Bird", "Oscar", "Ernie", "Kermit", "Cookie Monster", "Julia", "Rosita", "Elmo", "Bert"]
print("Sorted List:", sorted(muppets))
print("Original List:", muppets)

In [None]:
print("Sorted List:", sorted(numbers))
print("Original List:", numbers)

In [None]:
print("We have ", len(numbers), "numbers")
print("Max number:", max(numbers))
print("Min number:", min(numbers))
print("Sum:", sum(numbers))

In [None]:
# Min and max also operate on strings
print("Min muppets:", min(muppets))
print("Max muppets:", max(muppets))

### Exercise


* Write code that computes the average value of a list of numbers
* Write code that computes the median value of a list of numbers (for simplicity, assume the list contains an odd number of items)
  * We first obtain the sorted numbers, then by using it we obtain the median.

In [None]:
numbers = [3, 41, 12, 9, 74, 15, 5]

print("mean of the numbers", YOUR_CODE_HERE)
print("sorted nums",YOUR_CODE_HERE)
print("median of the numbers",YOUR_CODE_HERE)

## Adding / Removing Elements to a List



### Appending items at the end of a list

+ `list.append(x)`: add an element ot the end of a list
+ `list_1.extend(list_2)`: add all elements in the second list to the end of the first list. Alternatively, it is possible to use the `+` and concatenate two lists.

In [14]:
# Example of append
muppets = ["Big Bird", "Oscar", "Ernie", "Kermit", "Cookie Monster", "Julia", "Rosita", "Elmo", "Bert"]
muppets.append("Zoe")
muppets.append("Abby")
print(muppets)

['Big Bird', 'Oscar', 'Ernie', 'Kermit', 'Cookie Monster', 'Julia', 'Rosita', 'Elmo', 'Bert', 'Zoe', 'Abby']


In [16]:
# Example of extend
muppets = ["Big Bird", "Oscar", "Ernie", "Kermit", "Cookie Monster", "Julia", "Rosita", "Elmo", "Bert"]
muppets_to_add = ["Abby", "Zoe"]
print("Length of list:", len(muppets))
muppets.extend(muppets_to_add)
print(muppets_to_add)

print("Length of list:", len(muppets))

Length of list: 9
['Abby', 'Zoe']
Length of list: 11


In [8]:
# List concatenation. This is similar to "extend"
muppets = ["Big Bird", "Oscar", "Ernie", "Kermit", "Cookie Monster", "Julia", "Rosita", "Elmo", "Bert"]
muppets_to_add = ["Abby", "Zoe"]
new_muppets = muppets + muppets_to_add
print(new_muppets)

['Big Bird', 'Oscar', 'Ernie', 'Kermit', 'Cookie Monster', 'Julia', 'Rosita', 'Elmo', 'Bert', 'Abby', 'Zoe']


In [7]:
# Notice that append will not work as expected when we pass a list
# We now created a "nested" list. We will examine nested lists later
muppets = ["Big Bird", "Oscar", "Ernie", "Kermit", "Cookie Monster", "Julia", "Rosita", "Elmo", "Bert"]
#muppets.append("Abby")
#muppets.append("Zoe")
muppets.append(["Abby","Zoe"])
print(muppets)
# Notice that the two lists, created by append and extend, have different lengths
print("Length of list:", len(muppets))

['Big Bird', 'Oscar', 'Ernie', 'Kermit', 'Cookie Monster', 'Julia', 'Rosita', 'Elmo', 'Bert', ['Abby', 'Zoe']]
Length of list: 10


### Inserting and removing items in the list at any position (may skip this)



* `list.insert(index, x)`: insert element x into the list at the specified index. Elements to the right of this index are shifted over
* `list.pop(index)`: remove the element at the specified position


In [2]:
penguins_string = '''Emperor
King
Gentoo
Magellanic
Chinstrap
Adelie
Macaroni
Rockhopper'''
penguins_list = penguins_string.split("\n")

In [3]:
# Insert Little Penguin in position 7
penguins_list.insert(7, 'Little')
print(penguins_list)

['Emperor', 'King', 'Gentoo', 'Magellanic', 'Chinstrap', 'Adelie', 'Macaroni', 'Little', 'Rockhopper']


In [4]:
# It is there
print(penguins_list[7])

Little


In [5]:
# We can retrieve the element with pop, and then delete it from the list
penguins_list.pop(7)

'Little'

In [6]:
# Name at position 7 changed
print(penguins_list)

['Emperor', 'King', 'Gentoo', 'Magellanic', 'Chinstrap', 'Adelie', 'Macaroni', 'Rockhopper']


In [None]:
# If we repeat the operation with pop, we will get back the new name that is now in that location
penguins_list.pop(7)

In [None]:
# Name at position 7 changed
print(penguins_list[7])

## Finding items in lists

### Common functions

* `x in list`: checks if `x` appears in the list 
* `list.index(x)`: looks through the list to find the specified element, returning it's position if it's found, else throws an error
* `list.count(x)`: counts the number of occurrences of the input element

In [23]:
muppets = ["Big Bird", "Oscar", "Ernie", "Kermit", "Cookie Monster", "Julia", "Rosita", "Elmo", "Bert"]

In [24]:
"Elmo" in muppets

True

In [25]:
"Zoe" not in muppets

True

In [None]:
# Index
muppet = 'Cookie Monster'
print("Location of", muppet, "in the list:", muppets.index(muppet))

### Exercise


Now let's practice the `if-else` command:

* Define a variable `search` with the name of a muppet that you want to search for.
* Check if the name appears **in** the list
    * If yes, then return the **index** number for its first apperance
    * If not, print that the name does not appear in the list

In [None]:
muppets = ["Big Bird", "Oscar", "Ernie", "Kermit", "Cookie Monster", "Julia", "Rosita", "Elmo", "Bert"]

In [None]:
# I have already created the skeleton of the code. Complete it!

search = 'Zoe' # we want to search Zoe
if YOUR_CODE_HERE:
    location = muppets.index(search)
    print(search + " appeared at location " + str(location))
else:
    print("The name", search, "does not appear in the list")

If you finish writing the code above, then test the following muppets by putting each of them to `search`.


```
Big Bird
Cookie Monster
CookieMonster <- without space!
Zoe <- Zoe is not in muppets
```



### Exercise



Let's apply some of the things that we learned so far in the article below.

In [None]:
news = \
"""Johnson&Johnson said that it planned to apply for emergency authorization of the vaccine from the Food and Drug Administration as soon as next week, putting it on track to receive clearance later in February.
"This is the pandemic vaccine that can make a difference with a single dose," said Dr.Paul Stoffels, the chief scientific officer of Johnson&Johnson.
The company's announcement comes as the Biden administration is pushing to immunize Americans faster even with a tight vaccine supply. White House officials have been counting on Johnson&Johnson's vaccine to ease the shortfall. But the company may only have about seven million doses ready when the F.D.A. decides whether to authorize it, according to federal health officials familiar with its production, and about 30 million doses by early April."""

* What is the length of the document above in characters?
  * Hint: What is the function that gives the length of a string?
* How many paragraphs in the abobe news? 
  * Hint: Paragraphs are divided by newline.
* What is the average length of a word in characters in the first paragraph?
  * Hint: Words are divided by space.

In [None]:
# Your code here

## Tuples (we will do if we have time)



A tuple is similar to a list and consists of a number of values separated by commas. For instance:

In [None]:
t = (12345, 54321, 54321, 'hello!')
print(t)

The usual slicing and indexing operators still apply:

In [None]:
print(t[3])

And similarly, we can use the `count` and `index` functions. 

In [None]:
t.index('hello!')

However, a tuple but is *immutable*. This means that we cannot modify its contents. So the other operators that modify a list do not apply to a tuple.

In [None]:
t[1] = 246 # This gives you an error. 

In [None]:
# Check here that you can change the element of a list by index-assignment.

Note: In many cases, tuples and lists can be used interchangeably. In this course we mainly use lists.