# Lecture 2b -- Collections
This lecture discusses a subset of data structures known as collections. Collections are also called containers -- that is data types that contain multiple elements in a single object. In this lecture, students will learn about and use four main types of collections: 
- lists
- ranges 
- tuples
- dictionaries 

They will learn about their differences, their use cases, and various ways each collection can be manipulated, with a particular focus on lists. 

The structure of this lecture closely follows the QuantEcon lectures on [Python Fundamentals](https://datascience.quantecon.org/python_fundamentals/index.html) sans the economics content. 

## Lists
Lists are the most common type of collection in Python. Lists are **ordered** which means there is a first element and a last element. You can define a list using `[]`. In the cell below, we define a list of groceries and assign it the name `grocery_list`. 

In [146]:
grocery_list = ["bread", "milk", "cheese", "apples"]
print(grocery_list)
print(type(grocery_list)) # Note the type is the `list`

['bread', 'milk', 'cheese', 'apples']
<class 'list'>


Lists can also contain any combination of types and even functions!

In [147]:
list_of_many_types = [1, 1.5, "orange", print, [1,2,3]]
print(list_of_many_types)

[1, 1.5, 'orange', <built-in function print>, [1, 2, 3]]


## Properties of Lists
Since lists are so commonly used, there are many basic properties of lists that are worth knowing well, especially when compared to the other collection types. 
### Accessing Elements of a List
After being defined, you will want to access individual elements of a list. This can be done using bracket indexing notation. For instance, we can print the first item on `grocery_list`:

In [148]:
print(grocery_list[0])
print(grocery_list[-4])

bread
bread


### 0-based indexing
Many coding languages, including Python, use 0-based indexing. This means to access the first element of `grocery_list`, we used the 0 index. Using this convention,  the second element is accessed using index 1, the third by index 2, etc. 

### Negative Indexing
While it is rarely useful to index using negative numbers, we can do so. An index of $-1$ refers to the last element of the list, $-2$ refers to the second to last, etc.

### Out of Bounds Indices
If a list has $n$ elements, then the only valid indices are:
$$
-n, -(n-1),..., 0, ..., n-1
$$
If your indices are not one of these you will receive an "index out of range" error. See below.

### Type of Indices
Indices must be have an integer type! See below.

In [149]:
# Uncomment these lines, run the code, and observe the errors
# grocery_list[5]
# grocery_list[3.0]

### Multiple Indexing
We can access multiple elements of a list at once by using **slices**. Slices, like our single index examples, can be found within brackets and take the form `start_index:stop_index+1`.

In [150]:
dairy_list = grocery_list[1:3]
print(grocery_list)
print(dairy_list)

['bread', 'milk', 'cheese', 'apples']
['milk', 'cheese']


It is unintuitive that this would not also return the 4th element (indexed by 3), but that this convention is useful because of 0 based indexing. Since the last element of a list is indexed by its length minus 1, the list length can be used as  the end point of the slice when we want to include the last element of the list.

In [151]:
list_length = len(grocery_list)
print(list_length)
print(grocery_list[2:list_length])

4
['cheese', 'apples']


### Changing the Value of an Element
You can use this indexing to change individual elements of a list. When you can change elements of a data structure, it is called **mutable**. Below, we change the third item of `grocery_list` ("cheese") to "cheddar cheese"

In [152]:
grocery_list[2] = "cheddar cheese"
print(grocery_list)

['bread', 'milk', 'cheddar cheese', 'apples']


### Length

As we discussed in Lecture 2, lists have a length  that is defined by the number of elements in the list. This can be obtained by using `len()`: 

In [153]:
print(len(grocery_list))
print(len(list_of_many_types))

4
5


### Element Membership
We can check if a given object is an element in a given list using the `in` keyword. If the expression returns true, the element is in the list. Otherwise, it is not. 

In [154]:
list_of_many_types[3]((print in list_of_many_types))
print("bread" in grocery_list)  # "bread" is in grocery_list
print("squash" in grocery_list) # "squash" is not 

True
True
False


**Quick Discussion**: What's happening on the first line in the cell above?

### Appending, Inserting, & Extending
`.append()` is among the most useful methods for lists and we have already seen it in Lecture 2. This method allows us to add a new element to an already existing list without using an `=` sign.

`.insert()` is similar to append but it allows you to choose where in the list the element will be placed.

`.extend()` is also similar to `.append`, but it adds a list of many new elements onto an already existing list.

In [155]:
# Appending
grocery_list.append("squash") 
list_of_many_types.append(True)

# Inserting
grocery_list.insert(2, "garlic")
list_of_many_types.insert(2, len)

# Extending
grocery_list.extend(["noodles", "turkey"]) # both "noodles" and "turkey" are added to grocery_list
list_of_many_types.extend([8/5, str])

print(grocery_list)
print(list_of_many_types)

['bread', 'milk', 'garlic', 'cheddar cheese', 'apples', 'squash', 'noodles', 'turkey']
[1, 1.5, <built-in function len>, 'orange', <built-in function print>, [1, 2, 3], True, 1.6, <class 'str'>]


### Sorting
`.sort()` allows use to sort both numerical and string lists in the expected way.

In [156]:
grocery_list.sort()
print(grocery_list)

num_list = [4, 2, 3, 1, 6, 5]
num_list.sort(reverse = True)

print(num_list)

['apples', 'bread', 'cheddar cheese', 'garlic', 'milk', 'noodles', 'squash', 'turkey']
[6, 5, 4, 3, 2, 1]


Sorting arrays with many types will generally not work though! Why not? Uncomment the code below, run the cell, and look at the resulting error

In [157]:
# list_of_many_types.sort()

If you want to see the rest of the methods for lists, you can generate one and use tab completion to see the options. 

## Ranges
Ranges are a series of evenly spaced numbers, but they are distinct from lists in that these numbers are not stored into the memory of your computer. Instead, these numbers are generated as you iterate through the range or access an index for that range. 

In [158]:
n = 5
r1 = range(n) # range of numbers from 0 to n
r2 = range(2, n) # range of numbers from 2 to n
r3 = range(-5, n, 2) # range of numbers from -5 to n in increments of 2
print(r1)
print(r2)
print(r3)
print(r1[2]) # prints third element of r1
print(r2[2]) # prints third element of r2
print(r3[2]) # prints third element of r3

print(list(r1)) # convert range r1 to list and print

range(0, 5)
range(2, 5)
range(-5, 5, 2)
2
4
-1
[0, 1, 2, 3, 4]


In the last line, we convert a range into a list. Viewing the elements of a range as a list makes it clear that the range excludes the last number (in this case, n=5). Why do you think this convention is used? 

**Hint:** Think of slices from earlier. 

## Tuples
Tuples are basically **immutable** lists. That means once the tuple has been defined, elements cannot be changed, reordered, removed, or added. Tuples are defined with parentheses `()` instead of brackets. 

In most use cases, lists are a better choice, but tuples can be useful if you want to ensure the data does not change. Ultimately, it is easy to convert from one to the other as needed with `list()` and `tuple()`.

In [159]:
simple_grocery_tuple = ("bread", "milk", "eggs") # manually define tuple
print(simple_grocery_tuple[1:3]) # access elements just like arrays
grocery_tuple  = tuple(grocery_list) # convert grocery_list into a tuple
print(grocery_tuple)
print(list(grocery_tuple)) # convert the tuple back into a list again

('milk', 'eggs')
('apples', 'bread', 'cheddar cheese', 'garlic', 'milk', 'noodles', 'squash', 'turkey')
['apples', 'bread', 'cheddar cheese', 'garlic', 'milk', 'noodles', 'squash', 'turkey']


## Dictionaries
Dictionaries are mutable associative collections. Each item or value in the collection is associated with a **key** instead of an integer index. Dictionaries are intialized using curly braces `{}` or by providing a list of keys and a list of values wrapped in the `zip()` function. We will learn more about zip when we cover control flow. 

In [160]:
# Manually create a dictionary  where the keys are the good 
# and the values are the price of that good.
simple_price_dictionary = {
    "bread": 3.50,
    "milk": 5.25,
    "eggs": 6.00
} 

# create long price_list
price_list = [2.00, 3.50, 4.99, .88, 4.85,  2.50, 1.88, 13]
# use two lists and zip() to construct a dictionary
full_price_dict = dict(zip(grocery_tuple, price_list)) 


# add new grocery-price pair
full_price_dict["soy sauce"] = 2.75

# update the price of bread -- this is similar to how we do this for arrays but we use a key instead of an integer index
full_price_dict["bread"] = 3.75
print(full_price_dict)

{'apples': 2.0, 'bread': 3.75, 'cheddar cheese': 4.99, 'garlic': 0.88, 'milk': 4.85, 'noodles': 2.5, 'squash': 1.88, 'turkey': 13, 'soy sauce': 2.75}


The `.keys()` method allows us to get a list of all keys in our dictionary.

In [161]:
print(full_price_dict.keys())

dict_keys(['apples', 'bread', 'cheddar cheese', 'garlic', 'milk', 'noodles', 'squash', 'turkey', 'soy sauce'])


## Sets
We will not cover sets in this course as they are not commonly used in data science applications. All you need to know about them is that sets try to mimic mathematical sets and as a result are unordered collections of unique elements. 

### A Quick Note on Mutability
We saw that lists and dictionaries are mutable and that tuples are not. It may not be intuitive, but all of the single-item data types we have been working with are also **immutable**. 

Mutability is useful because we can change some elements of a list (or more complicated data structure) without redefining the entire structure everytime. Single-item data types do not have elements to change.

Recall the following code from Lecture 2

In [162]:
x = 10
y = x
x = 5
print(y) 

10


Now, we will do the same thing twice with a mutable type.

In [163]:
x = [10]
y = x
x = [8]
print(y) 

[10]


This is the same behavior for the same reason. `x` was assigned to the value `[10]` in memory. By assigning `y` to `x`, we just assigned it to the same value `x` is assigned to, `[10]`. Reassigning `x` just creates a new object in memory which `x` is then assigned to. `y` remains unchanged.

If instead, we change the value of 10 using the following bracket notation, we get a different behavior:

In [164]:
x = [10]
y = x
x[0] = 8
print(y) 

[8]


Now `y` changes! This is because `x[0] = 8` changes the first element of the list `x` points to from 10 to 8. It does **not** reassign `x`! Since `x` and `y` are pointing to the same object and we edited the object `x` was pointing to, `print(y)` now displays the new object, `[8]`.