#Collection Data Types

The four data types we'll cover next are:

* Lists
* Tuples
* Sets
* Dictionaries

These data types allow for a greater number of operations because they can hold multiple elements.  In order to ease into deadling we collections, we will first recall some properties of another collection data type: strings.


In [None]:
introduction = "Well, I thought that, maybe, you know, like, possibly?"

Now this introduction has a little lyrical flourish if you will, it has a large number of pauses as indicated by the commas. What if you wanted to get rid of those pesky commas?

You learned earlier that the `strip()` function can remove characters from the beginning or from the end of a string. Unfortunately, the commas are in the middle of the introduction so, `strip()` will not work.  

There is, however, another function that we can use to split apart a `string`. It is aptly labelled the `split()` function.

In [None]:
print( type( introduction ) )
new_variable = introduction.split(',')
print( new_variable )
print( type( new_variable ) )

`introduction` is a string. Check.

`new_variable` is **not** a string. It is a list! 

Unlike the `strip()` function, the `split()` function returns a list. It is all of the individual elements in a string split apart based on the character we gave as input.

We can tell that it's a list because it has multiple elements and it starts and ends with the brackets. The bracket symbol (`[]`) is used to denote a list.

##Lists

A list is an ordered sequence of elements, with that order being specified by the order that the elements are in when the list is created or as elements are added to the list. We create a list by using the `[]` syntax.

Let's go back to our obsession with pets, and imagine we own a pet store that carries a number of different pets.

In [None]:
pets = ['dogs', 'cats', 'fish']
pets

Notice that the list printed the elements in the same order that they were in when we created the list.

If we want to add an element to the list we can simply `append()` the element to the list.

In [None]:
pets.append('hedgehog')

pets

Notice how when we used the `append()` function it changes the variable value. 

If we had multiple additional values that we wanted to add, we could append them individually. Or we could just `extend()` our list `pets` with the list of additional pets that our pet shop carries.

In [None]:
pets.extend(['parrot', 'iguana'])

pets

We can also add two lists together.

In [None]:
pets + ['chameleon', 'hamster']

But notice that the addition symbol doesn't change the value of `pets`

In [None]:
pets

If we want to add to the list variable, we just set it equal to itself and the new list.

In [None]:
pets = pets + ['chameleon', 'hamster']
pets

And just like strings, we can use both of the additive math operators. I'll use a small number here to not flood the screen, but we can multiply the list too.

In [None]:
print( pets * 2 )

But just like before subtraction and division don't work since it's not clear what should happen

In [None]:
pets - ['hamster']

As for strings, you cannot attempt to access an index that has not been defined.  In order to avoid that kind of mistake it is useful to keep track of the number of elements in a list.  You can obtain this number using the built-in function `len()`:

In [None]:
len(pets)

In [None]:
print( pets[len(pets)] )

###Indexing and Slicing

Since a list is made up of multiple elements, it makes sense that we will be able to get a single element by its index or a slice of elements. Remember to access an element by its index we do `variable[index]`.

In [None]:
pets[1]

And then slicing is just `variable[start_index : stop_index : step]`

In [None]:
pets[1 : 3]

Both of these operations work exactly like they did when we used strings.

###Lists are mutable. Strings are not!

Now let's get to a big difference, `list` variables are **mutable**. That means that we can change one element of the list and this is the first type we've encountered that will let us do that. Let's say that we stopped selling cats because everyone just looks at them on the internet and now we carry sea monkeys. We could delete `cats` and append `sea monkeys` or we could just set the position that `cats` is in with `sea monkeys`.

In [None]:
pets[1] = 'sea monkeys'
pets

In [None]:
introduction[4] = 'x'
introduction

If we want to find out the index of a value to use it, we can use the built-in function `index()`. We use indices to access specific parts of a list a lot so this is helpful

In [None]:
pets

In [None]:
pets.index('fish')

### Adding and removing elements from a list

Python provides several methods for changing the contents of a list.  We will first look at these two:

* `insert()` - inserts a new element at the specified index
* `pop()` - removes and returns the element at the specified index

In [None]:
pets.insert(5, 'burger')
pets

Wait, that's not a type of pet!

In [None]:
pets.pop(5)

In [None]:
pets

There is also the `remove()` function, with this we actually get to give the value that we want to be removed.

In [None]:
pets.insert(2, 'hamster')
pets

In [None]:
pets.remove('hamster')
pets

The `remove()` function will only remove the **first** instance of the value in a list. If there is more than one entry then you would need to use remove again.

In [None]:
pets.remove('hamster')
pets

If you try to remove a value and it doesn't exist then Python will return an error and stop execution of the code.  

In [None]:
pets.remove('hamster')
pets

In [None]:
pets.remove('hamster')
pets

We can also use `del` to remove an item from a `list`. If we don't specify an index though it will delete the **entire** variable from memory though! So be careful when you use `del`.

In [None]:
del pets[-1]
pets

So there we go, we can add elements to a list with:

* `append()`
* `extend()`
* `insert()`
* `+`

We can remove elements from a list with:

* `pop()`
* `remove()`
* `del`

###More list methods

Lists have other built-in methods too besides the maintenance functions of adding and deleting elements. For example, you can reverse a list:

In [None]:
print('Initial pets', pets)
pets.reverse()
print('Reversed pets', pets)

And we can also sort the list

In [None]:
print('Initial pets', pets)
pets.sort()
print('Sorted pets', pets)

The methods `sort()` and `reverse()` act on the variable directly.  If you want to keep the original variable unchanged, you must use the built-in functions  `sorted()` and `reversed()`. 

Note that in order to make these statements print I have to cast it as a list. 

In [None]:
print( "Reversing the list", list( reversed(pets) ) )

print( "The original list", pets )

### Nesting

So far the elements that I've shown in a `list` have been `string`, `float`, or `integer` data types. Does that mean that a `list` can only hold the basic data types?

Nope! Lists can contain other `list` variables, just like this:
<img width='400' style="float:center" src='../images/nesting_dolls.jpg')></img>

Now just to prove it, I'll change our `pets` list to have the different breeds of dogs as an element.

In [None]:
dog_breeds = ['bulldog', 'terrier', 'greyhound']
pets.append(dog_breeds)
pets

But wait, what if we want to access just one of the dog breeds and print it to the screen? How do we do that?

Well I appended the `dog_breeds` variable to `pets`, so why don't we just try to access the last element of `pets` and see what happens.

In [None]:
pets[-1]

A-ha! So the last element of `pets` is the entire `dog_breeds` `list`. 

I bet we can access a single element by its index in that list too. Let's give a `0` index so that we print `bulldog`. Since these `list` variables are nested though, we need to give nested directions to the value. We do that just by putting one set of `[]` next to another `[]`. When Python navigates through your list variable it will always read the indices from left to right, with the leftmost `[]` being the topmost `list`.

In [None]:
pets[-1][0]

Visually I like to think of nested lists like this:

<img src='../images/nesting.png'></img>

Also, we can change the values inside our nested lists just like at our topmost `list` and use all of our normal `list` functions that we used before.

In [None]:
pets[-1][0] = 'golden retriever'
pets

In [None]:
pets[-1].append('pug')
pets

## Tuples

On the surface, tuples appear similar to `list`:

* Both Tuples and Lists contain a sequence of individual elements
* Both Tuples and Lists are stored in the order that they were added
* Both Tuples and Lists can store mixed data types
* We access individual elements with the syntax `variable[index]`

So what is the difference? 

The big difference is that tuples are an **immutable** data type (like strings). This means that none of the functions that are built-in to modify variables cannot be applied to tuples.

In order to indicate this difference between lists and tuples, Python uses a different syntax to create a tuple.
The syntax for creating a `list` uses `[]`. The syntax to create a `tuple` uses `()`. 


In [None]:
# Create a tuple that stores the attributes of our golden retriever, `penny`.

penny = (60, 75, 'yellow') #Length (in), Weight (lbs), Color 
print( type(penny) )
penny

Notice that since it isn't necessarily clear what each of the individual elements stood for in the tuple, I added a comment listing the meaning of each field. As we discussed earlier, comments are signaled in Python by the symbol `#` Whatever you type in a line after the `#` will be ignored by the Python interpreter.

Because tuples are immutable element assignment does not work.

In [None]:
penny[-1] = 'amber'

In [None]:
del penny[1]

We can still perform the additive operators with tuples though

In [None]:
penny + ('amber', 'stinky') #Eye color, breath smell

We just need to be careful if there is only one element in the tuple that we want to add though. To specify a tuple with only one element we have to use the syntax `(value, )` because without the comma, Python will think that we are trying to use the parentheses as a mathematical operation.

In [None]:
penny + ('amber', )

Otherwise, we can put any data type inside a `tuple`, nest `lists` and `tuples` inside a `tuple`, and index nested `lists` and `tuples` exactly like we do with a `list`.

#### So wait, why are there both lists and tuples? 

I know, right now it's pretty hard to see why you would use one data type over the other. To be honest, it isn't even always that clear after programming for a little bit and typically most beginners just prefer to use `lists` for their ability to be manipulated.

But this is why we have both types of variables in theory.

A `list` will typically hold homogenous elements. This means that, in theory, every element within a `list` to be of the same category. That's because when we have a `list` we may want to perform an action to every element within it.

A `tuple` will typically hold heterogeneous elements. This means that, in theory, every element will not be the same. 

To make this concrete, with our `pets` list every single element was a pet species. They were different values, but they were all within the same category.

With our `penny` tuple, each element was different. We stored her length, weight, and color. If we changed the color value, the new variable likely wouldn't be `Penny` the dog at all unless some mischievious kids broke into the store and dyed her fur! 

It's this line of thinking that cause some people to refer to a `tuple` as a `record`. The `penny` variable is a record for a specific animal and if we were to change one of its values then it would no longer be the record for `penny`.

In practice though, Python doesn't enforce your usage of these data types and you can choose to use either one. It's important to be aware of the distinction though, because many functions and programs written by others that we will use later will return a `tuple` back. Just remember to not try and change the contents of a `tuple`!

## Sets

A Python set is a collection that prevents the repetition of values from occurring.  This is particularly useful if you are organizing a set of records that are assigned different keys that you do not want to repeat. 

Another important characteristic of sets is that they are unordered collections of elements. 

The syntax to create a `set` uses `{}`. 

In [None]:
unique_pets = {'bulldog', 'hamster', 'parrot'}

unique_pets

If we try to make a set with more than one element that it is the same, the second occurence will be discarded.

In [None]:
not_so_unique_pets = {'bulldog', 'hamster', 'parrot', 'hamster'}
not_so_unique_pets

We can cast a `list` or `tuple` into a `set`, thus discarding all duplicate elements.

In [None]:
not_so_unique_pets_list = ['bulldog', 'hamster', 'parrot', 'hamster']
print(not_so_unique_pets_list)
set(not_so_unique_pets_list)

Sets are mutable like a `list`, but they don't use the same built-in functions. To add an item, you use the built-in function `add()`:

In [None]:
unique_pets.add('gerbil')
unique_pets

To remove an element from a `set`, you use the built-in function `discard()`: 

In [None]:
unique_pets.discard('parrot')
unique_pets

**Sets cannot be added or multiplied!**

The built-in operations for acting on sets are  `intersection`, `union`, and `difference`.

In [None]:
{'parrot', 'bulldog'} + {'gerbil'}

In [None]:
{'parrot', 'bulldog'} * 2

In [None]:
store_one = {'bulldog', 'parrot', 'hamster', 'fish'}
store_two = {'fish', 'parrot', 'terrier', 'cat'}

print( store_one.intersection(store_two) )

print( store_one.union(store_two) )

print( store_one.difference(store_two) )

These three functions are invariant against a reversing of the order of the variables. For example:

In [None]:
store_two.difference(store_one)

## Converting sets, lists, and tuples

Since `sets`, `lists`, and `tuples` are all collections of elements, Python allows us to easily convert variables from one type to another as needed. You just need to cast the variable.

In [None]:
list_unique_pets = list( unique_pets )
print(list_unique_pets)
type(list_unique_pets)

In [None]:
tuple_unique_pets = tuple( list(unique_pets) )
print(tuple_unique_pets)
type(tuple_unique_pets)

In [None]:
reset_unique_pets = set(tuple_unique_pets)
print(reset_unique_pets)
type(reset_unique_pets)

## Dictionaries

A Python dictionary is an extraordinarily useful data type that expands on the possibilities offered by lists.  In a list one keeps track of the elements by an index that must be an integer.  **Dictionaries keep track of elements by `key`!**

Each element in a dictionary is an `item`, and every `item` has both a `key` and a `value`. You use the `key` to "look up" the `value`. This concept is just like if we wanted to look up the meaning of a word in a real dictionary. Also, just like in a real dictionary, it means that all of the `keys` **must** be unique. If we had a `key` multiple times, then we wouldn't know where to go look up its `value`. Moreover, the keys are unordered. 

**Thus, the keys in a dictionary form a set!**

The syntax to create a dictionary also uses `{}`. `key-value` pairs are separated by commas, and the key is separated from the value by a colon:

`a_dict = {key : value, another_key : another_value}`


In [None]:
# Create a dictionary with the weights of our pets

weights = {'penny': 75, 'jenny': 0.5, 'benny': 10}
weights

Because `keys` in a dictionary are **unordered**, it means that your variable might not print out in the same order as that of your seatmates.


In [None]:
# Access the value corresponding to 'jenny'

weights['jenny']

Dictionaries are **mutable**. You can change the value of an element by re-assigning its value.

Let's say that Jenny the fish ate all the food in the fish tank and has now doubled in size.

In [None]:
weights['jenny'] = 1.0

weights

If you want to add a new `key-value` pair to the dictionary, you access a new `key` and assign it a value:

In [None]:
weights['wendy'] = 5
weights

**Dictionaries also cannot be added or multiplied!**

The built-in operation adding multiple `key-value` pairs to a dictionary is `update()`.


In [None]:
weights + {'jeff': 10, 'bobby': 8}

In [None]:
weights.update({'jeff': 10, 'bobby': 8})
weights

To remove a `key-value` pair from a `dict` variable, you can use `del` and give the `key` (make sure that the `key` exists or else it will be an error!)

In [None]:
del weights['jenny']
weights

Dictionaries also have the `pop()` function built-in that will work on a `key`, using this will delete the `key-value` pair and return the `value`

In [None]:
weights.pop('benny')

In [None]:
weights

### Acessing all keys or all values

Python has built-in functions that can retrieve all keys used in a dictionary.  In Python 2, these functions returned lists. In Python 3, they return `iterators`.

In order to obtain lists you cast the iterators as lists.

In [None]:
weights.keys()

In [None]:
weights.keys()[1]

In [None]:
list(weights.keys())[1]

In [None]:
weights.values()

To retrieve all the `key-value` pairs , you use the built-in function `items()`:

In [None]:
weights.items()

This is an iterator that can be used to create a list of tuples.

###Nesting

Just like a `list` or `tuple`, we can nest collection data types inside a `dict` variable. 

In [None]:
our_pets = {
    'penny' : {'weight' : 75,
               'length' : 60,
               'color' : 'yellow'
              },
    'bobby' : {'weight' : 10,
               'length' : 30,
               'color' : 'brown'
              }
}

To retrieve information about `penny` we can now use keys that are informative:

In [None]:
print( our_pets['penny'] )
print( our_pets['penny']['weight'] )

## Copying variables

Variable names are human readable ways to point the computer toward specific positions in memory. Because memory is limited, it makes sense to not waste it.

Unfortunately, this has some consequences when we copy variables.

Let's go back to our multiple pet stores example. You're a pet shop owner:

In [None]:
store_one = ['beagle', 'terrier', 'bulldog', 'fish']
store_one

You open a second store with the same inventory as the first one.

So, you can just copy the inventory of the first store.

In [None]:
store_two = store_one
print( store_one )
print( store_two )

In the meantime, `store_one` sold the `beagle`, and acquired a `parrot` and a `gerbil`:

In [None]:
store_one.remove('beagle')
store_one.extend(['parrot', 'gerbil'])

Let's check your inventories:

In [None]:
print( store_one )
print( store_two )

**Both inventories were changed!** 

The reason is that the statement

`store_two = store_one`

did not copy the value of `store_one` to a new place in memory labelled by `store_two`. It just made the name `store_two` point to the same position in memory as `store_one`. 

If you actually want to copy the value of `store_one` to a new place in memory labelled by `store_two` you have to use a built-in function named `copy()` or `deepcopy()` is the variable we are copying is nested. 

**Every distribution of Python comes with these two functions, but unlike other built-in functions, we have to import it before we can use it!**

In [None]:
from copy import copy

store_one = ['beagle', 'terrier', 'bulldog', 'fish']
store_two = copy(store_one)
store_one.remove('beagle')
store_one.extend(['parrot', 'gerbil'])

print('Store One', store_one)
print('Store Two', store_two)

Right now it's not important to understand libraries or anything like that, I just wanted to show you that **it is important** to understand how copying variables works. Accidentally setting one variable name equal to another and not remembering it is an easy way for errors to creep into your programs. Make sure to be careful!

#Exercises

You are now the manager of the pet store! I'm going to need you to manage the inventory of pets that we have at the store.

In [None]:
pet_store = ['beagle', 'parrot', 'iguana', 'gerbil', 'chameleon', 'fish']

We've sold out of `chameleon` and `iguana`, you need to remove them from the inventory. Use both `remove()` and `del` to do so.

We just got in a `terrier`, `chameleon`, `bulldog`, and another `terrier`. Please add all of those animals to the `pet_store`.

Now, how many **unique** animal types do we currently have in the store?

How many `terrier` dogs do we have in the store? (Hint: the `list` data type has a built-in function called `count()`)

As an owner, I actually have a very bad obsessive-compulsive disorder. Could you please reverse-alphabetically sort the pet types at the store? Thanks so much! Also going to need you to work this entire weekend. For free.

Could you print from `beagle` to `fish` but display it in alphabetical order?

What is the index of `parrot`?

Now let's go back to our pet attribute dictionary

In [None]:
attributes = {
    'penny' : {'weight' : 75,
               'length' : 60,
               'color' : 'yellow',
               'animal' : 'golden retriever',
              },
    'bobby' : {'weight' : 10,
               'length' : 20,
               'color' : 'brown',
               'animal' : 'cat',
              }
}

Please add `peter` the `parrot` to attributes dictionary (make up your own `weight`, `length`, and `color` attributes)

What are all the names of the animals that we currently have in `attributes`?

Does `bobby` weigh more than `peter`? (Remember equivalencies from the basic data types?)

`bobby` actually got into the catnip while you were on break, and then ripped open a box of treats and ate all of them. lol, he's just so cute!

Please update his weight to be twice as much as before.

Actually `bobby` is so cute I'm just going to take him home with me. Please take him out of the `attributes` dictionary so no one else will take him!

Great job today! Keep this up and I just might promote you!

Exercises completed!

In [None]:
from IPython.core.display import HTML


def css_styling():
    styles = open("../styles/presentation.css", "r").read()
    return HTML(styles)
css_styling()