# Part 3 - Collections & Iterables
This is a pretty big lesson and we have a lot of ground to cover. 

You're familiar with arrays and objects from JS. We can group data together in various useful ways like dynamic arrays, sets, key value pairs, and so on. These different structures are referred to as collections.

Looping through the elements of an array or the keys in an object is a very common operation, common enough that it has been generalized into the concept of an "iterable". 

In JavaScript collections had their own methods, functions bound to the object in question. Python has these too but if a concept is more general it is made into a standalone function that can be passed several different types of objects and knows how to work with each one. This concept is referred to as "duck typing" (if it looks like a duck and quacks like a duck it's a duck, seriously) and if it seems abstract now I'll give lots of examples.

## Lists
Python calls dynamic arrays lists. **Python lists are not linked lists**. They are regular dynamic arrays with the expected time complexities for different operations. 

In [10]:
nums = [5, 7, 20, 14, 9, 8] # A list of integers.

We can access a value at an index with brackets like so.

In [11]:
print( nums[3] ) # prints 14, the integer at index 3
nums[3] = 16 # sets the integer at index 3 to 16
print(nums)

14
[5, 7, 20, 16, 9, 8]


Unlike JS negative indicies are supported. Check it out:

In [12]:
print( nums[-1] ) # prints 8, the last thing in the array
print( nums[-2] ) # prints 9, the second to last thing in the array

8
9


So that's handy. We can add a number to the end of the array with append. It's the equivalent of push.

In [6]:
nums.append(10)
print(nums)

[5, 7, 20, 14, 9, 8, 10]


We can remove something from the end of a list with pop.

In [7]:
val = nums.pop()
print(nums) # the original list
print(val) # the value we popped from the list

[5, 7, 20, 14, 9, 8]
10


If we want to remove a particular element we can pass pop an optional argument, an index to pop from.

In [8]:
nums.pop(2) # pop whatever is at index 2 (20)
print(nums) # no more 20!

[5, 7, 14, 9, 8]


We can also insert a number at an arbitrary point. Let's say we want to put 20 back at position 2.

In [9]:
nums.insert(2, 20) # first argument is the index, second argument is the object to insert.
print(nums)

[5, 7, 20, 14, 9, 8]


None of that shift unshift crap you're welcome. To get the length of a list we don't use a length method. We use a built in function called len.

In [14]:
print(nums)
print( len(nums) )

[5, 7, 20, 16, 9, 8]
6


There are six things in the array! How can you determine if something is in a list? Easy! We have the in operator.

In [15]:
if 7 in nums:
    print("found a 7!")
if not 50 in nums:
    print("no 50 here")
print( 9 in nums) # true
print( 13 in nums) # false

found a 7!
no 50 here
True
False


## Slicing and Concatenation

How do you concatenate lists? Couldn't be simpler.

In [1]:
list1 = [1, 2, 3]
list2 = [4, 5, 6]
print(list1 + list2)

[1, 2, 3, 4, 5, 6]


It's just a plus sign! Like with strings. So that's nice and tidy. You might wonder what happens if you do that in JavaScript? Don't. Maintain your innocence.

So let's talk slicing. Slicing is a big deal in Python. Manipulating strings and lists and things in a clear concise way is one of the things Python excells at. Let's introduce the slice syntax:

In [3]:
ls = [4, 2, 10, 11, 7, 8, 9]
sublist = ls[2: 4]
print(sublist)

[10, 11]


So we create a list as normal. But then we see some special syntax: `ls[2: 4]`. What's up with that?

Within the brackets we supply two numbers seperated by a colon. The first number is the index to start on. You can see that 10 is at index 2 in ls. The second number is the index to end on, meaning, the index of the first element *not* included. 7 is at index 4 in our list so our slice only contains two numbers: 10 and 11.

These start and end indicies are actually optional. They have the default values of 0 and the length of the list respectively. Let's look at some examples:

In [4]:
ls = [4, 2, 10, 11, 7, 8, 9]
print( ls[:] ) # Get a shallow copy of the list
print( ls[:4] ) # Starting from index 0 get the first four elements in the list up to index 3.
print( ls[4:] ) # Starting from index 4 get the rest of the list.

[4, 2, 10, 11, 7, 8, 9]
[4, 2, 10, 11]
[7, 8, 9]


Negative indicies are still supported. They are really handy for working with the end of a list. Like so:

In [7]:
ls = [4, 2, 10, 11, 7, 8, 9]
print( ls[-1:] ) # Get a list with just the last element of ls in it.
print( ls[:-1] ) # Get a list with everything *but* the last element of ls.
print( ls[-4:] ) # Get the last four elements from the list.

[9]
[4, 2, 10, 11, 7, 8]
[11, 7, 8, 9]


So that's nifty. I'm introducing this syntax now so it doesn't blindside you later. As I say it's a very powerful and concise way of working with lists and other collections. Stay tuned!

## For Loops
You might have been surprised earlier when I covered while loops but not for loops. Well guess what hot stuff now's the time.

For loops in Python work a lot like for of loops in JS. Let's look at an example:

In [8]:
# You want a sum of numbers? I got one right here for ya
ls = [4, 2, 10, 11, 7, 8, 9]
total = 0

for number in ls:
    print(number)
    total += number

print("Your sum is", total, "ya doofus. Whaddya want from me?")

4
2
10
11
7
8
9
Your sum is 51 ya doofus. Whaddya want from me?


As you can see we loop through ls printing each number an adding it to total. Then we print out the total in an exceedingly broad East Coast accent. As you do. Nice and tidy isn't it? Consider how we would do this with a while loop:

In [12]:
ls = [4, 2, 10, 11, 7, 8, 9]
total = 0
# Gross
i = 0

while i < len(ls):
    print(ls[i])
    total += ls[i]
    i += 1

print("Total:", total)

4
2
10
11
7
8
9
Total: 51


Gross. Repulsive. Gauche. Forget to increment i? Infinite loop. This approach is also a little slower for what it's worth.

If you spend any time online reading about Python you will hear the word "Pythonic" a lot. Python programmers are more concerned with beauty, aesthetics, and philosophy than trivialities like "will it run?" or "is it useful?"

I kid still a lot of focus is put on best practices and clarity in the Python community.

What if we want to change a list in place though? Let's write a loop that doubles each number in an array.

In [7]:
ls = [4, 2, 10, 11, 7, 8, 9]
i = 0

while i < len(ls):
    ls[i] *= 2
    i += 1

print(ls)

[8, 4, 20, 22, 14, 16, 18]


How could we do that with a for loop? We'll see soon enough.

## Ranges
What we would like to do is print out some indicies, let's say from 0 to 9. As we've seen we could do this with a while loop but that's not ideal. How do we do it with a for loop? Introducing: the range function!

In [8]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


So what is range? Well in older versions of Python it would generate and return a list, something like this:

```
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
```

Nowadays it's something called a generator which is a concept we'll get to eventually. What matters now is that ranges are *iterable* just like lists are. You can loop through them with a for loop and Python will handle keeping track of your place in an iterable and stop at the end automatically. 

In the above example we just passed it a number. That produces a range object iterating from 0 to one less than that number. Let's look at how we could sum the numbers between 1 and 10.

In [11]:
total = 0
for i in range(1, 11):
    print(i)
    total += i
print("Total:", total)

1
2
3
4
5
6
7
8
9
10
Total: 55


What if you want to skip numbers? We can pass it a third optional argument to set the step size like so:

In [12]:
for i in range(1, 11, 2):
    print(i)

1
3
5
7
9


That just prints the odd numbers between 1 and 10. Nice!

Now I hesitate to show you this next bit because it is *not* Pythonic heaven forbid. But we don't have quite enough knowledge for the Pythonic way yet. Let's double the numbers in an array in place.

In [14]:
ls = [4, 2, 10, 11, 7, 8, 9]

for i in range( len(ls) ):
    ls[i] *= 2

print(ls)

[8, 4, 20, 22, 14, 16, 18]


So this is how we can write a C style (and JavaScript style) for loop in Python. We get the length of the array (7), create a range object from 0 to 6, and iterate through it using those numbers as indicies into the array. This is a common pattern for people just learning Python it's not great. Later on we will learn the enumerate function and everything will be well.

## Strings and String Formatting
Ah strings. Nice and simple. Familiar. You got your hello worlds, print your name, that kind of fun stuff. For the following examples we'll be using the following string:

In [20]:
s = "I am the very model of the modern Major General."

How long is the string s? Is there a .length method? Well you've already seen how to get the length of a list. It was the stand alone len function. Let's give it a shot:

In [21]:
print( len(s) )

48


And just like that we have our length. The len function works on any iterable, it isn't tied to a specific type. Let's slice off the front of the string. Starting from index 0 and going to index 18:

In [24]:
print( s[:19] )

I am the very model


Tidy! Now we can loop through the characters in a string. We can even iterate through our slice! Let's do that:

In [25]:
for c in s[:19]:
    print(c)

I
 
a
m
 
t
h
e
 
v
e
r
y
 
m
o
d
e
l


So how do we split a string? Is there a standalone function for that? No as it happens, that's a string method.

In [26]:
print(s.split())

['I', 'am', 'the', 'very', 'model', 'of', 'the', 'modern', 'Major', 'General.']


We called split and that split the string by its spaces. Splitting is specific to strings. We can ask for the length of all sorts of things but we can only split strings. That's why split is a string method.

We can pass split a seperator if we want to split on something else.

In [27]:
csv = "It's,a,me,a,CSV"
print(csv.split(","))

["It's", 'a', 'me', 'a', 'CSV']


Back to our original string, let's print out every word in our sentance.

In [28]:
for word in s.split():
    print(word)

I
am
the
very
model
of
the
modern
Major
General.


Ok so we've talked about splitting, how about joining? Is there a list method for join like there is an array method in JS? In Python join is actually... a string method.

In [29]:
ls = ["Join", "me", "up", "before", "you", "go-go"]
print( "".join(ls) ) # Empty seperator
print( " ".join(ls) ) # Single space seperator
print( ",".join(ls) ) # Comma seperator
print( "\t".join(ls) ) # Tab seperator

Joinmeupbeforeyougo-go
Join me up before you go-go
Join,me,up,before,you,go-go
Join	me	up	before	you	go-go


This might be a little weird. It's pretty unique to Python as far as I'm aware. But it follows that joining is something you only do with strings. If you added a join method to lists Python would have to decide how to join lists of almost anything *and* it wouldn't work on other iterators. This is Python sticking to its guns and once you get used to it it should come naturally.

There are a bunch more string methods, lower, upper, capitalize. But I'd say it's time to move on. We'll pick up the stragglers as we need them.

## Tuples, just worse lists?
"Riley?" you ask me "Lists just have too many damn features. Could I have a list that does less?"

Well my hypothetical student you are in luck. Python includes a datatype called tuples that are like lists but can't be changed once created.

In [31]:
tu = (5, 10, 20, 40) # just looks like a list, but with parens instead of brackets
print(tu)
print(tu[2])

(5, 10, 20, 40)
20


So far so good. Tuples look a lot like a list but they use parenthises instead of square brackets. We can print them, read from them, all good. But what if you try to modify it?

In [32]:
tu = (5, 10, 20, 40)
tu[2] = 60

TypeError: 'tuple' object does not support item assignment

No dice. Tuples are immutable. In fact the only mutable type we have looked at so far is the list. Everything else, integers, floats, bools, strings, and of course tuples are immutable. This is a very important and fundamental feature of Python so it's worth spending some time on.

Note that immutable objects are not like constant variables in JavaScript. Python does not have const. What immutability means is that the object cannot be modified *in place*. If we want to change the contents of a tuple we have to create a new one entirely.

In [12]:
tu = (5, 10, 20, 40)
ls = list(tu) # create a new list based on the contents of tu
ls[2] = 60 # modify the contents of the list
tu = tuple(ls) # create a *new* tuple based on ls and assign it to the same variable as before
print(tu)

(5, 10, 60, 40)


This is just an example. The above is not very *Pythonic*.

We can do something like JavaScript's destructuring to tuples (it works for lists too). Observe:

In [13]:
a, b, c, d = tu
print(a)
print(b)
print(c)
print(d)

5
10
60
40


Here is the question though: why. Why bother with a collection type that is like a list but has fewer features? All shall be revealed.

## Sets
Let me *set* you up for success. Ahem. Let's start with a formal definition for once.

> A set is an unordered collection of unique, immutable objects.

Let's break that down a bit. Unordered means that unlike a list elements in a set don't have indicies, they can't be accessed by their position in the set. 

Each element has to be unique, simple enough. An element is either in the set or not there cannot be duplicates.

Finally each element has to be immutable. We just talked about this but almost every type we have discussed so far is immutable, the exception being lists. Well now we have a new one: sets themselves are mutable. That means, sad to say for math nerds, in Python sets cannot contain sets at least without a little work. Alac alas.

In [6]:
# We have a list of names but we have some duplicates. Let's get rid of those rascals.
names = ["Jane", "Sally", "Ted", "Sally", "Alex", "Sally", "Ted"]
names_set = set(names) # we can convert most things into a set with the set function
print(names_set)

{'Sally', 'Ted', 'Jane', 'Alex'}


What happens if we try to loop through a set? Is that allowed?

In [7]:
for name in names_set:
    print(name)

Sally
Ted
Jane
Alex


Well would you look at that. When you print out a set they come out in an order. And when you iterate through it you do so in an order. So how is this an unordered collection?

Well for certain applications Python needs to decide on an order. For example if we took the set and converted it into a list then it would become ordered. What matters is that the order Python picks is *arbitrary*. The order it picks is because of some arcane implementation details deep in the Python interpreter because conceptually, philosophically sets are unordered.

## Dictionaries
Dictionaries are what Python calls hashmaps with key value pairs. Like the (poorly named) objects of JavaScript. There are some important differences though. For one dictionaries do not support JS style dot notation. They are objects (in the object oriented sense, see what a bad name it is!) and so their methods are accessed by dot syntax. Let's make a simple tally.

In [2]:
# A phone book
pb = {
    "Jane": 1111,
    "Sally": 2222,
    "Ted": 3333,
    "Alex": 4444,
}
print(pb["Jane"])

1111


So far so good. What happens if we try to read from an entry that doesn't exist? In JS it just comes back as undefined. In Python?

In [3]:
print( pb["Clair"] )

KeyError: 'Clair'

Oh snap! In python that's a straight up error hot stuff. Now there is a data structure in the standard library called a "defaultdict" that lets you set a default value in this case but Python makes it an error because it loves you. It doesn't want you making silly mistakes babe. So how do we check if something is a key? We use the in operator.

In [4]:
if "Alex" in pb:
    print("Alex is here!")
if not "Clair" in pb:
    print("Clair isn't")
print("Jane" in pb) # true
print("Frank" in pb) # false

Alex is here!
Clair isn't
True
False


In works for other kinds of data structures too.

In [14]:
names = ["Jane", "Sally", "Ted", "Sally", "Alex", "Sally", "Ted"]
names_set = set(names) # we can convert most things into a set with the set function
print("Jane" in names) # true
print("Jane" in names_set) # also true

True
True


So if you can check if something exists in a list then why even bother with sets? In short because they are much faster. Looking up if something is in a set or a key in a dictionary is an O(1) operation where looking up if something is in a list is an O(n) operation. Keep that in mind if you're working with larger datasets.

A common use case for dictionaries is tallying up data. There is another built in object for this in the standard library but we'll roll our own. We're cool that way.

In [5]:
names = ["Jane", "Sally", "Ted", "Sally", "Alex", "Sally", "Ted"]

tally = {} # we can create an empty dict with a pair of curly braces

for name in names:
    if not name in tally:
        tally[name] = 0
    tally[name] += 1

print(tally)

{'Jane': 1, 'Sally': 3, 'Ted': 2, 'Alex': 1}


Coolbeans. How do we get the size of a dictionary? If you guessed with len... you're correct.

In [8]:
print( len(tally) )

4


No `Object.keys(tally).length` garbage here. What about looping?

## Fancy Looping

In [9]:
for key in tally:
    print(key, tally[key])

Jane 1
Sally 3
Ted 2
Alex 1


So we can loop through the keys in a dict no problem. In the above example we're also printing out the values by looking each up in tally by it's key. 

However.

That's not Pythonic. We must be Pythonic. Guido's divine magesty demands it. Behold:

In [10]:
for item in tally.items():
    print(item)

('Jane', 1)
('Sally', 3)
('Ted', 2)
('Alex', 1)


Hey look at those! Are they... tuples perchance? Are you accusing me of *planning* this lesson or something? Well guess what we can write this in the Pythonic way now.

In [15]:
for name, value in tally.items():
    print(name, value)

Jane 1
Sally 3
Ted 2
Alex 1


So if you need to loop through a *list* and you need the indicies? Check this out:

In [17]:
ls = [4, 2, 10, 11, 7, 8, 9]

for idx, num in enumerate(ls):
    ls[idx] = num * 2

print(ls)

[8, 4, 20, 22, 14, 16, 18]


Sadly this isn't very Pythonic either although we are getting there. So if you want to double everything in a list how do?

## Comprehensions, Comprende?
Comprehensions are a big deal in Python and there is a lot to them. I'll be covering the tl;dr here. If you haven't used a language with this feature before it can be a little hard to get used to but once you do you'll be writing beautiful, concise, and above all Pythonic code. Let's dive right into an example:

In [3]:
ls = [4, 2, 10, 11, 7, 8, 9]

ls_doubled = []
for num in ls:
    ls_doubled.append(num * 2)
print(ls_doubled)

ls_doubled2 = [num * 2 for num in ls]
print(ls_doubled2)

[8, 4, 20, 22, 14, 16, 18]
[8, 4, 20, 22, 14, 16, 18]


Witness the elegance and beauty of a list comprehension. While Python does have the familiar map and filter functions comprehensions are much preferred. This line is doing a lot of work:
```python
ls_doubled2 = [num * 2 for num in ls]
```
In one line we are:
- Creating a new list
- Looping through ls
- Taking each element from ls and doubling it
- Pushing that value to our new list
- Returning the new list so we can use it later.

The order of the elements in the comprehension might be a little confusing. Let's break it apart:

```python
[(an expression using a) for (a, an arbitrary variable name) in (the collection we are looping over)]
```

I mentioned filter a second ago. How do we eliminate elements from the list? Let's get rid of all the numbers greater than 8.

In [4]:
ls = [4, 2, 10, 11, 7, 8, 9]
ls_filtered = [num for num in ls if num <= 8]
print(ls_filtered)

[4, 2, 7, 8]


Splendid. Perhaps we want to do both! Let's combine them!

In [5]:
ls = [4, 2, 10, 11, 7, 8, 9]
ls_filtered_doubled = [num * 2 for num in ls if num <= 8]
print(ls_filtered_doubled)

[8, 4, 14, 16]


## Excercise
Document analysis