# Session 3: loops and lists<a href="#Session-3:-loops-and-lists" class="anchor-link">¶</a>

*Data Structures and Algorithms*

*Achyuthuni Sri Harsha*

------------------------------------------------------------------------

So far, the data we have seen has been simple numbers and strings. In
this session, we will learn to work with compound data using *lists*. We
will also use *for loops* to loop through such sequences, and work more
on functions, specifically *methods* associated with data types like
lists and strings.

------------------------------------------------------------------------

## Preparation

**Readings**:

Guttag: Chapters 3.2, 4.3, 5.1-5.5.

**OR**

Sweigart, Al. Automate the Boring Stuff with Python.

-   Chapter 2 – Control flow
    <https://automatetheboringstuff.com/chapter2/>

-   Chapter 4 – Lists <https://automatetheboringstuff.com/chapter4/>

**Questions:**

Please read the material above, and think about how you would explain to
your classmates:

-   How does a `for` loop differ from a `while` loop?
-   What does the `range` function do?
-   A Python `list` is an *ordered*, *mutable* data *collection* with
    many *built-in methods*. What do we mean by these terms?

------------------------------------------------------------------------

## Recap

Let's first recap some of the things we have learned.

## For loops and range( )

We have previously started building algorithms based on repetition with
`while` loops. For example, we might want to print the squared values of
all integers from 0 to 4.

In [1]:
number = 0
while number <= 4:
    print(number*number)
    number += 1

0
1
4
9
16


Here we have to be careful to remember to increase the value of `number`
at every iteration - otherwise the while loop will continue running
forever. But is there an easier way of solving the same problem?

When we know that we want to iterate over a specific sequence of items,
such as the integers from 0 through 4, we often want to use another
repetition structure instead, a *for loop*. Here's the for loop for
solving the same problem:

In [3]:
for number in range(5):
    print(number*number)

0
1
4
9
16


Compared to the while loop, this is more compact and easier to read. How
does it work?

The general structure of a for loop is:

In \[ \]:

    for element in sequence:
        # statements

A for loop goes through the elements in a sequence in order, executing
the code block for each `element` until the sequence is exhausted. The
keyword `in` is special and used as follows. One by one, starting from
the first value in the sequence, the for loop assigns the value to the
variable `element`, after which the code block is executed. In the
example above, `number` gets assigned the values 0 to 4 in ascending
order. After the loop finishes, the value of `number` remains at 4.

Here we generate the sequence of variables using the built-in function
`range`. The `range` function creates a structure housing a sequence of
integer values based on its input. When specified as `range(n)`, it will
include the integers from zero to `n-1`, so `n` values in total, here 0
through 4. We can specify `range` more generally using up to three
parameters. If we include two, these are interpreted as the start and
end of the sequence. If we add a third parameter, it is the step size
between the start and the end. The last (end) element is always excluded
from the sequence (5 is not included in the example).

In sum, we can do this:

    range(start, end)
    range(start, end, step)

So, for example:


In [4]:
for number in range(1, 3):
    print(number)

1
2


In [5]:
for number in range(3, 8, 2):
    print(number)

3
5
7


We can loop from high to low numbers by specifying a start higher than
the end and a negative step size.

In [6]:
for number in range(9, 5, -1):
    print(number)

9
8
7
6


Sometimes, we want to use nested loops.

In [7]:
for i in range(3):
    print('Outer loop, for value i=' + str(i))
    for j in range(2):
        print('Inner loop, for value j=' + str(j))

Outer loop, for value i=0
Inner loop, for value j=0
Inner loop, for value j=1
Outer loop, for value i=1
Inner loop, for value j=0
Inner loop, for value j=1
Outer loop, for value i=2
Inner loop, for value j=0
Inner loop, for value j=1


Here, we're running everything in the indented for-loop code block three
times. And within that block, we have another indented for loop the body
of which runs twice - every time the outer loop repeats. So in total,
the inner loop runs \$3 \cdot 2=6\$ times. Nested loops like this can be
very useful for looping through multi-dimensional data like matrices. We
can in principle nest many times, but in practice we rarely have more
than two or three levels of nesting.

Sometimes we want stop a loop before the sequence is finished. Python
provides flow control tools, among which the *break* statement that can
prove very useful. We can use it like this:


In [8]:
a = 10
for i in range(a):
    print(i)
    if i == 2:
        break

0
1
2


Most looping tasks can be done with either for or while loops. **For
loops are often more convenient than while loops when we are working
with specific sequences, and we know how many times we need to iterate.
While loops can be very useful in applications where we don't know how
many iterations our calculation will require.**

## Lists

We've so far mainly dealt with *scalar* data such as integers and
floats. In practice, we're often working with *sequences* of scalar
data, or want to organize several pieces of data together. Python offers
a few options to build sequences, the most common of which are *lists*.
A list is defined using square brackets, and the items within the list
are separated by a comma. Type the following commands into Spyder's
console:

In [9]:
L = [1, 2, 3]
type(L)

list

A list is an ordered collection of items. These items can be anything:
numbers, strings, variables, or other lists.

In [10]:
number_list = [1, 2, 3]
word_list = ['this', 'is', 'a', 'list']
a = 5
b = 'good'
mixed_list = [a, b]

Let's look at some of the things we can do with a list. The list
`prices` below contains Kensington and Chelsea mean house prices from
2003 through 2008 (in thousands of pounds). The data are from the
[London
Datastore](https://data.london.gov.uk/dataset/average-house-prices-borough).

First, let's see how we can access the elements of a list through their
indices. *The index of the first element is zero.*

Here are some ways to access a list: try them out!

In [11]:
prices = [400, 450, 465, 525, 650, 700]

prices[0]
prices[1:3]
prices[-1]
prices[:-1]
prices[::2] # every other element

[400, 465, 650]

We can access a "slice" of the list `prices` using the syntax
`prices[begin:end:step]`, for example `prices[1:5:2]`. We need to be
careful with this notation: our slice will include the first `begin`-th
element but exclude the final `end`-th element (just like range),
picking elements with a step size of of `step`. So `prices[1:5:2]` will
give the values at indices 1 and 3 but not five, or `[450, 525]`. We can
exclude the step size to get the default step of one. We can also
exclude the start or the end (but keep the colons) for example to take
every other element of the entire list.

We can change the elements of a list and create new lists from existing
ones. Suppose the first value of `prices` was erroneous and the correct
value is 405. To change it, we type in

In [12]:
prices[0] = 405

We can then type `prices` to the console to see that the value has
changed.

How do we add another price, for 2009?

In [13]:
prices.append(675) # append adds an item to the end
print(prices)

[405, 450, 465, 525, 650, 700, 675]


We might also learn a bunch of new prices. Here are the prices from 2010
through 2015, in a different list. We can then combine the lists:

In [14]:
new_prices = [750, 790, 890, 975, 1195, 1200]

all_prices = prices + new_prices # add new_prices to the end of prices, create new result list
prices.extend(new_prices) # append all items of new_prices to prices 

We can get the length of a list using the function `len()`.

In [15]:
len(prices)

13

### Looping through lists

Next, let's loop through a list using a for loop. This is very
convenient:

In [16]:
for price in prices:
    print('The price was ' + str(price))

The price was 405
The price was 450
The price was 465
The price was 525
The price was 650
The price was 700
The price was 675
The price was 750
The price was 790
The price was 890
The price was 975
The price was 1195
The price was 1200


The loop will go through each value of the list in order, assign this
value to the variable prices, and execute the loop body statements with
this value. When it has gone through all the prices, it will stop, and
the program will move on.

Sometimes we want to loop through list *indices* instead of its
*elements*. For example, we might want to loop through the house prices
and say whether the price went up compared to the previous year. We then
need to compare to adjacent values in the list, which is convenient to
do with indices, checking the value at index 1 vs 0, index 2 vs 1, and
so on.

To loop through a list using indices, we first need the length of the
list to know how many iterations we need to do. We can then use `range`
to go through the indices in the list. Let's first just repeat the
preceding loop:


In [17]:
# We use range on the length of prices to make sure we go through all list indices.
for index in range(len(prices)):
    price = prices[index]
    print('The price was ' + str(price))

The price was 405
The price was 450
The price was 465
The price was 525
The price was 650
The price was 700
The price was 675
The price was 750
The price was 790
The price was 890
The price was 975
The price was 1195
The price was 1200


The output is equivalent to our previous loop. So instead of directly
looping through prices, here we access them through each of the integer
indices we loop through, from zero to the length of the list.

To check whether prices went up or down, we can do build on this idea:

In [18]:
for index in range(len(prices) - 1): # why do we have -1 here? what happens if we remove it?
    if prices[index + 1] > prices[index]:
        print('Prices went up at index ' + str(index))
    else: print('Prices went down at index ' + str(index))

Prices went up at index 0
Prices went up at index 1
Prices went up at index 2
Prices went up at index 3
Prices went up at index 4
Prices went down at index 5
Prices went up at index 6
Prices went up at index 7
Prices went up at index 8
Prices went up at index 9
Prices went up at index 10
Prices went up at index 11


This is a slightly more complicated loop. At each iteration, we check
whether the price at the index is higher the previous price, and then
print out the corresponding text.

When we loop through indices, we need to be careful to go through the
exact right indices. Here we must be careful to check index against
index plus one, because there is no price to compare to before index
zero.

What is the difference between the two types of looping (through
`prices` vs `range(len(prices))`)? Looping through values is easier when
all we need to do is something based on each value. But the loop with
the comparison would be a bit tricky to write by simply going through
the values of the list. That's because we need to make comparisons
between two adjacent prices, and looping through indices allows us to
easily access the two prices. (We could achieve this in the other loop
too - can you think of how to do so?)

We will use both types of looping through sequences throughout the
module.

### Lists within lists

Python lists are flexible, and can contain different types of data,
other lists, etc.

In [20]:
L = [1, 2, 4, 5, 'a', 5, [2, 3]]

A list that contains another list is sometimes called a *nested* list.
Let's look at another example. Suppose we're keeping track of European
cities we've visited, as well as the respective countries. One way to do
this is with a list of lists each containing city-country pairs as
follows.

In [21]:
visited_cities = [['Paris', 'France'], ['Berlin', 'Germany'], ['Bucharest', 'Romania'], ['Reykjavik', 'Iceland'], ['Munich', 'Germany']]

We can now access and slice the items through "nested" notation: for
example `visited_cities[1][0]` will give us `Berlin`. Here
`visited_cities[1]` is the list `['Berlin', 'Germany']`, and the value
at index `0` of this list is `Berlin`.

Let's get all cities in Germany that we have visited. We want to store
them in a list. We'll first create an empty list, and then loop through
the visited cities, picking the German ones.


In [22]:
german_cities = [] # creates empty list
for pair in visited_cities:
    if pair[1] == 'Germany':
        german_cities.append(pair[0])
print(german_cities) 

['Berlin', 'Munich']


Many common tasks involve looping through lists. We will see more
examples throughout the module.

## Methods

Data types in Python come associated with actions that work specifically
for them. For example, we can perform calculations on numbers and append
values to lists. Actions associated with a data type are called methods.
They are functions that you can only call for certain types of data. We
have already encountered some list methods, such as `append()`. There
are a lot more, here are some common ones:

| Method                  | Description                                                          |
|-------------------------|----------------------------------------------------------------------|
| `L = [1,2,3,4,5,'Joe']` | create a list containg the items inside tha brackets                 |
| `L = list(range(0,6))`  | create a list containg a sequence of variables                       |
| `len(L)`                | returns the number of items in the list                              |
| `L.append('Mike')`      | add an item ('Mike') at the end of the list                          |
| `L.remove('Joe')`       | remove the first item in the list with this value ('Joe')            |
| `L.pop(4)`              | remove the item of the list with this index (4)                      |
| `L.insert(3,'Kate')`    | insert the item with this value ('Kate') in this position (3)        |
| `L.extend([1,4])`       | add items from an iterable e.g list (\[1,4\]) at the end of the list |
| `L.sort()`              | sorts the list                                                       |
| `L.reverse()`           | reverses the list                                                    |

Most methods are accessed using the dot notation, like
`L.append('Mike')`. For some methods, Python provides easier ways to
perform them, like `len(L)`.

Both the `append` and `extend` methods add values to a list. What is the
difference between `L.append([1,4])` and `L.extend([1,4])`?

One final note on lists, for now. Sometimes we want to make copies of
lists. This can lead to surprising results:

In [23]:
list1 = [1,2,3]
list2 = list1
list1.append(5)
print(list2) # ?!?!??

[1, 2, 3, 5]


In Python, copying a list using `'='` does not create a new copy of the
list data in the computer's memory. What we are copying here is the
*reference* from the name *list1* to data - not the actual data.
Therefore, the names *list1* and *list2* both actually point to the same
list in memory. In order not to end up changing the original values
later, we have to create a new list that contains the same items. This
can be done by slicing or by using the built-in function `list()`.


In [24]:
list1 = [1,2,3]
list2 = list1[:] # slice that gets all items
list2 = list(list1) # function that creates a new list of the items

> **Advanced** These methods will copy only the first-order lists
> elements. If we have nested lists and end up wanting to make sure to
> copy nested elements too (instead of references), we need to "deep
> copy" the elements. Search online for "python list deepcopy".


## Working with text

Manipulating, parsing, or otherwise working with text is a very common
task we encounter in analytics. We have already seen some strings in the
first tutorial: we've printed and added together strings.

### Slicing and combining strings

Strings are like lists in that you can go access and go through
particular elements (characters). The similarity means that we can slice
strings by index just like lists, with the characters of the string as
our "elements". On the console, try the following commands for both
values of `test_word`.

In [26]:
test_word = 'lovely'

# String slicing
test_word[1:5]
test_word[:-1]
test_word[5:1:-2]
test_word[::-1]
# Changing the value
# test_word[0] = 'a' # gives an ERROR (if we need to change the string, we can use the method `replace`)

'ylevol'

We can "add" strings together using the + sign.

In [27]:
print(test_word + ' ' + test_word[:2])

lovely lo


### Looping through strings

We can loop through a string just like a list using a `for` loop and the
operator `in`.

In [28]:
vowels = 'aeiouy'

for letter in vowels:
    print(letter) 

a
e
i
o
u
y


### Immutable strings

Even though strings are very similar to lists, there is an important
difference. We cannot change the elements of strings. In programming
lingo, strings are *immutable*.

In [29]:
a_list = [1, 5, 9]
a_list[1] = 3 # OK with a list
print(a_list) 

a_string = 'fire'
a_string[1] = 'a' # not OK with a string
print(a_string)

[1, 3, 9]


TypeError: 'str' object does not support item assignment

### String methods

Like with lists, there exist many built-in methods that work
specifically for strings. Here are some common ones.

| Method                 | Description                                         |
|------------------------|-----------------------------------------------------|
| `s = 'lovely weather'` | Create string                                       |
| `s.capitalize()`       | Capitalizes first letter of `s`                     |
| `s.lower()`            | Converts all uppercase letters in `s` to lowercase  |
| `s.upper()`            | Converts all lowercase letters in `s` to uppercase) |
| `s.startswith('love')` | Checks if `s` starts with specified string          |
| `len(s)`               | Returns the length of the `s`                       |
| `s.find('v')`          | Finds the first index at which `v` occurs in `s`    |
| `s.replace(old, new)`  | Replaces all occurrences of old in `s` with new     |

Sometimes we want to split strings into parts: for example a sentence
into words. We can then use the convenient `split` method, which breaks
a string into a list. The default is by a space, but we can also specify
the delimiter.

In [31]:
sentence = 'The Python language was created by Guido van Rossum. It was named after Monty Python.'
sent_list = sentence.split() # default is split by space
print(sent_list)
sent_list = sentence.split('.') # split by full stop
print(sent_list)

['The', 'Python', 'language', 'was', 'created', 'by', 'Guido', 'van', 'Rossum.', 'It', 'was', 'named', 'after', 'Monty', 'Python.']
['The Python language was created by Guido van Rossum', ' It was named after Monty Python', '']


We often also want to do the opposite operation, joining together list
values into a single string:

In [32]:
german_cities = ['Berlin', 'Munich', 'Frankfurt'] 
print(', '.join(german_cities))

Berlin, Munich, Frankfurt


The `string.format()` method is useful for displaying values and
variables.

In [33]:
x = 3
print('The numbers are {0} and {1}.'.format(x, 4.2)) # 0 and 1 are indices for input of format()

The numbers are 3 and 4.2.


Let's combine `join` and `format`.

In [34]:
print('We visited {0} during the trip.'.format(', '.join(german_cities)))

We visited Berlin, Munich, Frankfurt during the trip.


From Python versions 3.6 onwards, we can do the same in a more
convenient way using so-called f-strings. If we use the letter f
preceding the string, we can input variables directly:

In [35]:
print(f'We visited {german_cities[0]} first.')

We visited Berlin first.


Built-in methods are very useful, as we don't need to write algorithms
from scratch to perform these common tasks. There are many further
built-in string methods available. Indeed, the built-in methods are so
numerous that even experienced programmers don't try to memorize all of
them. So how should we approach a programming problem when we don't know
if a convenient method for our problem already exists? How do we know
what methods exist and how they work?

There are several good sources of information. One is search engines: if
you need to do a specific thing with strings, search for it on Google,
for example "python string replace". More often than not, you're not the
first person to try to figure out the same problem, and the results will
usually point you to very good resources, such as Stack Overflow.
Another useful resource is Python's help, which we can access in the
console as follows.

In [36]:
help(str.split) # help on specific method
# help(str) # help on the str data type - lists all methods

Help on method_descriptor:

split(self, /, sep=None, maxsplit=-1)
    Return a list of the words in the string, using sep as the delimiter string.
    
    sep
      The delimiter according which to split the string.
      None (the default value) means split according to any whitespace,
      and discard empty strings from the result.
    maxsplit
      Maximum number of splits to do.
      -1 (the default value) means no limit.



**We'll now start working on functions in Spyder. Open the file
`ses03.py` in Spyder**. In Spyder, you can open a file by selecting File
-\> Open and locating your tutorial files in the file system. You can
then run all the code in the file by clicking on the green play button
in Spyder's toolbar, or hitting the `F5` key.

> **Important.** In the following exercises, please edit the file
> `ses03.py` and **but do not change the name of this file.** 

1 - Implement the function `item_lengths` in `ses03.py`.

**Hint**: you may want to loop through the elements of the input list
using a loop similar to

    for item in input_list:
        print(item)

and use the list method `append`.

2 - Implement the function `longest_item` in `ses03.py`.

**Hint**: you may want to loop through the indices of the input list
using a loop similar to

    for index in range(len(input_list)):
        print(index, input_list[index]) # this will print out the index and the corresponding element

and use variables to keep track of where the longest item so far was
found.

You may call your function `item_lengths` from within `longest_item`,
but don't need to do so.

## Errors and debugging

By now, you may have become quite familiar with some of Python's error
messages. You may have also have spent time trying to figure out why
your program produces a specific error, or does not produce the correct
result. These are called bugs and the process of finding and fixing them
is called debugging.

Fortunately, Python is quite is expressive in dealing with errors,
telling us what went wrong and where this happened in our code. Together
with the fact that Python code is interpreted line by line, this
expressiveness makes finding and dealing with code errors convenient
compared to many other languages.

When an error occurs, the Python interpreter tries to give us as much
information as possible about the error. For instance, consider the
following code:

In [37]:
text = 'hello'
print(txt)

NameError: name 'txt' is not defined

The first line shows that our instruction `print(txt)` returned a
*NameError*, which quite explicitly tells us that something is wrong
with the variable name. The *Traceback* then points to the line at which
the error occurred. Here the code was run in the console. If we were
running a complex program, it would point to the specific line in the
script file. Finally, details about the `NameError` are given, here,
`name 'txt' is not defined`.

This is a fairly simple case, the name is misspelt (`txt` instead of
`text`). Of course, if we would have declared the `txt` before this for
some other purpose, the error wouldn't have been raised - we would have
printed the value of that variable.

Another example of error occurs when we attempt invalid operations, such
as adding an integer to, say, a string (of characters):


In [38]:
s = 'a little string of size '
'This is ' + s + '.'
'This is a little string of size.'
s + 26

TypeError: can only concatenate str (not "int") to str

Again, Python is explicit about the problem with converting types.

These kind of bugs are easy to spot as the program crashes. But often
bugs are more difficult to spot: the program may run through but produce
an incorrect result. Or it may perhaps work in the most common input
cases, but fail spectacularly for other less common cases. Such glitches
can potentially cause a lot of damage, as examples of stock-trading
algorithm malfunctions show.

Debugging is connected to the broader process of *testing* your program.
The purpose of testing is to find any bugs that might exist. This
involves running your program for various inputs and checking whether
the behaviour is as intended. When designing complex programs, this
process starts with the design of the program, and plays an important
role in the modularization of the program into smaller parts, such that
the functioning of each can be tested separately from the rest.

Program testing is a deep topic and we will not go into much detail
here; you can read more in Guttag's Chapter 6.1. The key, for our
purposes, is finding a set of inputs that you believe would be very
likely to catch most problems in your program, much like our OK tests.
So for example, if calculating the absolute value of a number, you might
require tests with both positive and negative numbers, and perhaps zero,
and reasonably believe that this would capture most errors.


In [39]:
def abs_value(x):
    """
    Returns absolute value of parameter x

    Example use:
    >>> abs_value(2)
    2
    >>> abs_value(-1)
    1
    >>> abs_value(0)
    0
    """
    # code here

> **Advanced** This type of tests specified within the function are
> called *doctests*. You can run doctests for the file `file_name.py`
> using the following command on the command line:
>
>     python -m doctest -v file_name.py
>
> In Spyder, you can quickly run a single doctest by placing the cursor
> on a line starting with `>>>` and hitting `F9`. (Remember to define
> the function itself first!)

When working on a program and testing it on some input, we may then run
into a problem: either there's an error, as in the above simple
examples, or the program produces the wrong result.

Let's look at an example. Suppose we're working on the piece of code to
calculate percentage changes. Here we haven't written any tests for our
code yet.

This code looks like it should work:

In [40]:
def pct_change(L):
    """
    Calculates per-period percentage changes (returns)

    Parameters:
        L: a list of numbers 

    Returns a new list of percentage changes
    """
    pct_change_list = []
    for ind in range(len(L)):
        change = (L[ind]-L[ind-1])/L[ind-1]
        pct_change_list.append(change)

    return pct_change_list

# test with a specific input
pct_change([4, 2, 1, 2])

[1.0, -0.5, -0.5, 1.0]

What should happen with our test? If we calculate the percentage changes
by hand, the first change should be -50% (from 4 to 2), then the second
again -50% (from 2 to 1), and then 100% (from 1 to 2).

But the result we now get is this:

We have four values instead of three, and the first value is 100% (1.0
is equal to 100%)!

Let's try to debug the problem. Often, the first debugging step is
**adding a print statement** to understand what our program is actually
doing. Here, it looks like the variable `change` is somehow calculated
wrong. So, let's print out some details on how it is calculated.

In [41]:
 def pct_change(L):
    """
    Calculates per-period percentage changes (returns)

    Parameters:
        L: a list of numbers 

    Returns a new list of percentage changes
    """
    pct_change_list = []
    for ind in range(len(L)):
        print('ind', ind, 'L[ind-1]', L[ind-1], 'L[ind]', L[ind])
        change = (L[ind]-L[ind-1])/L[ind-1]
        pct_change_list.append(change)

    return pct_change_list

Now, we get the following output:

In [42]:
pct_change([4, 2, 1, 2])

ind 0 L[ind-1] 2 L[ind] 4
ind 1 L[ind-1] 4 L[ind] 2
ind 2 L[ind-1] 2 L[ind] 1
ind 3 L[ind-1] 1 L[ind] 2


[1.0, -0.5, -0.5, 1.0]

At index 0, the calculation is wrong: it calculates the difference from
2 to 4. That is, when we access `L[ind-1]` at `ind=0`, we're accessing
`L[-1]`, which is the *last value of the list*. So, we're calculating
the change from the last to the first value, which does not make sense
here.

One way to fix this is by a condition. We'll say that our program should
not make the calculation at index zero, but produce the value `None`
instead. THen, it should work.

In [43]:
def pct_change(L):
    """
    Calculates per-period percentage changes (returns)

    Parameters:
        L: a list of numbers 

    Returns a new list of percentage changes
    """
    pct_change_list = []
    for ind in range(len(L)):
        if ind == 0:
            change = None
        else:
            change = (L[ind]-L[ind-1])/L[ind-1]
        pct_change_list.append(change)
    return pct_change_list

### Debugging in Spyder

Sometimes adding print statements is not enough to figure out the
problem in the code. We can then use Spyder's built-in debugging
facilities.

Let's try this by creating a new empty Python script file and copying
the code above to the file. Then save the file and give it a name of
your choice, for example `debug_test.py`.

Spyder has a debugging menu: we can start debugging the file by clicking
on the blue play/pause icon or using `Ctrl+F5`. Notice the change in the
console, which no prompts you with `ipdb>` to signify we've entered the
debugger.

We can then run the file line-by-line using `Ctrl+F10`, or the next
button in the toolbar. Start the debugger use the command to run the
next line. The color of the line where the debugger is will change as
you go along. When you reach the line

    pct_change([4, 2, 1, 2])

instead of stepping to the next line, let's step *into* the function.
This moves the execution point inside the function, where we can move
line by line with `Ctrl+F10`. This will define the variables as we go
along. Stepping into functions like this is useful for seeing what
happens inside our code and what may be causing an error. The Spyder
variable explorer on the top right hand side of the window will show you
the values and types of the variables at any given point in time. This
can be very useful for spotting problems. We can stop the debugger with
`Ctrl+Shift+F12` or the blue stop button.

If we want to skip to a specific point of code to debug a problem, we
can also set a breakpoint on that line (using `F12`), and run the code
until that breakpoint in the debugger.

Alternatively, if we run code in Spyder and our program crashes, we can
write `%debug` into the console. This will bring the execution back to
the point where the problem happened and we can inspect the variables at
that point. This is useful for "post-mortem" debugging.

The value of debugging using Spyder is being able to step through the
code and seeing how the variables change and where the problem occurs.

Debugging is a skill that takes a while to learn. Some common types of
bugs include:

-   Referring to a wrong variable name
-   Passing arguments to a function in the wrong order
-   List indexing
-   Mixing list values and indices when looping
-   Forgetting parentheses () when calling a function - this will just
    refer to the function object

**Exercise.** Consider the following
function fact(n). Given an integer \$n\$, it should return the factorial
\$n!\$, that is, multiply together the integers from 1 to \$n\$.

In [44]:
def fact(n):
    """ A buggy factorial"""
    accumulator = 1 # fact(0) = 1
    for i in range(1, n):
        accumulator *= i
    return accumulator

However, the code does not quite work. Copy the code into Spyder and use
the debugger to find the mistake.

## All done!


## Review

-   We use for loops and while loops for repeating actions.
-   Python *data types* like strings and lists combine data with methods
    that work specifically on that kind of data.
-   Lists are a commonly used and flexible ordered compound data type in
    Python.
-   Both strings and lists have a number of methods that make coding
    much easier.

How would you explain to a classmate:

-   How does a for loop work in Python? How about a while loop?
-   How can we loop through lists of items?
-   What is slicing?
-   What are methods and how do we use them?

## Trump tweets and libraries

In the next optional exercises, we will apply the ideas we have learned
through a concrete example. The US President Donald Trump is a prolific
Twitter user. The file `trump_tweets.csv` contains a set of his tweets
up to July 2016 collected by Github user
[sashaperigo](https://github.com/sashaperigo/Trump-Tweets).

The data are in `csv` format, which stands for "comma-separated values".
A csv file is much like a table, and we could open it in Microsoft
Excel. But let's read the data into Python. For this, we'll need to use
a Python *library*.

We have previously used some of Python's built-in functions, like
`print` and `abs`. In fact, our Python installation comes with a vast
number of readymade functions that we can use. But because of the sheer
number of options, most of these are not made immediately available like
`print`, but stored in libraries that we *import* as we need them.

For the case at hand, Python has built-in functions for reading csv
files. They are stored in a library called `csv`. When we need these
functions, we use the command `import csv`, which makes the functions
available for use.

Find the function `read_trump_tweets` in the file. We will use this
function to load the data to analyse. It takes as argument the name of
the csv file containing the tweets. Our file is trump_tweets.csv, so
type in Spyder's console:


    tweets = read_trump_tweets('trump_tweets.csv')

This will read the tweets into a list. If the file does not load, check
that Spyder's working directory is the one where the csv file is. You
can change the directory by clicking on the folder icon in the top right
corner of the window.

Here's the code we use for reading Trump's tweets into a list. We'll
return to the details of reading and writing files later in the module.
For now, we're happy just to use the end result.


In [None]:
import csv
def read_trump_tweets(file_name):
    """
    Reads Trump tweets csv file into a list
    """
    with open(file_name, encoding="utf8") as csvfile:
        data = csv.reader(csvfile, delimiter=',')
        tweet_list = list(data)
        return tweet_list
tweets = read_trump_tweets('trump_tweets.csv')

Let's see what's in the list by printing out the first couple of lines.
Run the following code in the console:

In [None]:
print(tweets[0]) # headers
for tweet in tweets[1:4]: # first tweets: do not repeat the header line, so index from 1
    print(tweet)

The first line is a list containing headers, and the next lines have the
corresponding elements for the first tweets. Now let's analyse the
tweets.

### Question 6: Tweet analysis

1 - Looking at the first tweets, we see that the values of Favorites and
Retweets are integers, but they are in string form in the list. Complete
the function `str_to_int` to convert them to integers in Python. 

The function should return a new list with just the integer values of
each item at a desired index. You'll need to create a new list, loop
through the elements of the input list, and append values to the new
list in the loop. You can create an empty list using `new_list = []`.

Recall from above that you can loop through elements of a list with

    for elem in L:
        print(elem) # this will print each element in L in order

To effect the conversions, run the following commands in Spyder's
console:

In \[ \]:

    favorites = str_to_int(tweets, 2)
    retweets = str_to_int(tweets, 3)

2 - Having converted the integer columns, let's look at the most popular
tweets. Complete the function `value_at_least` to find the tweets with
at least a specific number of retweets. 

**Hint**: In writing this function, we care about the indices of the
underlying list. To loop through list indices, use

    for index in range(len(L)):
        print(index, L[index]) # this will print out the index and the corresponding element

To test the function on the tweet data, run the following commands in
Spyder's console. Test with different thresholds. What are the tweets
about?

In \[ \]:

    high_retweets = value_at_least(retweets, 20000) # get tweet indixes with high number of retweets
    for tw_ind in high_retweets: # print the text at these indices
        print(tweets[tw_ind + 1][0])

## More on Functions

We've learned to use functions as names for groups of statements to
perform a specific task. For example, we might define a function to
calculate the squared value of a number.

In \[ \]:

    def square(x):
        return x*x

Defining the function allows us to give a name to the process obtain the
square. With the function, we can abstract from the details of the
implementation and focus on the concept of squaring, which we can now
repeat for any number. Now we're going to build on this idea to create
functions that manipulate other functions to build higher-order
abstractions.

Suppose we want to create a list of the squared and cubed values of the
natural numbers up to `n`. We could do the following.

In \[ \]:

    def create_list_squares(n):
        num_list = []
        for number in range(n):
            num_list.append(number*number)
        return num_list
            
    def create_list_cubes(n):
        num_list = []
        for number in range(n):
            num_list.append(number*number*number)
        return num_list

This is repetitive. They are more or less the same function, just with
different calculations. Because the functions share a common pattern, we
can create an abstraction: each function creates a list of terms, but
applying a different calculation. We do this by making the specific
calculation an argument to the function.

In \[ \]:

    def create_list_calc(n, func):
        num_list = []
        for number in range(n):
            num_list.append(func(number))
        return num_list

    def square(x):
        return x*x

    square_list = create_list_calc(5, square)

The function now takes two arguments: the number up to which we create
the list, and the function that we use in the calculation. Functions,
just like lists, integers, and strings, are objects in Python, meaning
we can use them as parameters. Here we define the square function and
pass it as an argument to create_list_calc. When the latter is executed,
it will call the function square with each number from zero through
four.

We could now define any calculation on a number, and create a list
applying the calculation.

**We can also define local functions within another function.** This
works just like defining a local variable inside a function. This is
sometimes convenient if we don't need the function globally, but only
briefly within another function. Another use is that sometimes we may
wish to use a function to generate new functions.

Let's look at an example. The function `power_creator` defines and
returns a new function that calculates the power of a number. The
parameter of the power_creator is n, which we use in the definition as
the power that we'd like the new function to calculate.

We can use `power_creator` to generate functions to calculate different
powers. Each time we call it, it defines a new function with the
argument we give, and returns it. We can then call the resulting
function, which calculates the squares, or the cubes, of whatever the
argument we give it.

In \[ \]:

    def power_creator(n):
        def f(x):
            return x**n
        return f 

    square = power_creator(2)
    print(square(5))
    cube = power_creator(3)
    print(cube(2))

User-defined functions are powerful because they allow us to create
abstractions. We can take a procedure, an algorithm, and give it a name,
adding it to our programming language. We can then use our functions to
express more complex operations, calculations, and patterns. Here we've
seen two a bit more advanced examples of how we can manipulate
functions: we can use them as arguments to other functions, and return
them as results of other functions.

Let us now practice some real-world problem solving using these
concepts.

## More on tweets

Let's think about tweets again. They are contained in lists within a
list, and there are many computations we might want to apply to each of
the tweets in the list. Just now, we looped through to do some
conversions. Examples could include finding hashtags or mentions, or
counting the number of capital letters used. With the idea of using a
function as an argument, we can create a function that applies any
function to each item in a list, and then throw different actions at
this function.

Below is a function that applies a function on a list, `apply_function`.

In \[ \]:

    def apply_function(a_list, function, start_from):
        """
        Applies parameter function to items of a_list. 
        
        Parameters:
            a_list: a list of lists
            function: a function to be applied to each item of a_list,
            start_from: index of a_list to start from

        Returns: 
            a new list with the results.
        """
        new_list = []
        for item in a_list[start_from:]:
            new_list.append(function(item))
        return new_list

For example, we could define the following function to get the length of
the tweet text.

In \[ \]:

    def tweet_length(tweet):
        return len(tweet[0])

Then we can use `apply_function` to add this data to our list.

In \[ \]:

    tw_lengths = apply_function(tweets, tweet_length, 1)

If we wanted to add this data to our `tweets` list, we could loop
through it to append this information.

### Question 7: Counting capitals

Let's write another function to apply to our data, this time counting
the capital letters in the tweet. Complete the function
`count_capital_letters` in `ses03.py`. 

**Hint:** the tweet text string is contained in the first element of the
list. Loop through the string and use the string method `string.upper()`
to check if each character is uppercase. The method works like this:

In \[ \]:

    for char in 'Hello':
        print(char.isupper())

To create the list of capital letter counts in the tweet data, run the
following command in Spyder's console.

    capital_counts = apply_function(tweets, count_capital_letters, 1)

How can we now check which tweets that have high capital letter counts?
One way to do so would be to run our previous function `value_at_least`.

The advantage of `apply_function` is that we could now write many other
functions to work similarly to `count_capital_letters`. You might try
for example counting hashtags, mentions, specific words, etc.

We will learn about more Python data structures in the coming sessions.
Building on the ideas introduced today, these will allow us to
conveniently do much more powerful data analysis.

### Note: Advanced Looping

The above looping methods allow us to go through any data. For some
cases, Python provides more convenient advanced built-in methods to do
so.

When looping through a list in Python we often need to keep track of the
index. This can be done like this:

    some_list = [3,6,4,8,'joe',875,'76'] 

    index = 0
    for item in some_list:
        print('The value is:',item,', the index is:',index)
        index += 1

Nevertheless this is not a *'pythonic'* way to do this. Python provides
us with the built-in function enumerate(), which returns the index along
with the value of the cell:

    for index, item in enumerate(some_list):
        print('The value is:',item,', the index is:',index)

In many cases we need to loop through two lists at the same time.
Instead of using the same index for two different lists, this can be
done using the built in function zip( ):

    names = ['Mary', 'Nick', 'John', 'Sue'] 
    grades = [87, 45, 75, 91] 

    for name, grade in zip(names, grades): 
        print(name, ' got ', grade)

### Question: Palindromes

A *palindrome* is a string that reads the same forward and backwards e.g
'radar'. Write a function that tests whether an arbitrarily long string
is a palindrome. How should we go about this? How do we reverse a
string? There is a hiccup though - a string might contain capital
letters and punctuation. These will cause trouble in comparisons. For
palindrome checking, we need to compare only lower case strings with no
punctuation. Complete the functions `to_chars` and `is_palindrome` in
the skeleton file.

In \[ \]:

    def to_chars(s):
        """
        Strips the string s from any non-letter characters (English alphabet), 
        and makes it lower-case.
        
        Parameters:
            string s
        Returns: 
            string with only lower-case chars of the English alphabet
            
        Example use
        >>> to_chars('HeLlo!')
        'hello'
        >>> to_chars("Never (1) odd or (2) even...")
        'neveroddoreven'
        """
        # DON'T CHANGE ANYTHING ABOVE
        # YOUR CODE BELOW THIS
        alphabet = 'abcdefghjiklmnopqrstuvwxyz' # English alphabet


    def is_palindrome(s):
        """ 
        Palindrome checker
        
        Parameters:
            s is string that has gone through to_chars()
        
        Returns True if s is palindrome, False otherwise
        
        Examples:
        >>> is_palindrome('neveroddoreven')
        True
        >>> is_palindrome(to_chars('A man, a plan, a canal: Panama.'))
        True
        >>> is_palindrome('Hello')
        False
        """
        # DON'T CHANGE ANYTHING ABOVE
        # YOUR CODE BELOW THI


### Question 9: Strings within strings

**Exercise.** Write a function `search_word(s1, s2)` that takes two
strings (`s1` and `S2`) and returns the longest sequence in `s1`(a
substring) that also appears in `s2`. This works as follows:

    >>> search_word("searching for a substring", "subway")
        'sub'

Complete the function `search_word` in `ses03_extra.py`.

**Hint:** This is a challenging exercise. You will probably need to
perform nested looping...