## Introduction

These exercises touch only a very small subset of Python. The reason this subset has been chosen is that:
1. It contains basic Python idiom that deviates substantially from what you are used to in languages such as C# and Java.
2. We have to be familiar with the idiom in it as we will make extensive use of it in solving data science problems.

For help in solving the exercises you can consult this online [Python tutorial](https://docs.python.org/3.4/tutorial/).

## String Exercises

Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes ('...') or double quotes ("...") with the same result.

#### String Indexing

In [1]:
# Get the substring 'Fontys' from 'Fontys Machine Learning'

# Solution without variables
'Fontys Machine Learning'[:6]

# Solution with variables
a = 'Fontys Machine Learning'
b = a[0:6]        # Characters from position 0 (included) to 6 (excluded)
print(b)

# As you already saw, if we start from 0, we usually leave it out
b = a[:6]  

Fontys


In [3]:
# Get the substring 'Machine' from 'Fontys Machine Learning'
print('Fontys Machine Learning'[7:15])

Machine 


``str[start:end:stride]`` slices str from position start to position end taking steps of length stride. Note how the start is always included, and the end always excluded. This makes sure that ``s[:i] + s[i:]`` is always equal to ``s``. Negative strides go backwards through string. ``[:]`` (single colon) refers to all elements of a string. ``[1:]`` is equivalent to "1 to end".

One way to remember how slices work is to think of the indices as pointing between characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of n characters has index n, for example:

<pre>
 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
 0   1   2   3   4   5   6
-6  -5  -4  -3  -2  -1
</pre>

In [3]:
# What is the result of this expression, explain!
# The list is being sliced from 0 to the length of fontys, which is the desired length, with a step of one
'Fontys'[:] == 'Fontys'[0:len('Fontys'):1]

True

In [5]:
# Using slicing, create the string "Bye' from 'Bicycle' (you have to collect characters at positions 0, 3 and 6)
print('Bicycle'[::3])

Bye


In [12]:
# Reverse the string 'Fontys'; both this is more idiomatic and substantially faster
print('Fontys'[::-1])

sytnoF


In [14]:
# Using reversed() seems obvious but is tricky as reverse returns an iterable
# in the end it leads to this contrived, though instructive example
# Why this complexity?
# 1st: reversed() returns an iterable object
# 2nd: you can use str.join() to concatenate the strings in an interable
''.join(reversed('Fontys'))

'sytnoF'

In [19]:
# now using this reversal idiom: decide if a string is a palindrome
s = 'legermeetsysteemregel'
s == ''.join(reversed(s))

True

#### Create Formatted String Output

In [21]:
# Create the string 'The %s at %s has %d students!' where the placeholders
# are substituted with 'FHICT', 'ADS Minor' and 40

# See: https://docs.python.org/3.4/library/string.html#format-string-syntax
"The {0} at {1} has {2} students!".format("ADS Minor", "FHICT", 40)

'The ADS Minor at FHICT has 40 students!'

## List Exercises

Python knows a number of compound data types, used to group together other values. The most versatile is the list, which can be written as a list of comma-separated values (items) between square brackets. Lists might contain items of different types, but usually the items all have the same type.

#### List Creation

In [9]:
# Create a list of numbers 1,2,3
[1,2,3]

[1, 2, 3]

In [10]:
# Create a list of (1-character) strings from the string 'Fontys'
list('Fontys')

# It is worth thinking about why this works: the list class constructor/converter takes an iterable as argument.
# A string can be iterated over: You can use a string as collection in a for loop, hence the iterator protocol 
# has been implemented for string (as you might have expected).

# The following list comprehension should therefore also work
[c for c in 'Fontys']

['F', 'o', 'n', 't', 'y', 's']

In [23]:
# Create a list of the objects 1, [], 'Fontys', ['Fontys, 1]
[1, [], 'Fontys', ['Fontys', 1]]

[1, [], 'Fontys', ['Fontys', 1]]

In [27]:
# Create a list for the integers between 0 and 20 ([0..20), or 0 <= n < 20)
# Use the range() function, which is actually a type (immutable sequence) that can be iterated over.
list(range(21))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

In [30]:
# Create the integer list [11..21)
list(range(11, 22))

[11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]

In [31]:
# Create a list for all even integer numbers between 90 and 100.
[x for x in range(90, 101) if x% 2 == 0]

[90, 92, 94, 96, 98, 100]

In [32]:
# Create a list of  all integers between 0 and 100 that can be divided by 3 and are uneven
[x for x in range(0, 101) if x % 3 == 0 and x % 2 != 0]

[3, 9, 15, 21, 27, 33, 39, 45, 51, 57, 63, 69, 75, 81, 87, 93, 99]

#### List assignments

In [3]:
# Suppose the following list of lists
ll = [[1],[2],[3]]
ll2 = ll

# Assign to named variables using sequence unpacking
l1, l2, l3 = ll
l2[0] = 3
ll[1][0] = 5

# Same problem here: unpacking has assigned (copied) references
ll[1][0] == l2[0]

True

In [10]:
l = [1,2,3]
l1 = l
l2 = l[:]
l1[0] = 3
l2[0] = 5
# Surprise: names of non-basic type values are always references
# Assignment of non-basic type values is reference copy: shallow copy (as opposite to deep copy), both l and l1 refer 
# to the same list!
l1 == l, l2 == l, l[0]== 3

(True, False, True)

#### List Indexing and Slicing

In [15]:
# First some examples. Always remember:
# - the list (sequence) is always traversed (processed) left to right
# - the first index specifies the first item you want in your slice
# - the second index specifies the first item you don’t want in your slice
# - the third slice index is the stride length with which you step through the list to pick the elements of your slice

print(1, [1,2,3,4][:])
print(2, [1,2,3,4][1:])
print(3, [1,2,3,4][:1])
print(4, [1,2,3,4][:-1])
print(5, [1,2,3,4][0:4])
print(6, [1,2,3,4][0:4])
print(7, [1,2,3,4][1::-1])
print(8, [1,2,3,4][1::-3])
print(9, [1,2,3,4][:])
print(10, [1,2,3,4][::])        # often used to make a complete (shallow) copy of the list
print(11, [1,2,3,4][::1])
print(12, [1,2,3,4][::-1])      # often used to make a reversed copy of the list
print(13, [1,2,3,4][0:4:-1])    # if stride is negative, start < stop == True

1 [1, 2, 3, 4]
2 [2, 3, 4]
3 [1]
4 [1, 2, 3]
5 [1, 2, 3, 4]
6 [1, 2, 3, 4]
7 [2, 1]
8 [2]
9 [1, 2, 3, 4]
10 [1, 2, 3, 4]
11 [1, 2, 3, 4]
12 [4, 3, 2, 1]
13 []


In [34]:
# Get list [1,4] from list [1,2,3,4]
[1, 2, 3, 4][::3]

[1, 4]

In [39]:
# Get first and last element of any list in variables first and last
l = [1,2,3]
print(l[0], l[len(l) -1])


1 3


#### List and String

In [41]:
# Create the string 'ads' from list ['a', 'd', 's']
''.join(['a', 'd', 's'])

'ads'

In [7]:
# Why does str(['a', 'd', 's']) not work in previous exercise?
# Answer: str() serializes its parameters, creating: "['a', 'd', 's']"
str(['a', 'd', 's']) == str(['a', 'd', 's'].__str__())  

True

In [43]:
# Create the string 'a d s' from string 'ads'
' '.join('ads')

'a d s'

In [44]:
# Create the string '123' from list [1,2,3]
# Try: "".join([1,2,3]); why does it not work?                  # Type error: expected str instance, int found
"".join([1,2,3])

TypeError: sequence item 0: expected str instance, int found

In [45]:
# Create the list [1,2,3] from string '123'
''.join([str(x) for x in [1,2,3]])

'123'

The method ``split(str,num)`` returns a list of all the words in the string, using ``str`` as the separator (splits on all whitespace if left unspecified), optionally limiting the number of splits to ``num``.

In [46]:
# Create the list ['a', 'd', 's'] from 'a d s' with split()
'a d s'.split(' ')

['a', 'd', 's']

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.

For example, assume we want to create a list of squares, like:

<pre>
>>>squares = []
>>>for x in range(10):
...   squares.append(x**2)
...
>>>squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
</pre>

Note that this creates (or overwrites) a variable named x that still exists after the loop completes. We can calculate the list of squares without any side effects using:

<pre>
squares = [x**2 for x in range(10)]
</pre>

In [1]:
# Or with list comprehension
[x for x in 'ads']

['a', 'd', 's']

In [12]:
# Create a function is_palindrome() that returns if a sentence (ignoring blanks, character case and punctuation) is a 
# palindrome, e.g is_palindrome('A man, a plan, a canal, Panama') == True

def is_palindrome(s):
    s = [x.lower() for x in s if x.isalpha()]
    return s == s[::-1]
    
is_palindrome('A man, a plan, a canal, Panama')

True

## Tuple Exercises

We saw that lists and strings have many common properties, such as indexing and slicing operations. They are two examples of sequence data types. Since Python is an evolving language, other sequence data types may be added. There is also another standard sequence data type: the tuple.

A tuple consists of a number of values separated by commas.

In [13]:
# Create the list of tuples [(1, 2), (3, 4)] from [1, 2, 3, 4]
l = [1, 2, 3]

# the non-idiomatic way would be:
l1 = []
for i in range(0, len(l)-1, 2):
    l1.append((l[i], l[i+1]))
l1

# Test what happens if l is the empty list, has 1, 2 or any uneven number of elements

# It only takes the first few

[(1, 2)]

In [18]:
# Create the list of tuples the idiomatic way, using list comprehension
# run your tests from previous exercise
a = [1,2,3,4]
[x for x in zip(a[::2], a[1::2])]

[(1, 2), (3, 4)]

In [17]:
# Now, create the same list of tuples using zip()
# Again: Run your tests
a = [1,2,3,4]

[x for x in zip(a[::2], a[1::2])]

[(1, 2), (3, 4)]

In [28]:
# Flatten the list of tuples [(1, 2), (3, 4)] to [1, 2, 3, 4]
lt = [(1, 2), (3, 4)]

# With a loop
lst = []
for x,y in lt:
    lst.append(x)
    lst.append(y)
    
lst

[1, 2, 3, 4]

In [41]:
from itertools import chain
# With list comprehension
lt = [(1, 2), (3, 4)]
list(chain(*lt))

[1, 2, 3, 4]

In [32]:
# Explain (by consulting the ref manual) why this also works
list(sum(lt, ()))

# Under the hood, it translates to:
() + (1,2) + (3,4)

(1, 2, 3, 4)

In [51]:
# Now stretch this and sum all numbers in lt
# make use of function sum() and the fact that it is an overload function:
# sum() has different meanings for numbers and for lists! See ?sum
lt = [(1,2), (3, 4)]
sum(sum(lt, ()))

10

In [35]:
'lgh@fontys.nl'.split('@')

['lgh', 'fontys.nl']

In [53]:
# Split mail address lgh@fontys.nl in user name (lgh) and domain
# using a format string specifier, create the string 'lgh(at)fontys.nl' from addr
# you could use the parameter unpacking operator *
'lgh@fontys.nl'.replace('@',"(at)")

'lgh(at)fontys.nl'

## List as Matrix Exercises

In [None]:
# A matrix constructed from lists is a list of lists

In [36]:
# Add a P(ass), F(ail) mark to a list of numeric marks
# transform [6,4,7] into the matrix [[6,'P'], [4,'F'], [7,'P']]

marks = [6, 4, 7]

# With a for loop
l = []
for m in marks:
    l.append([m, 'P' if m > 5 else 'F'])
l

[[6, 'P'], [4, 'F'], [7, 'P']]

In [None]:
# With list comprehension (more idiomatic)
[[m, 'P' if m > 5 else 'F'] for m in marks]

In [41]:
# Transpose the 2 x 3 matrix [[1,2,3], [4,5,6]] to the 3 x 2 matrix [[1,4], [2,5], [3,6]]
# Generalize to transposing n x m matrix to m x n matrix; a straightforward list traversal solution would be nested 
# for loops
mx = [[1,2,3], [4,5,6]]
rl = []
for i in range(len(mx[0])):
    rll = []
    for l in mx:
        rll.append(l[i])
    rl.append(rll)
rl

[[1, 4], [2, 5], [3, 6]]

In [42]:
# We can morph this nested for loop solution into the more pythonic nested list comprehension solution
[[l[i] for l in mx] for i in range(len(mx[0]))]

[[1, 4], [2, 5], [3, 6]]

In [43]:
# But the seasoned python pro will probably use zip and the unary unpack (*) operator (https://goo.gl/bHWS2S)
# on top of that: the zip() solution is faster
[[t] for t in zip(*mx)]     # cool!

[[(1, 4)], [(2, 5)], [(3, 6)]]

In [1]:
# Sneak preview for next week, the easy way with numpy!!
import numpy as np
a = np.array([[1, 2], [3, 4]])
print(a)
a.transpose()      # a.T also works!

[[1 2]
 [3 4]]


array([[1, 3],
       [2, 4]])

In [44]:
# Create the matrix [[1,0],[2,0],[3,2]] by merging the lists [1,2,3] and ['O', 'O', 'G'], where each character is 
# mapped to its position in the lookup list ['O', 'V', 'G']
# Tip: use the index() method to determine the position of a value in a sequence
[[t] for t in zip([1,2,3],
                  [['O', 'V', 'G'].index(c) for c in ['O', 'O', 'G']])]  # whooa, can you do that in your other language

[[(1, 0)], [(2, 0)], [(3, 2)]]

## Dictionary Exercises

In [4]:
# You all know dictionaries (associative arrays) from C# and Java; here is how you define them in Python
d = {'k1': [1], 'k2': [2], 'k3': [3]}
d['k2'] = 3
d['k4'] = [4]
d

{'k1': [1], 'k2': 3, 'k3': [3], 'k4': [4]}

In [11]:
# Or:
keys = ['k1','k2','k3']
values = [[1],[2],[3]]
dict(zip(keys, values))    # zip() returns iterable of typles that can be converted into dictionary

{'k1': [1], 'k2': [2], 'k3': [3]}

In [12]:
# Or, using dict comprehension
# You might be tempted to do {k: v for k in keys for v in values}
# Why doesn't this work?
{k: v for k, v in zip(keys, values)}

{'k1': [1], 'k2': [2], 'k3': [3]}

In [6]:
list(range(2,1))

[]

In [2]:
# Create a Python function that returns a word count dictionary for a piece of text
# e.g. count_words("This and this and this also") == {'this': 3, 'and': 2, 'also': 1}
def count_words(s):
    d = {}
    for w in s.split(' '):     # This is a very simplistic tokenizer
        w = w.lower()
        d[w] = 1 if not d.get(w) else d[w] + 1
    return d

# note that dict is no sequence type and comparison has different meaning
count_words("This and this and this also") == {'this': 3, 'and': 2, 'also': 1}

True