# Python beginners workshop

hello and welcome to the python user group beginners workshop with jupyter notebooks. You can write your code into the coding boxes. To execute code, click on the code box and press Strg+Enter. (or the play-button above)

In order to make code more readable, most programming languages have a style guide. Before writing serious code, make yourself familiar with the style guide for python here: 
https://google.github.io/styleguide/pyguide.html

## output 

we start with the canonical "hello world" example, as in every programming language tutorial.
Printing output to the console in python is very easy. Just use the __print__ command

In [None]:
# for commenting use #
print "hello world"

if you feel very helpless, just call for help. (type "quit" into the search field to leave the help)

In [None]:
help()

## variables and data types

Python knows the following data types: 
**string, numerical, list, tuple and dictionary. **
python automatically assigns a type to the variable. Python is dynamically typed.
Choose any name for your variables you like. However, it's good style to name them in a meaningful way.
Do not choose any of these reserved keywords as variable names:

and       del       from      not       while    
as        elif      global    or        with     
assert    else      if        pass      yield    
break     except    import    print              
class     exec      in        raise              
continue  finally   is        return             
def       for       lambda    try   
(https://docs.python.org/2/reference/lexical_analysis.html)
#### Exercise:
try to initiate variables of different types. Can you initialte variables of type string (str), float, integer (int), boolean (bool) ? 

In [None]:
# initiate a variable by setting it equal to some value or string. Strings are enclosed by "" or ''. 
myname = "anna" 
# you can concatenate strings and variables with comma for printing (in python 2.7, python 3 is a bit stricter)
print "hello my name is", myname, " - welcome to the python tutorial!"  # just in case I forgot to introduce myself ;)
type(myname)  # find out about the type of a variable


### Numerical variables
Numerical variables can have different formats. Integer, float, long, complex. We will only look at integer and floating point numbers. Python will automatically assign a reasonable type. But the type can also be changed.

#### Exercise:
check the type of the given variables. What do you expect? What happens if you change the type? print the variables and their corresponding type. Can you change string variables to numerical variables? 

In [None]:
wort1 = 'hallo'
zahl1 = 123
zahl2 = 123.4
zahl3 = int(zahl2)  # transforms variable zahl2 to type integer
zahl4 = float(zahl1)
wort2 = str(zahl2)
print type(zahl3), zahl3
print type(zahl4), zahl4


#### Excursion for python 2.7:
Be careful, in python 2.7 integer division results in integers. This is not always the result we want. However, if one of the variables in the operation is float, the result will be float. We can cast the variables to float. When the floating point number is cast to integer, the closest lower integer is taken. This is not always the closest. => rounding error
Try out other operations:  + - / \*  \** 


In [None]:
zahl = 246
bruch = zahl/10
print bruch
bruch = float(zahl)/10
print bruch

### string

A string is similar to a list of characters. We will treat lists later.  
Every single character can be accessed.
The first character sits on position 0, the second character on position 1. If the string has length N, the last character has position N-1.
We can also access characters counting from the back of the string. The last one has position -1, the second last -2 etc.

In [None]:
mystring = "schwertfischflossen"
N = len(mystring)
print "length of string is", N
print "the last character is", mystring[N-1]
print mystring[-1]

#### Exercise
try accessing all the vowels in the following string

In [None]:
mystring = "schwertfischflossen"
print mystring[-2]
print mystring[4]

You can also access substrings using a range of positions, i.e. for accessing the 3rd,4th and 5th letter, use [2:5].
Note that the last position of the range is excluded.
Try to understand what the code below will return before executing it. Try accessing single letters and substrings at different positions. For example, access 'zog' and 'Her' in the last string.

In [None]:
mysentence = "Norbert legte die Schere auf den Bug"
print mysentence[:2], mysentence[-3:], mysentence[20:24]
sent = "Der Herzog"

Python prints numerical values just as well as strings without the need of prior transformations. 

#### Exercise:
What happens if you add a numeric variable to a string of numbers? What if you cast the string to integer first? Or if you cast the integer to string? 

In [2]:
myvar = 42
mystring = '57'
print myvar, '+', mystring, '=', str(myvar) + mystring

 42 + 57 = 4257


strings are concatenated, integers are added.

#### string functions and importing modules 
we can use a large quantity of libraries written in python. We import them with the "import" command.  
You can find a list of possible libraries here: https://docs.python.org/2/library/index.html  
Once we have imported the module, we can access all its functions via *modulename.functionname*

In [None]:
word = '  Tomorrow, tomorrow '
import string  # importing the library for string functions
print string.lower(word)  # turns all upper case letters to lower case letters
print string.strip(word)  # removes trailing and leading whitespaces
print string.replace(word, 'morrow','night') # replaces parts of the string 
print string.split(word)  # splits the string into a list of substrings. If no separator is given, whitespaces are used.

In order to better understand, how functions work, we can either use the python documentation at docs.python.org, or quickly check with 'help(functionname) or ?.

In [None]:
help(string.replace) # quick overview of the function replace()
?string.replace()  # special jupyter notebook way of accessing help (for strings function replace())

In [None]:
help(string)  # very thorough information on the string module and all its functions. This is quite long.

If writing the whole module name becomes tiresome, we can also change the name when importing the module.

In [None]:
import string as sg  
print sg.lower(word)

If we only need one or two functions from the module, but we use them very often, we can import the function directly, so we do not need to also write the module name.

In [None]:
from string import split
print split(word, 'o') # uses the letter 'o' as splitting criterium
print string.lower(word)
print lower(word)  # this should give an error. it will disappear, once you've run the next cell.

Lazy people like to just import all functions. This is not very good practice though. 

In [None]:
from string import *
print lower(word)
print replace(word,'orrow','')

since word is a string, we can use a much more practical way of applying the functions. Use this from now on for string modules.

In [None]:
print word.lower()
print word.split(',')



#### improve printing

If we want to have more power over the output of the print function, we can use a slightly modified format. We can define the format our variables should be in. %s = string, %i = integer, %f = float. 
The alternative way of printing "hello world" is then:

In [4]:
name = 'welt'
print 'hallo', name
print("hello %s"%name)


hallo welt
hello welt


When we expect a type that's different from the corresponding variable type, i.e. using %f with an integer variable, then python tries to convert the variable into the correct type before printing it. 
If we print numbers, we can now define the number of places behind the comma. For two places, use %.2f 
Play around a bit with the example, try to add variables, see what happens if you use %22.3s  instead of %s etc.

If we want to print a new line, use "\n".  (also, "\t" for tabs)  

In [5]:
zahl = 2
zahl2 = 3.142592
type(zahl)
print("%s mal"%zahl)  # integer to string
print("%i macht"%zahl2) # float to integer. This does a conversion before printing: int(zahl2)
print("%f - widdewiddewitt und"%(zahl+zahl))  # integer to float
print("%.2f macht neune"%zahl2) #float with fewer positions
# Note:  within brackets we can continue the expression in the next line. 
# However, strings cannot be continued with single "". 
print("%i mal %i macht %s widewidewitt und %s macht neune - \nich mach mir"
      "die %s wide wie sie mir gefaellt"%( zahl, zahl2, zahl*2, zahl2, name))
#print("hello %f"%name) #==>error # string to float 


2 mal
3 macht
4.000000 - widdewiddewitt und
3.14 macht neune
2 mal 3 macht 4 widewidewitt und 3.142592 macht neune - 
ich mach mirdie welt wide wie sie mir gefaellt


As we have seen in the last exercise, we can manipulate the position and the length of a string with %2.4s. (any number for 2 or 4)
There's another way to print, which becomes a little easier for strings with many variables. 

In [6]:
print("{zwei} mal {drei} macht {vier} widewidewitt"
      "und {drei} macht neune").format(zwei=zahl, drei=int(zahl2), vier=zahl*2) 

print("%(zwei)i mal %(drei)i macht %(vier)i widewidewitt"
      "und %(drei)i macht neune")%{'zwei':zahl, 'drei':zahl2, 'vier':zahl*2}

2 mal 3 macht 4 widewidewittund 3 macht neune
2 mal 3 macht 4 widewidewittund 3 macht neune


### tuple

if we want to treat several objects together, we can use tuples. We have already seen one in the last box. Here it is again. Tuples are immutable, this means, we can access objects at different position in the tuple, but we cannot change, delete or add objects. Again, as with strings, the first position has index 0, the last position has index N-1 where N is the length of the tuple. The last position can also be accessed with index -1.
#### Example: 

In [None]:
mytuple = ( zahl, zahl2, zahl*2, zahl2, name)
print("%i mal %i macht %s widewidewitt und %s macht neune - \n"
      "ich mach mir die %s wide wie sie mir gefaellt"%mytuple)

print "hallo", mytuple[4]
print type(mytuple)
#namedtuple.

#### optional Exercise, better skip this for now:
Try to experiment with tuples and print formating. 
i.e. with the given tuple print 10 stars, and then in the next line, print the word "star" at the first position without a star overhead. Remember: multiplying strings with a number will concatenate them. Take the given example as a hint.


In [None]:
# this is also possible ;)
mt = ('*','starling')
print("%s\n%5.1s")%(4*mt[0], mt[1]) # print the first entry of mt 4times, newline,
                                  # print the first letter of the second entry after 4 whitespaces (at position5)
                                  #check what happens if you print 2 or 3 letters of the second entry.

### list

if we want more control, if we want to change, add or remove objects, we can use lists.

In [7]:
mylist = []  # intitializing an empty list

As with tuples, objects in a list can have different types. 
Objects in a list can be accessed through the index of their position. The first element has index 0, the second index 1 etc.
We can also initialize a list with objects

In [8]:
rezept = ['200', 'Kilo', 'Schokolade', 200, 'gute']  # initiating list with objects.
print rezept

['200', 'Kilo', 'Schokolade', 200, 'gute']


we can change single objects within the list.

In [9]:
rezept[1] = 'Gramm'
print rezept[0], rezept[1]

200 Gramm


#### Exercise: 
Print the type of the third object in the list and the type of the variable mylist. 
change the last object to integer.
change the second object to 2.

In [None]:
mylist = [0,1,True, 'three','',5.5]  
print type(mylist[0])

The size of the list can be dynamically changed by appending or deleting objects.
Accessing positions that are outside the list results in errors.

In [11]:
rezept.append('Butter')
print rezept
del rezept[4]
print "rezept:", rezept

['200', 'Gramm', 'Schokolade', 200, 'Butter', 'Butter']
rezept: ['200', 'Gramm', 'Schokolade', 200, 'Butter']


lists can be converted to tuples and tuples can be converted to lists.

In [12]:
mytuple = (4, "Huehner")
print type(mytuple)
eggs = list(mytuple)
print type(eggs)

<type 'tuple'>
<type 'list'>


When we want to add more than one element to a list, we can use the method "extend" and add an entire list of elements.

In [13]:
eggs.extend(['80','Gramm','Staub'])
print eggs
rezept.extend(eggs)
print rezept

[4, 'Huehner', '80', 'Gramm', 'Staub']
['200', 'Gramm', 'Schokolade', 200, 'Butter', 4, 'Huehner', '80', 'Gramm', 'Staub']


If we want to print a list in a more pretty way, we can use the command "join". All objects of the list have to be string for this to work.
We change the type of the two numerical entries to string and print the list with entries separated by a hyphen.

In [17]:
rezept[3]=str(rezept[3])
rezept[5]=str(rezept[5])
# an easy way to print a list of strings:
print ", ".join(rezept) #choose a separator you prefer

200, Gramm, Schokolade, 200, Butter, 4, Huehner, 80, Gramm, Staub


Members of a list are stored in memory. The list variable (here: mylist) contains a pointer to the first location of the list in memory. The index helps to find the succeeding locations of the objects in the list. 
When copying a list, only the pointer is copied. The stored objects are not duplicated. This has huge advantages in speed efficiency and storage, but can also lead to errors in sloppy coding. When manipulating the new variable, the old one will also be manipulated, as they are identical. We can check this using the id() function. It gives us the memory location of the object.  
**Careful**: executing the below code multiple times will lead to an error. Why?   
Executing cells in the wrong order or multiple times can lead to funny results. Always control the values of your variables.

In [18]:
kuchen = rezept  # Attention: we do not copy the entire list here, we only copy the pointers.
print "kuchen:", kuchen
print "rezept:", rezept
del kuchen[1]
kuchen[5] = 'Eier'
print "rezept:", rezept, "\nkuchen:", kuchen  # see how both lists have changed, even though we only manipulated the "kuchen" 
print id(kuchen)
print id(rezept)

kuchen: ['200', 'Gramm', 'Schokolade', '200', 'Butter', '4', 'Huehner', '80', 'Gramm', 'Staub']
rezept: ['200', 'Gramm', 'Schokolade', '200', 'Butter', '4', 'Huehner', '80', 'Gramm', 'Staub']
rezept: ['200', 'Schokolade', '200', 'Butter', '4', 'Eier', '80', 'Gramm', 'Staub'] 
kuchen: ['200', 'Schokolade', '200', 'Butter', '4', 'Eier', '80', 'Gramm', 'Staub']
68400008
68400008


check out:  
[jakevdp.github.io/blog/2014/05/09/why-python-is-slow](http://jakevdp.github.io/blog/2014/05/09/why-python-is-slow)  
for interesting information on pythons internal workings.
![](pypoint.png)

if we want to make a true copy of the list, with a copy of all its elements, we need to define which elements to copy, by giving a range of the list.
When we want to copy all elements, use [:].
If we copy all the elements as well as the pointer, then changing elements of the new list will not influence the original list.

In [19]:
kuchen = rezept[0:7]
print kuchen
kuchen.append('Mehl')
print "rezept:", " ".join(rezept), "\nkuchen:", " ".join(kuchen)
print id(kuchen), id(rezept)

['200', 'Schokolade', '200', 'Butter', '4', 'Eier', '80']
rezept: 200 Schokolade 200 Butter 4 Eier 80 Gramm Staub 
kuchen: 200 Schokolade 200 Butter 4 Eier 80 Mehl
36079752 68400008


At this point, the last row should read: kuchen: 200 Schokolade 200 Butter 4 Eier 80 Mehl  
*Obviously 150 gramm of sugar is missing* :)
#### Exercise: 
copy the entire list "kuchen" to a new variable. You can name it "kuchen_mit_zucker" or any way you like.
Make sure you do not only copy the pointers. Add "150" and "Zucker" to your new list. Make sure you did not change anything in list "kuchen". What is the length of the list? (use function len())

In [20]:
kuchen_mit_zucker = kuchen[:]
kuchen_mit_zucker.extend(['150','zucker'])
print kuchen_mit_zucker
print kuchen

['200', 'Schokolade', '200', 'Butter', '4', 'Eier', '80', 'Mehl', '150', 'zucker']
['200', 'Schokolade', '200', 'Butter', '4', 'Eier', '80', 'Mehl']


Btw: melt chocolate & butter, stir all ingredients and bake at 150 degree for ~20 min. It's a great chocolate cake :)  

we can easily create a list of a range of integers with the command "range". It will come in useful later :)

In [21]:
mylist = range(1,20,1) # first integer, upper value (not inclusive), incremental value
print mylist
mylist = range(20,1,-1)
print mylist

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2]


#### optional exercise:
create a list of every 5th integer from 50 to 5 (including 50 and 5)

In [22]:
range(50, 0, -5)

[50, 45, 40, 35, 30, 25, 20, 15, 10, 5]

### list of lists (optional)

we can also create and access nested lists.
Here special care needs to be taken, when copying them. When a list is represented by a pointer to its objects, then a list of a list is a pointer of pointers.
Let's see what happens, if we create and copy a list of lists and then try to change its elements.

In [27]:
# we create a list of lists which consists of 5 lists of 7 elements each. (explanation will be given later)
lol = [range(k,k+7) for k in range(1,5)]
print lol
newlol = lol[:]  # we copy the pointers of the first list to the new list.
print 'newlol', newlol
newlol[0] = range(7,0,-1) # we replace the list that belongs to the first pointer.
print 'newlol',newlol
print 'lol',lol

[[1, 2, 3, 4, 5, 6, 7], [2, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]
newlol [[1, 2, 3, 4, 5, 6, 7], [2, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]
newlol [[7, 6, 5, 4, 3, 2, 1], [2, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]
lol [[1, 2, 3, 4, 5, 6, 7], [2, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]


This worked out fine. We copied the pointers of the list to a new variable. We can now replace them by a new pointer, which points to a different list. This works fine and does not influence the old list. However, if we change elements of the list they point to, it's a different thing. Let's see:

In [28]:
lol[1][0] = 100
print 'newlol',newlol
print 'lol',lol

newlol [[7, 6, 5, 4, 3, 2, 1], [100, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]
lol [[1, 2, 3, 4, 5, 6, 7], [100, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]


The new entry will also be found in the old list of lists. If we want to copy the list of lists with all its elements, we need to not only copy the pointers to the lists, but also the elements of the lists. There are two ways, doing this by hand, or using the function deepcopy.

In [29]:
# by hand:
print 'lol',lol
newlol = lol[:]
for l in range(0,len(lol)):  # loop over all pointers to lists
    newlol[l] = lol[l][:]    # copy all elements of the sublists.
print 'newlol', newlol

# with function deepcopy 
import copy
newlol2 = copy.deepcopy(lol)
print 'newlol2', newlol2

lol[0][2] = "brezel"
print 'lol',lol
print 'newlol', newlol
print 'newlol2', newlol2

lol [[1, 2, 3, 4, 5, 6, 7], [100, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]
newlol [[1, 2, 3, 4, 5, 6, 7], [100, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]
newlol2 [[1, 2, 3, 4, 5, 6, 7], [100, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]
lol [[1, 2, 'brezel', 4, 5, 6, 7], [100, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]
newlol [[1, 2, 3, 4, 5, 6, 7], [100, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]
newlol2 [[1, 2, 3, 4, 5, 6, 7], [100, 3, 4, 5, 6, 7, 8], [3, 4, 5, 6, 7, 8, 9], [4, 5, 6, 7, 8, 9, 10]]


###  array (optional)

we have seen before, that lists do not have the numeric properties we'd expect from arrays of numbers. When multiplying a list with a value, the list will be repeated, instead of multiplying the values of its objects. But a list can consist of many diverse objects. Not for all of them does multiplying with a number make sense. If we want to do mathematical calculations, then we can use arrays from the numpy library. Numpy = numerical python. 
The objects of a numpy array need to be of numerical type. A list can be transformed into a numpy array.

In [30]:
import numpy as np

mylist = range(1,10)
print mylist*2

myarray = np.array(mylist)
print myarray*2

[1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[ 2  4  6  8 10 12 14 16 18]


### dictionary 
a dictionary can speed up access to specific entries. Each entry has a key. We can add as many keys with values as we want.

In [31]:
mydict = {} #initializing empty dictionary
# adding keys and values
mydict['chocolate'] = '200 gramm'
mydict['eggs'] = 4
print mydict

{'eggs': 4, 'chocolate': '200 gramm'}


In [32]:
#initializing a dictionary with keys and values
mydict = {'flour': '80 gramm', 'butter': '200 gramm', 'sugar' : '150 gramm'}
print mydict['sugar']  # accessing the value that corresponds to a key

150 gramm


Each key can only hold one value. Multiple identical keys are not possible.

In [33]:
mydict['sugar'] = '100 gramm'
print mydict

{'butter': '200 gramm', 'flour': '80 gramm', 'sugar': '100 gramm'}


we can access keys and values of the dictionary separately in lists.

In [34]:
print mydict.keys()
print mydict.values()

['butter', 'flour', 'sugar']
['200 gramm', '80 gramm', '100 gramm']


if we want to copy dictionaries, we have to again take care. 

In [35]:
yourdict = mydict
yourdict['chocolate']= 200  # here we now change the value to the key "chocolate" for mydict and yourdict
print  mydict
yourdict['nuts'] = 50  # here we add a key - value pair to yourdict and mydict
print mydict

{'butter': '200 gramm', 'flour': '80 gramm', 'chocolate': 200, 'sugar': '100 gramm'}
{'butter': '200 gramm', 'flour': '80 gramm', 'chocolate': 200, 'nuts': 50, 'sugar': '100 gramm'}


use the deepcopy function from the copy library instead.

In [36]:
import copy
yourdict = copy.deepcopy(mydict)
yourdict['eggs'] = 4
print 'mine', mydict
print 'yours', yourdict

mine {'butter': '200 gramm', 'flour': '80 gramm', 'chocolate': 200, 'nuts': 50, 'sugar': '100 gramm'}
yours {'butter': '200 gramm', 'flour': '80 gramm', 'eggs': 4, 'nuts': 50, 'sugar': '100 gramm', 'chocolate': 200}


### conditions

If we want to execute code only under the condition that a statement is true or false, we can use conditional statements.
 ![](./condition.png) 
 
In Python, we do not use brackets. Anything with the same indentation is in one code block.
This makes the look of the code much nicer. Indent your code with exactly 4 spaces. Do not use tabs, or set your editor to convert tabs to 4 spaces automatically.
After the if-condition or the loop we need a colon ":".
The condition is in brackets.

In [None]:
if True:
    print 'this is true'
else:
    print 'this is false'

Often it is used, to test if a variable has a particular value or is within a range of values.

In [37]:
a=1
somevalues = [2, 5, 7]
print 'a==1', a==1 #is true        True if a is identical to 1
print 'a!=1', a!=1 #is false       True if a is not equal to 1
print 'a>4', a>4 #this is false    True if a is greater 4
print 'a<=1', a<=1 #this is true   True if a is equal to 4 or smaller
print 'a<1', a<1 #this is false    True if a is less than 4

print 'a in somevalues', a in somevalues  # is false   True if a is in the list "somevalues"
print 'a not in somevalues', a not in somevalues #is true True if a is not in the list "somevalues"

a==1 True
a!=1 False
a>4 False
a<=1 True
a<1 False
a in somevalues False
a not in somevalues True


The if(statement) is True for all True statements. To understand how the if-condition reacts to normal variables, 
we need to know, what value the variable has, when it is converted to boolean.
Here are some examples:

In [39]:
zero = 0
number = 5
anystring = 'bla'
emptystring = ''
#0, empty string, NULL or False are false, rest is true..
if zero:
    print 'zero is true:', bool(zero)
else:
    print 'zero is false:', bool(zero)
if number:
    print 'numbers are true:', bool(number)
else:
    print 'numbers are false:', bool(number)
if anystring:
    print "strings are true:", bool(anystring)
else:
    print "strings are false:", bool(anystring)
if emptystring:
    print "empty strings are true:", bool(emptystring)
else:
    print "empty strings are false:", bool(emptystring)

zero is false: False
numbers are true: True
strings are true: True
empty strings are false: False


#### Exercise:
a) create a condition that tests, whether a string variable is identical to the string "hello".
Identity is tested with ==.
If the condition is True, print something. If it is false, print something else.
b) create a condition that tests, whether a string variable is greater than "hello". 
Can you see how order is interpreted in strings? 

In [42]:
name = 'ad'
if name > 'hello':
    print 'yes'
else:
    print 'no'

no


### loops
If we want to execute the same code multiple times, but for different variables, we can either copy the same line multiple times and change the variables, or we use a loop.
Same as with the conditions: all code that belongs to the same block, here the block of the loop, needs to be indented.
Here we loop over many objects and test if they are false or true if cast to boolean. As an easy rule of thumb, we see that empty objects (0,[],{},'') are usually false. 

In [43]:
# list of variables we want to use
mylist = [0,1,2,
          '', 'hello', 
          False, None, 
          [], ['sublist','ha!'], 
          {}, {'dictentry': 'somevalue'},
          (), ('tuple',1,2)]  
for entry in mylist:                # for each variables in the list, we want to execute the code in the blcok below
    if entry:                      # we want to know if the variable converted to boolean is true of false
        print entry, 'is true'
    else:
        print entry, 'is false'

0 is false
1 is true
2 is true
 is false
hello is true
False is false
None is false
[] is false
['sublist', 'ha!'] is true
{} is false
{'dictentry': 'somevalue'} is true
() is false
('tuple', 1, 2) is true


In [44]:
for k in range(10):              
    print k, 'squared is', k*k  

0 squared is 0
1 squared is 1
2 squared is 4
3 squared is 9
4 squared is 16
5 squared is 25
6 squared is 36
7 squared is 49
8 squared is 64
9 squared is 81


#### Exercise: 
Loop through integers in the range 0 to 20. If the integer is a multiple of 5, print the integer squared. Otherwise, just print the integer value. 

In [45]:
for i in range(21):
    if i%5==0:
        print i*i
    else:
        print i

0
1
2
3
4
25
6
7
8
9
100
11
12
13
14
225
16
17
18
19
400


#### list comprehension
if we loop through a list, and the result should be a new list, i.e if we want to change elements in a list one by one, 
list comprehensions are a fast and pretty way to do this.

In [46]:
newlist = [k*k for k in range(10)] #transforms a list with integers from 0 to 9 to integers squared 
print newlist


[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


list comprehensions with conditions. There are two possibilities:  
We change the object according to some condition..,

In [47]:
[k if k%5 else k*k for k in range(1,21)]

[1, 2, 3, 4, 25, 6, 7, 8, 9, 100, 11, 12, 13, 14, 225, 16, 17, 18, 19, 400]

 or we use a condition in order to decide which objects will be in the list.

In [48]:
[k*k for k in range(1,21) if not k%5]

[25, 100, 225, 400]

#### Exercise:
using list comprehension, create the following list: ['ha','haha','hahaha','hahahaha',.... ] with 8 entries.

In [49]:
[k*'ha' for k in range(1,9)]

['ha',
 'haha',
 'hahaha',
 'hahahaha',
 'hahahahaha',
 'hahahahahaha',
 'hahahahahahaha',
 'hahahahahahahaha']

### generators 
if we do want to loop over a list that is really long and would take up lots of space in memory, we can use a generator.
A generator generates the entries to iterate over on the fly. They do not have to be stored anywhere. A generator function returns an iterator.  
We can create a generator just as easily as a list with list comprehensions. Instead of [] we use ().   
A generator can only be used once. If we want to use it very often, a list could be a better choice.  
Note: if we use range, then a list object is created, that also takes up space in memory. If you iterate over a huuuuuge number of integers, use xrange instead. xrange is like a generator. (in python 3.x range is also returning a generator)


In [62]:
#generator 
glist = [k*k for k in xrange(10)]
g = (k*k for k in xrange(10))

### iterators
An iterator provides always the next element. We can iterate over a list, or use a generator to provide the next element. Elements do not need to be saved in memory. However, we can only run once through a generator, then we need to initialize it again. If you run below code twice, you will get a StopIteration Error.

StopIteration: 

In [63]:
print g.next()
print 'and next:', g.next()
print 'and the loop will continue with the next element:'
for k in g:
    print k

0
and next: 1
and the loop will continue with the next element:
4
9
16
25
36
49
64
81


To keep track when looping over an iterator, we can use enumerate. It adds a counter, that is increased by one for each step in the loop. 

In [65]:
#generator 
g = (k*k for k in xrange(10))

for (i,k) in enumerate(g):
    print i,k
    if i==4:
        break

0 0
1 1
2 4
3 9
4 16


### reading/writing of files

We can read files and write to files. In order to do so, first the file needs to be opened. After the operations on the file have been done, the file needs to be closed again. This can be done with:

In [None]:
f = open('D:\\private\\python\\the_raven.txt', 'r')  # r - open for reading, a - append, w - write ..
f.close()

Watch out: pathes are written differently in linux/mac and windows. 
A much nicer way is to use the following expression:  

In [66]:
with open('D:\\private\\python\\the_raven.txt') as f:   
    #..some code here..
    pass  # pass just says: don't do anything

it opens the file and closes it automatically at the end of the block. Within the indented block, we can execute operations on the opened file. The file object can function as an iterator. For reading the file line by line, we can iterate over the file

In [67]:
with open('D:\\private\\python\\the_raven.txt') as f: 
    for line in f:
        print line

Once upon a midnight dreary, while I pondered, weak and weary,

Over many a quaint and curious volume of forgotten lore—

    While I nodded, nearly napping, suddenly there came a tapping,

As of some one gently rapping, rapping at my chamber door.

“’Tis some visitor,” I muttered, “tapping at my chamber door—

            Only this and nothing more.”



    Ah, distinctly I remember it was in the bleak December;

And each separate dying ember wrought its ghost upon the floor.

    Eagerly I wished the morrow;—vainly I had sought to borrow

    From my books surcease of sorrow—sorrow for the lost Lenore—

For the rare and radiant maiden whom the angels name Lenore—

            Nameless here for evermore.



    And the silken, sad, uncertain rustling of each purple curtain

Thrilled me—filled me with fantastic terrors never felt before;

    So that now, to still the beating of my heart, I stood repeating

    “’Tis some visitor entreating entrance at my chamber door—

Some late visit

If we use enumerate, we have some control over which line we are currently reading and can step out of the loop at a defined point. When we use "break", the loop is ended prematurely. 

In [69]:
with open('D:\\private\\python\\the_raven.txt') as f: 
    for (i,line) in enumerate(f):
       
        if i>4:
            break # we end the loop here after we have printed the 6th line (i=5)
        print i,line

0 Once upon a midnight dreary, while I pondered, weak and weary,

1 Over many a quaint and curious volume of forgotten lore—

2     While I nodded, nearly napping, suddenly there came a tapping,

3 As of some one gently rapping, rapping at my chamber door.

4 “’Tis some visitor,” I muttered, “tapping at my chamber door—



#### Exercise: 
1) in the previous code: move the condition and break, in order to only print the first 5 lines.  
2) print out the second stanza of the poem (lines 8-13 (bzw 7-12)). Read in the fifth sentence (i=4) of the first stanza into the string variable with name "sentence".

### regular expressions
reg exps are very handy when it comes to working with text. They are a sequence of characters that define a search pattern.
This pattern can then be used by search functions or by find and replace functions. 

- [a-zA-Z] - letters 
- [0-9] - numbers
- ? - zero or one occurence of the previous element
- \* - zero or more occurences of the previous element 
- . - any character  
https://www.debuggex.com/cheatsheet/regex/python

below we search for the pattern in the sentence that starts with v or u then has an indefinite number of other characters and ends with a,d or e.

In [73]:
import re
# search looks for a pattern in the entire string.
print sentence
pattern = "[vu].*[ade]"
#newword = re.sub("\"", "", word)  #word without "
result = re.search(pattern, sentence)
if result:
    print 'found:', result.group()
else:
    print 'not found'

NameError: name 'sentence' is not defined

we can also use patterns to replace parts of the strings. In the following code, we want to replace all special characters by en empty string. The ^ indicates a negation, meaning: all characters except the ones in the bracket.

In [None]:
print sentence
newsent= re.sub('[^a-zA-Z0-9 ]','', sentence)
print newsent

#### Exercise:
in the above code: what happens if you remove the whitespace in the bracket?
what happens if you remove the ^ ?

### functions

In order to reuse some code blocks, we can write functions. We have already used functions, before, such as re.sub() and re.search(). If we want to write our own function we have to think about several things:
what are the variables, that we want to use/ do we need in the function?   
what type of variables do we want to return?  
what is the function going to do?  
we define our function with the keyword **def** followed by the function name and the variables that are given to the function in parantheses. We can define optional variables by giving them a default argument. Do not ever give a mutable variable like list or dictionary as default argument.

In [70]:
# the funny function has one mandatory variable joke: it needs to be initiated.
# and one optional variable "optionalvariable" which is set to "ha" if no other value is given.
def my_funny_function(joke, optionalvariable='ha'): 
    print joke
    print("{ho}!!").format(ho=optionalvariable*10)
    

we can then call the function.

In [71]:
my_funny_function('Ein Neutron will in eine Disko. Sagt der Türsteher: Tur mit leid nur für geladene Gäste')
my_funny_function('Kommt Helium in eine Bar und bestellt ein Bier.'
                  'Sagt der Barkeeper: "Tut mir Leid, aber wir bedienen hier keine Edelgase."'
                  'Das Helium reagiert nicht.', optionalvariable='he')

Ein Neutron will in eine Disko. Sagt der Türsteher: Tur mit leid nur für geladene Gäste
hahahahahahahahahaha!!
Kommt Helium in eine Bar und bestellt ein Bier.Sagt der Barkeeper: "Tut mir Leid, aber wir bedienen hier keine Edelgase."Das Helium reagiert nicht.
hehehehehehehehehehe!!


Usually we also return something, for example a string or value or list...
In the following function we return a string.  
It's also good practice to document what your function is doing.
The place to do that is just below the function definition.  
Use a string with three "" for documentation. You can write over several lines with this string.

*attention*: the scope of variables that are defined within a function does not extend to the place outside a function.
In particular, the variable we give as function argument ('word1') is local. It is **not** the same as the "word1" we defined outside.

In [74]:
word1 = '  hahaha HAAAAA !!!'
word2 = 'hoHOhoHO??   '
word3 = 'huihui'
print id(word1)

def cleanword(word1):
    """ function to clean strings. 
    
    it removes weird characters, strips off whitespace 
    and sets all characters to lower case."""  #  some documentation string to explain what the function does.
    word1 = word1.strip()  
    word1 = word1.lower()   
    word1 = re.sub('[^a-z0-9 ]','', word1)
    return word1

word3 = cleanword(word2)
print word1  # word1 was not changed. word1 in the function was a local variable
print word2  # word2 was not changed.
print word3  # word3 was set to the result of the function

36016848
  hahaha HAAAAA !!!
hoHOhoHO??   
hohohoho


whenever Python sees a variable, it tries to understand if it was initiated or not. If it was not, it will throw an error.
When it encounters a variable it will first see, if it was defined within the function. (this is also valid for other blocks like loops). If it does not find that the variable was initiated within the function, or as function argument, it will start to look the next level up. In this case outside the function.
Here we use a variable within our function (word2) that was initiated before. It is not initiated within the function, nor is it an argument of the function. Python therefore assumes, that we want to use the variable that was defined outside the function. 

In [75]:
def cleanword():
    """ function to clean strings. 
    
    it removes weird characters, strips off whitespace 
    and sets all characters to lower case."""  #  some documentation string to explain what the function does.
  
    word1 = word2.strip()  #word2 is a global variable
    word1 = word1.lower()  #word1 is a local variable
    word1 = re.sub('[^a-z0-9 ]','', word1)
    return word1

word3 = cleanword()
print word1  # word1 was not changed. word1 in the function was a local variable
print word2  # word2 was not changed.
print word3  # word3 was changed to the result of the function

  hahaha HAAAAA !!!
hoHOhoHO??   
hohohoho


here we set the variable word2 within the function. It therefore uses this definition first. Word2 is now a local variable that is only valid within the function and does not interfere with our definition of word2 outside the function.

In [76]:
def cleanword2():
    word2 = ' hoHOhoHO ?'
    word2 = word2.strip() #word2 is a local variable
    word2 = word2.lower()  
    word2 = re.sub('["!\?"]','', word2)
    return word2

word3 = cleanword2()
print word1 # not changed.
print word2 # not changed
print word3 # result of function

  hahaha HAAAAA !!!
hoHOhoHO??   
hohohoho 


If we try to use our previously defined variable "word2" to initiate a locally defined variable "word2", this results in an error:  *local variable 'word2' referenced before assignment*
word2 is now a local variable and independent of the variable word2 that was defined outside.

In [77]:
def cleanword2():
    word2 = word2.strip() #word2 is a local variable. the local variable is not initiated. This results in an error.
    word2 = word2.lower()  
    word2 = re.sub('["!\?"]','', word2)
    return word2

word3 = cleanword2()
print word1
print word2
print word3

UnboundLocalError: local variable 'word2' referenced before assignment

We can force python to use the variable word2 that was defined outside the function by telling it that now word2 is global.
Now the variable in the function is the same as the word2 outside. We change word2 in our function.

In [78]:
def cleanword3():
    global word2
    word2 = word2.strip() # word2 is now a global variable
    word2 = word2.lower()  
    word2 = re.sub('["!\?"]','', word2)
    return word2
word3 = cleanword3()
print word1  # not changed
print word2  # changed in the function!! 
print word3  # result of function

  hahaha HAAAAA !!!
hohohoho
hohohoho


#### Exercise:
1) write a function that takes a list as argument and returns a list with the values squared.  
2) (optional) in the different function code above, check for several variables, if they are identical or not by printing the variable id.

## exercise

In this exercise you will read in a text file line by line. Split each line into words, set the characters to lowercase and remove all none-letter symbols.
Then we count how often each word appears in the text. 

#### 1) create a cleaning function

the function takes a word (string) as input. It transforms it to lower case and filters out all unwanted symbols. Only keep a-zA-Z0-9 and whitespace. Tip: the right function is somewhere in the code above.

#### 2) open a file, read in all words and count them
open the file "trump_inaugural_adr.txt". 
create an empty dictionary "mydict".  
Read in line by line. 
Split the lines into words, apply the cleaning function to the words  
and then add them to a dictionary, counting their appearance. 
print out all keys and all values of the dictionary.

In [None]:
with open(u'./trump_inaugural_adres.txt', 'r') as f: #trump speech, the raven, dearest creatures of creation.
    pass

#### 3) Print out the 15 most common words of the text. 
Use the sorted function for this. Here is an example:

In [None]:
exampledict = {'word': 5, 'anotherword': 25, 'niceword': 3, 'funnyword':8}
print sorted(exampledict.keys())  # prints the keys sorted alphabetically, lowest first
print sorted(exampledict.values(), reverse=True)  # prints the values, sorted numerically, largest first
print sorted(exampledict, key=exampledict.__getitem__, reverse=True)  # prints the keys, sorted according to their numerical value

#### 3) Find the frequency of words of length N
Find out, how often words of a certain length appear in a text.   
Example: for all words of length 4 sum their respective frequency of appearance in the text.

step by step:    
- create a new dictionary lengthdict
- Loop over the keys in the old dictionary (mydict).
- For each key of mydict, calculate its length, "length".
- Use this length as key for your new dictionary.
- test: If the key already exists in your new dictionary, add the value of the key of your old dictionary to the value in the new dictionary. Otherwise initiate the key with this value.

Plot the frequency of word occurence against the word length.

#### 4) plot the frequency of words against their length
create a list x with the keys of your wordlength dictionary.
create a list y with the values of your wordlength dictionary.
Plot x against y.
Here is an example for how to plot: 

In [None]:
#%pyplot inline
x=[1,2,3,4,5]
y=[3,5,4,6,2]

import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()