# Lab 1: Python basics

__Student I:__ Koggala Mahavidanalage Mudith Chathuranga Silva (kogsi273)

__Student II:__ Mohammed Bakheet (mohba508)

### A word of caution

There are currently two versions of Python in common use, Python 2 and Python 3, which are not 100% compatible. Python 2 is slowly being phased out but has a large enough install base to still be relevant. This course uses the more modern Python 3 but while searching for help online it is not uncommon to find help for Python 2. Especially older posts on sources such as Stack Exchange might refer to Python 2 as simply "Python". This should not cause any serious problems but keep it in mind whenever googling. With regards to this lab, the largest differences are how `print` works and the best practice recommendations for string formatting.

### References to R

Most students taking this course who are not already familiar with Python will probably have some experience of the R programming language. For this reason, there will be intermittent references to R throughout this lab. For those of you with a background in R (or MATLAB/Octave, or Julia) the most important thing to remember is that indexing starts at 0, not at 1.

### Recommended Reading

This course is not built on any specific source and no specific litterature is required. However, for those who prefer to have a printed reference book, we recommended the books by Mark Lutz:

* Learning Python by Mark Lutz, 5th edition, O'Reilly. Recommended for those who have no experience of Python. This book is called LP in the text below.

* Programming Python by Mark Lutz, 4th edition, O'Reilly. Recommended for those who have some experience with Python, it generally covers more advanced topics than what is included in this course but gives you a chance to dig a bit deeper if you're already comfortable with the basics. This book is called PP in the text.

For the student interested in Python as a language, it is worth mentioning
* Fluent Python by Luciano Ramalho (also O'Reilly). Note that it is - at the time of writing - still in its first edition, from 2015. Thus newer features will be missing.

### A note about notebooks

When using this notebook, you can enter python code in the empty cells, then press ctrl-enter. The code in the cell is executed and if any output occurs it will be displayed below the square. Code executed in this manner will use the same environment regardless of where in the notebook document it is placed. This means that variables and functions assigned values in one cell will thereafter be accessible from all other cells in your notebook session.

Note that the programming environments described in section 1 of LP is not applicable when you run python in this notebook.

### A note about the structure of this lab

This lab will contain tasks of varying difficulty. There might be cases when the solution seems too simple to be true (in retrospect), and cases where you have seen similar material elsewhere in the course. Don't be fooled by this. In many cases, the task might just serve to remind us of things that are worthwhile to check out, or to find out how to use a specific method.

We will be returning to, and using, several of the concepts in this lab.

### 1. Strings and string handling

The primary datatype for storing raw text in Python is the string. Note that there is no character datatype, only strings of length 1. This can be compared to how there are no atomic numbers in R, only vectors of length 1. A reference to the string datatype can be found __[here](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)__.

[Litterature: LP: Part II, especially Chapter 4, 7.]

a) Define the variable `parrot` as the string containing the sentence _It is dead, that is what is wrong with it. This is an ex-"Parrot"!_. 

[Note: If you have been programming in a language such as C or Java, you might be a bit confused about the term "define". Different languages use different terms when creating variables, such as "define", "declare", "initialize", etc. with slightly different meanings. In statically typed languages such as C or Java, declaring a variable creates a name connected to a container which can contain data of a specific type, but does not put a value in that container. Initialization is then the act of putting an initial value in such a container. Defining a variable is often used as a synonym to declaring a variable in statically typed languages but as Python is dynamically typed, i.e. variables can contain values of any type, there is no need to declare variables before initializing them. Thus, defining a variable in python entails simply assigning a value to a new name, at which point the variable is both declared and initialized. This works exactly as in R.]

In [1]:
parrot = 'It is dead, that is what is wrong with it. This is an ex-"Parrot"!'
parrot

'It is dead, that is what is wrong with it. This is an ex-"Parrot"!'

b) What methods does the string now called `parrot` (or indeed any string) seem to support? Write Python commands below to find out.

In [2]:
dir(parrot)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

c) Count the number of characters (letters, blank space, commas, periods
etc) in the sentence.

In [3]:
len(parrot)

66

d) If we type `parrot + parrot`, should it change the string itself, or merely produce a new string? How would you test your intuition? Write expressions below.

In [4]:
#parrot + parrot adds the string contained in parrot to it self and it will output two concatinated parrot strings
parrot + parrot

'It is dead, that is what is wrong with it. This is an ex-"Parrot"!It is dead, that is what is wrong with it. This is an ex-"Parrot"!'

e) Separate the sentence into a list of words (possibly including separators) using a built-in method. Call the list `parrot_words`.

In [5]:
parrot_words = parrot.split()
print(parrot_words, type(parrot_words))

['It', 'is', 'dead,', 'that', 'is', 'what', 'is', 'wrong', 'with', 'it.', 'This', 'is', 'an', 'ex-"Parrot"!'] <class 'list'>


f) Merge (concatenate) `parrot_words` into a string again.

In [6]:
" ".join(parrot_words)

'It is dead, that is what is wrong with it. This is an ex-"Parrot"!'

g) Create a string `parrot_info` which consists of "The length of parrot_info is 66." (the length of the string should be calculated automatically, and you may not write any numbers in the string). Use f-string syntax!

In [7]:
parrot_info = "The length of parrot_info is 66."
parrot_info

'The length of parrot_info is 66.'

### 2. Iteration, sequences and string formatting

Loops are not as painfully slow in Python as they are in R and thus, not as critical to avoid. However, for many use cases, _comprehensions_, like _list comprehensions_ or _dict comprehensions_ are faster. In this assignment we will see both traditional loop constructs and comprehensions. For an introduction to comprehensions, __[this](https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html)__ might be a good place to start.

It should also be noted that what Python calls lists are unnamed sequences. As in R, a Python list can contain elements of many types, however, these can only be accessed by indexing or sequence, not by name as in R.

a) Write a `for`-loop that produces the following output on the screen:<br>
> `The next number in the loop is 5`<br>
> `The next number in the loop is 6`<br>
> ...<br>
> `The next number in the loop is 10`<br>

[Hint: the `range` function has more than one argument.]<br>
[Literature: For the range construct see LP part II chapter 4 (p.112).]

In [8]:
for i in range(5,11):
    print("The next number in the loop is ",i)

The next number in the loop is  5
The next number in the loop is  6
The next number in the loop is  7
The next number in the loop is  8
The next number in the loop is  9
The next number in the loop is  10


b) Write a `for`-loop that for a given`n` sets `first_n_squared` to the sum of squares of the first `n` numbers (0..n-1). Call the iteration variable `i`.

In [9]:
n = 100  # If we change this and run the code, the value of first_n_squared should change afterwards!
# your code goes here
first_n_squared = 0
for i in range(n):
    first_n_squared+=i**2

first_n_squared   # should return 0^2 + 1^2 + ... + 99^2 = 328350 if n = 100

328350

Hint (not mandatory): iteration is often about a gradual procedure of updating or computing. Write out, on paper, how you would compute $0^2$, $0^2 + 1^2$, $0^2 + 1^2 + 2^2$, and consider what kinds of gradual updates you might want to perform.

c) It is often worth considering what a piece of code actually contributes. Think about a single loop iteration (when we go through the body of the loop). What should the variable `first_n_squared` contain _before_ a loop iteration? What should the loop iteration contribute? What does it contain _after_ ? A sentence or two for each is enough. Write this as a code comment in the box below:

In [10]:
"""
Before a loop iteration:
Before a loop iteration, in the body of the loop the value of 'first_n_squared' will take the value of 0 
in the first iteration, and it will keep increasing as the loop iterates.

For the third iteration, the value of first_n_squared is equal to the sum of square of the previous two iterations, 
which is 1, whereas the value of first_n_squared after the third iteration is the sum of the squared of the
three iterations which is 5, and the value of i will be 2, n = 100, during this iteration the value of i=2 will be squared
added to the sum of the square of the two previous iteration and the result will be the sum(0^2,1^2,2^2).
The value before the third iteration is the sum of the square of the first two iterations (sum(0^2,1^2)), which is
equal to 1, and the value of after the third iteration (when i = 2) will be the sum of the square of the three 
iterations (sum(0^2,1^2,2^2)) then the value of first_n_squared will be 5


"""

"\nBefore a loop iteration:\nBefore a loop iteration, in the body of the loop the value of 'first_n_squared' will take the value of 0 \nin the first iteration, and it will keep increasing as the loop iterates.\n\nFor the third iteration, the value of first_n_squared is equal to the sum of square of the previous two iterations, \nwhich is 1, whereas the value of first_n_squared after the third iteration is the sum of the squared of the\nthree iterations which is 5, and the value of i will be 2, n = 100, during this iteration the value of i=2 will be squared\nadded to the sum of the square of the two previous iteration and the result will be the sum(0^2,1^2,2^2).\nThe value before the third iteration is the sum of the square of the first two iterations (sum(0^2,1^2)), which is\nequal to 1, and the value of after the third iteration (when i = 2) will be the sum of the square of the three \niterations (sum(0^2,1^2,2^2)) then the value of first_n_squared will be 5\n\n\n"

Hint: 
* Your answer might involve the iteration variable `i` (informally: the current number we're looking at in the loop).
* After all the loop iterations are done (and your iteration variable has reached _n - 1_ ), it should contain the sum $0^2 + 1^2 + ... + (n-1)^2$. Does your explanation suggest that this should be the case?

[Tangent: this form of reasoning can form the basis of a mathematical correctness proof for algorithms, that enables us to formally check that code does what it should. This is quite beyond the scope of this course, but the (CS-)interested reader might want to consider reading up on eg [loop invariants](https://en.wikipedia.org/wiki/Loop_invariant), We only go into it at the level of detail that actually forces us to think about what our (simple) code does.]

d) Write a code snippet that counts the number of __letters__ (alphabetic characters) in `parrot` (as defined above). Use a `for` loop.

In [22]:
#parrot = "It is dead, that is what is wrong with it. This is an ex-"Parrot"!"
counter = 0
for char in parrot:
    if (char.isalpha()):
        counter+=1
print('the counter number is: ',counter)

the counter number is:  47


e) Explain your letter-counting code in the same terms as above (before, after, contributed).

In [None]:
"""
Before a loop iteration:
Before a loop iteration, the counter takes the initial number of 0, and keeps increasing as the loop iterates.
For instance, in the third iteration, the value of the counter before the iteration begins is 2, because the number
of char in this case was two (It), so, the counter counted the first letter (I) and the second letter (t).
After: 
the loop compares the character within our parrot string, if the character is in the alphabet, then the counter
increases by 1, until the end of our string. 
In the third iteration the counter value was 2 and will still be 2 after the iteration because the value of of char
during this iteration is a (space) which is not counted as alphabetic, so the counter value will not increase,
however, in the fourth iteration the value of the counter will incease by 1 and will become 3 because it has a 
value of (i).

Contributed:
"""

f) Write a for-loop that iterates over the list `names` below and presents them on the screen in the following fashion:

> `The name Tesco is nice`<br>
> ...<br>
> `The name Zeno is nice`<br>

Use Python's string formatting capabilities (the `format` function in the string class) to solve the problem.

[Warning: The best practices for how to do string formatting differs from Python 2 and 3, make sure you use the Python 3 approach.]<br>
[Literature: String formatting is covered in LP part II chapter 7.]

In [25]:
names = ['Tesco', 'Forex', 'Alonzo', 'Zeno']
for name in names:
    print(f"The name {name} is nice \n")

The name Tesco is nice 

The name Forex is nice 

The name Alonzo is nice 

The name Zeno is nice 



g) Write a for-loop that iterates over the list `names` and produces the list `n_letters` (`[5,5,6,4]`) with the length of each name.

In [26]:
n_letters = []
for i in range(len(names)):
    n_letters.append(len(names[i]))
print(n_letters)

[5, 5, 6, 4]


h) How would you - in a Python interpreter/REPL or in this Notebook - retrieve the help for the built-in function `max`?

In [None]:
help(max)

i) Show an example of how `max` can be used with an iterable of your choice.

In [27]:
largest_number = max(n_letters)
print('The largest number is: ', largest_number)

The largest number is:  6


j) Use a comprehension (or generator) to calculate the sum 0^2 + ... + (n-1)^2 as above.

In [28]:
n = 100
first_n_squared = 328350 # Change None to your solution.
first_n_squared  = sum([x**2 for x in range(n)])
"""
Should return the same result as your for-loop,
by using the sum function with comprehension
"""
print(first_n_squared)



    

328350


k) Solve the previous task using a list comprehension.

[Literature: Comprehensions are covered in LP part II chapter 4.]

In [29]:
first_n_squared = 0
for i in [x**2 for x in range(100)]: #by using a for loop with comprehension
    first_n_squared+=i

print(first_n_squared)

328350


l) Use a list comprehension to produce a list `short_long` that indicates if the name (in the list `names`) has more than four letters. The answer should be `['long', 'long', 'long', 'short']`.

In [30]:
short_long = [('long' if len(x)>4 else 'short') for x in names]
short_long

['long', 'long', 'long', 'short']

m) Use a comprehension to count the number of letters in `parrot`. You may not use a `for`-loop. (The comprehension will contain the word `for`, but it isn't a `for ... in ...:`-statement.)

In [31]:
countme = map(len,parrot)
count_comprehension = sum([1 for x in parrot if x.isalpha() == True])
count_comprehension

47

[Note: this is fairly similar to the long/short task, but note how we access member functions of the values.]

n) Below we have the string `datadump`. Retrieve the substring string starting at character 27 (that is "o") and ending at character 34 ("l") by means of slicing.

In [32]:
datadump = "The name of the game is <b>old html</b>. That is <b>so cool</b>."
datadump[slice(27,35,1)]

'old html'

o) Write a loop that uses indices to __simultaneously__ loop over the lists `names` and `short_long` to write the following to the screen:

> `The name Tesco is a long name`<br>
> ...<br>
> `The name Zeno is a short name`<br>

In [33]:
for i in range(len(names)):
    print(f"The name {names[i]} is a {short_long[i]} name")

The name Tesco is a long name
The name Forex is a long name
The name Alonzo is a long name
The name Zeno is a short name


Note: this is a common programming pattern, though not particularly Pythonic in this use case. We do however need to know how to use indices in lists to work properly with Python.

p) Do the task above once more, but this time without the use of indices.

In [57]:
for (name,length) in zip(names, short_long):
    print("The name",name,"is a",length, "name")

The name Tesco is a long name
The name Forex is a long name
The name Alonzo is a long name
The name Zeno is a short name


[Hint: Use the `zip` function.]<br>
[Literature: zip usage with dictionary is found in LP part II chapter 8 and dictionary comprehensions in the same place.]

q) Among the built-in datatypes, it is also worth mentioning the tuple. Construct two tuples, `one` containing the number one and `two` containing the number 1 and the number 2. What happens if you add them? Name some method that a list with similar content (such as `two_list` below) would support, that `two` doesn't and explain why this makes sense.

In [39]:
one = (1,)    # Change this.
two = (1,2)    # Change this
print(type(one))
print(type(two))
two_list = [1, 2]
adding_tuple = one + two
print(adding_tuple)
print(type(adding_tuple))

list_one = [1]
list_two = [1,2]
print(list_one + list_two)



"""
When adding the two tuples, the value of the second tuple (after the sum sign ) will be appended to the value of the second tuple,
so, in our case the value of the second tuple (1,2) will be added to the value of the first tuple (1) and 
the result will be a new tuple holding the value of the sum of two tuples (1,1,2).
We cann't use pop() or remove() methods with tuples because tuples are immutable, Our list (two_list) can support
item assignment but our tuple can not support this kind of assignment because tuples are immutable whereas
lists are mutable.
"""

<class 'tuple'>
<class 'tuple'>
(1, 1, 2)
<class 'tuple'>
[1, 1, 2]


"\nWhen adding the two tuples, the value of the second tuple will be appended to the value of the second tuple,\nso, in our case the value of the second tuple (1,2) will be added to the value of the first tuple (1) and \nthe result will be a new tuple holding the value of the sum of two tuples (1,1,2).\nWe cann't use pop() or remove() methods with tuples because tuples are immutable, Our list (two_list) can support\nitem assignment but our tuple can not support this kind of assignment because tuples are immutable whereas\nlists are mutable.\n"

### 3. Conditionals, logic and while loops

a) Below we have an integer called `n`. Write code that prints "It's even!" if it is even, and "It's odd!" if it's not.

In [None]:
n = 4 # Change this to other values and run your code to test.
# Your code here.
if (n%2 == 0):
    print("It's even!")
else:
    print("It's odd!")

b) Below we have the list `options`. Write code (including an `if` statement) that ensures that the boolean variable `OPTIMIZE` is True _if and only if_ the list contains the string `--optimize` (exactly like that).

In [None]:
OPTIMIZE = False       # Or some value which we are unsure of.
options = ['--print-results', '--optimize', '-x']  # This might have been generated by a GUI or command line option

# Your code goes here.
if ("--optimize" in options):
    OPTIMIZE = True
OPTIMIZE
# Here OPTIMIZE should be True if and only if we found '--optimize' in the list.

Note: It might be tempting to use a `for` loop. In this case, we will not be needing this, and you may _not_ use it. Python has some useful built-ins to test for membership.

You may use an `else`-free `if` statement if you like.

c) Redo the task above, but now consider the case where the boolean `OPTIMIZE` is True _if and only if_ the `options` list contains either `--optimize` or `-o` (or both). **You may only use one if-statement**.

In [None]:
OPTIMIZE = False       # Or some value which we are unsure of.
options = ['--print-results', '-o', '-x']  # This might have been generated by a GUI or command line option

# Your code goes here.
if "--optimize" in options or "-o" in options:
    OPTIMIZE = True
OPTIMIZE
# Here OPTIMIZE should be True if and only if we found '--optimize' or '-o' in the list.

[Hint: Don't forget to test your code with different versions of the options list! 

If you find something that seems strange, you might want to check what the value of the _condition itself_ is.]

[Note: This extension of the task is included as it includes a common source of hard-to-spot bugs.]

d) Sometimes we can avoid using an `if` statement altogether. The task above is a prime example of this (and was introduced to get some practice with the `if` statement). Solve the task above in a one-liner without resorting to an `if` statement. (You may use an `if` expression, but you don't have to.)

In [None]:
options = ['--print-results', '-o', '-x']  # This might have been generated by a GUI or command line option

OPTIMIZE = "--optimize" in options or "-o" in options
# Replace None with your single line of code.


OPTIMIZE
# Here OPTIMIZE should be True if and only if we found '--optimize' or '-o' in the list.

[Hint: What should the value of the condition be when you enter the then-branch of the `if`? When you enter the else-branch?]

e) Write a `while`-loop that repeatedly generates a random number from a uniform distribution over the interval [0,1], and prints the sentence 'The random number is smaller than 0.9' on the screen until the generated random number is greater than 0.9.

[Hint: Python has a `random` module with basic random number generators.]<br/>

[Literature: Introduction to the Random module can be found in LP part III chapter 5 (Numeric Types). Importing modules is introduced in part I chapter 3  and covered in depth in part IV.]

In [47]:
import random as r
num = 0
while num < 0.9:
    print(f"The random number is smaller than 0.9")
    num = r.uniform(0,1)

The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9
The random number is smaller than 0.9


### 4. Dictionaries

Dictionaries are association tables, or maps, connecting a key to a value. For instance a name represented by a string as key with a number representing some attribute as a value. Dictionaries can themselves be values in other dictionaries, creating nested or hierarchical data structures. This is similar to named lists in R but keys in Python dictionaries can be more complex than just strings.

[Literature: Dictionaries are found in LP section II chapter 4.]

a) Make a dictionary named `amadeus` containing the information that the student Amadeus is a male, scored 8 on the Algebra exam and 13 on the History exam. The dictionary should NOT include a name entry.

In [49]:
amadeus = {'Gender':'Male','Algebra':8,'History':13}
amadeus

{'Algebra': 8, 'Gender': 'Male', 'History': 13}

b) Make three more dictionaries, one for each of the students: Rosa, Mona and Ludwig, from the information in the following table:

| Name          | Gender        | Algebra       | History | 
| :-----------: | :-----------: |:-------------:| :------:|
| Rosa          | Female        | 19            | 22      |
| Mona          | Female        | 6             | 27      |
| Ludwig        | Other         | 12            | 18      |

In [50]:
rosa = {'Gender':'Female','Algebra':19,'History':22}
mona = {'Gender':'Female','Algebra':6,'History':27}
ludwig = {'Gender':'Other','Algebra':12,'History':18}
print(rosa,"\n",mona,"\n",ludwig)


{'Gender': 'Female', 'Algebra': 19, 'History': 22} 
 {'Gender': 'Female', 'Algebra': 6, 'History': 27} 
 {'Gender': 'Other', 'Algebra': 12, 'History': 18}


c) Combine the four students in a dictionary named `students` such that a user of your dictionary can type `students['Amadeus']['History']` to retrive Amadeus score on the history test.

[HINT: The values in a dictionary can be dictionaries.]

In [51]:
students = {'Amadeus':amadeus,'Rosa':rosa,'Mona':mona,'Ludwig':ludwig}
students['Amadeus']['History']

13

d) Add the new male student Karl to the dictionary `students`. Karl scored 14 on the Algebra exam and 10 on the History exam.

In [52]:
students['Karl'] = {'Gender':'Male','Algebra':14,'History':10}
students

{'Amadeus': {'Algebra': 8, 'Gender': 'Male', 'History': 13},
 'Karl': {'Algebra': 14, 'Gender': 'Male', 'History': 10},
 'Ludwig': {'Algebra': 12, 'Gender': 'Other', 'History': 18},
 'Mona': {'Algebra': 6, 'Gender': 'Female', 'History': 27},
 'Rosa': {'Algebra': 19, 'Gender': 'Female', 'History': 22}}

e) Use a `for`-loop to print out the names and scores of all students on the screen. The output should look like something this (the order of the students doesn't matter):

> `Student Amadeus scored 8 on the Algebra exam and 13 on the History exam`<br>
> `Student Rosa scored 19 on the Algebra exam and 22 on the History exam`<br>
> ...

[Hint: Dictionaries are iterables, also, check out the `items` function for dictionaries.]

In [58]:
for name, details in students.items():    # Change the names of iteration variables to something moresuitable than k, v.
    print('Student',name, 'scored',details['Algebra'],'on the Algebra exam and',details['History'],'on the History exam')
    #pass # Your code goes here.



Student Amadeus scored 8 on the Algebra exam and 13 on the History exam
Student Rosa scored 19 on the Algebra exam and 22 on the History exam
Student Mona scored 6 on the Algebra exam and 27 on the History exam
Student Ludwig scored 12 on the Algebra exam and 18 on the History exam
Student Karl scored 14 on the Algebra exam and 10 on the History exam


f) Use a dict comprehension and the lists `names` and `short_long` from assignment 2 to create a dictionary of names and wether they are short or long. The result should be a dictionary equivalent to {'Forex':'long', 'Tesco':'long', ...}.

In [59]:
#Task f is done here:
dic_names_short_long = {name:length for name,length in zip(names,short_long)}
dic_names_short_long

{'Alonzo': 'long', 'Forex': 'long', 'Tesco': 'long', 'Zeno': 'short'}

### 5. Introductory file I/O

File I/O in Python is a bit more general than what most R programmers are used to. In R, reading and writing files are usually performed using file type specific functions such as `read.csv` while in Python we usually start with reading standard text files. However, there are lots of specialized functions for different file types in Python as well, especially when using the __[pandas](http://pandas.pydata.org/)__ library which is built around a datatype similar to R DataFrames. Pandas will not be covered in this course though.

[Literature: Files are introduced in LP part II chapter 4 and chapter 9.]

The file `students.tsv` contains tab separated values corresponding to the students in previous assigments.

a) Iterate over the file, line by line, and print each line. The result should be something like this:

> `Amadeus	Male	8	13`<br>
> `Rosa	Female	19	22`<br>
> ...

The file should be closed when reading is complete.

[Hint: Files are iterable in Python.]

In [61]:
lines = tuple(open('students.tsv', 'r'))
for i in lines:
    print(i)

Amadeus	Male	8	13

Rosa	Female	19	22

Mona	Female	6	27

Ludwig	Other	12	18

Karl	Male	14	10


b) Working with many files can be problematic, especially when you forget to close files or errors interrupt programs before files are closed. Python thus has a special `with` statement which automatically closes files for you, even if an error occurs. Redo the assignment above using the `with` statement.

[Literature: With is introduced in LP part II chapter 9 page 294.]

In [62]:
with open('students.tsv','r') as lines:
    for i in lines:
        print(i)
        

Amadeus	Male	8	13

Rosa	Female	19	22

Mona	Female	6	27

Ludwig	Other	12	18

Karl	Male	14	10


c) If you are going to open text files that might have different character encodings, a useful habit might be to use the [`codecs`](https://docs.python.org/3/library/codecs.html) module. Redo the task above, but using codecs.open. You might want to find out the character encoding of the file (for instance in an edit

In [65]:
import codecs
with codecs.open('students.tsv', mode='r', encoding='UTF-8', errors='strict', buffering=-1) as lines:
    for i in lines:
        print(i)

Amadeus	Male	8	13

Rosa	Female	19	22

Mona	Female	6	27

Ludwig	Other	12	18

Karl	Male	14	10


d) Recreate the dictionary from assignment the previous assignment by reading the data from the file. Using a dedicated csv-reader is not permitted.

In [66]:
import codecs
with codecs.open('students.tsv', mode='r', encoding='UTF-8', errors='strict', buffering=-1) as lines:
    dictionary = {}
    for i in lines:
        values = i.split()
        dic = {'Gender':values[1], 'Algebra':values[2], 'History':values[3]}
        dictionary.update({values[0]:dic})
dictionary


{'Amadeus': {'Algebra': '8', 'Gender': 'Male', 'History': '13'},
 'Karl': {'Algebra': '14', 'Gender': 'Male', 'History': '10'},
 'Ludwig': {'Algebra': '12', 'Gender': 'Other', 'History': '18'},
 'Mona': {'Algebra': '6', 'Gender': 'Female', 'History': '27'},
 'Rosa': {'Algebra': '19', 'Gender': 'Female', 'History': '22'}}

e) Using the dictionary above, write sentences from task 4e above to a new file, called `students.txt`.

In [70]:
with open('students.txt','w+') as students_file:
    for name, details in students.items():
        students_file.write(f"Student {name} scored {details['Algebra']} on the Algebra exam and {details['History']} on the History exam \n")