<h1 align="center">Python for DATA SCIENCE</h1><Br/>
<img src="https://goo.gl/ZKX5FF" style="width:15%; float:centre"><Br/>
<h2 align="center">Dr Mazen Gabriel Alhrishy</h2>
<h5 align="center"><i>MAZEN.ALHRISHY@GMAIL.COM</i></h5><Br/>

<table width=25%>
    <tr>
        <td>
            <a href="https://goo.gl/BTtR3C"><img src="https://goo.gl/rMsKok"></a>
        </td>
        <td>
            <a href="https://goo.gl/XaRDbH"><img src="https://goo.gl/KyMZcj"></a>
        </td>
        <td>
            <a href="https://goo.gl/9uCqS6"><img src="https://goo.gl/a8gcDK"></a>
        </td>
        <td>
            <a href="https://goo.gl/bnt2EL"><img src="https://goo.gl/1rT18x"></a>
        </td>
        <td>
            <a href="https://goo.gl/VmfU3S"><img src="https://goo.gl/WFFkxn"></a>
        </td>
    </tr>
</table>

<h1>Week 5- Data Structures and Files Handling</h1>

<ol style="list-style-type:none">
    <li><h2>I- More data structures</h2></li>
    <li><h2>II- Reading and writing files</h2></li>
    <li><h2>- Exercises</h2></li>
    <li><h2>- Solutions</h2></li>
</ol>

<h2>I- More data structures</h2>

<ol style="list-style-type:none">
    <li><h3>1- List comprehensions</h3></li>
    <li><h3>2- Tuples</h3></li>
    <li><h3>3- Dictionaries</h3></li>
</ol>

<h3>1- List comprehensions</h3>

<blockquote>
"List comprehensions provide a concise way to create lists. They are usually used to create a new list where each element is the result of some operations applied to each element of another sequence"
</blockquote>

<ul><li>A simple Python syntax for a list comprehension is:</ul></li>

<b>
[<font color="green">expression</font> for <font color="red">element</font> in <font color="red">sequence</font>]
</b>

<ul><li>To see the appeal of list comprehensions, lets assume that we want to square each number from 1-10, and store the result in a list</ul></li>

<b>Option 1</b>: we can do this using a <b>for</b> loop. However, this creates a variable named x that still exists after the loop completes

In [None]:
squares = list()

for x in range(1, 11):
    squares.append(x**2)
    
print(squares)

<b>Option 2</b>: We can create the list without any side effects by using the <b>map()</b> function introduced before, and converting the resulting object into a list

In [None]:
map_object = map(lambda x: x**2, range(1, 11))
squares = list(map_object)

print(squares)

<b>Option 3</b>: However, using a list comprehension is more concise and readable!

In [None]:
squares = [x**2 for x in range(1, 11)]

print(squares)

<h3>2- Tuples</h3>

<blockquote>
"A tuple is another standard sequence data type similar to lists. However, the important difference is that, unlike lists, tuples are <b>immutable</b>"
</blockquote>

<ul><li>A tuple consists of a number of values separated by commas, with or without surrounding parentheses (although often parentheses are used anyway)</ul></li>

In [None]:
t = (12345, 54321, 'hello!')
print(t, type(t))

<ul><li>We can create an empty tuple by using either empty parentheses <b>( )</b>, or the built-in function <b>tuple()</b></ul></li>

In [None]:
empty_t = ()
empty_t = tuple()

print(empty_t, type(empty_t))

<ul><li>To create a tuple with one item only, the value should be followed with a comma (it is not sufficient to enclose a single value in parentheses)</ul></li>

In [None]:
singleton_t = ('hello!')  # creates a string object not a tuple!
print(singleton_t, type(singleton_t))

singleton_t = ('hello!',)  # note the trailing comma
print(singleton_t, type(singleton_t))

<ul><li>Unlike lists, tuples are immutable (i.e. you can't change items values)</ul></li>

In [None]:
t[0] = 88888  # value reassignment is not allowed

<ul><li>Like lists, tuples can be indexed and sliced</li></ul>

In [None]:
print(t[0])  # indexing returns the item
print(t[-1])
print(t[-2:])  # slicing returns a new tuple

<ul><li>Like lists, you can also unpack a tuple by using as many variables on the left side of the equals sign as there are elements in the tuple</ul></li>

In [None]:
x, y, z = t
print(x)
print(y)
print(z)

<h3>3- Dictionaries</h3>

<blockquote>
"A dictionary is an unordered set of <b>key: value</b> pairs, with the requirement that the keys are immutable and unique within one dictionary"
</blockquote>

<ul><li>The dictionary's set of <b>key: value</b> pairs are separated by commas, with surrounding braces</ul></li>

In [None]:
d = {'one': 1, 'two': 2, 'three': 3}
print(d, type(d))

<ul><li>This can also be done using the built-in function <b>dict()</b></ul></li>

In [None]:
d = dict(one=1, two=2, three=3)  # using comma-separated key=value
print(d, type(d))

d = dict([('one', 1), ('two', 2), ('three', 3)])  # using a list of comma-separated (key,value)
print(d, type(d))

<ul><li><b>dict()</b> can also be used to create an empty dictionary, similar to using empty braces <b>{ }</b></ul></li>

In [None]:
empty_d = {}
empty_d = dict()

print(empty_d, type(empty_d))

<ul><li>To understand the motivation behind dictionaries, consider the task of storing some countries with their corresponding capitals</ul></li>

<b>Option 1</b>: using 2 lists, the first to store countries, and the second to store corresponding capitals. To access the capital of a country, we first need to get the index of that country from the first list, then use the index to access the capital in the second list (i.e. no direct access)

In [None]:
countries = ['spain', 'france', 'germany', 'norway']
capitals = ['madrid', 'paris', 'berlin', 'oslo']

ind_ger = countries.index('germany')
print(capitals[ind_ger])

<b>Option 2</b>: Using a dictionary on the other hand allows direct access to a value using its key as an index!

In [None]:
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin', 'norway':'oslo'}
print(europe['germany'])

<ul><li>To check whether a single key is in the dictionary, use the membership operations

In [None]:
print('germany' in europe)
print('germany' not in europe)

<ul><li>To check all the keys used in the dictionary, use the <b>keys()</b> method together with the <b>list()</b> function to return a list of all the keys in arbitrary order. If you want the list sorted, you can use the <b>sorted()</b> function instead of <b>list()</b></ul></li>

In [None]:
print(list(europe.keys()))
print(sorted(europe.keys()))

<ul><li>The <b>items()</b> method together with the <b>list()</b> function returns a list of tuples in arbitrary order, where each tuple is a key:value pair. Again, we can sort the list of tuples using the <b>sorted()</b> function instead of <b>list()</b></ul></li>

In [None]:
print(list(europe.items()))
print(sorted(europe.items()))  # sorting the tuple pairs using the keys

<ul><li>The <b>items()</b> method is also useful to loop through dictionaries becasue it retrieves the key:value pairs </ul></li>

In [None]:
for k, v in europe.items():
    print(k, v)

<ul><li>To add a <b>key: value</b> pair to the dictionary</li></ul>

In [None]:
europe['latvia'] = 'riga'
print(europe)

<ul><li>The <b>del</b> statement can be used to remove a <b>key: value</b> pair from a dictionary; while the <b>clear()</b> method can be used to clear the entire dictionary</li></ul>

In [None]:
del europe['germany']
print(europe)

europe.clear()
print(europe)

<a href="https://docs.python.org/3/library/stdtypes.html#dict"> Here</a> are all of the methods you can call on a dictionary object

<h2>II- Reading and writing files</h2>

<ul><li>The <b>open()</b> function can be used to read a file. It returns a file object, and is most commonly used with two arguments as follows</ul></li>

<b>
file_object = <font color="red">open</font>(file, mode='rt')<br>
</b>
    
    file: a string containing the filename
    mode: an optional string that specifies the mode in which the file is opened. Available modes are:
          'r' open for reading only (default)
          'r+' open for reading and writing
          'w' open for writing only, truncating the file first (i.e. erase its contents)
          'a' open for writing only, appending to the end of the file if it exists
          't' appended to any mode above to open the file in text mode (default)
          'b' appended to any mode above to open the file in binary mode

<ul><li>Files that contain text are normally opened in text mode 't' (i.e. you read and write strings from and to the file)</ul></li>

In [None]:
f1 = open(r'Text_Files/file_(3_lines).txt', 'rt')

<ul><li>Files that don’t contain text (e.g. image files), are normally opened in binary mode 'b' (i.e. data is read and written from and to the file in the form of bytes objects)</ul></li>

In [None]:
f2 = open(r'Images/Guido_van_Rossum.jpg', 'rb')

<ul><li>To read the entire contents of the file, the <b>read()</b> method can be used</ul></li>

In [None]:
data1 = f1.read()
print(data1)

In [None]:
data2 = f2.read()
print(data2)

<ul><li>If the end of the file has been reached, <b>read()</b> will return an empty string</ul></li>

In [None]:
f1.read()

In [None]:
f2.read()

<ul><li>If we are done processing the opened file, it should be closed to free up any system resources used by it. For this, the <b>close()</b> method is called</ul></li>

In [None]:
f1.close()
f2.close()

<ul><li>To check if a file was closed, the <b>closed</b> method can be called</ul></li>

In [None]:
print(f1.closed)
print(f2.closed)

<ul><li>To read a single line from the file, the <b>readline()</b> method can be used (a newline character (\n) is left at the end of each line). When the end of the file has been reached, <b>readline()</b> returns an empty string

In [None]:
f = open(r'Text_Files/file_(2_lines).txt', 'rt')

In [None]:
f.readline()

In [None]:
f.readline()

In [None]:
f.readline()

In [None]:
f.close()

<ul><li>For reading all lines from a file, one at a time, you can loop over the file object. This is memory efficient, fast, and leads to simple code</ul></li>

In [None]:
f = open(r'Text_Files/file_(3_lines).txt', 'rt')
for line in f:
    print(line, end='')
f.close()

<ul><li>If you want to read all the lines of a file into a list at once, you can also use <b>list()</b> or <b>readlines()</b></ul></li>

In [None]:
f = open(r'Text_Files/file_(3_lines).txt', 'rt')
print(list(f))
f.close()

In [None]:
f = open(r'Text_Files/file_(3_lines).txt', 'rt')
print(f.readlines())
f.close()

<ul><li>It is good practice to use the <b>with</b> keyword when dealing with file objects. The advantage is that the file is properly closed after it's processed (no need for you to call <b>close()</b>)</ul></li>

In [None]:
with open(r'Text_Files/file_(3_lines).txt', 'rt') as f:
    print(f.readlines())

<ul><li>To write a string to the file, the <b>write()</b> method is used. It also returns the number of characters written</ul></li>

In [None]:
with open(r'Text_Files/file_write_test.txt', 'wt') as f:
    f.write('This is a test\n')

print(f.closed)  # check if file was closed

<a href="https://docs.python.org/3/library/io.html#io.IOBase"> Here</a> are all of the methods you can call on a file object

<h2>- Exercises</h2>

1- Using a list comprehension, create a new list 'new_list' out of the list 'numbers', which contains the numbers in 'numbers', as integers. Print out the result.

    numbers = [34.6, -203.4, 44.9, 68.3, -12.2, 44.6, 12.7]
    
<ul><li>Modify your answer so that 'new_list' contains only the positive numbers from 'numbers', again as integers. Print out the result.</ul></li>

2- Using a list comprehension, create a list of integers which specify the length of each word in the sentence, but only if the word is not 'the'

    sentence = 'the quick brown fox jumps over the lazy dog'

<ul><li>Print out the resulting list</ul></li>

3- Similar to lists, dictionaries can also have nested dictionaries, where the value is a dictionary of key: value pairs. Write a 'europe' dictionary of the following countries, where each key is a country, and the value is a dictionary of 2 keys: 'capital' and 'population'

    Spain:(Madrid, 46.77 Million)
    France:(Paris, 66.03 Million)
    Germany:(Berlin, 80.62 Million)
    Norway:(Oslo, 5.084 Million)
    
<ul><li>Print out the capital of France</ul></li>
<ul><li>Add Latvia to the dictionary</ul></li>

    Latvia:(Riga, 1.96 Million)
<ul><li>Print out the dictionary</ul></li>

4- Write a Python script to read through the lines of a file, break each line into a list of words, and then loop through each of the words in the line, and count each word using a dictionary

<h2>- Solutions</h2>

In [None]:
# Exercise 1

numbers = [34.6, -203.4, 44.9, 68.3, -12.2, 44.6, 12.7]
new_list = [int(number) for number in numbers]
print(new_list)

new_list = [int(number) for number in numbers if number > 0]
print(new_list)

In [None]:
# Exercise 2

sentence = 'the quick brown fox jumps over the lazy dog'
words = sentence.split()
print(words)
word_lengths = [len(word) for word in words if word != "the"]
print(word_lengths)

In [None]:
# Exercise 3
europe = { 'spain': { 'capital':'madrid', 'population':46.77 },
           'france': { 'capital':'paris', 'population':66.03 },
           'germany': { 'capital':'berlin', 'population':80.62 },
           'norway': { 'capital':'oslo', 'population':5.084 } }


# print out the capital of France
print(europe['france']['capital'])

# add Latvia to europe 
europe['Latvia'] = {'capital': 'riga', 'population': 1.96}

# Print europe
print(europe)

In [None]:
# Exercise 4

with open(r'Text_Files/romeo.txt', 'rt') as f:

    word_counts = dict()

    for line in f:
        words = line.split()  # split each line into a list of words
        for word in words:
            if word not in word_counts:
                word_counts[word] = 1
            else:
                word_counts[word] += 1

print(word_counts)