# Data Science Self-Assessment - Solutions

For the corresponding solutions (by Galvanize) see my [Github page](https://github.com/RoyKlaasseBos/self-study-resources/blob/master/DSI-Self-Assessment.pdf).

## Spot the Differences

### For Loops

**Script 1**  
The program will print the value of `total` for each iteration of the `for`-loop. Since total is reset to 0 for each iteration, the output will be:
```python
1
2
3
```

**Script 2**  
After each iteration of the `for`-loop the current value of `total` is printed, which comes down to: 
```python
1
3
6
```

**Script 3**  
The difference with script 2 is that total is only printed once (after the `for`-loop). 
```python
6
```

### For Loops

**Script 1**  
As you can see below, only `cat` is retured since the `return`-statement immediatetely breaks the `for`-loop.

In [7]:
# Script 1
def my_function1(my_list):
    output = []
    for item in my_list:
        output.append(item)
        return item
    
my_function1(['cat', 'bad', 'dad'])

'cat'

**Script 2**  
Here is the same problem as in script 1 (the `return`-statement is part of the `for`-loop) and thus the output will only be `cat`.
```python
'cat'
```

**Script 3**  
This script returns the last item of `my_list` (`dad`) because that is the value of `item` once the `for`-loop finishes. 
```python
'dad'
```

**Script 4**  
Because the list `output` is reset to `[]` after each iteration of the `for`-loop `my_function4()` will return `dad` (an empty `output` list  appended with the last item of `my_list`).
```python
'dad'
```

**Script 5**  
This time the list is not emptied after each iteration so that the return value will be a list `output` that contains all elements of `my_list`. Note that calling `my_function5()` twice will give exactly the same output as the `output` list is emptied again.
```python
['cat', 'bad', 'dad']
['cat', 'bad', 'dad']
```

**Script 6**  
Now `output` has become a global variable (list) which is not reset for each `my_function6()` function call. In other words, the elements of `my_list` for the second function call are appended to the existing values of `my_list`. 
```python
['cat', 'bad', 'dad']
['cat', 'bad', 'dad', 'cat', 'bad', 'dad']
```

### Make a function

(1) We want a function that takes a list of numbers and returns that list where 10 was added to each number.

In [16]:
# using a list-comprehension
def list_add_10(list_num):
    return [num + 10 for num in list_num]

list_add_10([1,2,3])

[11, 12, 13]

In [18]:
# alternatively 
def list_add_10(list_num):
    output = []
    for num in list_num:
        output.append(num+10)
    return output

list_add_10([1,2,3])

[11, 12, 13]

(2) We want a function that takes in a list of strings and returns the list with the length of the words.

In [21]:
# using a list-comprehension
def list_length_words(list_words):
    return [len(word) for word in list_words]

list_length_words(['great', 'job', 'so', 'far'])

[5, 3, 2, 3]

In [22]:
# alternatively 
def list_length_words(list_words):
    output = []
    for word in list_words:
        output.append(len(word))
    return output

list_length_words(['great', 'job', 'so', 'far'])

[5, 3, 2, 3]

## More Advanced Python Challenges

**Challenge 1**

In [42]:
def read_file(file_name):
    with open(file_name) as file_object:
        return file_object.readlines()

# dictionary comprehension
def letter_counter(file_name, letters_to_count='aeiou'):
    '''Returns the number of times specified letters appear in a file'''
    text = read_file(file_name)
    return {letter.lower(): text[0].count(letter) for letter in list(text[0])}
                             
letter_counter('lorem_ipsum.txt')

{' ': 68,
 ',': 4,
 '.': 4,
 'a': 29,
 'b': 3,
 'c': 16,
 'd': 18,
 'e': 37,
 'f': 3,
 'g': 3,
 'h': 1,
 'i': 42,
 'l': 21,
 'm': 17,
 'n': 24,
 'o': 29,
 'p': 11,
 'q': 5,
 'r': 22,
 's': 18,
 't': 32,
 'u': 28,
 'v': 3,
 'x': 3}

In [31]:
def read_file(file_name):
    with open(file_name) as file_object:
        return file_object.readlines()

# alternatively 
def letter_counter(file_name, letters_to_count='aeiou'):
    '''Returns the number of times specified letters appear in a file'''
    text = read_file(file_name)

    letter_frequency = {}
    for letter in list(text[0]):
        letter_frequency[letter.lower()] = letter_frequency.get(letter, 0) + 1        
    return letter_frequency
    
letter_counter('lorem_ipsum.txt')

{' ': 68,
 ',': 4,
 '.': 4,
 'a': 29,
 'b': 3,
 'c': 16,
 'd': 8,
 'e': 8,
 'f': 3,
 'g': 3,
 'h': 1,
 'i': 42,
 'l': 22,
 'm': 17,
 'n': 24,
 'o': 29,
 'p': 11,
 'q': 5,
 'r': 22,
 's': 18,
 't': 32,
 'u': 23,
 'v': 3,
 'x': 3}

**Challenge 2**

In [59]:
def remove_item(list_items, item_to_remove):
    '''Remove first occurence of item from list'''
    return list_items[0:list_items.index(item_to_remove)] + list_items[list_items.index(item_to_remove)+1:] \
    if item_to_remove in list_items else 'The item is not in the list'

print(remove_item([1,3,7,8,0,7], 3))
print(remove_item([1,3,7,8,0,7], 2))

[1, 7, 8, 0, 7]
The item is not in the list


**Challenge 3**

In [105]:
def cipher(text, cipher_alphabet, option='encipher'):
    '''
    It has been assumed the option argument is always 'decipher' or 'encipher' 
    That is because the program uses 'decipher' if any argument than 'encipher' is passed different
    '''
    try:
        return ''.join([cipher_alphabet[letter.lower()] for letter in list(text)]) \
        if option == 'encipher' else \
        ''.join([key for letter in list(text) for key, value in cipher_alphabet.items() if letter.lower() == value])
    except:
        return "Please enter valid input (only a-z, A-Z and spaces allowed)"

# create cipher alphabet; note the additional space
cipher_alphabet = dict(zip('abcdefghijklmnopqrstuvwxyz ', 'phqgiumeaylnofdxjkrcvstzwb '))

print(cipher('defend the east wall of the castle', cipher_alphabet))
print(cipher('giuifg cei iprc tpnn du cei qprcni', cipher_alphabet, 'decipher')) 

giuifg cei iprc tpnn du cei qprcni
defend the east wall of the castle


**Challenge 4**

In [133]:
def count_isograms(list_of_words):
    '''Count the number of strings without repeating characters in a list.
       A word is an isogram if all letters occur exactly 1 time.
    '''
    num_isograms = 0
    
    for word in list_of_words:
        # convert to upper() to account for case insensitivity
        if all(word.upper().count(letter.upper()) == 1 for letter in word):
            num_isograms += 1 
                
    return num_isograms


count_isograms(['conduct', 'letter', 'contract', 'hours', 'interview', 'Conduct'])

1

**Challenge 5**

In [143]:
def matching_pairs(data_list):
    pairs = []
    for index1, pair1 in enumerate(data_list): 
        for index2, pair2 in enumerate(data_list):
            
            # the sum of the number is a multiple of 3 
            # it must be different pairs (so not two times the same item)
            if ((pair1[1] + pair2[1]) % 3 == 0) and index1 != index2 and \
            
            # only vowel-vowel or consonant-consonant combination
            ((pair1[0] in 'aeiou' and pair2[0] in 'aeiou') or (pair1[0] not in 'aeiou' and pair2[0] not in 'aeiou')) and \

            # because (A,B) is the same as (B,A)
            (index2, index1) not in pairs:
                pairs.append((index1, index2))
    return pairs
 
matching_pairs([('a', 4), ('b', 5), ('c', 1), ('d', 3), ('e', 2), ('f', 6)])

[(0, 4), (1, 2), (3, 5)]

:fireworks:

In [138]:
data_list = [('a', 4), ('b', 5), ('c', 1), ('d', 3), ('e', 2), ('f', 6)]
data_list[:3] + data_list[4:]

[('a', 4), ('b', 5), ('c', 1), ('e', 2), ('f', 6)]

'abc'

'''sql
SELECT * FROM
'''

```
c(1,2,3)
```

```python
for word in words:
    print(word*2)
```

In [5]:
a = 12,

In [6]:
type(a)

tuple

In [20]:
a = {'a': 'roy', 'b': "Rick"}

In [22]:
[[key for key in a.keys()], 'b']

[['a', 'b'], 'b']

In [23]:
z = lambda z: z*2
z(3)

6

In [44]:
a = lambda x: abs(5-x)
a(2)
a(3)
a(4)
a(5)
a(6)

1

In [48]:
x=0
sorted([1,2,3], key=x)

TypeError: 'int' object is not callable

In [49]:
reduce(lambda a, b: '{}, {}'.format(a, b), [1, 2, 3, 4, 5, 6, 7, 8, 9])

NameError: name 'reduce' is not defined

In [51]:
>>> def transform(n):
...     return lambda x: x + n
...
>>> f = transform(3)
>>> f(4)

7

In [54]:
list(filter(lambda x: x%2==0, [1,2,3]))

[2]

In [59]:
from datetime import date

d0 = date(2008, 8, 18)
d1 = date(2008, 9, 26)
delta = d0 - d1
print(delta.days)

-39


In [55]:
date_start = '01-02-2013'    
date_stop = '07-28-2015'

In [64]:
date_start[3:5]

'02'

In [71]:
from datetime import date

def convert_to_date_dash(date_string):
    return date(int(date_string[-4:]), int(date_string[0:2]), int(date_string[3:5]))

def calculate_date_difference(date_start, date_stop):
    return (convert_to_date_dash(date_stop) - convert_to_date_dash(date_start)).days

calculate_date_difference('01-02-2013', '07-28-2015')

937

In [60]:
def convert_to_date_dash(date_string):
    return date(int(date_string[-4:]), int(date_string[0:2]), int(date_string[3:5]))

def calculate_date_difference(date_start, date_stop):
    return (convert_to_date_dash(date_stop) - convert_to_date_dash(date_start)).days

calculate_date_difference('01-02-2013', '07-28-2015')

TypeError: an integer is required (got type str)

In [73]:
date_start = '15-Jan-1994'      
date_stop = '14-Jul-2015'  
month = {'Jan': 1, 'Feb': 2, 'Mar':3, 'Apr': 4, 'Jul': 7}

In [81]:
int(date_start[-4:])
month[date_start[3:6]]

1

In [79]:
date_start[3:6]

'Jan'

In [111]:
with open('dsp/python/football.csv') as file_object:
    football_object = file_object.readlines()
    

football_data = [line.strip() for line in football_object]

In [112]:
columns = football_data[0].split(',')

In [113]:
columns

['Team',
 'Games',
 'Wins',
 'Losses',
 'Draws',
 'Goals',
 'Goals Allowed',
 'Points']

5

In [118]:
[abs( - columns.index('Goals Allowed')) for team in football_data]

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

In [130]:
football_data

['Team,Games,Wins,Losses,Draws,Goals,Goals Allowed,Points',
 'Arsenal,38,26,9,3,79,36,87',
 'Liverpool,38,24,8,6,67,30,80',
 'Manchester United,38,24,5,9,87,45,77',
 'Newcastle,38,21,8,9,74,52,71',
 'Leeds,38,18,12,8,53,37,66',
 'Chelsea,38,17,13,8,66,38,64',
 'West_Ham,38,15,8,15,48,57,53',
 'Aston_Villa,38,12,14,12,46,47,50',
 'Tottenham,38,14,8,16,49,53,50',
 'Blackburn,38,12,10,16,55,51,46',
 'Southampton,38,12,9,17,46,54,45',
 'Middlesbrough,38,12,9,17,35,47,45',
 'Fulham,38,10,14,14,36,44,44',
 'Charlton,38,10,14,14,38,49,44',
 'Everton,38,11,10,17,45,57,43',
 'Bolton,38,9,13,16,44,62,40',
 'Sunderland,38,10,10,18,29,51,40',
 'Ipswich,38,9,9,20,41,64,36',
 'Derby,38,8,6,24,33,63,30',
 'Leicester,38,5,13,20,30,64,28']

In [133]:
diff_for_against_goals = {team.split(',')[0]: abs(int(team.split(',')[columns.index('Goals')]) - int(team.split(',')[columns.index('Goals Allowed')])) for team in football_data[1:]}
diff_for_against_goals

{'Arsenal': 43,
 'Aston_Villa': 1,
 'Blackburn': 4,
 'Bolton': 18,
 'Charlton': 11,
 'Chelsea': 28,
 'Derby': 30,
 'Everton': 12,
 'Fulham': 8,
 'Ipswich': 23,
 'Leeds': 16,
 'Leicester': 34,
 'Liverpool': 37,
 'Manchester United': 42,
 'Middlesbrough': 12,
 'Newcastle': 22,
 'Southampton': 8,
 'Sunderland': 22,
 'Tottenham': 4,
 'West_Ham': 9}

In [136]:
diff_for_against_goals
min(diff_for_against_goals.values(), )

1

In [137]:
min(diff_for_against_goals, key=diff_for_against_goals.get)

'Aston_Villa'

In [143]:
diff_for_against_goals.items()

dict_items([('Arsenal', 43), ('Liverpool', 37), ('Manchester United', 42), ('Newcastle', 22), ('Leeds', 16), ('Chelsea', 28), ('West_Ham', 9), ('Aston_Villa', 1), ('Tottenham', 4), ('Blackburn', 4), ('Southampton', 8), ('Middlesbrough', 12), ('Fulham', 8), ('Charlton', 11), ('Everton', 12), ('Bolton', 18), ('Sunderland', 22), ('Ipswich', 23), ('Derby', 30), ('Leicester', 34)])

In [140]:
min(diff_for_against_goals.items(), key=lambda x: x[1])[0]

'Aston_Villa'

In [148]:
min(diff_for_against_goals, key=diff_for_against_goals.get)

'Aston_Villa'

In [149]:
min(diff_for_against_goals, key= lambda x: diff_for_against_goals[x])

'Aston_Villa'

In [153]:
import pandas as pd
df = pd.read_csv('dsp/python/football.csv')

In [154]:
df.head()

Unnamed: 0,Team,Games,Wins,Losses,Draws,Goals,Goals Allowed,Points
0,Arsenal,38,26,9,3,79,36,87
1,Liverpool,38,24,8,6,67,30,80
2,Manchester United,38,24,5,9,87,45,77
3,Newcastle,38,21,8,9,74,52,71
4,Leeds,38,18,12,8,53,37,66


In [183]:
df.loc[(abs(df['Goals'] - df['Goals Allowed'])).idxmin(), 'Team']

'Aston_Villa'

7