### Exercises

#### Question 1

Alongside this note book, four CSV files are specified (one is in fact a TSV file).

For each file, load it using the CSV module, and find the smallest and largest numbers in the data.

All these files contain just lists of numbers - with the exception of a possible header row

##### Solution 1

In [1]:
file1 = 'file1.csv'
file2 = 'file2.csv'
file3 = 'file3.tsv'
file4 = 'file4.csv'

In [5]:
def open_file(file):
    with open(file) as f:
        for row in f.readlines():
            print(row.strip())

In [10]:
def open_file_raw(file):
    with open(file) as f:
        print(f.readlines())

In [6]:
open_file(file1)

col1,col2,col3
10,20,30
30,40,50
60,70,80


In [7]:
open_file(file2)

1, 3.14, 500
20, 1, -2
-1.1, -2.2, -3.3


In [23]:
open_file_raw(file2)

['1, 3.14, 500\n', '20, 1, -2\n', '-1.1, -2.2, -3.3']


In [15]:
open_file(file3)

col1	col2	col3
10	20	30
40	50	60
100	200	300


In [16]:
open_file_raw(file3)

['col1\tcol2\tcol3\n', '10\t20\t30\n', '40\t50\t60\n', '100\t200\t300']


In [9]:
open_file(file4)

-col1-|-col2-|-col3-
10|20|30
-3.14-|-25-|-100-
---3.14-|20|-30-


In [17]:
open_file_raw(file4)

['-col1-|-col2-|-col3-\n', '10|20|30\n', '-3.14-|-25-|-100-\n', '---3.14-|20|-30-']


In [18]:
import csv

In [20]:
# single function to parse all files
def parse_file(file, has_header_row=True, **kwargs):
    with open(file) as f:
        reader = csv.reader(f, **kwargs)
        if has_header_row:
            next(reader)
        flattened = [num for row in reader for num in row]
    return min_max(flattened)

In [21]:
# independent min_max function to decompose complexity
def min_max(data):
    con = []
    for num in data:
        # convert the numbers to integers or floats to calculate min_max appropriately
        try:
            con.append(int(num))
        except ValueError:
            con.append(float(num))     
    return min(con), max(con)

In [22]:
parse_file(file1)

(10, 80)

In [25]:
parse_file(file2, has_header_row=False)

(-3.3, 500)

In [26]:
parse_file(file3, delimiter='\t')

(10, 300)

In [27]:
parse_file(file4, delimiter='|', quotechar='-')

(-3.14, 100)

#### Question 2

Given this data structure consisting of a list of dictionaries, write a function that will write this data out to a file, where the column headers (in the first row) are based on the dictionary keys, and the values are flattened out to one row per dictionary (under the corresponding column header).

Note that not all dictionaries contain all the same keys, nor are the keys necessarily in the same order when present.

For "missing" values, your function should just write an empty string.

For example, given this `data`:

In [28]:
data = [
    {'a': '1_a', 'b': '1_b', 'c': '1_c'},
    {'c': '2_c', 'd': '2_d'},
    {'a': '3_a', 'c': '3_c', 'e': '3_e'}
]

```
a,b,c,d,e
1_a,1_b,1_c,,,
,,2_c,2_d,
3_a,,3_c,,3_e
```

The order of the columns and rows is not important - as long as they match up with respective column headers.

##### Solution 2

In [51]:
def transform_data(data):
    # iterate through data list and dict and put the dict keys in a set to form the header
    header_row = {alpha for row in data for alpha in row}
    # create list of lists
    new_data = [[row.get(alpha, '') for alpha in sorted(header_row)] for row in data]
    new_data = [sorted(header_row), *new_data]
    return new_data

In [52]:
transform_data(data)

[['a', 'b', 'c', 'd', 'e'],
 ['1_a', '1_b', '1_c', '', ''],
 ['', '', '2_c', '2_d', ''],
 ['3_a', '', '3_c', '', '3_e']]

In [53]:
def write_csv(file, data):
    # write file to disk
    with open(file, 'w') as f:
        writer = csv.writer(f)
        for row in transform_data(data):
            writer.writerow(row)
    # read file from disk
    with open(file) as f:
        for row in f:
            print(row.strip())

In [56]:
assign_file = 'test.csv'

In [57]:
write_csv(assign_file, data)

a,b,c,d,e
1_a,1_b,1_c,,
,,2_c,2_d,
3_a,,3_c,,3_e
