## Tables and Dictionaries

## `unzip` is the inverse of `zip`

### `unzip` a table = transpose

In [64]:
# zip is lazy
# We use list to force completion here
unzip = lambda seq: list(zip(*seq))

hours = [['name', 'hours', 'rate'],
        ['Ann', '42', '37.5'],
        ['Bob', '55', '7.5'],
        ['Alice', '12', '225']]
transpose = unzip(hours)
transpose

[('name', 'Ann', 'Bob', 'Alice'),
 ('hours', '42', '55', '12'),
 ('rate', '37.5', '7.5', '225')]

## Making a column dictionary

#### Task: Turn the transpose into a dictionary

* header string is key
* rest of the list is val

In [65]:
from toolz import get
columns = {get(0, col):col[1:] for col in transpose}
columns

{'hours': ('42', '55', '12'),
 'name': ('Ann', 'Bob', 'Alice'),
 'rate': ('37.5', '7.5', '225')}

## Use `first` and `drop` to clean up the code

* get(0, col):col[1:] is obtuse
* use `first` and `drop`

In [17]:
from toolz import first, drop
help(first)

Help on function first in module toolz.itertoolz:

first(seq)
    The first element in a sequence
    
    >>> first('ABC')
    'A'



In [24]:
help(drop)
# Note: drop is lazy
rest = lambda seq: list(drop(1, seq))

Help on function drop in module toolz.itertoolz:

drop(n, seq)
    The sequence following the first n elements
    
    >>> list(drop(2, [10, 20, 30, 40, 50]))
    [30, 40, 50]
    
    See Also:
        take
        tail



### Use `first` and `rest` in place of `get(0,col):col[1:]` 

In [49]:
columns = {get(0, col):col[1:] for col in transpose}
columns

{'hours': ('42', '55', '12'),
 'name': ('Ann', 'Bob', 'Alice'),
 'rate': ('37.5', '7.5', '225')}

In [50]:
columns = {first(col):rest(col) for col in transpose}
columns

{'hours': ['42', '55', '12'],
 'name': ['Ann', 'Bob', 'Alice'],
 'rate': ['37.5', '7.5', '225']}

## Column transformations

* Now we can transform a column
    * Apply conversion functions
    * Perform column operations like Excel

### <font color='red'> Task: Change hours to ints</font>

#### Use `get` to get the hours column

In [38]:
get('hours', columns)

[42, 55, 12]

#### Write a list comprehension to convert the hours to ints

In [39]:
[int(val) for val in get('hours', columns)]

[42, 55, 12]

#### Turn your expression into a function

In [40]:
col_to_int_col = lambda col_key, col_dict: [int(val) for val in get(col_key, col_dict)]
col_to_int_col('hours', columns)

[42, 55, 12]

#### Use `assoc` and your conversion function to make a new updated dictionary

In [41]:
from toolz import assoc
assoc(columns, 'hours', col_to_int_col('hours', columns))

{'hours': [42, 55, 12],
 'name': ['Ann', 'Bob', 'Alice'],
 'rate': ['37.5', '7.5', '225']}

#### Turn your expression into a function

In [42]:
convert_col_to_int = lambda col_key, col_dict: assoc(col_dict, col_key, col_to_int_col(col_key, col_dict))
convert_col_to_int('hours', columns)

{'hours': [42, 55, 12],
 'name': ['Ann', 'Bob', 'Alice'],
 'rate': ['37.5', '7.5', '225']}

### Task: Change rates to floats

#### Use `get` to get the rates column

In [44]:
get('rate', columns)

['37.5', '7.5', '225']

#### Write a list comprehension to convert the hours to ints

In [45]:
[float(val) for val in get('rate', columns)]

[37.5, 7.5, 225.0]

#### Turn your expression into a function

In [51]:
col_to_float_col = lambda col_key, col_dict: [float(val) for val in get(col_key, col_dict)]
col_to_float_col('rate', columns)

[37.5, 7.5, 225.0]

#### Use `assoc` and your conversion function to make a new updated dictionary

In [52]:
from toolz import assoc
assoc(columns, 'rate', col_to_float_col('rate', columns))

{'hours': ['42', '55', '12'],
 'name': ['Ann', 'Bob', 'Alice'],
 'rate': [37.5, 7.5, 225.0]}

#### Turn your expression into a function

In [55]:
convert_col_to_float = lambda col_key, col_dict: assoc(col_dict, col_key, col_to_float_col(col_key, col_dict))
convert_col_to_float('rate', columns)

{'hours': ['42', '55', '12'],
 'name': ['Ann', 'Bob', 'Alice'],
 'rate': [37.5, 7.5, 225.0]}

## TIME TO UP OUR GAME!!!

Write a few sentences comparing and contrasting `col_to_int_col` and `col_to_float_col`

Write a few sentences comparning `convert_col_to_float` and `convert_col_to_int`

Say some stuff about the DRY principle

### <font color='red'>Task: Write a more general column conversion function</font>

In [56]:
col_to_float_col = lambda col_key, col_dict: [float(val) for val in get(col_key, col_dict)]
col_to_int_col = lambda col_key, col_dict: [int(val) for val in get(col_key, col_dict)]

In [57]:
col_to_type = lambda convert, col_key, col_dict: [convert(val) for val in get(col_key, col_dict)]

### <font color='red'>Task: Write a more general convert and update function</font>

In [58]:
convert_col_to_int = lambda col_key, col_dict: assoc(col_dict, col_key, col_to_int_col(col_key, col_dict))
convert_col_to_float = lambda col_key, col_dict: assoc(col_dict, col_key, col_to_float_col(col_key, col_dict))

In [61]:
convert_col = lambda convert, col_key, col_dict: assoc(col_dict, col_key, col_to_type(convert, col_key, col_dict))

In [62]:
convert_col(int, 'hours', columns)

{'hours': [42, 55, 12],
 'name': ['Ann', 'Bob', 'Alice'],
 'rate': ['37.5', '7.5', '225']}

In [63]:
convert_col(float, 'rate', columns)

{'hours': ['42', '55', '12'],
 'name': ['Ann', 'Bob', 'Alice'],
 'rate': [37.5, 7.5, 225.0]}

# TIME TO BLOW YOUR MIND!!!!111!!one!!

**Dewey:** *Alright, let's pray. God of ~~Rock~~ Code, thank you for this chance to kick ~~ass~~butt. We are your humble servants, please give us the power to blow people's minds with our high voltage ~~rock~~ code. In your name, we pray. Amen.*

### <font color='red'>Task: Write a function to perform multiple conversion</font>

* arguments
    * conversion dictionary
        * keys: col_keys to be converts
        * values: conversion functions
    * column_dictionary
* returns an updated column dictionary

#### Write a column conversion function

* Arguments
    * key
    * conversion dictionary
    * column 
* returns
* converted column if key in conversion dictionary
* original column, if not

Hint: Use the `identity` function from `toolz` as the default for `get`

#### Verify that `identity` is an identity function

* Always returns the unchanged argument

In [None]:
from toolz import identity

#### Write a an expression for getting the right function for a key

* Use `identity` as the default

In [80]:
convert_dict = {'hours':int, 'rate':float}

get('hours', convert_dict, identity)

int

#### Write an expression that gets and applys the conversion function to the column

In [84]:
key = 'hours'
col = get(key, columns)
func = get(key, convert_dict, identity)
return_value = [func(item) for item in col]
return_value

[42, 55, 12]

#### Compose func and return_value expressions into one expression

In [86]:
[get(key, convert_dict, identity)(item) for item in col]

[42, 55, 12]

#### Convert the last expression into a function

In [90]:
from toolz import identity
maybe_convert = lambda key, convert_dict, column: [get(key, convert_dict, identity)(val) for val in column]

In [91]:
assert maybe_convert('hours', convert_dict, get('hours', columns)) == [42, 55, 12]
assert maybe_convert('name', convert_dict, get('name', columns)) == ['Ann', 'Bob', 'Alice']
assert maybe_convert('rate', convert_dict, get('rate', columns)) == [37.5, 7.5, 225.0] 

#### get(key, convert_dict, identity)(item) is obtuse.  Refactor

In [104]:
get_and_apply_conversion = lambda key, item, convert_dict: get(key, convert_dict, identity)(item)
maybe_convert = lambda key, convert_dict, column: [get_and_apply_conversion(key, val, convert_dict) for val in column]

In [105]:
assert maybe_convert('hours', convert_dict, get('hours', columns)) == [42, 55, 12]
assert maybe_convert('name', convert_dict, get('name', columns)) == ['Ann', 'Bob', 'Alice']
assert maybe_convert('rate', convert_dict, get('rate', columns)) == [37.5, 7.5, 225.0] 

#### Write a dictionary comprehension to convert all columns in the conversion dictionary

* Use the function from above.

In [108]:
list(columns.items())

[('name', ('Ann', 'Bob', 'Alice')),
 ('hours', ('42', '55', '12')),
 ('rate', ('37.5', '7.5', '225'))]

In [106]:
convert_columns = lambda convert_dict, col_dict: {key:maybe_convert(key, convert_dict, col) for key, col in col_dict.items()}

In [109]:
assert convert_columns(convert_dict, columns) == {'hours': [42, 55, 12], 'name': ['Ann', 'Bob', 'Alice'], 'rate': [37.5, 7.5, 225.0]}

## <font color='red'>Task: Write a function that turns a column dictionary back into a table.</font>