# Multi-dimensional data
A note on the format of these exercises: 
* This is an exercise in *functional programming.* 
* Thus, I will ask you to *write functions* to accomplish specific things. 
* These functions should work on any input I give them, within bounds. 
* They will be tested on arbitrary test cases.

First watch the video and login to grading:

In [None]:
# Don't change this cell; just run it. 
from IPython.display import IFrame
IFrame('https://1813261-1.kaf.kaltura.com/media/t/1_a4xfr7hl/133896931', width=800, height=560)

from client.api.notebook import Notebook
ok = Notebook('03-02-multi-dimensional-data.ok')
ok.auth(inline=True)

1. **Write a function `clean_rows`** that takes a two-dimensional `array` and deletes the *rows* that contain -1's. Hint: act on the whole array and then collect results for rows with `all(axis=1)`. (-1 is a conventional code for *missing data* in public data corpora.) 

In [None]:
def clean_rows(data): 
    # Fill in details ...
    for row in data:
        for element in row:
            if element == -1:
                data = np.delete(data,element-1,0)
    return data

In [156]:
# Test your code on this example.
import numpy as np
data = np.array([[1,2,3],[4,-1,6],[7,8,9]])
print("Before:")
print(data)
print("After:")
print(clean_rows(data))

Before:
[[ 1  2  3]
 [ 4 -1  6]
 [ 7  8  9]]
After:
[[1 2 3]
 [7 8 9]]


In [157]:
_ = ok.grade('q01')  # check that your solution works. 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



2. **Write a function `clean_columns`** that removes all columns containing -1 in any row. Hint: this is the transpose of the first problem. 

In [158]:
def clean_columns(data): 
    # fill in details ...
    for row in data:
        for element in row:
            if element == -1:
                data = np.delete(data, element-1, 1)
    return data

In [159]:
# Test your code on this example
data = np.array([[1,2,3],[4,-1,6],[7,8,9]])
print("Before:")
print(data)
print("After:")
print(clean_columns(data))

Before:
[[ 1  2  3]
 [ 4 -1  6]
 [ 7  8  9]]
After:
[[1 3]
 [4 6]
 [7 9]]


In [160]:
_ = ok.grade('q02')  # check that your solution works. 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



3. **Write a function `masked`** that masks missing data using a masked array.  See this documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ma.masked_where.html

In [182]:
def masked(data):
    # fill in details ...
    return np.ma.masked_where(data < 0, data)

In [183]:
# Test your code on this example
import numpy as np
data = np.array([[4,5,-1,2],[1,2,3,1],[7,-1,9,8],[4,2,3,6],[2,4,6,2]])
print("Before:")
print(data)
print("After:")
print(masked(data))

Before:
[[ 4  5 -1  2]
 [ 1  2  3  1]
 [ 7 -1  9  8]
 [ 4  2  3  6]
 [ 2  4  6  2]]
After:
[[4 5 -- 2]
 [1 2 3 1]
 [7 -- 9 8]
 [4 2 3 6]
 [2 4 6 2]]


In [184]:
_ = ok.grade('q03')  # check that your solution works. 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



4. **Write a function column_averages** that computes the averages of each column, skipping missing data in each column. Read about this here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html . Use masking to skip missing data.

In [189]:
def column_averages(data):
    # fill in details ...
    np.ma.masked_where(data < 0, data)
    return np.mean(data,0)

In [190]:
data = np.array([[4, 5, -1, 2], [1, 2, 3, 1],
                 [6, -1, 9, 8], [4, 2, 3, 6], [2, 4, 6, 2]])
print("Before:")
print(data)
print("After:")
print(column_averages(data))

Before:
[[ 4  5 -1  2]
 [ 1  2  3  1]
 [ 6 -1  9  8]
 [ 4  2  3  6]
 [ 2  4  6  2]]
After:
[3.4 2.4 4.  3.8]


In [191]:
_ = ok.grade('q04')  # check that your solution works. 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



5. (Advanced) **Write a function `default_missing`** that replaces missing data for a column with the mean of the non-missing data rows for that column. *This won't change the mean!*

Hint: my steps in solving this included: 

   a. Create a masked array using your function `masked`.
    
   b. Use `mean` to compute the mean of the masked array. Compute the `axis 0 mean` of that masked array. These are the means for the non-missing data. Use `keepdims=1` to allow this to broadcast in the next step. 
   
   c. Use `np.select` to replace the -1s with averages. Read about this here:  https://docs.scipy.org/doc/numpy/reference/generated/numpy.select.html

In [211]:
def default_missing(data): 
    # fill in details ... 
    average = np.mean(masked(data), axis = 0)
    condition = [data != -1, data == -1]
    choicelist = [data, average]
    return np.select(condition, choicelist)

In [212]:
# Test your code on this example
data = np.array([[4, 5, -1], [1, 2, 3], [7,-1,9], [-1, 2, -1], [2, 4, 6]])
print("Before:")
print(data)
print("After:")
print(default_missing(data))

Before:
[[ 4  5 -1]
 [ 1  2  3]
 [ 7 -1  9]
 [-1  2 -1]
 [ 2  4  6]]
After:
[[4.   5.   6.  ]
 [1.   2.   3.  ]
 [7.   3.25 9.  ]
 [3.5  2.   6.  ]
 [2.   4.   6.  ]]


In [213]:
_ = ok.grade('q05')  # check that your solution works. 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



# When you are done with this notebook, 
* Save and checkpoint. 
* Ensure that the name of this file is precisely `03-02-multi-dimensional-data.ipynb`. 
* <del>Change `ready` to `True` in the cell below. </del>
* <del>Run the cell below to submit your work for grading. </del>
* Save and checkpoint the notebook. 

ready = True  # change to True when ready to submit
print("submitting file {} for assignment {} as {}".format(ok.assignment.src[0], 
                                                          ok.assignment.name, 
                                                          ok.assignment.get_student_email()))
if not ready: 
    raise Exception("change ready to True when ready to submit")
_ = ok.submit()