# Data abstraction

The lecture notes for this assignment are written in Jupyter. See the adjoining file [video notes](03-01-data-abstraction-video-notes.ipynb) for details. Let's check your understanding of these concepts. 

## For a handy reference
**[Python Data Science Handbook:](http://shop.oreilly.com/product/0636920034919.do)** Essential Tools for Working with Data *By Jake VanderPlas*

Covers the following topics:
* **IPython and Jupyter:** provide computational environments for data scientists using Python
* **NumPy:** includes the ndarray for efficient storage and manipulation of dense data arrays in Python
* **Pandas:** features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python
* **Matplotlib:** includes capabilities for a flexible range of data visualizations in Python
* **Scikit-Learn:** for efficient and clean Python implementations of the most important and established machine learning algorithms

It's available for reading in electronic form the Tufts Library website.

## For Test Prep

* [Array Basics Notebook](03-00-array-basics.ipynb) adapted from Jake VanderPlas' book (cited above)
* [Array Computation Notebook](03-01-array-computation.ipynb) adapted from Jake VanderPlas' book (cited above)
* [Numpy Array Basics Notebook](03-02-numpy-array-basics.ipynb) adapted from Jake VanderPlas' book (cited above)

In [1]:
# Don't change this cell; just run it. 
from IPython.display import IFrame
IFrame('https://1813261-1.kaf.kaltura.com/media/t/1_govlzyqa/133896931', width=800, height=560)

from client.api.notebook import Notebook
ok = Notebook('03-01-data-abstraction.ok')
ok.auth(inline=True)

Assignment: 03-01 Data abstraction
OK, version v1.14.15

Successfully logged in as jameswang1222@gmail.com


1. Make up an array of the numbers 1 to 5. Put into a variable x.

In [2]:
# your answer: 
import numpy as np
x= np.array([x+1 for x in range(5)])
x

array([1, 2, 3, 4, 5])

In [3]:
_ = ok.grade('q01')  # test that your answer is correct 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



2. Write code that sets `y` to the vector created by adding 5 to each element of `x`. 

In [4]:
# Your answer: 
y = x + 5
y

array([ 6,  7,  8,  9, 10])

In [5]:
_ = ok.grade('q02')  # test that your answer is correct 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



# Is the 'for' loop obsolete? 

Sort of. Let's just say that there are very efficient ways to do things in `numpy` without `for` loops. I'm sure that you could tell me whether 7 is a member of y via a `for` loop. But you can also do that with `arrays` much more simply: 

3. (Advanced) Consider that `y` *is an iterable* and write an expression that is True if 7 is in `y`, and False if not. Put that value into `z`.

In [6]:
# your answer: 
z = 7 in y
z

True

# Whoa there! 
The advanced problem shows that there are things about an `array` that are inherited from its status as something else. E.g., the following also works:

In [7]:
for i in y: 
    print(i)

6
7
8
9
10


# The treasure hunt
Most every common thing that you might want to do to an `array` with a `for` loop is easier to do with some `numpy.ndarray` function and/or some combination of those functions and native Python. A very large user community has gone to great expense to make using an `array` as simple as possible! 

What this means -- in practical terms -- is that it is often simpler to look around for a solution in the *numpy user manual* than to code it yourself. Thus, programming with `numpy` requires both knowledge of native Python and "treasure hunting" in the `numpy` documentation! 

Let's have some fun with a few treasure hunts through https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html 

4. Complete the function below so that it always returns the sum of the one-dimensional array `x` passed to it. Beware: I will test it on multiple arrays `x`!

In [13]:
def mysum(x): 
    # your answer:
    return sum(x)

In [14]:
# run this to check your code
mysum(np.array([1,2,3,4,5]))

15

In [15]:
_ = ok.grade('q04')  # test that your answer is correct

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



5. In the function below, return a normalized set of data whose mean is 0.0, by subtracting the current mean from x. 

In [16]:
def renorm(x): 
    # your answer: 
    return x-sum(x)/len(x)

In [17]:
# run this to check your code
x = np.array([5, 6, 7, 8, 9])
renorm(x)

array([-2., -1.,  0.,  1.,  2.])

In [18]:
_ = ok.grade('q05')  # test that your answer is correct

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



6. (Advanced) What happens if you try to do the same things you did to arrays to lists? 

In [19]:

def renorm(x): 
    # your answer: 
    return x-sum(x)/len(x)
x = [5,6,7,8,9]
renorm(x)

TypeError: unsupported operand type(s) for -: 'list' and 'float'

___Your answer:___unsupported operand type(s) for -: 'list' and 'float'

# When you are done with this notebook, 
* Save and checkpoint. 
* Ensure that the name of this file is precisely `03-01-data-abstraction.ipynb`. 
* <del>Change `ready` to `True` in the cell below. </del>
* <del>Run the cell below to submit your work for grading. </del>
* Save and checkpoint the notebook. 

ready = True  # change to True when ready to submit
print("submitting file {} for assignment {} as {}".format(ok.assignment.src[0], 
                                                          ok.assignment.name, 
                                                          ok.assignment.get_student_email()))
if not ready: 
    raise Exception("change ready to True when ready to submit")
_ = ok.submit()