### Intro to Numpy

What is Numpy?

From the docs...
> NumPy is the fundamental package for scientific computing with Python.

[Numpy](http://www.numpy.org/) is a very critical library in the Python eco-system. Although we won't be using it directly that often, it is used extensively in both Pandas and Scikit-Learn, which *are* two libraries that we will be seeing a lot while exploring Machine Learning. As such, it's worth learning at least a little bit about it.

We also have access to a pretty great Numpy Resource within the Safari Online book 'Python for Data Analysis'. Specifically, [chapter 4](https://www.safaribooksonline.com/library/view/python-for-data/9781449323592/ch04.html) has a very thorough rundown of Numpy, and later chapters go into even more in depth.

### The NDArray

One class lies at the core of Numpy: [numpy.ndarray](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html#). The ND-Array (N-Dimensional array) is an optimized structure for the storage and manipulation of large multi dimensional arrays. Python has native support for lists without Numpy of course, and it is trival to move back and forth between Numpy arrays and Python lists. The actual reasons that Numpy Arrays perform so much better than Python arrays is beyond the scope of this lesson, but some of the more easily understood benefits are worth examining.




#### Array Creation

Before you can start doing anything intersting with Numpy arrays you'll have to create one.


In Python you might create a list of the numbers 0 to 99 as follows:
```python
[i for i in range(100)]
```
In Numpy, an equivalent array can be created like:

```python
import np

# from a python list
np.asarray([i for i in range(100)])

# more simply using built-in numpy methods
np.arange(100)
```

In fact, numpy has a huge [suite of methods](https://docs.scipy.org/doc/numpy-1.12.0/reference/routines.array-creation.html) for creating arrays. Some of the interesting ones include (try running these!):

In [None]:
import numpy as np

# an array with even steps throughout a given range
np.arange(start=10, stop=50, step=3)

In [None]:
# an array of a given # of samples, evenly distributed across a desired range
np.linspace(2.0, 10.0, num=20)

In [None]:
# an array of a trillion zeroes
np.zeros(1000000000000)

In [None]:
# a 10x10 identity matrix (2D-array)
np.eye(10)

Try creating a code block below and build some arrays using the methods described above.

In [None]:
### Your code here!

#### Vectorization

Numpy comes supplied with a bunch of methods known as 'Universal Functions', which are special functions that can operate on Numpy arrays in an extremely fast fashion known as 'vectorization'. We can, for instance, multiply two arrays together, multipy an array by some scalar value, compare two arrays element-by-element for equality, etc. There are a slew of ufuncs listed [here](https://docs.scipy.org/doc/numpy-1.12.0/reference/ufuncs.html). Let's see some examples (try running these!):

In [None]:
# 0..9, multiplied by 2
(np.arange(10) * 2)

In [None]:
# sin function up to PI/2 rads
x_range = np.linspace(0.0, 2 * np.pi, num=10)
np.sin(x_range)

In [None]:
# the same function above, plotted
import matplotlib.pylab as plt

plt.plot(x_range, np.sin(x_range))
plt.xlabel('Angle [rad]')
plt.ylabel('sin(x)')
plt.axis('tight')
plt

In [None]:
# boolean check against elements of an array
list_a = np.arange(50)
list_a < 25

#### Performance

[The Numpy API](https://docs.scipy.org/doc/numpy/reference/routines.html) is pretty extensitive, but what really makes it so critical is how much faster it is than regular Python. In many cases, using Numpy arrays will result in code that is 10 to 100 times faster than regular Python. 

Let's see this for ourselves. The following code contains two implementations of a method that takes two arrays and multiplies them together element-wise to create a 3rd array containing the product. The 'classic' method uses python lists, and 'numpy' uses the NDArray. Initially the arrays will be of length 100.

In [None]:
n = 10000

def classic_array_multiply():
    product = []
    list_a = [i for i in range(n)]
    list_b = [i + n for i in range(n)]    
    for a, b in zip(list_a, list_b):
        product.append(a * b)
    return product

    
def numpy_array_multiply():
    list_a = np.arange(n)
    list_b = np.arange(start=n, stop=n + n)
    return list_a * list_b

We'll test these method using an IPython (interactive-python) "magic" called 'timeit'. There are many more magics listed [here](https://ipython.org/ipython-doc/3/interactive/magics.html). 'timeit' will run a block of code repeatedly, and display some performance statistics about it. Run the following code and try to understand the results. Then, try increasing the 'n' value in the above code and see how the performance differs.

In [None]:
print('numpy timing')
%timeit numpy_array_multiply()
print('classic timing')
%timeit classic_array_multiply()

In case you're unfamilar with the symbols in the timing output, here's a refresher on the metric scale. Most important here would be 'milli', 'micro', and 'nano'.

![metric scale](http://www.bustatech.com/wp-content/uploads/2011/11/image.png)

#### And so much more !!

We've only touched on the very basics of Numpy, but this should be enough so that you at least have a point of reference if you are exposed to the Numpy APIs on your machine learning quest.

If you are interested in digging even deeper into the mathematic/scientific angle of all this then you may also want to look at [SciPy](https://docs.scipy.org/doc/scipy/reference/), a computing library built on top of Numpy that includes many advanced features such as:

- Calculus/Integration
- Fourier Transforms
- Signal Processing
- much more..

#### Onwards !

You are now a Numpy savante! Why not move on to Part 4, [Intro to Pandas](pandas.ipynb)?