# Intro to Numpy 

Tonight we're going to start off with a little detour into the `Numpy` library, which is the fundamental package for scientific computing in Python. It turns out that the `Pandas DataFrames` that we worked with last class are actually but off the `numpy array` (which we'll get to), so it's important to have some basic knowldege of what's running under the hood of our `DataFrames`. We started with `Pandas DataFrames` as opposed to `Numpy` and `numpy arrays` because they are a little bit more intuitive, and we're able to interact with them from a much higher level. 

While `Numpy` offers a number of things (see the [docs](http://www.numpy.org/) for a better idea), one of it's mainstays is the `numpy array`, which is what we'll focus on tonight. 

## The basics

What's so special about a `numpy array`? From a high level, they are kind of like lists - they just store a bunch of stuff in a container. Okay, great, so what's the big deal? Well, it turns out that a `numpy array` is much faster to interact with and perform calculations with than a standard list. Why is that, though? The two main reasons that they are faster are: 

1. They are stored as one contiguous block of memory, rather than being spread out across multiple locations like a list. 
2. Each item in a `numpy array` is of the same data type (i.e. all integers, all floats, etc.), rather than a conglomerate of any number of data types (as a list is). 

Just how much faster are they? Well, let's take the numbers from 0 to 1 million, and sum those numbers, timing it with both a list a numpy array.

In [6]:
import numpy as np
def sum_np_array(): 
    a = np.arange(0, 1000000)
    return a.sum()
    
def sum_lst(): 
    a = xrange(1000000)
    return sum(a)

In [5]:
%timeit sum_np_array()

1000 loops, best of 3: 1.63 ms per loop


In [7]:
%timeit sum_lst()

100 loops, best of 3: 7.91 ms per loop


Woah, so it's about 5 times faster! This is because of those two points above. Because numpy arrays store data in contiguous blocks of memory, it is able to take advantage of **vectorization**, which is the ability of a CPU to perform one operation on mulitiple pieces of data at once. In addition, since a numpy array knows what type each object it is storing is (and those types don't change), it doesn't have to waste time checking what type each object is (like a list). The combo of these two things speeds up our calcualtion quite a bit. 

It's also worth nothing that all we did above was a sum - just a **simple** sum. When we move to doing more complicated operations, we'll save even more time! Let's look at what else numpy arrays can do...