<figure>
  <IMG SRC="https://www.colorado.edu/cs/profiles/express/themes/cuspirit/logo.png" WIDTH=50 ALIGN="right">
</figure>
# Basic NumPy
*CSCI 3022 - Dirk Grunwald*

## Overview

Statistians and data scientists primarily deal with structured numeric data.

While tuples, lists and dictionaries are useful for general programming, *vectors* and *arrays* are more useful for mathematical calculations.

[NumPy](https://docs.scipy.org/doc/numpy-1.13.0/index.html) is an *extension module* to the Python language that provides vectors and arrays. In order to use NumPy, we must *import* the numpy module.

In [None]:
import numpy as np

In this course, we will almost always use the same method to refer to NumPy operations -- we will import NumPy and use the name ```np``` to refer to parts of numpy.

In this tutorial, we focus on the parts of NumPy we use in this course. NumPy site has [a quickstart tutorial](https://docs.scipy.org/doc/numpy-1.13.0/user/quickstart.html) that is more complete. 

There is also a complete [reference manual](https://docs.scipy.org/doc/numpy-1.13.0/reference/index.html).

I usually use [Google to search for reference pages](https://www.google.com/search?source=hp&ei=jrdCWpeHPMumjwSr-JngCA&q=numpy+ones&oq=numpy+ones&gs_l=psy-ab.3..0l2.5642.6767.0.7015.12.11.0.0.0.0.113.801.9j1.11.0....0...1c.1.64.psy-ab..1.11.891.6..35i39k1j0i67k1j0i131k1j0i20i264k1j0i20i263i264k1j0i20i263k1.92.hgIku2CyNOo) and also check out answers presented in [Stack Overflow](https://stackoverflow.com/questions/tagged/numpy).

## Creating a vector

A numpy vector is similar to a list, but each element must be exactly the same type. In this course, we will mainly use vectors of floating point numbers although using the [different datayptes](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html) can be much more efficient.

In [None]:
z = np.zeros(5)
print('z is', z)

In [None]:
o = np.ones(3)
print('o is', o)

In [None]:
evens = np.arange(0, 12, 2)
print('evens is', evens)

In [None]:
ls = np.linspace(2,10,9)
print('ls is', ls)

Although it's not very efficient, you can create a vector from a Python list or tuple.

In [None]:
evens = np.array([0, 2, 4, 6, 8, 10])
print('evens is', evens)

You can also extract or slice elements from a vector just like a list and you can even repeatedly skip parts

In [None]:
print('Middle part', evens[1:3])
print('Front part', evens[:2])
print('Back part', evens[2:])
print('Every other', evens[::2])

## Arrays and shapes

A vector is a 1-dimensional array (like a list), but NumPy handles different types of arrays.

Each vector has a size and shape. For a vector, the size tells you how many items are in the vector. While it might appear that ```len``` and ```size``` give you the same infomation, they don't.

In [None]:
print('evens has', len(evens), 'elements')
print('That is also the size of evens:', evens.size)

 The *shape* is needed for arrays where you need to determine the size of each dimension (*e.g.* 

In [None]:
print('the shape of evens is', evens.shape)
twod = np.array( [[0, 2, 4], [6,8,10]])
print('twod is', twod)
print('It has length', len(twod), 'size', twod.size, 'and shape', twod.shape)

It is possible to *reshape* an array from one shape to another. We use this sparingly in this class because it's often confusing to people.

In [None]:
flat = twod.reshape((6,))
print('flat is ', flat)
print('twod is now', twod)

In [None]:
print('Second row, first element of twod is', twod[1][0])

## Operations on Vectors

NumPy has functions that operate on all the elements of a vector or array. Some of the more useful ones are:
* [Mathmatical functions](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.math.html) - sum, prod, log, absolute value and others
* [Logic functions](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.logic.html) - such as determining if all or any elements of a vector are true
* [Random sampling](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html) - to generate random numbers
* [Statistics](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.statistics.html) - mean, median and the like

In [None]:
evens = np.arange(0,12,2)
print('evens are', evens,'and they sum to', np.sum(evens))
print('mean is', np.sum(evens)/evens.size)
print('which you could write as', np.mean(evens))
print('the largest value is', np.max(evens))
print('and the largest value is at location', np.argmax(evens))

Some of the functions involve multiple vectors

In [None]:
odds = np.arange(1,12,2)
print('evens are', evens, 'and odds are', odds)
print('the sum of evens and odds is', np.add(evens,odds))
print('the maximum of evens and odds is', np.maximum(evens,odds))
print('or, more compactly', evens + odds)

## Operations on vector elements

Certain [NumPy functions](https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html) operate on the elements of the vector in ways that makes mathematical sense for vectors and arrays.

In [None]:
data = np.arange(0,12)
print('data is', data)
print('data * 2 is', data * 2)
print('data + data is', data + data)
print('data squared is', data**2)

In [None]:
twod = np.array( [[0, 2, 4], [6,8,10]])
print('twod is', twod)
print('twod*2 is', twod*2)

## Operations and Slicing

You can combine operations on elements to extract and manipulate data. Logical operations produce a True or False result and we can use a vector of Boolean values to extract values from an existing vector.

In [None]:
print('Elements less than 5 are', data < 5)

In [None]:
print('Those elements are: ', data[ data < 5 ])

We will use this all the time to winnow or reduce data sets. You don't need to just use the comparison in the same vector. For example, assume we have some ages and salaries, and then want to extract the salaries corresponding to people older than 40.

In [None]:
ages = np.array([20, 39, 45, 18, 56, 90])
salary = np.array([10000, 40000, 50000, 8000, 55000, 5000])
print('salaries for folks older than 40:', salary[ ages >= 40 ])