# Introduction to numpy

The numpy package is the preferred way to interact directly with arrays in Python. We won't need much of it, but it is worth going through some key points.

In [1]:
import numpy as np

x = np.array([1,2,3,4,5])
x

array([1, 2, 3, 4, 5])

Don't forget that integers and floats are different objects in Python.

In [2]:
x.dtype

dtype('int64')

In [3]:
np.array([1.0,2,3,4,5]).dtype

dtype('float64')

The `shape` of an array is what we would usually call the *size* or *dimensions* of the array.

In [4]:
x.shape

(5,)

Informally, an array with one dimension may be referred to as a vector, but that is not an official designation in numpy.

In [5]:
def showthem(lines):
    for line in lines:
        print() 
        print(line,"=")
        print(eval(line))
    return None

The components of the array are referenced starting at 0.

In [6]:
showthem(["x[0]","x[[0,1,2]]"])


x[0] =
1

x[[0,1,2]] =
[1 2 3]


In [7]:
showthem(["x[:3]"])


x[:3] =
[1 2 3]


The use of `x[:3]` above is referred to as **slicing** the array. The `:3` means to take the first three elements. If we want to reference from the end instead of the beginning, we can use negative indexes.

In [8]:
showthem(["x[-1]","x[-3:]"])


x[-1] =
5

x[-3:] =
[3 4 5]


Where numpy slicing gets confusing is that in the range `i:j`, *the last index is not part of the slice.* This does mean that there are going to be $j-i$ values in the slice.

In [9]:
showthem(["x[1:4]"])


x[1:4] =
[2 3 4]


Also, numpy is happy to let you index past the end of the array. It will just return as much as is available without complaint. This can be a source of frustrating bugs.

In [10]:
showthem(["x[1:10]"])


x[1:10] =
[2 3 4 5]


## 2D arrays

There is no practical difference between a 2D array and an array of rows, each given as a 1D array.

In [11]:
A = np.array( [[1,2],[3,4],[5,6]] )
A.shape

(3, 2)

You can slice in each dimension individually.

In [12]:
showthem(["A[-2:,:2]"])


A[-2:,:2] =
[[3 4]
 [5 6]]


A `:` in one slice position means to keep everything in that dimension.

In [13]:
showthem(["A[:2,:]","A[:,0]"])


A[:2,:] =
[[1 2]
 [3 4]]

A[:,0] =
[1 3 5]


Here are some other common ways to build arrays.

In [14]:
showthem(["np.ones(5)","np.zeros((3,6))","np.repeat(np.pi,3)"])


np.ones(5) =
[1. 1. 1. 1. 1.]

np.zeros((3,6)) =
[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]

np.repeat(np.pi,3) =
[3.14159265 3.14159265 3.14159265]


## Reductions

A common task is to *reduce* an array along one direction, called an *axis* in numpy, resulting in an array of one less dimension. It's easiest to explain by some examples.

In [15]:
showthem(["A","np.sum(A,axis=0)","np.sum(A,axis=1)"])


A =
[[1 2]
 [3 4]
 [5 6]]

np.sum(A,axis=0) =
[ 9 12]

np.sum(A,axis=1) =
[ 3  7 11]


If you don't give an axis, the reduction occurs over all directions at once, resulting in a single number.

In [16]:
showthem(["np.sum(A)"])


np.sum(A) =
21


You can also do reductions with maximum, minimum, mean, etc.

## Broadcasting

You can add together arrays of the same shape, just as you would expect. Generally, you cannot do that for arrays with different shapes.

In [17]:
x = np.array([1,2,3,4,5])
y = np.array([10,20])
showthem(["x+x","x+y"])


x+x =
[ 2  4  6  8 10]

x+y =


ValueError: operands could not be broadcast together with shapes (5,) (2,) 

There is a major exception though: when one of the operands can be repeated along an axis to become the same shape as the other, then this *broadcasting* is done.

In [18]:
showthem(["x+3","A-y"])


x+3 =
[4 5 6 7 8]

A-y =
[[ -9 -18]
 [ -7 -16]
 [ -5 -14]]


For example, suppose that for every element in a column, we want to subtract off the average value in that column. This becomes easy via broadcasting.

In [19]:
showthem(["A-np.mean(A,axis=0)"])


A-np.mean(A,axis=0) =
[[-2. -2.]
 [ 0.  0.]
 [ 2.  2.]]


## Random numbers

Generating truly random numbers on a computer is not simple. Mostly we rely on *pseudorandom* numbers, which are generated by deterministic functions having extremely long periods. One nice consequence is repeatability. Given the starting "seed" or state of the random generator, you should be able to get exactly the same pseudorandom sequence every time.

We will rely on pseudorandom numbers in two ways. First, many algorithms in data science have at least one random aspect (dividing data into subsets, for example). The library routines we will be using allow you to specify the random state and get repeatable results. Occasionally, though, we might want to generate random values for our own use.

In [20]:
from numpy.random import default_rng
rng = default_rng(19716)  # giving an initial state to the generator

The `uniform` generator method produces numbers distributed uniformly (i.e., every value is equally likely) between two limits you specify.

In [21]:
rng.uniform(-1,1,size=(4,3))

array([[ 0.95168753,  0.31539479, -0.665158  ],
       [ 0.4292572 ,  0.96076254,  0.35726498],
       [-0.95950348, -0.83727892, -0.88239937],
       [ 0.57263239,  0.80400543, -0.38772561]])

Another common type of random value is generated by `normal`, which produces real values distributed according to the "bell curve" (normal or Gaussian distribution).

In [22]:
rng.normal(size=(2,4))

array([[ 2.07503   , -0.07714901,  1.19144657, -0.65962782],
       [-0.85564896,  1.68707045,  1.19491051,  1.44790912]])

In the long run, the average value of these numbers will be zero.

In [23]:
x = rng.normal(size=400000)
np.mean(x)

-0.0025048371736856203

Results with absolute value greater than 3 should occur at a rate less than 1%.

In [24]:
np.mean( np.abs(x)>3 )

0.00272

<div style="max-width:608px"><div style="position:relative;padding-bottom:66.118421052632%"><iframe id="kaltura_player" src="https://cdnapisec.kaltura.com/p/2358381/sp/235838100/embedIframeJs/uiconf_id/43030021/partner_id/2358381?iframeembed=true&playerId=kaltura_player&entry_id=1_w2pr41c8&flashvars[streamerType]=auto&amp;flashvars[localizationCode]=en&amp;flashvars[leadWithHTML5]=true&amp;flashvars[sideBarContainer.plugin]=true&amp;flashvars[sideBarContainer.position]=left&amp;flashvars[sideBarContainer.clickToClose]=true&amp;flashvars[chapters.plugin]=true&amp;flashvars[chapters.layout]=vertical&amp;flashvars[chapters.thumbnailRotator]=false&amp;flashvars[streamSelector.plugin]=true&amp;flashvars[EmbedPlayer.SpinnerTarget]=videoHolder&amp;flashvars[dualScreen.plugin]=true&amp;flashvars[Kaltura.addCrossoriginToIframe]=true&amp;&wid=1_1bu728r1" width="608" height="402" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" sandbox="allow-forms allow-same-origin allow-scripts allow-top-navigation allow-pointer-lock allow-popups allow-modals allow-orientation-lock allow-popups-to-escape-sandbox allow-presentation allow-top-navigation-by-user-activation" frameborder="0" title="Kaltura Player" style="position:absolute;top:0;left:0;width:100%;height:100%"></iframe></div></div>