# A Single Variable: Shape and Distribution

## Workshop: NumPy

NumPy objects are of type `ndarray`. You can create an `ndarray` by:

- Converting a python list
- Using a factory function that returns a populated vector
- Reading data from a file directly into a NumPy object

In [1]:
import numpy as np

### Creating vectors

#### From a list

In [2]:
vec1 = np.array([0., 1., 2., 3., 4.,])

#### numpy.arange( start_inclusive, stop_exclusive, step size)

In [3]:
vec2 = np.arange(0,5,1, dtype=float)

#### numpy.linspace( start_inclusive, stop_inclusive, number_elements )

In [4]:
vec3 = np.linspace(0,4,5)

#### numpy.zeros( n )

In [5]:
vec4 = np.zeros(5)
for i in range(5):
    vec4[i] = i

#### Read from a text file, one number per row

In [6]:
vec5 = np.loadtxt("ch2/data")

In [7]:
vectors = [vec1, vec2, vec3, vec4, vec5]
for v in vectors:
    print(v)

[0. 1. 2. 3. 4.]
[0. 1. 2. 3. 4.]
[0. 1. 2. 3. 4.]
[0. 1. 2. 3. 4.]
[0. 1. 2. 3. 4.]


Apparently, part of the appeal of numpy vectors is the ability to do math across them without using loops, causing the python interpreter to do extra work.

#### Adding a vector to another

In [8]:
v1 = vec1 + vec2
print(v1)

[0. 2. 4. 6. 8.]


#### Unnecessary to use loops to do the same

In [9]:
v2 = np.zeros(5)
for i in range(5):
    v2[i] = vec1[i] + vec2[i]

print(v2)

[0. 2. 4. 6. 8.]


#### Adding vectors in place

In [10]:
vec1 += vec2
print(vec1)
print(vec2)

[0. 2. 4. 6. 8.]
[0. 1. 2. 3. 4.]


#### Broadcasting: scalar multiplicaton

In [11]:
v3 = 2*vec3
v4 = vec4 + 3

print("v3: ", v3)
print("v4: ", v4)

v3:  [0. 2. 4. 6. 8.]
v4:  [3. 4. 5. 6. 7.]


#### Ufuncs: applying a function to a vector, element by element

In [27]:
v5 = np.sin(vec5)
print(v5)

def testfunc(n):
    return 2*n
v5_2 = testfunc(v5)
print(v5_2)

[ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ]
[ 0.          1.68294197  1.81859485  0.28224002 -1.51360499]


#### Converting back to python list

In [14]:
lst = v5.tolist()
print(lst)

[0.0, 0.8414709848078965, 0.9092974268256817, 0.1411200080598672, -0.7568024953079282]


As with addition, multiplication happens on a per-element basis. Vector multiplication is used with the dot() function

In [17]:
v6 = v3 * v4
print(v6)
print(sum(v6))

[ 0.  8. 20. 36. 56.]
120.0


### Calculating kernel density estimates (KDE's)

I've added the length of president terms in days in ch2/presidents.txt.  The kde function does all the work of generating the estimate, and the loop following calculates the kde for each of 1000 points between the minimum and maximum presidential terms

In [18]:
from numpy import exp,sqrt,pi

# z: position, w: bandwidth, xv: vector of points
def kde(z, w, xv):
    return sum(exp(-0.5*((z-xv)/w)**2) / sqrt(2*pi*w**2))

In [24]:
data = np.loadtxt("ch2/presidents.txt", usecols=(2,), delimiter="\t")
#print(data)

In [28]:
w = 2.5

for x in np.linspace(min(data)-w, max(data)+w, 1000):
    print(x, kde(x,w,data))

28.5 0.09678828980765734
32.930430430430434 0.11843935793037211
37.36086086086086 0.006269408064452273
41.79129129129129 1.4355366927568755e-05
46.22172172172172 1.4218681392245661e-09
50.652152152152155 6.0920241115248776e-15
55.08258258258258 1.1290707847336816e-21
59.51301301301301 9.051863411341183e-30
63.94344344344344 3.139150082067488e-39
68.37387387387388 4.7091598551027585e-50
72.80430430430431 3.0558501371219356e-62
77.23473473473473 8.577839910889595e-76
81.66516516516516 1.0415523112647514e-90
86.0955955955956 5.470682404151574e-107
90.52602602602602 1.242965814942318e-124
94.95645645645645 1.221614367100489e-143
99.38688688688688 5.193574246131471e-164
103.81731731731732 9.551153461835636e-186
108.24774774774775 7.598057349448386e-209
112.67817817817817 2.614607948109948e-233
117.1086086086086 1.605899079816814e-234
121.53903903903904 5.428171972927913e-210
125.96946946946946 7.936814189375364e-187
130.39989989989988 5.019911878360797e-165
134.83033033033033 1.373419595395

### NumPy Data Types

It can be helpful to view NumPy as a wrapper for underlying C buffers of memory.  When you work with numpy objects, you are working with blocks of memory allocated by C.  This means when you specify types with numpy universal functions, you will be giving them C types instead of python types

### NumPy vector shapes

In [30]:
d1 = np.linspace(0,11,12)
d2 = np.linspace(0,11,12)

d1.shape = (3,4)
print(d1)
print(d2)

[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11.]


#### reshape() vs shape

The reshape() function generates a view, and doesn't change the original vector.

In [32]:
view = d2.reshape((3,4))
total = d1 + view

print(d2,end="\n\n")
print(view, end="\n\n")
print(total)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11.]

[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]

[[ 0.  2.  4.  6.]
 [ 8. 10. 12. 14.]
 [16. 18. 20. 22.]]


#### Element access:  not quite what you think it would be --> [row,col]

In [34]:
print(d1[0,1])
print()
print(view[0,1])
print()
print(d2[1])

1.0

1.0

1.0


#### Shape or layout information

In [36]:
print(d1.shape)
print(d2.shape)
print(view.shape)

(3, 4)
(12,)
(3, 4)


#### Number of elements

In [37]:
print(d1.size)
print(len(d1))
print(len(d2))

12
3
12


#### Slicing and advanced indexing

In [40]:
print(d1, end="\n\n")

# Slicing
print(d1[0,:],end="\n\n")
print(d1[:,1])

[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]

[0. 1. 2. 3.]

[1. 5. 9.]


In [42]:
# Individual element: scalar
print(d1[0,1])

# Subvector of shape 1
print(d1[0:1,1])

# Subarray of shape 1x1
print(d1[0:1,1:2])

1.0
[1.]
[[1.]]


In [48]:
# Integer indexing
print(d1[ :, [2,0] ])

[[ 2.  0.]
 [ 6.  4.]
 [10.  8.]]


In [51]:
# Boolean indexing
truths = np.array( [False, True, True] )
print( d1[truths, :])

[[ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]


Two important notes:

- **slicing** returns **views**
- **advanced indexing** returns **copies**