# Intro to NumPy

* NumPy is the main scientific computing package for Python - it allows you to easily work with large arrays of data and supports functionality for many common operations (including linear algebra)

* All about doing computations on large data sets all at once - can do many many things without looping! Much more effecient

-  [based on this numpy quickstart guide](https://docs.scipy.org/doc/numpy/user/quickstart.html)

-  [NumPy main page](http://www.numpy.org/)

- [NumPY and SciPy doc page](https://docs.scipy.org/doc/)

In [0]:
# import numpy and other stuff for this tutorial
import numpy as np

# import a specific function from NumPy cause we'll use it a lot
from numpy import pi

# functionality for plotting
import matplotlib.pyplot as plt

## initialize array and a few basic operations

In [0]:
# set up an array and figure out shape...  
# np.arange method works just like the built in range function
my_array = np.arange(10)   
print(my_array)

# the interval includes `start` but excludes `stop`, overal interval [start...stop-1]

my_array.shape     

In [0]:
# can specify start, stop and step
seq_array = np.arange(0,30,5)     # start, stop (stop at < X), step size
print(seq_array)
# note that 30 is not in there...

In [0]:
# reshape array
my_array = np.arange(36)
my_array = my_array.reshape(6,6)    # 3,12,  9,4
print(my_array.shape)   
print(my_array)
# why is (2,5) and (5,2) ok but (2,6) not ok? 

In [0]:

# reshape array - more complex...
my_array = np.arange(100)
my_array = my_array.reshape(5,5,4)   # 2,5,10
my_array.shape   
print(my_array)

# NOTICE how the dims stack on top of each other! there are 5, 5x4 matrices

## data types (and remember - strong typed language)

In [0]:
print('Dims of data:', my_array.ndim)              # number of dims
print('Name of data type:', my_array.dtype)   # name of data type (float, int32, int64 etc)
print('Size of each element (bytes):', my_array.itemsize)          # size of each element in bytes
print('Total number of elements in array:', my_array.size)         # total number of elements in array

### infer data types upon array creation

In [0]:
# will infer data type based on input values...here we have 1 float so the whole thing is float
float_array = np.array([1.2,2,3])  
float_array.dtype             # or np.dtype

In [0]:
# can also specify type upon array creation
# here make a 2D array of int32s
int_array = np.array([[1,2], [6,7]], dtype = 'int32')   # complex, float32, float64, int32, uint32 (unsigned int32), etc

# can also use tuples! any array-like input of numerical values
int_array = np.array(((1,2), (6,7)), dtype = 'int32')   # complex, float32, float64, int32, uint32 (unsigned int32), etc

print(int_array.dtype)
print(int_array)

### what happens if you initialize with floating point numbers but you declare an int data type?

In [0]:
int_array = np.array([1.1,7.5], dtype = 'int32')   # complex, float32, float64, int32, uint32 (unsigned int32), etc
int_array

## Allocate arrays of zeros, ones or rand to reserve the memory before filling up later 

<div class="alert alert-info">
handy when you know what size you need, but you're not ready to fill it up yet...saves you from dynamically resizing the matrix during analysis, which is very slow
</div>

In [0]:
# note the () around the dims because you usually specify as a tuple...
# default type is float64...can also pass in a list
zero_array = np.zeros( (3,4) )   
print('Data type:', zero_array.dtype)

# explicilty declare data type
zero_array = np.zeros( (3,4), dtype=np.int32)    
print(zero_array.dtype)
print(zero_array)

In [0]:
# ones
# note the 3D output below...4, 4x4 squares of floating point 1s...
np.ones( (4,4,4), dtype=np.float64 )

In [0]:

# what if you want to initialize an array of 10s?
np.ones( (4,4,4), dtype=np.float64 ) * 10

In [0]:
# and empty...not really 'empty' but initialized with varible output determined 
# by current state of memory
np.empty( (2,2,2), dtype = np.float32)

In [0]:
# an alternate way to initialize an array with arbitrary values
# note that 'full' will guess best data type given init value
arr = np.full((2,2), np.nan)

## Can also create sequences of numbers using arange...

In [0]:
seq_array = np.arange(0,10,.56788)    # decimal input is ok too 
# (and again - stop is NOT included)
print(seq_array)

Because of machine precision issues, sometimes hard to predict how many elements will end up in an array when initialized using arange...so often better to specify a sequence based on start point, stop point, and the exact number of elements that you want (or the number of steps between start and stop). linspace (linear spacing) is the function to do this, and note that unlike arange that ends < stop point, linspace will always end exactly at the specified stop point. 

In [0]:
# start, stop, number of linearly spaced steps between start and stop...
# note that start AND stop included!
lin = np.linspace(0,180,9) 
print(lin_array)

## Common use of linspace...eval a function over an interval. quick intro to basic plotting here too...

In [0]:
# eval sin function over an interval
lin = np.linspace(0, 2*pi, 360)
sw = np.sin(lin)

# plotting - can play with formatting here...change line color and other 
# properties
# note we assign a handle
h = plt.plot(lin_arr*180/pi, sw, 'ro-', linewidth = 4)    # specify x,y data...convert rad to deg for x-axis

# label each axis and give it a title
plt.xlabel('angle (deg)')
plt.ylabel('Amplitude')
plt.title('Sin Wave')
plt.grid(1)
plt.show()

In [0]:
# Controlling figure properties. 

# figure out all settings to tweak...
plt.setp(h)

In [0]:
# eval sin function over an interval
lin = np.linspace(0, 2*pi, 360)
sw = np.sin(lin)

# plot
h = plt.plot(lin_arr*180/pi, sw, 'ko-', linewidth = 4)    # specify x,y data...convert rad to deg for x-axis

plt.xlabel('angle (deg)')
plt.ylabel('Amplitude')
plt.title('Sin Wave')
plt.grid(1)
plt.setp(h, 'markersize', 15) 
plt.setp(h, 'alpha', .05) 

plt.show()

### More plotting...scatterplots and legends

In [0]:
# Scatter plots..
N = 30
x = np.linspace(0,9,N)

# random method! - randn like rand but draws from N(0,var)
# What does the *3 do here?
y = x + np.random.randn(1,x.size)*3   # make a second vector x + some randn noise 

plt.scatter(x, y, s=50, c='green', alpha=.5, label="X vs Y")  # note alpha or transparency
plt.xlabel("X")
plt.ylabel("Y")

# add a legend!
plt.legend(loc=2)   # 1-4 for each corner of the plot

# show the plot
plt.show()

# all, any, apply_along_axis, argmax, argmin, argsort, average, ...
# bincount, ceil, clip, conj, corrcoef, cov, cross, cumprod, cumsum, ...
# diff, dot, floor, inner, inv, lexsort, max, maximum, mean, median, ...
# min, minimum, nonzero, outer, prod, re, round, sort, std, sum, trace, ...
# transpose, var, vdot, vectorize, where

## BIG POINT _ ARRAY LIKE OPERATIONS: Simple elementwise arithmetic operations like + and - work on corresponding elements of arrays.

In [0]:
x = np.linspace(0,2*pi,360)

y = np.sin(x)

print(x-y)


### when dealing with muliple arrays of different data types, resulting array will take the form of the highest precision input array (upcasting)!


In [0]:
# declare dtype as int32
x = np.arange(10, dtype='int32')

# this will default to float64
y = np.random.randn(1,10)

# now multiply the int32 array with the float64 array and answer should be the 
# higher precision of the two (float64)
z = x * y 
print('z data type: ', z.dtype)

## Unary operations implemented as methods of the ndarray class

In [0]:
# note the method chain...
x = np.arange(10).reshape(2,5)   # 2 x 5 matrix

print(x.sum())                   # sum of all elements
print(x.sum(axis=0))             # sum of each column (across 1st dim)
print(x.sum(axis=1))             # sum of each row (across 2nd dim)
print(x.sum(0))                  # don't need the axis arg, can just specify

## Other common operations...

## Set logic....

In [0]:
x = np.arange(20)
y = np.linspace(0, 20, 21)
print(x.size)
print(y.size)

z = np.union1d(x,y)
print(z, z.size)

# z = np.intersect1d(x,y)
# print(z)

# z = np.unique([np.append(x,y)])
# print(z)

## Slicing...

In [0]:
# create a 1d array
x = np.linspace(0,9,10)
print(x)
x[1]                     # just the second entry, remember 0 based indexing

# specific start and stop points (exclusive)
x[0:2]                   # the first and second entries in the array, so N>=0 and N<2 (note the < upper bound - not inclusive)

# assign the 2nd - 4th element to 100 (index 1,2,3)
x[1:4] = 100               
print(x[1:4])

# start, stop, step interval
print(x[0:8:2])

# reverse x
print(x[::-1])

# iterate over all elements in 1D array x
for i in x:
    print(i*3)    # then i takes the value of each element in x
    

## multidimentional array indexing, slicing etc

In [0]:
x = np.round(np.random.rand(10,5)*10)   # generate a matrix of uniformly distributed random numbers over 0:10
print(x)

x[0,0]     # first row, first column
x[2,3]     # third row, 4th column

x[:, 3]    # all entries in the 4th column 
x[3, :]    # all entries in the 4th row
x[0:2, 4]  # first two entries of the 5th column
x[6, 2:4]  # 7th row, 3rd and 4th entries. 

# if not all dims specified then missing values are considered complete slices
# these three ways of writing all do the same thing...
x[6]       
x[6,]
x[6,:]

# tricks...
print('last row: ', x[-1,:])     # last row
print('last column: ', x[:,-1])  # last column
print('last entry: ', x[-1,-1])  # last value

# iterating goes over the first dim (rows)
for r in x:
     print(r)
        
# can also iterate over all entries in the array using 'flat'
# will proceed along 1st row, then to 2nd row, etc. 
for a in x.flat:
    print(a)

## pull out subset of rows and columns

In [0]:
# generate a matrix of random numbers over 0-1
x = np.random.rand(4,3) 
print(x)

# first two rows - note that you don't have to specify the 2nd dim - and note that 
# '2' here means rows 0 and 1 (not 0 through 2!)
y = x[:2] 
print('\n', y)

# can also take the last two rows...in the same manner...in this case rows 3 and 4
y = x[2:] 
print('\n', y)

# first two rows, 1st column
y = x[:2,0] 
print('\n', y)

# rows 3 - end, columns 2 - end
y = x[2:,1:]
print('\n', y)


## important - slicing an array creates a view of it! if you change the view, you also will change the original data!


In [0]:
z = x[2,]
print(z.shape)

# change all values in z using [:]
z[:]=100     # so if you change data in z it will also change in x

print(x)

In [0]:
# using logical indexing!
x = np.arange(0,10)
# note that in NumPy we use & for bitwise comparisons instead of and
y = x[(x>3) & (x<7)]
print(y)

## Fancy indexing...using arrays to index arrays - used all the time in data analysis...

<div class="alert alert-info">
fancy indexing always makes a COPY of the data (unlike slicing which creates a view)!!!
</div>

In [0]:
# define an array
x = np.random.rand(3,4)

# index array - can be a tuple
y = (2,3)

# index
print(x)
print('\n x indexed at tuple y: ', x[y])

In [0]:
# can use fancy indexing to extract elements in a particular order
print(x)

# this will extract the 3rd row, then the 2nd row, then the first row
x[[2,1,0]]

# and this will extract all rows from the 2nd, 3rd and then 1st column. 
x[:,[1,2,0]]

In [0]:
# or can pass in multiple arrays...will return a 1D array 
# corresponding to each set of tuples (1,1) and (2,2) in this case
print(x)
x[[1,2],[1,2]]