![data-x](http://oi64.tinypic.com/o858n4.jpg)

---
# NumPy Data X 

**Author:** Alexander Fred-Ojala & Ikhlaq Sidhu

**License Agreement:** Feel free to do whatever you want with this code

___

# Introduction to NumPy

# What is NumPy:  

NumPy stands for Numerical Python and it is the fundamental package for scientific computing with Python. It is a package that lets you efficiently store and manipulate numerical arrays. It contains among other things:

* a powerful N-dimensional array object
* sophisticated (broadcasting) functions
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities


# NumPy contains an array object that is "fast"


<img src="https://github.com/ikhlaqsidhu/data-x/raw/master/imgsource/threefundamental.png">


It stores:
* location of a memory block (allocated all at one time)
* a shape (3 x 3 or 1 x 9, etc)
* data type / size of each element

The core feauture that NumPy supports is its multi-dimensional arrays. In NumPy, dimensions are called axes and the number of axes is called a rank.

In [1]:
# written for Python 3.6
import numpy as np

In [2]:
np.__version__ # made for v. 1.13.3

'1.12.1'


## Creating a NumPy Array: - 
### 1. Simplest possible: We use a list as an argument input in making a NumPy Array


In [3]:
# Create array from Python list
list1 = [1, 2, 3, 4]
data = np.array(list1)
data

array([1, 2, 3, 4])

In [4]:
# Find out object type
type(data)

numpy.ndarray

In [5]:
# See data type that is stored in the array
data.dtype

dtype('int64')

In [6]:
# The data types are specified for the full array, if we store
# a float in an int array, the float will be up-casted to an int
data[0] = 3.14159
print(data)

[3 2 3 4]


In [7]:
# NumPy converts to most logical data type
list2 = [1.2, 2, 3, 4]
data2 = np.array(list2)
print(data2)
print(data2.dtype) # all values will be converted to floats if we have one

[ 1.2  2.   3.   4. ]
float64


In [8]:
# We can manually specify the datatype
list3 = [1, 2, 3]
data3 = np.array(list3, dtype=str) #manually specify data type
print(data3)
print(data3.dtype)

['1' '2' '3']
<U1


In [9]:
# lists can also be much longer
list4 = range(100001)
data = np.array(list4)
data

array([     0,      1,      2, ...,  99998,  99999, 100000])

In [10]:
len(data) # to see the length of the full array

100001

In [12]:
# data = np.array(1,2,3,4, 5,6,7,8,9) # wrong
data = np.array([1,2,3,4,5,6,7,8,9]) # right
data

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [13]:
# see documentation, the first keyword is the object to be passed in
np.array?

More info on data types can be found here:
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html

# Accessing elements: Slicing and indexing

In [15]:
# Similar to indexing and slicing Python lists:
print(data[:])
print (data[0:3])
print (data[3:])
print (data[::-2])

[1 2 3 4 5 6 7 8 9]
[1 2 3]
[4 5 6 7 8 9]
[9 7 5 3 1]


In [17]:
# more slicing
x = np.array(range(25))
print ('x:',x)
print()
print (x[5:15:2])
print (x[15:5:-1])

x: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]

[ 5  7  9 11 13]
[15 14 13 12 11 10  9  8  7  6]


## Arrays are like lists, but different
NumPy utilizes efficient pointers to a location in memory and it will store the full array in memory. Lists on the other hand are pointers to many different objects in memory.

In [18]:
# Slicing returns a view in Numpy, 
# and not a copy as is the case with Python lists


data = np.array(range(10))
view = data[0:3]
view

array([0, 1, 2])

In [20]:
l = list(range(10))
copy = l[0:3]
copy

[0, 1, 2]

In [21]:
copy[0] = 99
view[0] = 99
print(copy)
print(view)

[99, 1, 2]
[99  1  2]


In [22]:
print('Python list:',l) # has not changed
print('NumPy array:',data) # has changed

Python list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
NumPy array: [99  1  2  3  4  5  6  7  8  9]


In [23]:
# Creating copies of the array instead of views
arr_copy = data[:3].copy() 
print('Array copy',arr_copy)
arr_copy[0] = 555
print('New array copy',arr_copy)
print('Original array',data) # now it is not a view any more

Array copy [99  1  2]
New array copy [555   1   2]
Original array [99  1  2  3  4  5  6  7  8  9]


In [25]:
# same thing with assignment, its not a copy, its the same data
x = np.array(range(25))
print (x)
y = x
y[:] = 0
print (x)
x is y

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]


True

# Arrays are a lot faster than lists

In [26]:
# Arrays are faster and more efficient than lists

x = list(range(100000))
y = [i**2 for i in x]
print (y[0:5])

[0, 1, 4, 9, 16]


In [27]:
# Time the operation with some IPython magic command
print('Time for Python lists:')
list_time = %timeit -o -n 20 [i**2 for i in x]

Time for Python lists:
20.2 ms ± 636 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)


In [28]:
z = np.array(x)
w = z**2
print(w[:5])

[ 0  1  4  9 16]


In [29]:
print('Time for NumPy arrays:')
np_time = %timeit -o -n 20 z**2

Time for NumPy arrays:
70.8 µs ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 20 loops each)


In [30]:
print('NumPy is ' + str(list_time.all_runs[0]//np_time.all_runs[0]) + ' times faster than lists at squaring 100 000 elements.')

NumPy is 208.0 times faster than lists at squaring 100 000 elements.


# Universal functions
A function that is applied on an `ndarray` in an element-by-element fashion. Several universal functions can be found the NumPy documentation here:
https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html

In [31]:
# Arrays are different than lists in another way:
# x and y are lists

x = list(range(5))
y = list(range(5,10))
print ("list x = ", x)
print ("list y = ", y)
print ("x + y = ", x+y)

list x =  [0, 1, 2, 3, 4]
list y =  [5, 6, 7, 8, 9]
x + y =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [32]:
# now lets try with NumPy arrays:
xn = np.array(x)
yn = np.array(y)
print ('np.array xn =', xn)
print ('np.array xn =', yn)
print ("xn + yn = ", xn + yn)

np.array xn = [0 1 2 3 4]
np.array xn = [5 6 7 8 9]
xn + yn =  [ 5  7  9 11 13]


In [33]:
# + for np.arrays is a wrapper around the function np.add
np.add(xn,yn)

array([ 5,  7,  9, 11, 13])

In [34]:
# An array is a sequence that can be manipulated easily
# An arithmatic operation is applied to each element individually
# When two arrays are added, they must have the same size 
# (otherwise they might be broadcasted)

print (3* x)
print (3 * xn)

[0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
[ 0  3  6  9 12]


# Join, add, concatenate

In [35]:
print(xn)
print(yn)

[0 1 2 3 4]
[5 6 7 8 9]


In [36]:
# if you need to join numpy arrays, try hstack, vstack, column_stack, or concatenate
print (np.hstack((xn,yn)))

[0 1 2 3 4 5 6 7 8 9]


In [37]:
print (np.vstack((xn,yn)))

[[0 1 2 3 4]
 [5 6 7 8 9]]


In [38]:
print (np.column_stack((xn,yn)))

[[0 5]
 [1 6]
 [2 7]
 [3 8]
 [4 9]]


In [41]:
print (np.concatenate((xn, yn), axis = 0))

[0 1 2 3 4 5 6 7 8 9]


In [42]:
# the elements of an array must be of a type that is valid to perform
# a specific mathematical operation on

data = np.array([1,2,'cat', 4])
print(data)
print(data.dtype)
print (data+1)  # results in error

['1' '2' 'cat' '4']
<U21


TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U21') dtype('<U21') dtype('<U21')

### Creating arrays with 2 axis:


In [43]:
# This list has two dimensions
list3 = [[1, 2, 3],
         [4, 5, 6]]
list3 # nested list

[[1, 2, 3], [4, 5, 6]]

In [44]:
# data = np.array([[1, 2, 3], [4, 5, 6]])
data = np.array(list3)
data

array([[1, 2, 3],
       [4, 5, 6]])

# Attributes of a multidim array

In [45]:
print('Dimensions:',data.ndim)
print ('Shape:',data.shape)
print('Size:', data.size)

Dimensions: 2
Shape: (2, 3)
Size: 6


In [46]:
# You can also transpose an array Matrix with either np.transpose(arr)
# or arr.T
print ('Transpose:')
data.T

# print (list3.T) # note, this would not work

Transpose:


array([[1, 4],
       [2, 5],
       [3, 6]])

# Other ways to create NumPy arrays

In [47]:
# np.arange() is similar to built in range()
# Creates array with a range of consecutive numbers
# starts at 0 and step=1 if not specified. Exclusive of stop.

np.arange(12)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [48]:
#Array increasing from start to end: np.arange(start, end)
np.arange(10, 20)

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [49]:
# Array increasing from start to end by step: np.arange(start, end, step)
# The range always includes start but excludes end
np.arange(1, 10, 2)

array([1, 3, 5, 7, 9])

In [50]:
# Returns a new array of specified size, filled with zeros.
array=np.zeros((2,5), dtype=np.int8)
array

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]], dtype=int8)

In [51]:
#Returns a new array of specified size, filled with ones.
array=np.ones((2,5), dtype=np.float128)
array

array([[ 1.0,  1.0,  1.0,  1.0,  1.0],
       [ 1.0,  1.0,  1.0,  1.0,  1.0]], dtype=float128)

In [52]:
# Returns the identity matrix of specific squared size
array = np.eye(5)
array

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])

## Some useful indexing strategies

### There are two main types of indexing: Integer and Boolean

In [53]:
x = np.array([[1, 2], [3, 4], [5, 6]]) 
x

array([[1, 2],
       [3, 4],
       [5, 6]])

#### Integer indexing

In [54]:
# first element is  the row, 2nd element is the column
print(x[1,0])

3


In [55]:
print(x[1:,:]) # all rows after first, all columns

[[3 4]
 [5 6]]


In [56]:
# first list contains  row indices, 2nd element contains column indices
idx = x[[0,1,2], [0,1,1]]  # create index object
print (idx)


[1 4 6]


### Boolean indexing

In [57]:
print('Comparison operator, find all values greater than 3:\n')
print(x>3)

Comparison operator, find all values greater than 3:

[[False False]
 [False  True]
 [ True  True]]


In [58]:
print('Boolean indexing, only extract elements greater than 3:\n')
print(x[x>3])

Boolean indexing, only extract elements greater than 3:

[4 5 6]


### Masks

In [60]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [59]:
arr = np.arange(10)
mask = arr>5
print(mask)
arr[mask]

[False False False False False False  True  True  True  True]


array([6, 7, 8, 9])

In [61]:
# Functions any / all
print( np.any( arr==9 ) )
print( np.all( arr>-1 ) )

True
True


## Extra NumPy array methods

In [62]:
# Reshape is used to change the shape
a = np.arange(0, 15)

print('Original:',a)
a = a.reshape(3, 5)
# a = np.arange(0, 15).reshape(3, 5)  # same thing

print ('Reshaped:')
print(a)


Original: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
Reshaped:
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


In [64]:
# We can also easily find the sum, min, max, .. are easy
print (a)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


In [65]:
print ('Sum:',a.sum())
print('Min:', a.min())
print('Max:', a.max())

Sum: 105
Min: 0
Max: 14


In [66]:
print ('Sum along columns:',a.sum(axis=0))
print ('Sum along rows:',a.sum(axis=1))

# Note here axis specifies what dimension to "collapse"

Sum along columns: [15 18 21 24 27]
Sum along rows: [10 35 60]


## Arrray Axis
<img src= "https://github.com/ikhlaqsidhu/data-x/raw/master/imgsource/anatomyarray.png">



To get the cumulative product:

In [67]:
print (np.arange(1, 10))
print (np.cumprod(np.arange(1, 10)))

[1 2 3 4 5 6 7 8 9]
[     1      2      6     24    120    720   5040  40320 362880]


To get the cumulative sum:

In [68]:
print (np.arange(1, 10))
np.cumsum((np.arange(1, 10)))

[1 2 3 4 5 6 7 8 9]


array([ 1,  3,  6, 10, 15, 21, 28, 36, 45])

Creating a 3D array:

In [69]:
a = np.arange(0, 96).reshape(2, 6, 8)
print(a)

[[[ 0  1  2  3  4  5  6  7]
  [ 8  9 10 11 12 13 14 15]
  [16 17 18 19 20 21 22 23]
  [24 25 26 27 28 29 30 31]
  [32 33 34 35 36 37 38 39]
  [40 41 42 43 44 45 46 47]]

 [[48 49 50 51 52 53 54 55]
  [56 57 58 59 60 61 62 63]
  [64 65 66 67 68 69 70 71]
  [72 73 74 75 76 77 78 79]
  [80 81 82 83 84 85 86 87]
  [88 89 90 91 92 93 94 95]]]


In [70]:
# The same methods typically apply in multiple dimensions
print (a.sum(axis = 0))
print ('---')
print (a.sum(axis = 1))

[[ 48  50  52  54  56  58  60  62]
 [ 64  66  68  70  72  74  76  78]
 [ 80  82  84  86  88  90  92  94]
 [ 96  98 100 102 104 106 108 110]
 [112 114 116 118 120 122 124 126]
 [128 130 132 134 136 138 140 142]]
---
[[120 126 132 138 144 150 156 162]
 [408 414 420 426 432 438 444 450]]


# More ufuncs and Basic Operations

One of the coolest parts of NumPy is the ability for you to run mathematical operations on top of arrays. Here are some basic operations:

In [71]:
a = np.arange(11, 21)
b = np.arange(0, 10)
print ("a = ",a)
print ("b = ",b)
print (a + b)

a =  [11 12 13 14 15 16 17 18 19 20]
b =  [0 1 2 3 4 5 6 7 8 9]
[11 13 15 17 19 21 23 25 27 29]


In [72]:
a * b

array([  0,  12,  26,  42,  60,  80, 102, 126, 152, 180])

In [73]:
a ** 2

array([121, 144, 169, 196, 225, 256, 289, 324, 361, 400])

You can even do things like matrix operations

In [74]:
a.dot(b)

780

In [75]:
# Matrix multiplication
c = np.arange(1,5).reshape(2,2)
print ("c = \n", c)
print()
d = np.arange(5,9).reshape(2,2)
print ("d = \n", d)

c = 
 [[1 2]
 [3 4]]

d = 
 [[5 6]
 [7 8]]


In [76]:
print (d.dot(c))

[[23 34]
 [31 46]]


In [77]:
np.matmul(d,c)

array([[23, 34],
       [31, 46]])

# Random numbers

In [78]:
# Radom numbers
np.random.seed(0)  # set the seed to zero for reproducibility
print(np.random.uniform(1,5,10))   # 10 random uniform numbers from 1 to 5
print()
print (np.random.exponential(1,5)) # 5 random exp numbers with rate 1

[ 3.19525402  3.86075747  3.4110535   3.17953273  2.6946192   3.58357645
  2.75034885  4.567092    4.85465104  2.53376608]

[ 1.56889614  0.75267411  0.83943285  2.59825415  0.07368535]


In [79]:
print (np.random.random(8).reshape(2,4)) #8 random 0-1 in a 2 x 4 array

[[ 0.0871293   0.0202184   0.83261985  0.77815675]
 [ 0.87001215  0.97861834  0.79915856  0.46147936]]


If you want to learn more about "random" numbers in NumPy go to: https://docs.scipy.org/doc/numpy-1.12.0/reference/routines.random.html

# Trignometric functions

In [81]:
# linspace: Create an array with numbers from a to b 
# with n equally spaced numbers (inclusive)

data = np.linspace(0,10,5)
print (data)

[  0.    2.5   5.    7.5  10. ]


In [82]:
from numpy import pi
x = np.linspace(0,pi, 3)
print('x = ', x)
print()
print ("sin(x) = ", np.sin(x))

x =  [ 0.          1.57079633  3.14159265]

sin(x) =  [  0.00000000e+00   1.00000000e+00   1.22464680e-16]


In [83]:
# flatten matrices using ravel()
x = np.array(range(24))
x = x.reshape(4,6)
print('Original:\n',x)
print()
x = x.ravel() # make it flat
print ('Flattened:\n',x)

Original:
 [[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]

Flattened:
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
