![data-x](http://oi64.tinypic.com/o858n4.jpg)

---
# Data-X: Introduction to Numpy

**Author:** Alexander Fred Ojala, Ikhlaq Sidhu

**License Agreement:** Feel free to do whatever you want with this code

___

# What is NumPy:  

NumPy stands for **Numerical Python** and it is the fundamental package for scientific computing in Python. It is a package that lets you efficiently store and manipulate numerical arrays. It contains among other things:

* a powerful N-dimensional array object
* sophisticated (broadcasting) functions
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities


# NumPy contains an array object that is "fast"


<img src="https://github.com/ikhlaqsidhu/data-x/raw/master/imgsource/threefundamental.png">


**It stores / consists of**:
* location of a memory block (allocated all at one time)
* a shape (3 x 3 or 1 x 9, etc)
* data type / size of each element

The core feauture that NumPy supports is its multi-dimensional arrays. In NumPy, dimensions are called axes and the number of axes is called a rank.

In [None]:
# written for Python 3.x
import numpy as np

In [None]:
np.__version__


## Creating a NumPy Array: - 
### 1. Simplest possible: We use a list as an argument input in making a NumPy Array


In [None]:
# Create array from Python list
list1 = [1, 2, 3, 4]
data = np.array(list1)
data

In [None]:
# Find out object type
type(data)

In [None]:
# See data type that is stored in the array
data.dtype

In [None]:
data

In [None]:
# The data types are specified for the full array, if we store
# a float in an int array, the float will be up-casted to an int
data[0] = 3.14159
data

In [None]:
# NumPy converts to most logical data type
data2 = np.array([1.2, 2, 3, 4])
print(data2)
print(data2.dtype) # all values will be converted to floats

In [None]:
# the elements of an array must be of a type that is valid to perform
# a specific mathematical operation on

data = np.array([1,2,'cat', 4])
print(data)
print(data.dtype)

In [None]:
# lists can also be much longer
data = np.array(range(1,100001))
print(data)
print()
print(len(data)) # to see the length of the full array

### More info on data types can be found here:
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html

# Accessing elements: Slicing and indexing
<img src='https://qph.fs.quoracdn.net/main-qimg-6400819662432f726e2b29e2dd40b646' style='width:600px'>

In [None]:
# Similar to indexing and slicing Python lists:
data = np.array(range(10))
print(data[:])
print (data[0:3])
print (data[3:])

In [None]:
print (data[::-1]) # [start : end : step_size]

## Arrays are like lists, but different
NumPy utilizes efficient pointers to a location in memory and it will store the full array. Lists on the other hand are pointers to many different objects in memory.

In [None]:
# Slicing returns a view in Numpy, 
# and not a copy as is the case with Python lists
data = np.array(range(10))
np_slice = data[0:3]
print(np_slice)

In [None]:
l = list(range(10))
list_slice = l[0:3]
print(list_slice)

In [None]:
np_slice[0] = 99
list_slice[0] = 99
print(np_slice)
print(list_slice)

In [None]:
print('Python list:',l) # has not changed
print('NumPy array:',data) # has changed

In [None]:
# Creating copies of the array instead of views
data = np.array(range(10))
arr_copy = data[:3].copy()
print('Array copy',arr_copy)

In [None]:
arr_copy[0] = 555
print('Array copy',arr_copy)
print('Original array',data) # now it is not a view any more

# Question - 
Making train and test sets: Create two arrays from array a, one with 2/3 and the other with 1/3 of the elements. 

Note that you don't want to mess up your original data set when you (later) make transformations on the train / test set.

In [None]:
a = np.arange(1000)
np.random.shuffle(a) # inplace

train = a.copy()[:2*len(a)//3]
test = a.copy()[2*len(a)//3:]

#input answer

# Arrays are also a lot faster than lists

In [None]:
# Arrays are faster and more efficient than lists
x = list(range(100000))

# Say that we want to square all elements
y = [i**2 for i in x[0:10]]
print (y)

In [None]:
# Time the operation with some IPython magic command
print('Time for Python lists:')
list_time = %timeit -o -n 20 [i**2 for i in x]

In [None]:
z = np.array(x)
w = z**2
print(w[:10])

In [None]:
print('Time for NumPy arrays:')
np_time = %timeit -o -n 20 z**2

In [None]:
print('NumPy is ' + str(list_time.all_runs[0]//np_time.all_runs[0]) + \
      ' times faster than lists at squaring 100 000 elements.')

# Universal functions
A function that is applied on an `ndarray` in an element-by-element fashion. Several universal functions can be found the NumPy documentation here:
https://docs.scipy.org/doc/numpy-1.13.0/reference/ufuncs.html

In [None]:
# Arrays are different than lists in another way:
# x and y are lists

x = list(range(5))
y = list(range(5,10))
print ("list x = ", x)
print ("list y = ", y)

In [None]:
print ("x + y = ", x+y)

In [None]:
# now lets try with NumPy arrays:
xn = np.array(x)
yn = np.array(y)
print ('np.array xn =', xn)
print ('np.array xn =', yn)

In [None]:
print ("xn + yn = ", xn + yn)

In [None]:
# + for np.arrays is a wrapper around the u. function np.add
np.add(xn,yn)

In [None]:
# An array is a sequence that can be manipulated easily
# An arithmetic operation is applied to each element individually
# When two arrays are added, they must have the same size 
# (otherwise they might be broadcasted)

# python lists
print (3* x)

In [None]:
# Numpy Arrays
print (3 * xn)

# Join, add, concatenate

In [None]:
print(xn)
print(yn)

In [None]:
# if you need to join numpy arrays, 
# try hstack, vstack, column_stack, or concatenate
np.hstack([xn, yn])

In [None]:
np.vstack([xn, yn])

In [None]:
np.column_stack([xn, yn])

In [None]:
np.concatenate([xn, yn],axis=0)

### Creating arrays with 2 axis:


In [None]:
# This list has two dimensions
list3 = [[1, 2, 3],
         [4, 5, 6]]
list3 # nested list

In [None]:
# data = np.array([[1, 2, 3], [4, 5, 6]])
data = np.array(list3)
data

### Attributes of a multidim array

In [None]:
print('Dimensions:',data.ndim)
print ('Shape:',data.shape)
print('Size:', data.size)

In [None]:
np.transpose(data)

# Other ways to create NumPy arrays

In [None]:
# np.arange() is similar to built in range()
# Creates array with a range of consecutive numbers
# starts at 0 and step=1 if not specified. Exclusive of stop.

print(np.arange(12))

In [None]:
# Array increasing from start to end by step: np.arange(start, end, step)
# The range always includes start but excludes end
print(np.arange(1, 10, 2))

In [None]:
# Returns a new array of specified size, filled with zeros.
print(np.zeros((2,5), dtype=np.int8))

In [None]:
#Returns a new array of specified size, filled with ones.
print(np.ones((4,2), dtype=np.float128))

In [None]:
# Returns the identity matrix of specific squared size
np.eye(5)

## Some useful indexing strategies

### There are two main types of indexing: Integer and Boolean

In [None]:
x = np.array([[1, 2], [3, 4], [5, 6]]) 
x

In [None]:
## Integer indexing
# first element is  the row, 2nd element is the column
x[1,0]

In [None]:
x[1:,:] # all rows after first, all columns

In [None]:
## Boolean indexing
print('Comparison operator, find all values greater than 3:\n')
x>3

In [None]:
print('Boolean indexing, only extract elements greater than 3:\n')
print(x[x>3])

## Extra NumPy array methods

In [None]:
# Reshape is used to change the shape
a = np.arange(0, 15)
a

In [None]:
a = a.reshape(3, 5)
print ('Reshaped:')
print(a)

In [None]:
# We can also flatten matrices using ravel()
x = np.array(range(24))
x = x.reshape(4,6)
print('Original:\n',x)
print()
x = x.ravel() # make it flat
print ('Flattened:\n',x)

In [None]:
# We can also easily find the sum, min, max, .. are easy
print (a)
print()
print ('Sum:',a.sum())
print('Min:', a.min())
print('Max:', a.max())

In [None]:
print ('Column sum:',a.sum(axis=0))
print ('Row sum:',a.sum(axis=1))

# Note here axis specifies what dimension to "collapse"

In [None]:
# To get the cumulative product:
print (np.arange(1, 10))
print (np.cumprod(np.arange(1, 10)))

In [None]:
# To get the cumulative sum:
print (np.arange(1, 10))
print(np.cumsum((np.arange(1, 10))))

In [None]:
# Creating a 3D array:
a = np.arange(0, 96).reshape(2, 6, 8)
print(a)

In [None]:
# The same methods typically apply in multiple dimensions
print (a.sum(axis = 0))
print ('---')
print (a.sum(axis = 1))

## Arrray Axis
<img src= "https://github.com/ikhlaqsidhu/data-x/raw/master/imgsource/anatomyarray.png">



---
# More ufuncs and Basic Operations

One of the coolest parts of NumPy is the ability for you to run mathematical operations on top of arrays. Here are some basic operations:

In [None]:
a = np.arange(11, 21)
b = np.arange(0, 10)
print ("a = ",a)
print ("b = ",b)

In [None]:
print (a + b)
print (a * b) # Hadamard product
print (a ** 2)

In [None]:
# Dot product / matrix multiplications
print (a.dot(b))

In [None]:
print ('Matrix multiplication')
c = np.arange(1,5).reshape(2,2)
print ("c = \n", c)
print()
d = np.arange(5,9).reshape(2,2)
print ("d = \n", d)
print()
print (np.matmul(c,d)) # not commutative

# Random numbers

In [None]:
# Radom numbers
np.random.seed(1337)  # set the seed to zero for reproducibility
print(np.random.uniform(1,5,10))   # 10 random uniform numbers 1 to 5

In [None]:
print (np.random.exponential(1,5)) # 5 random exp numbers with rate 1

In [None]:
print (np.random.randn(8).reshape(2,4)) #8 random numbers from normal distr, 2 x 4 array

### If you want to learn more about "random" numbers in NumPy go to: https://docs.scipy.org/doc/numpy-1.12.0/reference/routines.random.html

# Trignometric functions

In [None]:
# linspace: Create an array of numbers from a to b 
# with n equally spaced numbers (inclusive)

data = np.linspace(0,10,5)
print (data)

In [None]:
x = np.linspace(0,np.pi, 3)
print('x = ', x)
print()
print ("sin(x) = ", np.sin(x))

In [None]:
x = np.linspace(0,4*np.pi,1000)
y = np.sin(x)-np.cos(x)**2

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(x,y)

# Prediction Example (Ordinary Least Square)

In [None]:
# Generate data w linear trend
x = np.linspace(0,10,50)
Y = 5 + x*2 + np.random.normal(0,2.5,len(x))

In [None]:
plt.scatter(x,Y);

In [None]:
X = np.column_stack([np.ones(len(x)),x])

In [None]:
X[:4,:]

Remeber:
$Y \in R^{50x1}, X \in R^{50x2}, W \in R^{2x1}$
$$\hat{Y} = XW$$
$$ SE = (Y-XW)^2$$
$$ \nabla_W SE = 2X^T(Y-XW) = 0$$
$$W_{ols} = (X^TX)^{-1}X^TY$$

In [None]:
W = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(Y)

In [None]:
W

In [None]:
plt.scatter(x,Y);
plt.plot(x,W[0]+W[1]*x,c='red');