Important Python Libraries
==

Python, by default, only has access to a small number of built-in types and functions. The vast majority of functions are located in modules, and before a function can be accessed, the module which contains the
function must be imported. 

For example, when using ipython --pylab (or any variants), a large number of modules are automatically imported, including NumPy and matplotlib. This is style of importing useful for learning and interactive use, but care is needed to make sure that the correct module is imported when
designing more complex programs.

- NumPy
NumPy provides a set of array and matrix data types that are essential for statistics data analysis. 
- SciPy
SciPy contains a large number of routines needed for analysis of data. The most important include a wide range of random number generator, linear algebra routines and optimizers. SciPy depnds on NumPy.
- Matplotlib or Seaborn
Matplotlib provides a plotting environmetn for 2D plots, with limited support for 3D plotting. Seaborn is a Python package that improves the default appearance of Matplotlib plots without any additional code.
- Pandas
Pandas provides high-performance data structures.


# Magic functions in IPython

Magic function make tasks such as navigating the local file systems (using %cd) or running other Python programs (using run program.py) simple. 

Entering %magic inside an IPython session will produce a detailed description of the available functions. Alternatively, %lsmagic produces a succint list of available magic commands. 

The most useful magic functions are:
- cd: change director
- edit *filename*: launch an editor to edit *filename*
- ls: list the contents of a directory
- run *filename*: run an Python program
- timeit: time the execution of a piece of code or funtion

# Arrays and Matrices

NumPy provides two important data types - arrays and matrices. The difference between these two data types are:

- Arrays can have 1, 2, 3 or more dimensions, and matrices always have 2 dimensions. This means that a 1 by n vector stored as an array has 1 dimension and n elements, while the same vector stored as a matrix has 2-dimensions where the sizes of the dimensions are 1 and n.
- Standard mathematical operators on arrays operate element-by-element. This is not the case for matrices, where multiplicaiton (\*) follows the rules of linear algebra. Using array, matrice mutiplicaiton is performed by numpy.dot(). 
- Arrays are more common than matrices, and all functions are thoroughly tested with arrays. With matrices, sometimes libraries would give you strange results.
- Arrays can be quickly treated as a matrix using either *asmatrix* or *mat* without copying the underlying data.

The best practice is to use arrays and to use the *asmatrix* view when writing linear algebra-heavy code. It is also important to test any custom function with both arrays and matrices to ensure that false assumptions about the behaviour of multiplication have not been made.


## Arrays

Arrays are the base data type in NumPy. In some ways arrays are very similar to lists - they both contain collections of elements.

Arrays, unlike lists, are always rectangular, so that all rows have the same number of elements.



In [1]:
import numpy as np

x = [0.0, 1, 2, 3, 4]
y = np.array(x)
print(y, type(y))

# Two (or higher) -dimentional arrays are initialized using nested lists
y = np.array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
print(y)

# Find out the array-size
print(y.shape)

y = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
print(y.shape)
y

[ 0.  1.  2.  3.  4.] <class 'numpy.ndarray'>
[[0 1 2 3 4]
 [5 6 7 8 9]]
(2, 5)
(2, 2, 2)


array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

## 1.1 Array dtypes

Homogeneous arrays can contain a variety of numeric data types. The most useful is ’float64’, which corresponds to the python built-in data type of float (and C/C++ double). By default, calls to array will preserve the type of the input, if possible. If an input contains all integers, it will have a dtype of ’int32’ (similar to the built in data type int ). If an input contains integers, floats, or a mix of the two, the array’s data type will be float64 . If the input contains a mix of integers, floats and complex types, the array will be initialized to hold complex data.

In [11]:
x = [0, 1, 2, 3, 4] # Integers
y = np.array(x)
print(y.dtype)

int64


In [5]:
x = [0.0, 1, 2, 3, 4] # 0.0 is a float
y = np.array(x)
y.dtype


x = [0.0 + 1j, 1, 2, 3, 4] # (0.0 + 1j) is a complex
y = np.array(x)
print(y)

""" 
NumPy attempts to find the smallest data type which can represent the data when constructing an array.
It is possible to force NumPy to select a particular dtype by using the keyword argument dtype= datetype
when initializing the array .
"""

x = [0, 1, 2, 3, 4] # Integers
y = np.array(x)
print(y.dtype)

y = np.array(x, dtype="float64") # String dtype
print(y.dtype)

y = np.array(x, dtype=np.float64) # NumPy type dtype
print(y.dtype)

[ 0.+1.j  1.+0.j  2.+0.j  3.+0.j  4.+0.j]
int64
float64
float64


## 2 Matrix

Matrices are essentially a subset of arrays, and behave in a virtually identical manner. The two differences are:

- Matrices always have 2 dimentions
- Matrices follow the rules of linear algebra \*

There are two ways to transform an 1- and 2-dimentional array into a matrix:

- Calling *matrix* on an array creates a copy of matrix
- Calling *mat* or *asmatrix* creates a view of matrix (a faster method without copying any data).

In [8]:
x = [0.0, 1, 2, 3, 4] # Any float makes all float
y = array(x)
print type(y)

print y*y # element-by-element

z = asmatrix(x)
print type(z)

print z*z # error

<type 'numpy.ndarray'>
[  0.   1.   4.   9.  16.]
<class 'numpy.matrixlib.defmatrix.matrix'>


ValueError: matrices are not aligned

# 3. Vectors and Matrices

## 3.1 Row vectors (1-dimensional arrays)

The following examples show how to create a vector using 2-dimentional arrays and matrix.

$$
x =
  \begin{bmatrix}
    1 & 2 & 3 & 4 & 5 \\
  \end{bmatrix}
$$

In [11]:
x = array([1.0, 2, 3, 4, 5])
print ndim(x) # the dimension of the vector x

# if an array with 2-dimensions is required, 
# it is necessary to use a trivial ensted list.
x = array([[1.0, 2, 3, 4, 5]])
print x
print ndim(x)

# A matrix is always 2-dimentional and so a nested lsit is not required.
x = matrix([1.0, 2, 3, 4, 5])
print x, type(x)
print ndim(x)

1
[[ 1.  2.  3.  4.  5.]]
2
[[ 1.  2.  3.  4.  5.]] <class 'numpy.matrixlib.defmatrix.matrix'>
2


Notice that the output matrix representation uses nested lists [[ ]] to emphasize the 2-dimentional structure of all matrices.

## 3.2 Column vectors (2-dimensional arrays)

The column vector is entered as a matrix or 2-dimensional arrays using a set of nested lists.

$$
x =
  \begin{bmatrix}
    1 \\ 2 \\ 3 \\ 4 \\ 5 \\
  \end{bmatrix}
$$

In [13]:
x = array([[1.0],[2],[3],[4],[5]])
print x

x = matrix([[1.0], [2], [3], [4], [5]])
print x

x = array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
print x

[[ 1.]
 [ 2.]
 [ 3.]
 [ 4.]
 [ 5.]]
[[ 1.]
 [ 2.]
 [ 3.]
 [ 4.]
 [ 5.]]
[[1 2 3]
 [4 5 6]
 [7 8 9]]


## 3.3 Matrix (2-dimensional arrays)

Matrices and 2-dimensional arrays are rows of columns, and so:

$$
x =
  \begin{bmatrix}
    2 & 1 & 1 \\
    1 & 3 & 2 \\
    1 & 0 & 0
  \end{bmatrix}
$$

is input by enter the matrix one row at a time, each in a list, and then encapsulate the row lists in another list.

In [19]:
x = array([[2, 1, 1], [1, 3, 2], [1, 0, 0.0]])
print x

x = matrix([
            [2, 1, 1], 
            [1, 3, 2], 
            [1, 0, 0.0]
           ])
print x

y = zeros([2, 2])
print y, type(y)

z = empty([3, 3])
print z, type(z)

y = identity(5)
print y, type(y)

[[ 2.  1.  1.]
 [ 1.  3.  2.]
 [ 1.  0.  0.]]
[[ 2.  1.  1.]
 [ 1.  3.  2.]
 [ 1.  0.  0.]]
[[ 0.  0.]
 [ 0.  0.]] <type 'numpy.ndarray'>
[[  6.91756912e-310   2.21148210e-316   6.91754718e-310]
 [  6.91754724e-310   6.91754723e-310   6.91754721e-310]
 [  0.00000000e+000   0.00000000e+000   3.95252517e-322]] <type 'numpy.ndarray'>
[[ 1.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]] <type 'numpy.ndarray'>


## 3.4 Concatenation

Concatenation is the process by which one vector or matrix is appended to another.

Arrays and matrices can be concatenated horizontally or vertically.


In [22]:
# Method 1
  # Vertically (axis=0)

x = array([[1.0, 2], 
           [3, 4]])

y = array([[5.0, 6], 
           [7, 8]])

z = concatenate([x, y], axis = 0) 
#print z

  # Horizontally (axis=1)
z = concatenate((x, y), axis = 1) 
print z


# Method 2
z = vstack((x, y))
print z
z = hstack((x, y))
print z

[[ 1.  2.  5.  6.]
 [ 3.  4.  7.  8.]]
[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]
 [ 7.  8.]]
[[ 1.  2.  5.  6.]
 [ 3.  4.  7.  8.]]


# 4. Accessing elements of an array

Four methods are available for accessing elements contained within an array:
- scalar selection
- slicing
- numerical indexing
- logical (or Boolean) indexing

## 4.1 Scalar selection

Pure scalar selection is the simplest method to select elements from an array, and is implemented using [i] for 1-dimensional arrays, [i, j] for 2-dimensional arrays.

In [2]:
x = array([1.0,2.0,3.0,4.0,5.0])
print x[0]

x = array([[1.0,2.0,3.0,4.0,5.0]])
print x[0, 2]

print type(x[0, 2])

I = identity(4)
print I[2, 2], type(I[2, 2])

1.0
3.0
<type 'numpy.float64'>
1.0 <type 'numpy.float64'>


In [45]:
# You can alway assign a value to an array
x[0, 2] = -9

print x

[[ 1.  2. -9.  4.  5.]]


## 4.2 Array slicing

Arrays, like lists and tuples, can be sliced. Arrays slicing is virtually identical to lists slicing except that a simpler slicing syntax is available, since arrays are explicitly multidimensional and rectangular. 

Arrays are sliced using the syntax [:,:, . . . ,:] (where the number of dimensions of the arrays determines the size of the slice). Recall that the slice notation a : b : s will select every $s$th element where the indices $i$ satisfy $a ≤ i < b$ so that the starting value $a$ is always included in the list and the ending value $b$ is always excluded.

Additionally, a number of shorthand notations are commonly encountered:
- : and :: are the same as 0:n:1 where n is the length of the array (or list).
- $a$: and $a$:$n$ are the same as a:n:1 where n is the length of the array (or list).
- :$b$ is the same as 0:$b$:$1$ .
- ::$s$ is the same as $0$:$n$:$s$ where $n$ is the length of the array (or list).


In [5]:
x = array([[1.0,2,3],
          [4.0,5,6],
          [7, 8, 9]])

# Pull everything from matrix
print x[:,:]

# Pull the second row from matrix
print x[1:2,:]

# Pull the top-right 2X2 square matrix
print x[0:2, 1:3]

# Pull the bottom-right 2X2 square matrix
print x[1:3, 1:3]


[[ 1.  2.  3.]
 [ 4.  5.  6.]
 [ 7.  8.  9.]]
[[ 4.  5.  6.]]
[[ 2.  3.]
 [ 5.  6.]]
[[ 5.  6.]
 [ 8.  9.]]


In 2-dimensional arrays, the first dimension specifies the row or rows of the slice and the second dimension specifies the the column or columns. 

Note that the 2-dimensional slice syntax y[a:b,c:d] is the
same as $y[a:b,:][:,c:d]$ or $y[a:b][:,c:d]$, although clearly the shorter form is preferred. In the case where only row slicing in needed $y[a:b]$, which is the equivalent to $y[a:b,:]$, is the shortest syntax.

In [48]:
y = array([[0.0, 1, 2, 3, 4],[5, 6, 7, 8, 9]])
print y

y[:1, :] # row 0, all columns
y[:1] # Same as before

y[:, :1] # all rows, column 0

y[:1, 0:3] # row 0, column 1, 2, 3
y[:1][:, 0:3] # same as before

y[:, 3:] # all rows, column 4 to last


[[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]]


array([[ 3.,  4.],
       [ 8.,  9.]])

### 4.2.1 Mixed selection using scalar and slice selections

When arrays have more than 1-dimension, it is often useful to mix scalar and slice selectors to select an entire row, column or panel of a 3-dimensional array. This is similar to pure slicing with one important
caveat – dimensions selected using scalar selectors are eliminated. 

For example, if x is a 2-dimensional array, then x[0,:] will select the first row. However, unlike the 2-dimensional array constructed using the
slice x[:1,:] , x[0,:] will be a 1-dimensional array.

In [9]:
x = array([[1.0, 2], [3, 4]])

# Use slice syntax
print x[:1, :] # row 1, select all columns, 2-dimensional
print shape(x[:1, :]), ndim(x[:1, :])

# Use mixed syntax
print x[0, :] # same, but 1-dimensional

print shape(x[0, :]), ndim(x[0, :])

 [[ 1.  2.]]
(1, 2) 2
[ 1.  2.]
(2,) 1


While these two selections appear similar, the first produces a 2-dimensional array (note the [[ ]] syntax) while the second is a 1-dimensional array. In most cases where a single row or column is required, using scalar selectors such as y[0,:] is the best practice. It is important to be aware of the dimension reduction since scalar selections from a 2-dimensional arrays will no longer have 2-dimensions.
This type of dimension reduction may matter when evaluating linear algebra expression.

The principle adopted by NumPy is that slicing should always preserve the dimension of the underlying array, while scalar indexing should always collapse the dimension(s). This is consistent with x[0,0]
returning a scalar (or 0-dimensional array) since both selections are scalar. This is demonstrated in the next example which highlights the differences between pure slicing, mixed slicing and pure scalar selection. 

In [57]:
x = array([[0.0, 1, 2, 3, 4],[5, 6, 7, 8, 9]])
print x

y2 = x[:1, :] 
print y2, ndim(y2)

y1 = x[0, :]
print y1, ndim(y1)

z = x[0, 0] # top left element, dim reduction to scalar (0-d array)
print z, ndim(z)

c = x[:, 0] # All rows, 1 column, dim reduction to 1-d array
print c, ndim(c)

[[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]]
[[ 0.  1.  2.  3.  4.]] 2
[ 0.  1.  2.  3.  4.] 1
0.0 0
[ 0.  5.] 1


### 4.2.2 Assignment using slicing

Slicing and scalar selection can be used to assign arrays that have the same dimension as the slice.

In [14]:
x = array([[0.0]*3]*3)  # Square 3X3 matrix of zeros
print x

x[0, :] = array([1.0, 2.0, 3.0])
print x

x[::2, ::2] = array([[-99.0, -99], [-99, -99]])
print x

x[1,1] = pi
print x

[[ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]
[[ 1.  2.  3.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]]
[[-99.   2. -99.]
 [  0.   0.   0.]
 [-99.   0. -99.]]
[[-99.           2.         -99.        ]
 [  0.           3.14159265   0.        ]
 [-99.           0.         -99.        ]]


In [16]:
z = array([[1, 2, 3]]*3)
print z

[[1 2 3]
 [1 2 3]
 [1 2 3]]


NumPy attempts to automatic (silent) data type conversion if an element with one data type is inserted into an array with a different data type. 

For example, if an array has an integer data type, place a float into
the array results in the float being truncated and stored as an integer. This is dangerous, and so in most cases, arrays should be initialized to contain floats unless a considered decision is taken to use a different
data type.

In [8]:
x = [0, 1, 2, 3, 4] # Integers
y = array(x)
print y.dtype

y[0] = pi
print y

int64
[3 1 2 3 4]


## 4.3 Linear slicing using flat

Data in matrices is stored in row-major order - elements are indexed by first counting across rows and then down columns.  For example, in the matrix below, the elements are:

$$
A =
  \begin{bmatrix}
    1th & 2th & 3th \\
    4th & 5th & 6th \\
    7th & 8th & 9th
  \end{bmatrix}
$$

In addition to slicing using the [:,:, . . . ,:] syntax, k-dimensional arrays can be linear sliced. Linear slicing assigns an index to each element of the array, starting with the first (0), the second (1), and so on until the final element (n − 1). 

In 2-dimensions, linear slicing works by first counting across rows, and
then down columns. To use linear slicing, the method or function flat must first be used.

In [5]:
# arange() is better than range() because it supports float
y = reshape(arange(0.0, 25), (5, 5))
print y

print y[0]

print y.flat[0] # scalar slice, flat is 1-dimensinoal
print y.flat[6] 
print y.flat[12:15]
print y.flat[:]

[[  0.   1.   2.   3.   4.]
 [  5.   6.   7.   8.   9.]
 [ 10.  11.  12.  13.  14.]
 [ 15.  16.  17.  18.  19.]
 [ 20.  21.  22.  23.  24.]]
[ 0.  1.  2.  3.  4.]
0.0
6.0
[ 12.  13.  14.]
[  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.  14.
  15.  16.  17.  18.  19.  20.  21.  22.  23.  24.]


## 4.4 Slicing and memoray management

Unlike lists, slices of arrays do not copy the underlying data. 

Instead a slice of an array returns a view of the array, which shares the data in the sliced array. This is important since changes in slices will propagate to the underlying array and to any other slices which share the same element.

In [1]:
x = reshape(arange(4.0), (2, 2))
print x

s1 = x[0, :]
s2 = x[:, 0]

print "s1", s1
print "s2", s2

s1[0] = -pi

print "s1", s1, "\n", "s2", s2, "\n", "x", x

[[ 0.  1.]
 [ 2.  3.]]
s1 [ 0.  1.]
s2 [ 0.  2.]
s1 [-3.14159265  1.        ] 
s2 [-3.14159265  2.        ] 
x [[-3.14159265  1.        ]
 [ 2.          3.        ]]


If changes should not propagate to parent and sibling arrays, it is necessary to call copy on the slice. 

Alternatively, they can also be copied by calling array on arrays, or matrix on matrices.

In [18]:
x = reshape(arange(4.0), (2, 2))
print x

s1 = copy(x[0, :]) # Using fucntion - copy
s2 = x[:, 0].copy() # Using method - copy
s3 = array(x[0, :]) # Using creating a new array

s1[0] = -pi
s2[0] = -pi
s3[0] = -pi

print s1, s2, s3
print x

[[ 0.  1.]
 [ 2.  3.]]
[-3.14159265  1.        ] [-3.14159265  2.        ] [-3.14159265  1.        ]
[[ 0.  1.]
 [ 2.  3.]]


There is one notable exception to this rule – when using pure scalar selection the (scalar) value returned is always a copy.



In [19]:
x = arange(5.0)
y = x[0] # Pure scalar selection
z = x[:1] # A pure slice
y = -3.14
print y # y Changes

print x # No propagation

# No changes to z either
print z 

z[0] = -2.79
print y # No propagation since y used pure scalar selection
print x # z is a view of x, so changes propagate

-3.14
[ 0.  1.  2.  3.  4.]
[ 0.]
-3.14
[-2.79  1.    2.    3.    4.  ]


Finally, assignments from functions which change values will automatically create a copy of the underlying array.

In [23]:
x = reshape(arange(4.0), (2, 2))
y = x

print id(x), id(y) # same memory locations

y = x + 1
print y

print id(x), id(y) # Different locations
print x

28815472 28815472
[[ 1.  2.]
 [ 3.  4.]]
28815472 26474976
[[ 0.  1.]
 [ 2.  3.]]


Even trivial function such as y = x + 0.0 create a copy of x , and so the only scenario where explicit copying is required is when y is directly assigned using a slice of x, and changes to y should not propagate to x .

Exercise
==
Q1. Input the following matices into Python as both arrays and matrices.

$$
u =
  \begin{bmatrix}
    1 & 1 & 2 & 3 & 5 & 8
  \end{bmatrix}
$$

$$ v = u^T $$


$$
x = I_{2X2}
$$


$$
y =
  \begin{bmatrix}
    1 & 2 \\
    3 & 4 \\
  \end{bmatrix}
$$


$$
z =
  \begin{bmatrix}
    1 & 2 & 1 & 2 \\
    3 & 4 & 3 & 4 \\
    1 & 2 & 1 & 2 \\
  \end{bmatrix}
$$

$$
w =
  \begin{bmatrix}
    x & x \\
    y & y \\
  \end{bmatrix}
$$


In [8]:
u = array([[1, 1, 2, 3, 5, 8]])
v = transpose(u)

x = identity(2)
y = reshape(arange(1, 5), (2, 2))
z = array([[1, 2, 1, 2],
           [3, 4, 3, 4],
           [1, 2, 1, 2]])
w = vstack([hstack([x, x]), hstack([y, y])])

print "u", u
print "v", v
print "x", x
print "y", y
print "z", z
print "w", w

u [[1 1 2 3 5 8]]
v [[1]
 [1]
 [2]
 [3]
 [5]
 [8]]
x [[ 1.  0.]
 [ 0.  1.]]
y [[1 2]
 [3 4]]
z [[1 2 1 2]
 [3 4 3 4]
 [1 2 1 2]]
w [[ 1.  0.  1.  0.]
 [ 0.  1.  0.  1.]
 [ 1.  2.  1.  2.]
 [ 3.  4.  3.  4.]]


Q2. What command would pull x out of w ?

In [11]:
print w[:2, :2]
print x

[[ 1.  0.]
 [ 0.  1.]]
[[ 1.  0.]
 [ 0.  1.]]


Q3. What command would pull $[x^T, y^T]^T$ out of $w$?

In [14]:
print w[:, :2]
print transpose(hstack([transpose(x), transpose(y)]))

[[ 1.  0.]
 [ 0.  1.]
 [ 1.  2.]
 [ 3.  4.]]
[[ 1.  0.]
 [ 0.  1.]
 [ 1.  2.]
 [ 3.  4.]]


Q4. What command would pull $y$ out of $z$?

In [15]:
print z[:2, :2]
print y

[[1 2]
 [3 4]]
[[1 2]
 [3 4]]


Q5. Write an OLS estimator.
$$ \beta = (X^{T}X)^{-1}X^TY $$

In [18]:
def ols_estimator(x, y):
    term1 = inv( dot(transpose(x), x) )
    term2 = dot(transpose(x), y)
    beta = dot(term1, term2)
    return beta

# Generate x
x = reshape(random.random(100), (20, 5))
unos = ones((20,1))
x = hstack([unos, x])

# Generate y
y = reshape(random.random(20), (20, 1))

print ols_estimator(x, y)

[[ 0.80220748]
 [-0.04419834]
 [ 0.18866963]
 [-0.27413913]
 [-0.25524798]
 [-0.15792029]]
