# Lecture 2 :  Numpy Array and Matplolib Visualisation

In the last lecture, we studied the different containers of **PYTHON** like the list, string dictionary etc. 
We also saw how to control the flow in the code using *for*, *if-else* and *while* loop. 

In this lecture our focus will be on one of the most widely use **PYTHON OBJECT** called the *numpy.ndarray*. 
The standard library available in **PYTHON** for array manipulation is called the *Numpy*. We will also see how we can visualize such arrays using a very versatile plotting library called the *Matplotlib*. 

Advantages of *Numpy* : 
> 1. Extension package to Python for multi-dimensional arrays
> 2. More efficient in terms of memory access due to same type of items
> 3. Ideally suited for scientific computations. 

In [None]:
import numpy as np                              # Recommended way to import numpy 
import matplotlib.pyplot as plt                 # Recommended way to import all plotting commands

# Magic word to load the namespace with numpy and matplotlib with inline figures
%pylab inline                                   

In [None]:
a = np.array([0., 1., 2., 3.])      # Creating a 1D array of 4 elements
print type(a), len(a), a.shape, a.ndim, a.T

Memory Efficiency can be easily demonstrated. 

In [None]:
L = range(1000)
print type(L)
%timeit [i**2 for i in range(1000)]  # Using for loop to square element wise. 

In [None]:
a = np.arange(1000)
print type(a)
%timeit a**2                       # Element wise squaring in array

Creating multi-dimensional arrays :
    > Typing each element indivisually -- Not Practical!
    > Using the start:stop and spacing information 
    > Using the start:stop and number of elements information.

In [None]:
# Creating multi-dimensional arrays. 
b = np.array([[0,1,2], [3,4,5]])
print b, b.shape

In [None]:
print len(b)                  # Returns the size of the first dimension. 
print b.shape                 # Returns number of rows x number of columns 
print b.ndim                  # Returns dimension of the array.

In [None]:
c = np.array([[[1], [2]], [[3], [4]]])
print c

In [None]:
print len(c)
print c.shape
print c.ndim

In [None]:
# Evenly spaced arrays using spacing information 

a = np.arange(10)                   # 0,1,...9 by default the spacing is 1.
print a

b = np.arange(1,9,2)                # start, end (not included), spacing
print b

In [None]:
# Evenly spaced arrays using number of points.

a = np.linspace(0,1,6)            # start, end (included), number of points
print a

b = np.linspace(0,1,6,endpoint=False)  # start, end (not included), number of points
print b

d = np.linspace(-1.0,5.0,7)
print d, 10**d
c = np.logspace(-1.0, 5.0, 7)
print c


Most times the above method of creating an array is used. However in certain cases, it may be require to initialize an array and later replace its elements based on some criteria. **Numpy** provides various methods to initialize such arrays.

In [None]:
# Other common types of arrays needed for initialize.
a = np.ones(shape=(3,3))
print a

In [None]:
b = np.zeros(shape=(2,4,2))
print b

In [None]:
c = np.diag([1,3,5])
print c

In [None]:
A = np.array([[2,3],[3,4]])
print A.shape
B = np.zeros_like(A)
print B

**Numpy** has a special attribute for generating **random numbers** using the *Mersenne Twister Pseudo Random Number Genrator*.

In [None]:
a = np.random.rand(4)                 # 1D array with 4 uniform random numbers between [0, 1]
print a

l1 = 3.0
l2 = 6.0
a = l1 + (l2 - l1)*np.random.rand(10) # 1D array with 10 uniform random numbers between [3.0, 6.0]
print a

In [None]:
np.random.seed(1234)
sigma = 1.0
mu = 0.0
b = sigma*np.random.randn(2,4) + mu     # 2D array of normal random numbers i.e.,Gaussian distribution with mu and sigma

print b
np.random.seed(1234)                  # This sets the seed for generating random numbers, useful for repeatblity.

In [None]:
np.random?

It is **imperative** that a numpy.ndarray object contains same type of elements. 

One can check the type of elements inside the numpy array using the attribute *dtype*.

In [None]:
a = np.array([1,2,3])
print a.dtype

b = np.array([1.,2.,3.])
bi = np.array([1.,2.,3.], dtype='int64')
print b.dtype, bi.dtype

c = np.array([True, False, True, False , False])
print c.dtype

d = np.zeros(shape=(4,), dtype='complex128')
d.real = np.random.rand(4) 
d.imag = np.random.randn(4)
print d.dtype, d

**Slicing and Indexing of Numpy Arrays**

The items in the array can be accessed and assigned in exactly the same manner as it is done for the *lists*. 
*NOTE* : The indexing begins from 0, just like *C* Language. 

In [None]:
#1D array -- exactly same as list.

a = np.arange(2.,42.,3)
print a
indxarr = np.arange(1,len(a),2)
for j in indxarr:
    print "The %d element of Array is %f"%(j,a[j])
    
b = a[::-1]                            # Reversing the array 'a' and storing the result in 'b'à
print b[:4]                            # printing first 4 elements of b. 

In [None]:
#2D Array -- Interesting things can be done using some very cool indexing.

a = np.ones(shape=(3,4))
a[2,3] = a[1,2] = 0                    # setting indivisual element 
a[-1] = -1                             # Complete last row of the array
a[:, -1] = -20                         # Complete last column of the array --> roots of Broadcasting!!
print a

**BROADCASTING**

As we saw above the basic operations on numpy arrays works elementwise and therefore one would imagine that when we add two matrices they should have same array. With the technique of *Broadcasting*, it is also possible to do operations on arrays of different sizes in case *numpy* can transform these arrays so to have same size.

<img src='numpy_broadcasting.png'>

In [None]:
a = np.arange(0,40,10)
b = np.array([0.,1., 2.])
#METHOD 0
#c = a + b
#print c

#METHOD 1
#A = np.zeros([4,3])
#for i in range(3): A[:,i] = a
#c = A + b
#print c


#METHOD 2
a = np.arange(0,40,10)
b = np.array([0.,1., 2.])
A = a[:, np.newaxis]
print a.shape, A.shape
c = A + b  

print a 
print A 
print b
print c

**!IMPORTANT!: Copies and Views**

A slicing operation that we saw above just creates a view of the original array and its not allocated any new memory space. Therefore one has to be very careful in modifying the view of the original array as there may be serious consequences.  This is really tricky but its designed to ensure efficient memory handling. 

**REMEMBER** - VIEWS SHARE SAME MEMORY WITH THE ARRAY AND COPIES DONT. SO MODIFY THE COPIES!.

This can be cleared with the following example. 

In [None]:
a = np.arange(2.,42.,3)
b = a[::2]
print "The array a = ",a
print "The array b = ",b
b[0] = -100.
print "The new array b = ",b                           # This is expected. 
print "The new array a = ",a                             # !!!!!

print np.may_share_memory(a,b)

In [None]:
a = np.arange(2.,42.,3)
b = a[::2].copy()                                      # This forces numpy to allocate new memory for b
print "The array a = ",a
print "The array b = ",b
b[0] = -100.
print "The new array b = ",b                           # This is expected. 
print "The new array a = ",a                             # !!!!!

print np.may_share_memory(a,b)

** Fancy Indexing with Numpy ** 

The indexing and slicing in numpy gives loads of flexible options and can be widely used for smart manipulation. 
For example, the index of numpy arrays need not be just a single integer but can also be *booleans* or a *list* of integers.

In [None]:
np.random.seed(3)

a = np.random.rand(15)                    # An array of 15 random floats with a chosen seed.
print a
a[4:8] = np.log(0)
# Now the task is to find all values in this array that are infinity 
print (np.isinf(a))
mask = (np.isinf(a))

#Replace these values by -100
masked_array = a[mask]                           # or simply masked_array = a[a%3 == 0]

print "The original array : ", a 
print "The masked array : ",masked_array

a[mask] = -1 

print "The manipulated array ", a

**MATHEMATICAL OPERATIONS**

*Numpy* provides suite of mathematical, logical and reduction methods that can be used to manipulate the array 
**elementwise**. It is very important to ensure that the rules are followed in terms of matching the size of involved arrays. 

*BASIC OPERATIONS*

In [None]:
a = np.arange(6, dtype='float64')
b = np.linspace(1.0,2.0,6)
print a, "+ 2*", b, '=',  a+2.0*b

c = np.linspace(-2.0, -6.0, 5)
print a + c

In [None]:
# Simple power, multiplication etc are done elementwise.

c = np.ones(shape=(3,3))
print "Elementwise multiplication "
print c*c             

print "Matrix Multiplication"
print np.inner(c,c)                       # c.dot(c) will also be fine.

*TRANCENDENTAL FUNCTIONS* : Numpy also has provision for trancendental functions as sine, cosine, log, tangent, exp etc.

In [None]:
a = np.linspace(0.,2.*np.pi,12)
plt.plot(a, np.sin(a), 'r-o')                           # The first 1D plot using matplotlib... more later.


*LOGICAL FUNCTIONS* : Can do all sorts of logical operations between two arrays of same size elementwise.

In [None]:
a = np.linspace(2.0,6.0,5)
b = np.linspace(1.0,10.0,5)
b[2] = 0
a[2] = a[-1] = 0
np.logical_or(a, b)
print (a == b)

*REDUCTION OPERATIONS* : Very useful in most of the cases that we will encounter. They inlove summing, finding min and max of array, finding the index of a particular element etc.

In [None]:
x = np.arange(1,8,2)
print x

print "The sum is : %d"%np.sum(x)
print x.sum()

In [None]:
x2D = np.array([[1,1,3],[3,4,7]])
print x2D.shape
print "sum all rows for a given column", np.sum(x2D, axis=0)
print "this is same as ", np.array([np.sum(x2D[:,0]), np.sum(x2D[:,1]), np.sum(x2D[:,2])])

print "sum all columns for a given row", np.sum(x2D, axis=1)
print "this is same as ", np.array([np.sum(x2D[0,:]), np.sum(x2D[1,:])])


In [None]:
# Flattening from multi-dimensional array to 1D. 
x1D = x2D.flatten()
print np.shape(x1D), x1D

# Reshaping the array for a given size. Note n*m*l = number of elements in array.
x2Dn = x1D.reshape(3,2)
print x2Dn



** MATPLOTLIB BASIC PLOTTING **

We will start with a basic line plot and then go on modifying it learning method of how to tinker with the matploib figure. 

In [None]:
x = np.linspace(-np.pi, np.pi, 256, endpoint=True)          # define an array using linspace.
C, S = np.cos(x), np.sin(x)
plt.plot(x, C, color='blue')
plt.plot(x, S, color='green')

In [None]:
plt.plot(x, C, color='black', linestyle='--')
plt.plot(x, S, color='green', linewidth=2)

In [None]:
plt.figure(figsize=[4,4], dpi=80)
plt.subplot(1,1,1)                  # No. of rows, No of columns and index. 
plt.plot(x, C, color='black', linestyle='--', label=r'Cosine : cos$\theta$')
plt.plot(x, S, color='green', linewidth=2, label=r'Sine : sin$\theta$')
plt.xlim([-5.0,5.])
plt.xticks(np.linspace(-5.,5.,9,endpoint=True), fontsize=12)
plt.ylim([-2.0,2.])
plt.yticks(np.linspace(-2.,2.,5,endpoint=True), ['Mon', 'Tue', 'Wed', 'Thur', 'Fri'])
#plt.legend(loc='upper right')
plt.xlabel('x')
plt.ylabel(r'$\rho , \eta$')

ax = plt.gca()
ax.spines['right'].set_color('none')
ax.yaxis.set_ticks_position('left')

#ax = plt.gca()
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')

plt.figtext(0.5, 0.2,r'Trignometry',fontsize=24)

#Annotation!! Slightly Complex. 
t = 2.0*np.pi/3.
plt.scatter([t,], [np.sin(t),], 50,color='k',marker='o')
plt.annotate(r'Value = %.2f'%np.sin(t), 
             xy=(t, np.sin(t)), xycoords='data', 
             xytext=(+10, 30), textcoords='offset points', 
             arrowprops=dict(arrowstyle='<->',connectionstyle='arc3,rad=0.2'))