### Data Analysis Packages
There are four key packages that are most widely used for data analysis.

• NumPy

• SciPy

• Matplotlib

• Pandas

Pandas, NumPy, and Matplotlib play a major role and have the scope of usage in almost
all data analysis tasks.

![1.png](data/1.png)

### Data analysis packages

### NumPy
NumPy is the core library for scientific computing in Python. It provides a highperformance
multidimensional array object, and tools for working with these arrays. It’s a
successor of Numeric package.

### Array
A NumPy array is a collection of similar data type values, and is indexed by a tuple of
nonnegative numbers.

#### Check the versions of libraries

In [1]:
# numpy
import numpy
print('numpy: {}'.format(numpy.__version__))

numpy: 1.16.5


#### Example code for initializing NumPy array

In [2]:
import numpy as np

# Create a rank 1 array
a = np.array([0, 1, 2])
print (type(a))

# this will print the dimension of the array
print (a.shape)
print (a[0])
print (a[1])
print (a[2])

# Change an element of the array
a[0] = 5
print (a)

<class 'numpy.ndarray'>
(3,)
0
1
2
[5 1 2]


In [3]:
# Create a rank 2 array
b = np.array([[0,1,2],[3,4,5]])
print (b.shape)
print (b)
print (b[0, 0], b[0, 1], b[1, 0])

(2, 3)
[[0 1 2]
 [3 4 5]]
0 1 3


### Creating NumPy Array
NumPy also provides many built-in functions to create arrays.

#### Creating NumPy array

In [4]:
# Create a 3x3 array of all zeros
a = np.zeros((3,3))
print (a)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [5]:
# Create a 2x2 array of all ones
b = np.ones((2,2))
print (b)

[[1. 1.]
 [1. 1.]]


In [6]:
# Create a 3x3 constant array
c = np.full((3,3), 7)
print (c)

[[7 7 7]
 [7 7 7]
 [7 7 7]]


In [7]:
# Create a 3x3 array filled with random values
d = np.random.random((3,3))
print (d)

[[0.53024278 0.5150547  0.56480921]
 [0.6477523  0.65582511 0.27755771]
 [0.5426302  0.27742299 0.57152831]]


In [8]:
# Create a 3x3 identity matrix
e = np.eye(3)
print (e)

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [9]:
# convert list to array
f = np.array([2, 3, 1, 0])
print (f)

[2 3 1 0]


In [10]:
# arange() will create arrays with regularly incrementing values
g = np.arange(20)
print (g)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [11]:
# note mix of tuple and lists
h = np.array([[0, 1,2.0],[0,0,0],(1+1j,3.,2.)])
print (h)

[[0.+0.j 1.+0.j 2.+0.j]
 [0.+0.j 0.+0.j 0.+0.j]
 [1.+1.j 3.+0.j 2.+0.j]]


In [12]:
# note mix of tuple and lists
h = np.array([[0, 1,2.0],[0,0,0],(1+1j,3.,2.)])
print (h)

[[0.+0.j 1.+0.j 2.+0.j]
 [0.+0.j 0.+0.j 0.+0.j]
 [1.+1.j 3.+0.j 2.+0.j]]


In [13]:
# create an array of range with float data type
i = np.arange(1, 8, dtype=np.float)
print (i)

[1. 2. 3. 4. 5. 6. 7.]


In [14]:
# linspace() will create arrays with a specified number of items which are
# spaced equally between the specified beginning and end values
j = np.linspace(2., 4., 5)
print (j)

[2.  2.5 3.  3.5 4. ]


In [15]:
# indices() will create a set of arrays stacked as a one-higher
# dimensioned array, one per dimension with each representing variation
# in that dimension
k = np.indices((2,2))
print (k)

[[[0 0]
  [1 1]]

 [[0 1]
  [0 1]]]


### Data Types
An array is a collection of items of the same data type and NumPy supports and provides
built-in functions to construct arrays with optional arguments to explicitly specify
required datatypes.

#### NumPy datatypes

In [16]:
# Let numpy choose the datatype
x = np.array([0, 1])
y = np.array([2.0, 3.0])

# Force a particular datatype
z = np.array([5, 6], dtype=np.int64)
print (x.dtype, y.dtype, z.dtype)

int32 float64 int64


### Array Indexing
NumPy offers several ways to index into arrays. Standard Python x[obj] syntax can be
used to index NumPy array, where x is the array and obj is the selection.

There are three kinds of indexing available:

• Field access

• Basic slicing

• Advanced indexing

### Field Access
If the ndarray object is a structured array, the fields of the array can be accessed by
indexing the array with strings, dictionary like. Indexing x[‘field-name’] returns a new
view to the array, which is of the same shape as x, except when the field is a subarray, but
of data type x.dtype[‘field-name’] and contains only the part of the data in the specified
field.

#### Field access

In [17]:
x = np.array([5, 6, 7, 8, 9])
x[1:7:2]

array([6, 8])

In [18]:
print (x[-2:5])
print (x[-1:1:-1])

[8 9]
[9 8 7]


In [19]:
x[4:]

array([9])

#### Basic slicing

In [20]:
y = np.array([[[1],[2],[3]], [[4],[5],[6]]])
print ("Shape of y: ", x.shape)
y[1:3]

Shape of y:  (5,)


array([[[4],
        [5],
        [6]]])

In [21]:
# Create a rank 2 array with shape (3, 4)
a = np.array([[5,6,7,8], [1,2,3,4], [9,10,11,12]])
print ("Array a:", a)

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
# [6 7]]
b = a[:2, 1:3]
print ("Array b:", b)

Array a: [[ 5  6  7  8]
 [ 1  2  3  4]
 [ 9 10 11 12]]
Array b: [[6 7]
 [2 3]]


In [22]:
print (a[0, 1])
b[0, 0] = 77
print (a[0, 1])

6
77


In [23]:
row_r1 = a[1,:]# Rank 1 view of the second row of a
row_r2 = a[1:2,:]# Rank 2 view of the second row of a
print (row_r1, row_r1.shape) # Prints "[5 6 7 8] (4,)"
print (row_r2, row_r2.shape) # Prints "[[5 6 7 8]] (1, 4)"

[1 2 3 4] (4,)
[[1 2 3 4]] (1, 4)


In [24]:
# We can make the same distinction when accessing columns of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print (col_r1, col_r1.shape,'\n') # Prints "[ 2 6 10] (3,)"
print (col_r2, col_r2.shape)

[77  2 10] (3,) 

[[77]
 [ 2]
 [10]] (3, 1)


### Advanced Indexing
Integer array indexing: Integer array indexing allows you to construct random arrays and
other arrays.

#### Advanced indexinga = np.array([[1,2], [3, 4]])

In [25]:
# An example of integer array indexing.
# The returned array will have shape (2,) and
print (a[[0, 1], [0, 1]])

[5 2]


In [26]:
# The above example of integer array indexing is equivalent to this:
print (np.array([a[0, 0], a[1, 1]]))

[5 2]


In [27]:
# When using integer array indexing, you can reuse the same
# element from the source array:
print (a[[0, 0], [1, 1]])

[77 77]


In [28]:
# Equivalent to the previous integer array indexing example
print (np.array([a[0, 1], a[0, 1]]))

[77 77]


#### Boolean array indexing

In [29]:
a=np.array([[1,2], [3, 4], [5, 6]])

# Find the elements of a that are bigger than 2
print (a > 2, '\n')

# to get the actual value
print (a[a > 2])

[[False False]
 [ True  True]
 [ True  True]] 

[3 4 5 6]


### Array Math
Basic mathematical functions are available as operators and also as functions in NumPy.
It operates element-wise on an array.

#### Array math

In [30]:
import numpy as np
x=np.array([[1,2],[3,4],[5,6]])
y=np.array([[7,8],[9,10],[11,12]])

# Elementwise sum; both produce the array
print(x+y,'\n')
print(np.add(x, y))

[[ 8 10]
 [12 14]
 [16 18]] 

[[ 8 10]
 [12 14]
 [16 18]]


In [31]:
# Elementwise difference; both produce the array
print(x-y,'\n')
print(np.subtract(x, y))

[[-6 -6]
 [-6 -6]
 [-6 -6]] 

[[-6 -6]
 [-6 -6]
 [-6 -6]]


In [32]:
# Elementwise product; both produce the array
print(x*y,'\n')
print(np.multiply(x, y))

[[ 7 16]
 [27 40]
 [55 72]] 

[[ 7 16]
 [27 40]
 [55 72]]


In [33]:
# Elementwise division; both produce the array
print(x/y,'\n')
print(np.divide(x, y))

[[0.14285714 0.25      ]
 [0.33333333 0.4       ]
 [0.45454545 0.5       ]] 

[[0.14285714 0.25      ]
 [0.33333333 0.4       ]
 [0.45454545 0.5       ]]


In [34]:
# Elementwise square root; produces the array
print(np.sqrt(x))

[[1.         1.41421356]
 [1.73205081 2.        ]
 [2.23606798 2.44948974]]


#### We can use the “dot” function to calculate inner products of vectors or to multiply matrices or multiply a vector by a matrix.

#### Array math (continued)

In [35]:
x=np.array([[1,2],[3,4]])
y=np.array([[5,6],[7,8]])

a=np.array([9,10])
b=np.array([11, 12])

# Inner product of vectors; both produce 219
print(a.dot(b))
print(np.dot(a, b))

219
219


In [36]:
# Matrix / vector product; both produce the rank 1 array [29 67]
print(x.dot(a),'\n')
print(np.dot(x, a))

[29 67] 

[29 67]


In [37]:
# Matrix / matrix product; both produce the rank 2 array
print(x.dot(y),'\n')
print(np.dot(x, y))

[[19 22]
 [43 50]] 

[[19 22]
 [43 50]]


NumPy provides many useful functions for performing computations on arrays. One
of the most useful is sum.

#### Sum function

In [38]:
x=np.array([[1,2],[3,4]])

# Compute sum of all elements
print (np.sum(x))

# Compute sum of each column
print (np.sum(x, axis=0))

# Compute sum of each row
print (np.sum(x, axis=1))

10
[4 6]
[3 7]


Transpose is one of the common operations often performed on matrix, which can
be achieved using the T attribute of an array object.

#### Transpose function

In [39]:
x=np.array([[1,2], [3,4]])
print(x,'\n')
print(x.T)

[[1 2]
 [3 4]] 

[[1 3]
 [2 4]]


In [40]:
# Note that taking the transpose of a rank 1 array does nothing:
v=np.array([1,2,3])
print(v,'\n')
print(v.T)

[1 2 3] 

[1 2 3]


### Broadcasting
Broadcasting enables arithmetic operations to be performed between different shaped
arrays.

#### Broadcasting

In [41]:
# create a matrix
a = np.array([[1,2,3], [4,5,6], [7,8,9]])

# create a vector
v = np.array([1, 0, 1])

# create an empty matrix with the same shape as a
b = np.empty_like(a)

# Add the vector v to each row of the matrix x with an explicit loop
for i in range(3):
    b[i, :] = a[i, :] + v
    print (b)

[[          2           2           4]
 [          0           0           0]
 [          0           0 -1224778528]]
[[          2           2           4]
 [          5           5           7]
 [          0           0 -1224778528]]
[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]]


#### Broadcasting for large matrix

In [42]:
# Stack 3 copies of v on top of each other
vv = np.tile(v, (3, 1))
print (vv)

[[1 0 1]
 [1 0 1]
 [1 0 1]]


In [43]:
# Add a and vv elementwise
b = a + vv
print (b)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]]


#### Broadcasting using NumPy

In [44]:
a = np.array([[1,2,3], [4,5,6], [7,8,9]])
v = np.array([1, 0, 1])

# Add v to each row of a using broadcasting
b = a + v
print (b)

[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]]


#### Applications of broadcasting

In [45]:
# Compute outer product of vectors
# v has shape (3,)
v = np.array([1,2,3])

# w has shape (2,)
w = np.array([4,5])

# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:
print (np.reshape(v, (3, 1)) * w)

[[ 4  5]
 [ 8 10]
 [12 15]]


In [46]:
# Add a vector to each row of a matrix
x = np.array([[1,2,3],[4,5,6]])

# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3)
print(x + v)

[[2 4 6]
 [5 7 9]]


In [47]:
# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcast
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column
print((x.T + w).T)


[[ 5  6  7]
 [ 9 10 11]]


In [48]:
# Another solution is to reshape w to be a row vector of shape (2, 1);
# we can then broadcast it directly against x to produce the same
# output.
print(x + np.reshape(w,(2,1)))

[[ 5  6  7]
 [ 9 10 11]]


In [49]:
# Multiply a matrix by a constant:
# x has shape (2, 3). Numpy treats scalars as arrays of shape ();
# these can be broadcast together to shape (2, 3)
print(x * 2)

[[ 2  4  6]
 [ 8 10 12]]
