# Data wrangling & manipulation using Numpy and Pandas in Python -Part 1

NumPy, which stands for Numerical Python, is a library consisting of multidimensional array objects and a collection of routines for processing those arrays. Using NumPy, mathematical and logical operations on arrays can be performed. This tutorial explains the basics of NumPy such as its architecture and environment.

### Operations using NumPy

 * Mathematical and logical operations on arrays.
 * Operations related to linear algebra. NumPy has in-built functions for linear algebra and random number generation.
 
The ndarray object consists of contiguous one-dimensional segment of computer memory, combined with an indexing scheme that maps each item to a location in the memory block. 

In [143]:
# in order to use numpy we should import it first
import numpy as np
# Here we are importing numpy package and giving it a name as "np". Whenever we want to use numpy, then use "np"

a = np.array([1,2,3]) # We are creating a 1-d array 
print (a) # we are printing the array

[1 2 3]


In [144]:
a = np.array([[1, 2], [3, 4]]) # We are creating a 2d array
print (a)

[[1 2]
 [3 4]]


### 1) numpy.arange
**Syntax:** numpy.arange(start,stop, step)

* **start** : number, optional

Start of interval. The interval includes this value. The default start value is 0.

* **stop** : number

End of interval. The interval does not include this value, except in some cases where step is not an integer and floating point round-off affects the length of out.

* **step** : number, optional

Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified as a position argument, start must also be given

In [145]:
# Example 1
import numpy as np
np.arange( 10, 30, 5 )

array([10, 15, 20, 25])

In [146]:
a = np.arange(15) # Prints numbers from 0 to 14. Similar to "range" function  in built-in python.
a

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

### 2) numpy.reshape
**Syntax:** numpy.reshape(a, newshape)

_Gives a new shape to an array without changing its data._


* **a** : Array to be reshaped.

* **newshape** : int or tuple of ints
The new shape should be compatible with the original shape.

In [147]:
a = np.arange(15).reshape(3, 5) # Change the 1-d array into 3*5 matrix
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

### 3) ndarray.shape

The dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.

In [148]:
a.shape

(3, 5)

### 4) ndarray.ndim
The number of axes (dimensions) of the array.

In [149]:
a.ndim

2

### 5) ndarray.dtype
An object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

In [150]:
a.dtype

dtype('int32')

### 6) ndarray.itemsize
The size in bytes of each element of the array. For example, an array of elements of type float64 has itemsize 8 (=64/8), while one of type complex32 has itemsize 4 (=32/8). It is equivalent to ndarray.dtype.itemsize.

In [151]:
a.itemsize

4

### 7) ndarray.size
The total number of elements of the array. This is equal to the product of the elements of shape.

In [152]:
a.size

15

### 8) type(ndarray)
Returns the type of object


In [153]:
type(a)

numpy.ndarray

**Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.
The function zeros creates an array full of zeros, the function ones creates an array full of ones, and the function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64.**

### 9) numpy.zeros , numpy.ones , numpy.empty, numpy.eye, numpy.full and numpy.random.random

In [154]:
np.zeros( (3,4) ) # creates an arrays of all zero

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [155]:
np.ones( (2,3), dtype=np.int16 ) #  creates an array of all ones. dtype can also be specified 

array([[1, 1, 1],
       [1, 1, 1]], dtype=int16)

In [156]:
np.empty( (2,3) )  # Creates an empty array. Note that the numbers are the elements at that location 

array([[2.12199579e-314, 6.36598737e-314, 1.06099790e-313],
       [1.48539705e-313, 1.90979621e-313, 2.33419537e-313]])

In [157]:
np.eye(3) # Creates identity matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [158]:
np.full((2,2),7) # Creates a matrix with the specified elements

array([[7, 7],
       [7, 7]])

In [159]:
a = np.random.random((2,3)) # Creates a matrix with some random numbers
a

array([[0.94059626, 0.06910856, 0.65390767],
       [0.29684543, 0.31569933, 0.56992573]])

### 10) Basic Operations
### 10.1) Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.

In [160]:
# Let's create two arrays
a = np.array( [20,30,40,50] )
b = np.arange( 4 )
print(a)
print(b)

c = a-b # Simple substraction
print(c)

print(b**2) # Scalar multiplication(power)

print(10*np.sin(a)) # trignometric functions are available in numpy package

print(a<35) # Retuns true (or) false based on condition

a[0] +=10 # This is equalient to a[0]=a[0]+10
print(a)

[20 30 40 50]
[0 1 2 3]
[20 29 38 47]
[0 1 4 9]
[ 9.12945251 -9.88031624  7.4511316  -2.62374854]
[ True  True False False]
[30 30 40 50]


### 10.2) Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product can be performed using the @ operator (in python >=3.5) or the dot function or method:

In [161]:
A = np.array( [[1,1],[0,1]] )
B = np.array( [[2,0],[3,4]] )

print(A)
print(B)

print(A * B)  # elementwise product

print(A @ B)  # matrix product

print(A.dot(B)) # another matrix product

print(np.dot(A,B)) # another matrix product

[[1 1]
 [0 1]]
[[2 0]
 [3 4]]
[[2 0]
 [0 4]]
[[5 4]
 [3 4]]
[[5 4]
 [3 4]]
[[5 4]
 [3 4]]


### 10.3) Some of the math functions on arrays-Use of axis

In [106]:
b=np.arange(12).reshape(3,4)
print(b)

print(b.sum()) # Addition of all elements in b

print(b.sum(axis=0)) # Note that axis=0 means column and axis=1 is row

print(b.sum(axis=1))

print(b.min(axis=1)) # Mininmum element from each row

print(b.cumsum(axis=1)) # Cummulative sum of each row

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
66
[12 15 18 21]
[ 6 22 38]
[0 4 8]
[[ 0  1  3  6]
 [ 4  9 15 22]
 [ 8 17 27 38]]


### 10.4) Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:

In [162]:
a=np.arange(5)
print(a)

b=np.random.random(5)
print(b)

print(np.add(a,b))# Elementwise sum; both produce the same array
print(a + b)

print(a - b) # Elementwise difference; both produce the same array
print(np.subtract(a, b))

print(a * b)# Elementwise product; both produce the same array
print(np.multiply(a, b))

print(a / b)# Elementwise division; both produce the same array
print(np.divide(a, b))

[0 1 2 3 4]
[0.6481224  0.22973983 0.2310506  0.11142461 0.31478754]
[0.6481224  1.22973983 2.2310506  3.11142461 4.31478754]
[0.6481224  1.22973983 2.2310506  3.11142461 4.31478754]
[-0.6481224   0.77026017  1.7689494   2.88857539  3.68521246]
[-0.6481224   0.77026017  1.7689494   2.88857539  3.68521246]
[0.         0.22973983 0.4621012  0.33427384 1.25915014]
[0.         0.22973983 0.4621012  0.33427384 1.25915014]
[ 0.          4.35274971  8.65611253 26.92403329 12.70698343]
[ 0.          4.35274971  8.65611253 26.92403329 12.70698343]


### 11) Slicing and Iterating

### 11.1) Slicing: In addition to accessing list elements one at a time, Python provides concise syntax to access sublists; this is known as slicing:

In [163]:
# Slicing a 1-D array

a=np.arange(10)**3
print(a)

print(a[2]) # Prints the element at index 2. i.e 3rd element of the array a
print(a[2:4]) # Prints the element from index 2 to index 3. Note that a[2:4] means elements from 2 upto 4 (But not including 4).
print(a[2:3]) # Note that this returns an array with 1 element. But a[2] returns a scalar element 

[  0   1   8  27  64 125 216 343 512 729]
8
[ 8 27]
[8]


In [164]:
# Slicing a 2-D array

a=np.arange(12).reshape(4,3)**2 # Create an array "a" with square of numbers from 0 to 11.
print(a)

print(a[1:3]) # Prints second and third rows
print(a[1:3,1:2]) # from second and third rows, print second column
print(a[2,])# print 3rd row
print(a[:2],) # print 1st and 2nd row
print(a[:3,])# print 1st, 2nd and 3rd row
print(a[:3,:2])# from 1st , 2nd and 3rd row, print 1st and 2nd column 
print(a[:,2]) # from all rows, print 3rd column
print(a[2,:])# print 3rd row with all columns

[[  0   1   4]
 [  9  16  25]
 [ 36  49  64]
 [ 81 100 121]]
[[ 9 16 25]
 [36 49 64]]
[[16]
 [49]]
[36 49 64]
[[ 0  1  4]
 [ 9 16 25]]
[[ 0  1  4]
 [ 9 16 25]
 [36 49 64]]
[[ 0  1]
 [ 9 16]
 [36 49]]
[  4  25  64 121]
[36 49 64]


### Using -ve indentation for slicing the array

In [27]:
# Slicing 1-D array using -ve indexing

a=np.arange(10)**3
print(a)

print(a[-1]) # prints last element
print(a[-2]) # prints last but element
print(a[-5:-2]) # prints 5th ,6th and 7th element. i.e -5,-4,-3

[  0   1   8  27  64 125 216 343 512 729]
729
512
[125 216 343]


In [28]:
# Slicing a 2=D using -ve indexing

a=np.arange(12).reshape(4,3)**2 
print(a)

print(a[-1,]) # Prints last row
print(a[-1,-1]) # print element at last row and last column
print(a[-3:-1,-2:]) # Note: -3 :-1 means 3rd row from last upto 1st row from last(exclusive) and -2: means 
                    #2nd column and 1st column from last.

[[  0   1   4]
 [  9  16  25]
 [ 36  49  64]
 [ 81 100 121]]
[ 81 100 121]
121
[[16 25]
 [49 64]]


### A slice of an array is a view into the same data, so modifying it will modify the original array.

In [67]:
a=np.array([[1,2,3],[4,5,6],[7,8,9]]) # Create an array "a"
a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [68]:
b=a[:1,:1] # Use slicing to pull out the subarray consisting of 1st element in an array
b

array([[1]])

In [70]:
# Now let's change the array "b"

b[0]=12
a # Note that original array is changed


array([[12,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9]])

### One useful trick with integer array indexing is selecting or mutating (Change) one element from each row of a matrix.

In [78]:
a=np.array([[1,2,3],[4,5,6],[7,8,9]])
a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [79]:
b=np.array([0,1,2])
b

array([0, 1, 2])

In [82]:
# Select one element from each row of "a" using the indices in b.
a[np.arange(3),b]

array([1, 5, 9])

In [83]:
a[np.arange(3),b]+=10 # We have change one element from each row of a using the indices in b
a

array([[11,  2,  3],
       [ 4, 15,  6],
       [ 7,  8, 19]])

### 11.2 Iterating

**Iterating over multidimensional arrays is done with respect to the first axis.**

In [29]:
a=np.arange(12).reshape(4,3)**2 # Create an array "a" with square of numbers from 0 to 11.
print(a)

for i in a:
    print(i)

[[  0   1   4]
 [  9  16  25]
 [ 36  49  64]
 [ 81 100 121]]
[0 1 4]
[ 9 16 25]
[36 49 64]
[ 81 100 121]


**However, if one wants to perform an operation on each element in the array, one can use the "flat" attribute which is an iterator over all the elements of the array.**

In [30]:
for i in a.flat:
    print(i)

0
1
4
9
16
25
36
49
64
81
100
121


### 12) Matrix functions for matrix shape manipulation

### 12.1) Difference between resize and reshape


In [165]:
a=np.floor(10*np.random.random((3,4))) # Create a rondom 3*4 matrix.
a

array([[8., 5., 0., 1.],
       [5., 2., 7., 0.],
       [3., 4., 5., 0.]])

In [166]:
a.reshape(2,6) 

array([[8., 5., 0., 1., 5., 2.],
       [7., 0., 3., 4., 5., 0.]])

In [167]:
a # Note that actual array is not changed

array([[8., 5., 0., 1.],
       [5., 2., 7., 0.],
       [3., 4., 5., 0.]])

In [168]:
a.resize(2,6)

In [169]:
a # The reshape function returns its argument with a modified shape, 
  #whereas the ndarray.resize method modifies the array itself:

array([[8., 5., 0., 1., 5., 2.],
       [7., 0., 3., 4., 5., 0.]])

### 11.2) Function to flatten an array

In [36]:
a.ravel()
a

array([[0., 7., 2., 3., 1., 3.],
       [9., 6., 2., 5., 8., 2.]])

### 11.3) To find transpose of a matrix.

In [37]:
a.T

array([[0., 9.],
       [7., 6.],
       [2., 2.],
       [3., 5.],
       [1., 8.],
       [3., 2.]])

### 12) Stacking together different arrays

In [38]:
# Create 2 matrices

a=np.floor(10*np.random.random((2,2)))
print(a)

b=np.floor(10*np.random.random((2,2)))
print(b)

[[7. 0.]
 [7. 0.]]
[[1. 7.]
 [4. 3.]]


In [39]:
np.vstack((a,b)) # Stacking one array above another.

array([[7., 0.],
       [7., 0.],
       [1., 7.],
       [4., 3.]])

In [40]:
np.hstack((a,b)) # Stacking one array beside another

array([[7., 0., 1., 7.],
       [7., 0., 4., 3.]])

### 13) Splitting one array into several smaller ones

In [41]:
a=np.array([[11,12,13,14,15,16],[17,19,23,24,25,34]])
print(a)

np.hsplit(a,3) # Split "a" into 3

[[11 12 13 14 15 16]
 [17 19 23 24 25 34]]


[array([[11, 12],
        [17, 19]]), array([[13, 14],
        [23, 24]]), array([[15, 16],
        [25, 34]])]

In [42]:
np.hsplit(a,(3,4)) # Split "a" after the third and the fourth column

[array([[11, 12, 13],
        [17, 19, 23]]), array([[14],
        [24]]), array([[15, 16],
        [25, 34]])]

In [43]:
np.vsplit(a,2) # Splits along vertical axis

[array([[11, 12, 13, 14, 15, 16]]), array([[17, 19, 23, 24, 25, 34]])]

### 14) Broadcasting

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations. Frequently we have a smaller array and a larger array, and we want to use the smaller array multiple times to perform some operation on the larger array.

In [129]:
a=np.array([[11,22,33],[44,55,66],[12,14,34]])
a

b=np.array([1,2,3])
b

array([1, 2, 3])

In [130]:
c=a+b
c

array([[12, 24, 36],
       [45, 57, 69],
       [13, 16, 37]])