# Operations on NumPy Arrays

The learning objectives of this section are:

* Manipulate arrays
    * Reshape arrays
    * Stack arrays
* Perform operations on arrays
    * Perform basic mathematical operations
    * Apply built-in functions 
    * Apply your own functions 
    * Apply basic linear algebra operations 


### Manipulating Arrays

Let's look at some ways to manipulate arrays, i.e. changing the shape, combining and splitting arrays, etc.   

#### Reshaping Arrays

Reshaping is done using the ```reshape()``` function.


In [1]:
import numpy as np

# Reshape a 1-D array to a 3 x 4 array
some_array = np.arange(0, 12).reshape(3, 4)
print(some_array)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [2]:
# Can reshape it further 
some_array.reshape(2, 6)

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

In [4]:
# If you specify -1 as a dimension, the dimensions are automatically calculated
# -1 means "whatever dimension is needed" 
some_array.reshape(4, -1)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

```array.T``` returns the transpose of an array.

In [5]:
# Transposing an array
print(some_array)
some_array.T

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


array([[ 0,  4,  8],
       [ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11]])

### Stacking and Splitting Arrays

#### Stacking: ```np.hstack()``` and ```n.vstack()```

Stacking is done using the ```np.hstack()``` and ```np.vstack()``` methods. For horizontal stacking, the number of rows should be the same, while for vertical stacking, the number of columns should be the same.

In [6]:
# Creating two arrays
array_1 = np.arange(12).reshape(3, 4)
array_2 = np.arange(20).reshape(5, 4)

print(array_1)
print("\n")
print(array_2)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]


In [7]:
# vstack
# Note that np.vstack(a, b) throws an error - you need to pass the arrays as a list
np.vstack((array_1, array_2))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

Similarly, two arrays having the same number of rows can be horizontally stacked using ```np.hstack((a, b))```.

### Perform Operations on Arrays

Performing mathematical operations on arrays is extremely simple. Let's see some common operations.


#### Basic Mathematical Operations

NumPy provides almost all the basic math functions - exp, sin, cos, log, sqrt etc. The function is applied to each element of the array.


In [8]:
# Basic mathematical operations
a = np.arange(1, 20)

# sin, cos, exp, log
print(np.sin(a))
print(np.cos(a))
print(np.exp(a))
print(np.log(a))

[ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427 -0.2794155
  0.6569866   0.98935825  0.41211849 -0.54402111 -0.99999021 -0.53657292
  0.42016704  0.99060736  0.65028784 -0.28790332 -0.96139749 -0.75098725
  0.14987721]
[ 0.54030231 -0.41614684 -0.9899925  -0.65364362  0.28366219  0.96017029
  0.75390225 -0.14550003 -0.91113026 -0.83907153  0.0044257   0.84385396
  0.90744678  0.13673722 -0.75968791 -0.95765948 -0.27516334  0.66031671
  0.98870462]
[2.71828183e+00 7.38905610e+00 2.00855369e+01 5.45981500e+01
 1.48413159e+02 4.03428793e+02 1.09663316e+03 2.98095799e+03
 8.10308393e+03 2.20264658e+04 5.98741417e+04 1.62754791e+05
 4.42413392e+05 1.20260428e+06 3.26901737e+06 8.88611052e+06
 2.41549528e+07 6.56599691e+07 1.78482301e+08]
[0.         0.69314718 1.09861229 1.38629436 1.60943791 1.79175947
 1.94591015 2.07944154 2.19722458 2.30258509 2.39789527 2.48490665
 2.56494936 2.63905733 2.7080502  2.77258872 2.83321334 2.89037176
 2.94443898]


#### Apply User Defined Functions

You can also apply your own functions on arrays. For e.g. applying the function ```x/(x+1)``` to each element of an array.

One way to do that is by looping through the array, which is the non-numpy way. You would rather want to write **vectorised code**. 

The simplest way to do that is to vectorise the function you want, and then apply it on the array. Numpy provides the ```np.vectorize()``` method to vectorise functions.

Let's look at both the ways to do it.

In [9]:
print(a)

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [10]:
# The non-numpy way, not recommended
a_list = [x/(x+1) for x in a]
print(a_list)

[0.5, 0.6666666666666666, 0.75, 0.8, 0.8333333333333334, 0.8571428571428571, 0.875, 0.8888888888888888, 0.9, 0.9090909090909091, 0.9166666666666666, 0.9230769230769231, 0.9285714285714286, 0.9333333333333333, 0.9375, 0.9411764705882353, 0.9444444444444444, 0.9473684210526315, 0.95]


In [11]:
# The numpy way: vectorize the function, then apply it
f = np.vectorize(lambda x: x/(x+1))
f(a)

array([0.5       , 0.66666667, 0.75      , 0.8       , 0.83333333,
       0.85714286, 0.875     , 0.88888889, 0.9       , 0.90909091,
       0.91666667, 0.92307692, 0.92857143, 0.93333333, 0.9375    ,
       0.94117647, 0.94444444, 0.94736842, 0.95      ])

In [9]:
import time
a1=np.arange(1,10000000)
e=time.time()
l1=[x/(x+1) for x in a1]
s=time.time()
print(s-e)

e=time.time()
f=np.vectorize(lambda x:x/(x+1))
f(a1)
s=time.time()
print(s-e)

3.625281572341919
1.8700263500213623


In [12]:
# Apply function on a 2-d array: Applied to each element
b = np.linspace(1, 100, 10)
print(b)
f(b)

[  1.  12.  23.  34.  45.  56.  67.  78.  89. 100.]


array([0.5       , 0.92307692, 0.95833333, 0.97142857, 0.97826087,
       0.98245614, 0.98529412, 0.98734177, 0.98888889, 0.99009901])

This also has the advantage that you can vectorize the function once, and then apply it as many times as needed. 

#### Apply Basic Linear Algebra Operations

NumPy provides the ```np.linalg``` package to apply common linear algebra operations, such as:
* ```np.linalg.inv```: Inverse of a matrix
* ```np.linalg.det```: Determinant of a matrix
* ```np.linalg.eig```: Eigenvalues and eigenvectors of a matrix
    
Also, you can multiple matrices using ```np.dot(a, b)```. 


In [12]:
# np.linalg documentation
help(np.linalg)

Help on package numpy.linalg in numpy:

NAME
    numpy.linalg

DESCRIPTION
    Core Linear Algebra Tools
    -------------------------
    Linear algebra basics:
    
    - norm            Vector or matrix norm
    - inv             Inverse of a square matrix
    - solve           Solve a linear system of equations
    - det             Determinant of a square matrix
    - lstsq           Solve linear least-squares problem
    - pinv            Pseudo-inverse (Moore-Penrose) calculated using a singular
                      value decomposition
    - matrix_power    Integer power of a square matrix
    
    Eigenvalues and decompositions:
    
    - eig             Eigenvalues and vectors of a square matrix
    - eigh            Eigenvalues and eigenvectors of a Hermitian matrix
    - eigvals         Eigenvalues of a square matrix
    - eigvalsh        Eigenvalues of a Hermitian matrix
    - qr              QR decomposition of a matrix
    - svd             Singular value decomposition 

In [1]:
# Creating arrays
import numpy as np
a = np.arange(10, 19).reshape(3, 3)
b= np.arange(1, 13).reshape(3, 4)
print(a)
print(b)

[[10 11 12]
 [13 14 15]
 [16 17 18]]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [2]:
l=[[32,3,42],[34,56,88],[464,424,24]]
a=np.array(l)
a

array([[ 32,   3,  42],
       [ 34,  56,  88],
       [464, 424,  24]])

In [3]:
# Inverse
np.linalg.inv(a)

array([[ 0.02371333, -0.01169316,  0.0013766 ],
       [-0.02638213,  0.0123419 ,  0.00091509],
       [ 0.00762666,  0.00802751, -0.0011142 ]])

In [18]:
# Determinant
np.linalg.det(a)

-1516784.0000000026

In [14]:
# Eigenvalues and eigenvectors
np.linalg.eig(a)

(array([ 4.24242853e+01, -4.24285286e-01, -8.76087811e-16]),
 array([[-0.44819574, -0.73921067,  0.40824829],
        [-0.5688793 , -0.03327957, -0.81649658],
        [-0.68956285,  0.67265152,  0.40824829]]))

In [17]:
# Multiply matrices
np.dot(a, b)

array([[ 38,  44,  50,  56],
       [ 83,  98, 113, 128],
       [128, 152, 176, 200]])

In [13]:
A=np.random.randint(1,1000,(3,4))
A

array([[857, 799, 722, 982],
       [475,  73, 949, 136],
       [799,  81, 749, 839]])

In [14]:
b=list(map(tuple,A))
b

[(857, 799, 722, 982), (475, 73, 949, 136), (799, 81, 749, 839)]

In [15]:
dt = np.dtype([('n1', int),('n2', int),('n3', int),('n4', int)]) 
A1=np.array(b,dtype=dt)
print(A1)

[(857, 799, 722, 982) (475,  73, 949, 136) (799,  81, 749, 839)]


In [16]:
np.sort(A1,order='n1')

array([(475,  73, 949, 136), (799,  81, 749, 839), (857, 799, 722, 982)],
      dtype=[('n1', '<i4'), ('n2', '<i4'), ('n3', '<i4'), ('n4', '<i4')])

#### Stacking: Several arrays can be stacked together along different axes. 

* ```np.vstack```: To stack arrays along vertical axis.
* ```np.hstack```: To stack arrays along horizontal axis.
* ```np.column_stack```: To stack 1-D arrays as columns into 2-D arrays.
* ```np.concatenate:```: To stack arrays along specified axis (axis is passed as argument).



In [4]:
a = np.array([[1, 2], 
              [3, 4]]) 
  
b = np.array([[5, 6], 
              [7, 8]]) 
  
# vertical stacking 
print("Vertical stacking:\n", np.vstack((a, b))) 
  
# horizontal stacking 
print("\nHorizontal stacking:\n", np.hstack((a, b))) 
  
c = [5, 6] 
  
# stacking columns 
print("\nColumn stacking:\n", np.column_stack((a, c))) 
  
# concatenation method  
print("\nConcatenating to 2nd axis:\n", np.concatenate((a, b), 0))
# OR print("\nConcatenating to 2nd axis:\n", np.concatenate((a, b),axis= 0))

Vertical stacking:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]

Horizontal stacking:
 [[1 2 5 6]
 [3 4 7 8]]

Column stacking:
 [[1 2 5]
 [3 4 6]]

Concatenating to 2nd axis:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]


### Splitting:For splitting, we have these functions:

   * ```np.hsplit:``` Split array along horizontal axis.
   * ```np.vsplit:``` Split array along vertical axis.
   * ```np.array_split:``` Split array along specified axis.


In [8]:
import numpy as np
a = np.array([[1, 3, 5, 7, 9, 11], 
              [2, 4, 6, 8, 10, 12]]) 
  
# horizontal splitting 
print("Splitting along horizontal axis into 2 parts:\n", np.hsplit(a, 3)) 
  
# vertical splitting 
print("\nSplitting along vertical axis into 2 parts:\n", np.vsplit(a, 2))

Splitting along horizontal axis into 2 parts:
 [array([[1, 3],
       [2, 4]]), array([[5, 7],
       [6, 8]]), array([[ 9, 11],
       [10, 12]])]

Splitting along vertical axis into 2 parts:
 [array([[ 1,  3,  5,  7,  9, 11]]), array([[ 2,  4,  6,  8, 10, 12]])]


### Sorting in Numpy

In [1]:
import numpy as np  
a = np.array([[30,17,15],[19,90,16],[69,53,21]]) 
a

array([[30, 17, 15],
       [19, 90, 16],
       [69, 53, 21]])

#### Applying sort() function:

In [3]:
np.sort(a[...,0])

array([19, 30, 69])

In [13]:
np.sort(a,axis=0) 

array([[19, 17, 15],
       [30, 53, 16],
       [69, 90, 21]])

In [10]:
A=np.random.randint(1,1000,(3,4))
A
b=list(map(tuple,A))
b

[(480, 822, 511, 24), (838, 543, 915, 683), (562, 195, 196, 74)]

In [11]:
dt = np.dtype([('n1', int),('n2', int),('n3', int),('n4', int)]) 
A1=np.array(b,dtype=dt)
print(A1)

[(480, 822, 511,  24) (838, 543, 915, 683) (562, 195, 196,  74)]


In [12]:
np.sort(A1,order='n1')

array([(480, 822, 511,  24), (562, 195, 196,  74), (838, 543, 915, 683)],
      dtype=[('n1', '<i4'), ('n2', '<i4'), ('n3', '<i4'), ('n4', '<i4')])

#### Order parameter in sort function 

In [16]:
dt = np.dtype([('name', 'S10'),('age', int)]) 
a = np.array([("Karan",21),("Arpit",25),("Ashish", 17), ("Sam",27),("Robin",22)], dtype = dt)  
a 

array([(b'Karan', 21), (b'Arpit', 25), (b'Ashish', 17), (b'Sam', 27),
       (b'Robin', 22)], dtype=[('name', 'S10'), ('age', '<i4')])

#### Order by name

In [17]:
np.sort(a, order = 'name')

array([(b'Arpit', 25), (b'Ashish', 17), (b'Karan', 21), (b'Robin', 22),
       (b'Sam', 27)], dtype=[('name', 'S10'), ('age', '<i4')])

#### Order by age:

In [18]:
data=np.sort(a, order = 'age')
data

array([(b'Ashish', 17), (b'Karan', 21), (b'Robin', 22), (b'Arpit', 25),
       (b'Sam', 27)], dtype=[('name', 'S10'), ('age', '<i4')])

### `Read from CSV File`

In [8]:
import numpy as np
a = np.genfromtxt('data.csv', delimiter=',', dtype=str)
a

array([['1', 'Angeli', 'Mapes', 'amapes0@chronoengine.com', 'Male'],
       ['2', 'Petronille', 'Helmke', 'phelmke1@walmart.com', 'Female'],
       ['3', 'Humfrid', 'Sainsberry', 'hsainsberry2@disqus.com', 'Male'],
       ['4', 'Kaja', 'Carnson', 'kcarnson3@amazon.de', 'Female'],
       ['5', 'Basia', 'Narraway', 'bnarraway4@bloomberg.com', 'Female'],
       ['6', 'Hy', 'Robiot', 'hrobiot5@quantcast.com', 'Male'],
       ['7', 'Angelika', 'Pedrocco', 'apedrocco6@quantcast.com',
        'Female'],
       ['8', 'Brigit', 'Olivie', 'bolivie7@so-net.ne.jp', 'Female'],
       ['9', 'Marjorie', 'Coope', 'mcoope8@microsoft.com', 'Female'],
       ['10', 'Demetra', 'Gumey', 'dgumey9@gmpg.org', 'Female'],
       ['11', 'Geneva', 'Vasentsov', 'gvasentsova@wsj.com', 'Female'],
       ['12', 'Llywellyn', 'Bullant', 'lbullantb@sfgate.com', 'Male'],
       ['13', 'Ruperta', 'Dudin', 'rdudinc@dailymotion.com', 'Female'],
       ['14', 'Page', 'Ferrotti', 'pferrottid@mashable.com', 'Female'],
       [