# Numpy Tutorial

### What is NumPy?
It stands for "Numerical Python". NumPy is a Python module that provides fast and efficient array operations of homogeneous data. It is the core library for scientific computing in Python providing a high-performance multidimensional array object, and tools for working with arrays. NumPy is one of the many packages that are extremely essential in your data science journey because this library equips you with an array data structure that offers some benefits over the traditional data structures of Python like lists.

### Creating NumPy arrays
The syntax of creating a NumPy array is: numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0) Here, the arguments

- object: Any object exposing the array interface
- dtype: Desired data type of array, optional
- copy: Optional. By default (true), the object is copied
- order: C (row-major) or F (column-major) or A (any) (default)
- subok: By default, returned array forced to be a base class array. If true, sub-classes passed through
- ndim: Specifies minimum dimensions of the resultant array Let's see how you can create a simple array using NumPy by first importing the package numpy as np


In [1]:
import numpy as np
a = np.array([1,2,3,4]) # creates a 1-dimensional array
b = np.array([[1,2,3,4], [5,6,7,8]]) # creates a 2-dimensional array
print(a)
print('-'*15)
print(b)

[1 2 3 4]
---------------
[[1 2 3 4]
 [5 6 7 8]]


### Advantages of using NumPy
- Absolutely free since open-sourced
Faster access in reading and writing items
- Time and space complexity of tasks is much lower when compared with traditional data structures
- Has a lot of built-in functions for linear algebra

## Advantages of using NumPy
Absolutely free since open-sourced
Faster access in reading and writing items
Time and space complexity of tasks is much lower when compared with traditional data structures
Has a lot of built-in functions for linear algebra

In [2]:
import numpy as np
a = np.array([1,2,3,4])
b = np.array([[1,2,3,4],[5,6,7,8]])

### Shape
It returns a tuple consisting of array dimensions i.e. tells us how many items are present in each dimension and can be found using the .shape the attribute of the ndarray object.

In [3]:
print("The shape of the array a is: ", a.shape)
print("THe shape of the array b is: ", b.shape)

The shape of the array a is:  (4,)
THe shape of the array b is:  (2, 4)


### Dimensions
It gives the number of dimensions and can be found using the .ndim the attribute of ndarray object.

In [4]:
print("The dimensions of array a is: ", a.ndim)
print("The dimensions of arrat b is: ", b.ndim)

The dimensions of array a is:  1
The dimensions of arrat b is:  2


### Size
It tells the total number of items in the array as a whole. More precisely it is the product of the elements of the .size the attribute of the array.

In [5]:
print("The size of the array a is: ", a.size)
print("The size of the array b is: ", b.size)

The size of the array a is:  4
The size of the array b is:  8


### Datatype
As the name suggests, it informs about the type of data in the array. Since a NumPy array consists of homogeneous data only, you will get only a single dtype.

- NumPy offers support to a much greater variety of numerical types than base Python does like int8, int16, float32, float16, bool_, complex_ etc.

In [6]:
print("The datatype of the array a is: ", a.dtype)
print("The datatype of the array b is: ", b.dtype)

The datatype of the array a is:  int32
The datatype of the array b is:  int32


### Itemsize
It represents the number of bytes in each element of the array.

In [7]:
print('The number of bytes in each element of the array a is: ', a.itemsize)
print('The number of bytes in each element of the array b is: ', b.itemsize)

The number of bytes in each element of the array a is:  4
The number of bytes in each element of the array b is:  4


## Creating New NumPy Arrays
You already know to create a NumPy array using the numpy.array() command. Now, let's look at some of the other ways to make NumPy arrays taking the help of the low-level ndarray constructor (we will be using np as an alias for numpy).

In [8]:
# np.empty()
# Creates an uninitialized (arbitrary) array of specified shape and dtype
np.empty((3,4), dtype='int8')

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int8)

In [9]:
# np.zeros()
# Creates a new array of specified size, filled with zeros
np.zeros((3,4), dtype='int8')

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int8)

In [10]:
# np.ones()
# Creates a new array of specified size and type, filled with ones

np.ones((3,4), dtype='int8')

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int8)

In [11]:
# np.full()
# Creates a new array of given shape and type, filled with a constant value
np.full((3,4), 7)

array([[7, 7, 7, 7],
       [7, 7, 7, 7],
       [7, 7, 7, 7]])

In [12]:
# np.eye()
# Creates a 2-D array with ones on the diagonal and zeros else where
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

## Arrays with Existing Data
New NumPy arrays can also be created from already existing ones. Let us look at some ways of doing so.

In [13]:
# np.asarray()
'''
This command is very similar to the np.array() command. Some examples are: '''
# Converting a list to array
a = [1,2,3,4,5]
b = np.asarray(a)
print(b)

[1 2 3 4 5]


In [14]:
# Converting a tuples to array
a = ((1,2),(3,4))
b = np.asarray(a)
print(b)

[[1 2]
 [3 4]]


In [15]:
# np.fromiter()
'''
This function creates a NumPy ndarray from an iterable.
'''
a = np.fromiter([1,2,3,4,5], dtype='int8')
b = np.fromiter([1,2,3,4,5], dtype='int8')
c = np.fromiter(range(1,5), dtype='int8')
d = np.fromiter('String', dtype='S50')
print("Array a is ",a)
print("Array b is ",b)
print("Array c is ",c)
print("Array d is ",d)

Array a is  [1 2 3 4 5]
Array b is  [1 2 3 4 5]
Array c is  [1 2 3 4]
Array d is  [b'S' b't' b'r' b'i' b'n' b'g']


# Creating new arrays
In vector mathematics, it is necessary to generate a set of numbers within some predefined range. You can create them easily with the help of some NumPy functions. Let's look at some of them.

### np.arange( )
It returns an array containing evenly spaced values within a given range.

Syntax :numpy.arange(start, stop, step, dtype)


In [16]:
print(np.arange(1,20,dtype='int32')) # Numpy array from 1 to 19
print(np.arange(1,20,2,dtype='int8')) # Numpy array from 1 to 19 with step size 2

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[ 1  3  5  7  9 11 13 15 17 19]


### np.linspace( )

It also returns an array within a range but not according to the step size as in the case of .arange() but according to the number of values we want within that range.

*Syntax: numpy.linspace(start, stop, num, endpoint, retstep, dtype)*

Here, start and stop means the same as in .arange( ) but the difference lies in num, which gives us the number of equally spaced numbers you want to insert within the range [start, stop-1]. The endpoint argument generally is by default set to the stop value (try changing it for some interesting results)


In [17]:
# NumPy array from 1 to 20 with 100 numbers in between
print(np.linspace(1,20,100))

[ 1.          1.19191919  1.38383838  1.57575758  1.76767677  1.95959596
  2.15151515  2.34343434  2.53535354  2.72727273  2.91919192  3.11111111
  3.3030303   3.49494949  3.68686869  3.87878788  4.07070707  4.26262626
  4.45454545  4.64646465  4.83838384  5.03030303  5.22222222  5.41414141
  5.60606061  5.7979798   5.98989899  6.18181818  6.37373737  6.56565657
  6.75757576  6.94949495  7.14141414  7.33333333  7.52525253  7.71717172
  7.90909091  8.1010101   8.29292929  8.48484848  8.67676768  8.86868687
  9.06060606  9.25252525  9.44444444  9.63636364  9.82828283 10.02020202
 10.21212121 10.4040404  10.5959596  10.78787879 10.97979798 11.17171717
 11.36363636 11.55555556 11.74747475 11.93939394 12.13131313 12.32323232
 12.51515152 12.70707071 12.8989899  13.09090909 13.28282828 13.47474747
 13.66666667 13.85858586 14.05050505 14.24242424 14.43434343 14.62626263
 14.81818182 15.01010101 15.2020202  15.39393939 15.58585859 15.77777778
 15.96969697 16.16161616 16.35353535 16.54545455 16

### np.logspace( )
This function returns an array containing numbers that are evenly spaced on a log scale. Start and stop endpoints of the scale are indices of the base, usually 10.

Syntax: numpy.logspace(start, stop, num, endpoint, base, dtype)

Here, the range of values is [{base}^{start}, {base}^{stop}][base 
start
 ,base 
stop
 ] with num being the number of equally spaced values on log scale within the range.

In [18]:
# NumPy array from 10^0 to 10^2 with 100 numbers in log scale
print(np.logspace(0,2,100))

[  1.           1.04761575   1.09749877   1.149757     1.20450354
   1.26185688   1.32194115   1.38488637   1.45082878   1.51991108
   1.59228279   1.66810054   1.7475284    1.83073828   1.91791026
   2.009233     2.10490414   2.20513074   2.3101297    2.42012826
   2.53536449   2.65608778   2.7825594    2.91505306   3.05385551
   3.19926714   3.35160265   3.51119173   3.67837977   3.85352859
   4.03701726   4.22924287   4.43062146   4.64158883   4.86260158
   5.09413801   5.33669923   5.59081018   5.85702082   6.13590727
   6.42807312   6.73415066   7.05480231   7.39072203   7.74263683
   8.11130831   8.49753436   8.90215085   9.32603347   9.77009957
  10.23531022  10.72267222  11.23324033  11.76811952  12.32846739
  12.91549665  13.53047775  14.17474163  14.84968262  15.55676144
  16.29750835  17.07352647  17.88649529  18.73817423  19.6304065
  20.56512308  21.5443469   22.5701972   23.64489413  24.77076356
  25.95024211  27.18588243  28.48035868  29.8364724   31.2571585
  32.7454916

## Indexing
You now know how to create different types of NumPy arrays and check their features. But how about accessing a particular value or taking a chunk of values from the array itself? In this topic, we are going to discuss exactly that. Like Python lists, the index starts at 0 for arrays as well.

Array indexing and slicing are exactly similar like Python indexing and slicing. It follows the same pattern of array[start:stop: step]. Let us look at an example to observe this behaviour.

In [19]:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
# pull out second element of third row
print(a[2][1])
print('='*10)
# Pull out first two rows and column
print(a[:2, :2])
print('='*10)
# Pull all element of the third row
print(a[2,:])

8
[[1 2]
 [4 5]]
[7 8 9]


### Integer array indexing
Integer array indexing allows you to construct arbitrary arrays using the data from another array. Let us understand from the example

In [20]:
# An example of integer array indexing
a=np.array([[1,2],[3,4],[5,6]])

print(a[[0,1,2],[0,1,0]])
print('==========')

print(np.array([a[0,0],a[1,1],a[2,0]]))
print('==========')

print(a[[0,0],[1,1]])
print('==========')

print(np.array([a[0,1],a[0,1]]))

[1 4 5]
[1 4 5]
[2 2]
[2 2]


## Boolean indexing
This type is generally used for comparison purposes. For ex: How about checking if how many numbers in the array are greater than 50(say)? It can be performed using a simple comparison operator (>=, >, ==, <, <=)

A boolean index array is of the same shape as the array-to-be-filtered and it contains only True and False values. You can filter those you want using the concept of masking. For ex: If for some array a and boolean condition condition = a > 2, a[condition] will result in an array that contains only the numbers in array a that is greater than 2.

Let us look at an example below

In [21]:
a = np.array([[4,5,6],[2,5,7],[7,2,1]])
# Boolean condition for value greater than 3
mask= a > 3
print(mask)

# Masking for the above boolean condition in the array.
print(a[mask])

[[ True  True  True]
 [False  True  True]
 [ True False False]]
[4 5 6 5 7 7]


# What is vectorization?
Vectorization is the ability of NumPy by which we can perform operations on entire arrays rather than on a single element. When looping over an array or any data structure in Python, there’s a lot of overhead involved. Vectorized operations in NumPy delegate the looping internally to highly optimized C and Fortran functions, making for a cleaner and faster Python code.

In [1]:
import numpy as np
a  = np.array([1,2,3,4,5,6,7])
print(a[a > 2])

[3 4 5 6 7]


## Vectorized operations
* No, let us look at how you can do some elementary vectorized operations like addition, subtraction, multiplication etc. Images below depict the type of operations and their corresponding output.

### Addition
* Two ways to go about it, using either + or np.add()

In [3]:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b=np.array([[10,11,12],[13,14,15],[16,17,18]])

print(a+b) # first way
print('='*15)
print(np.add(a,b)) # then Second way

[[11 13 15]
 [17 19 21]
 [23 25 27]]
[[11 13 15]
 [17 19 21]
 [23 25 27]]


### Subtraction
* Two ways, - or np.subtract()

In [4]:
print(a-b)
print('='*15)
print(np.subtract(a,b))

[[-9 -9 -9]
 [-9 -9 -9]
 [-9 -9 -9]]
[[-9 -9 -9]
 [-9 -9 -9]
 [-9 -9 -9]]


### Multiplication
* Two ways, * or np.multiply()

In [5]:
print(a*b)
print('='*15)
print(np.multiply(a,b))

[[ 10  22  36]
 [ 52  70  90]
 [112 136 162]]
[[ 10  22  36]
 [ 52  70  90]
 [112 136 162]]


### Division
* Two ways, / or np.divide()

In [6]:
print(a/b)
print('='*15)
print(np.divide(a,b))

[[0.1        0.18181818 0.25      ]
 [0.30769231 0.35714286 0.4       ]
 [0.4375     0.47058824 0.5       ]]
[[0.1        0.18181818 0.25      ]
 [0.30769231 0.35714286 0.4       ]
 [0.4375     0.47058824 0.5       ]]


### Square root transform
* Use np.sqrt()

In [7]:
ar = np.array([[6,7,8,9,10],[1,2,3,4,5]])
print("Square: ",np.sqrt(ar))

Square:  [[2.44948974 2.64575131 2.82842712 3.         3.16227766]
 [1.         1.41421356 1.73205081 2.         2.23606798]]


### Log transform
* Use np.log()

In [8]:
# element wise square root transform
a = np.array([[1,4,9],[16,25,36]])
print(np.log(a))

[[0.         1.38629436 2.19722458]
 [2.77258872 3.21887582 3.58351894]]


### Aggregrate operations
* Aggregration operations are those where we perform some operation on the entire array. Some commonly used aggregrate operations are listed below:
* Command         	Description
* a.sum() =>            Array-wise sum
* a.min()	=>            Array-wise minimum value
* a.max(axis=0)	  =>  Maximum value of an array row
* a.cumsum(axis=1)=>	Cumulative sum of the elements
* a.mean()    =>	    Mean
* np.median(a)	 =>   Median
* np.corrcoef(a)=>	    Correlation coefficient
* np.std(a)	      =>  Standard deviation


In [13]:
arr = np.array([1,2,3,4])
print("Sum: ",arr.sum())
print("Minimum: ",a.min())
print("Maximum: ",a.max(axis=0)) 
a.cumsum(axis=1)
print("Mean: ",a.mean())
print("Median: ",np.median(a))
np.corrcoef(a) 
np.std(a) # Standard diviation

Sum:  10
Minimum:  1
Maximum:  [16 25 36]
Mean:  15.166666666666666
Median:  12.5


12.212243401148248

### Array comparison
* You already saw how you can perform element-wise comparison of array elements. With NumPy you also perform entire array comparisons. Use the command np.array_equal() for array comparison. It is illustrated with examples below:

In [15]:
a = np.array([1,2,3,4])
b = np.array([1,2,3,4])
c = np.array([2,3,54,5])
print(np.array_equal(a,b))
print(np.array_equal(b,c))

True
False


### Understanding Axes notation
* In NumPy, an axis refers to a single dimension of a multidimensional array. By changing axis you can compute across dimensions, whereas not specifying axis will result in computation over the entire array.

In [17]:
a = np.array([[1,4,9],[16,25,36]])

# computes sum over columns
print(a.sum(axis=0))
print('==========')

# computes sum over rows
print(a.sum(axis=1))
print('==========')
z
# computes total sum
print(a.sum())

[17 29 45]
[14 77]
91
