<a href="https://colab.research.google.com/github/Manasipotade/ML-notes/blob/master/Manipulating_Data_with_NumPy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What is NumPy?

It stands for "Numerical Python". NumPy is a Python module that provides fast and efficient array operations of homogeneous data. It is the core library for scientific computing in Python providing a high-performance multidimensional array object, and tools for working with arrays.

NumPy is one of the many packages that are extremely essential in your data science journey because this library equips you with an array data structure that offers some benefits over the traditional data structures of Python like lists.

NumPy Arrays
The central feature of NumPy is the array object class, also called the ndarray. Arrays are very similar to lists in Python, except that every element of an array must be of the same type (in lists you can hold data which have different types), typically a numeric type like float or int. 

Arrays make operations with large amounts of numeric data very fast and are generally much more efficient than lists. You can chose to create arrays of n dimensions (Python list is an array of pointers to Python objects, at least 4 bytes per pointer plus 16 bytes for even the smallest Python object; 4 for type pointer, 4 for reference count, 4 for value and the memory allocators rounds up to 16. A NumPy array is an array of uniform values -- single-precision numbers takes 4 bytes each, double-precision ones, 8 bytes).



# Creating NumPy arrays

The syntax of creating a NumPy array is:

In [0]:
numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)

Here, the arguments

* **object**: Any object exposing the array interface
* **dtype**: Desired data type of array, optional
* **copy**: Optional. By default (true), the object is copied
* **order**: C (row major) or F (column major) or A (any) (default)
* **subok**: By default, returned array forced to be a base class array. If true, sub-classes passed through
* **ndim**: Specifies minimum dimensions of resultant array


# Sample Code

In [1]:
import numpy as np
a = np.array([1,2,3,4])               # creates a 1-dimensional array
b = np.array([[1,2,3,4], [5,6,7,8]])    # creates a 2-dimensional array
print(a)
print('----')
print(b)

[1 2 3 4]
----
[[1 2 3 4]
 [5 6 7 8]]


# Advantages of using NumPy
* Absolutely free since open-sourced
* Faster access in reading and writing items
* Time and space complexity of tasks is much lower when compared with traditional data structures
* Has a lot of built-in functions for linear algebra

# Attributes of NumPy arrays

## Shape
It returns a tuple consisting of array dimensions i.e. tells us how many items are present in each dimension and can be found using the .shape attribute of the ndarray object.

In [2]:
print('The shape of the array a is ', a.shape)
print('The shape of the array b is ', b.shape)

The shape of the array a is  (4,)
The shape of the array b is  (2, 4)


## Dimensions
It gives the number of dimensions and can be found using the .ndim attribute of ndarray object.

In [3]:
print('The dimensions of array a is ', a.ndim)
print('The dimensions of array b is ', b.ndim)

The dimensions of array a is  1
The dimensions of array b is  2


## Size
It tells the total number of items in the array as a whole. More precisely it is the product of the elements of the .shape attribute of the array.


In [4]:
print('The size of the array a is ', a.size)
print('The size of the array b is ', b.size)

The size of the array a is  4
The size of the array b is  8


## Datatype
As the name suggests, it informs about the type of data in the array. Since a NumPy array consists of homogeneous data only, you will get only a single dtype.



In [5]:
print('The datatype of the array a is ', a.dtype)
print('The datatype of the array b is ', b.dtype)

The datatype of the array a is  int64
The datatype of the array b is  int64


##Itemsize
It represents the number of bytes in each element of the array.

In [6]:
print('The number of bytes in each element of the array a is  ', a.itemsize)
print('The number of bytes in each element of the array b is ', b.itemsize)

The number of bytes in each element of the array a is   8
The number of bytes in each element of the array b is  8


## Reshape
It gives a new shape to an array without changing its data.



In [8]:
# initialize NumPy array
array = np.arange(1,11)
print(array)

# check dimensions
dim = array.ndim

# reshaped array
reshaped = array.reshape(5,2)

# check shape
new_dim = reshaped.ndim

#print
print('Original Dimension: ',dim)
print('New Dimension: ',new_dim)

[ 1  2  3  4  5  6  7  8  9 10]
Original Dimension 1
New Dimension 2


# Creating with Low-level ndarray Constructor

## np.empty()
Creates an uninitialized (arbitrary) array of specified shape and dtype


In [9]:
np.empty((3,4),dtype='int8')

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int8)

## np.zeros()
Creates a new array of specified size, filled with zeros



In [10]:
np.zeros((3,4),dtype='int8')

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int8)

## np.ones()
Creates a new array of specified size and type, filled with ones

In [11]:
np.ones((3,4),dtype='int8')

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]], dtype=int8)

## np.full()
Creates a new array of given shape and type, filled with a constant value

In [12]:
np.full((2,2),7)

array([[7, 7],
       [7, 7]])

## np.eye()
Creates a 2-D array with ones on the diagonal and zeros elsewhere


In [14]:
np.eye(3, dtype='int')

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

# Creating with Existing Data

New NumPy arrays can also be created from already existing ones

## np.asarray( )
This command is very similar to the np.array() command.
* Converting a list to array

In [15]:
#python list
a=[1,2,3]
#convert to NumPy array
b=np.asarray(a)
print(b)

[1 2 3]


* Converting tuples into array

In [16]:
#python tuples
a=((1,2),(3,4))

#convery to NumPy array
b=np.asarray(a)
print(b)

[[1 2]
 [3 4]]


## np.fromiter( )
This function creates a NumPy ndarray from an iterable

In [17]:
a=np.fromiter([1,2,3,4],dtype='int8')
b=np.fromiter((1,2,3,4),dtype='int8')
c=np.fromiter(range(1,5),dtype='int8')
d=np.fromiter('string',dtype='S50')

print("Array a is ",a)
print("Array b is ",b)
print("Array c is ",c)
print("Array d is ",d)

Array a is  [1 2 3 4]
Array b is  [1 2 3 4]
Array c is  [1 2 3 4]
Array d is  [b's' b't' b'r' b'i' b'n' b'g']


# Creating with Numerical Ranges

## np.arange( )
It returns an array containing evenly spaced values within a given range.

In [3]:
# NumPy array from 1 to 19
print(np.arange(1,20,dtype='int32'))

# NumPy array from 1 to 19 with step size 2
print(np.arange(1,20,2,dtype='int8'))

[ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[ 1  3  5  7  9 11 13 15 17 19]


## np.linspace( )
It also returns an array within a range but not according to the step size as in the case of .arange() but according to the number of values we want within that range.

**Syntax:** numpy.linspace(start, stop, num, endpoint, retstep, dtype)



In [4]:
# NumPy array from 1 to 20 with 100 numbers in between
print(np.linspace(1,20,100))

[ 1.          1.19191919  1.38383838  1.57575758  1.76767677  1.95959596
  2.15151515  2.34343434  2.53535354  2.72727273  2.91919192  3.11111111
  3.3030303   3.49494949  3.68686869  3.87878788  4.07070707  4.26262626
  4.45454545  4.64646465  4.83838384  5.03030303  5.22222222  5.41414141
  5.60606061  5.7979798   5.98989899  6.18181818  6.37373737  6.56565657
  6.75757576  6.94949495  7.14141414  7.33333333  7.52525253  7.71717172
  7.90909091  8.1010101   8.29292929  8.48484848  8.67676768  8.86868687
  9.06060606  9.25252525  9.44444444  9.63636364  9.82828283 10.02020202
 10.21212121 10.4040404  10.5959596  10.78787879 10.97979798 11.17171717
 11.36363636 11.55555556 11.74747475 11.93939394 12.13131313 12.32323232
 12.51515152 12.70707071 12.8989899  13.09090909 13.28282828 13.47474747
 13.66666667 13.85858586 14.05050505 14.24242424 14.43434343 14.62626263
 14.81818182 15.01010101 15.2020202  15.39393939 15.58585859 15.77777778
 15.96969697 16.16161616 16.35353535 16.54545455 16

## np.logspace( )
This function returns an array containing numbers that are evenly spaced on a log scale. Start and stop endpoints of the scale are indices of the base, usually 10.

**Syntax:** numpy.logspace(start, stop, num, endpoint, base, dtype)

In [5]:
# NumPy array from 10^0 to 10^2 with 100 numbers in log scale
print(np.logspace(0,2,100))

[  1.           1.04761575   1.09749877   1.149757     1.20450354
   1.26185688   1.32194115   1.38488637   1.45082878   1.51991108
   1.59228279   1.66810054   1.7475284    1.83073828   1.91791026
   2.009233     2.10490414   2.20513074   2.3101297    2.42012826
   2.53536449   2.65608778   2.7825594    2.91505306   3.05385551
   3.19926714   3.35160265   3.51119173   3.67837977   3.85352859
   4.03701726   4.22924287   4.43062146   4.64158883   4.86260158
   5.09413801   5.33669923   5.59081018   5.85702082   6.13590727
   6.42807312   6.73415066   7.05480231   7.39072203   7.74263683
   8.11130831   8.49753436   8.90215085   9.32603347   9.77009957
  10.23531022  10.72267222  11.23324033  11.76811952  12.32846739
  12.91549665  13.53047775  14.17474163  14.84968262  15.55676144
  16.29750835  17.07352647  17.88649529  18.73817423  19.6304065
  20.56512308  21.5443469   22.5701972   23.64489413  24.77076356
  25.95024211  27.18588243  28.48035868  29.8364724   31.2571585
  32.7454916

# Indexing
Array indexing and slicing is exactly similar like Python indexing and slicing.
It follows the same pattern of array[start:stop:step].
<br>
## Integer array indexing
Integer array indexing allows you to construct arbitrary arrays using the data from another array.
<br>

### Boolean indexing
This type is generally used for comparison purposes. For ex: How about checking if how many numbers in the array are greater than 50(say)? It can be performed using a simple comparison operator (>=, >, ==, <, <=)

A boolean index array is of the same shape as the array-to-be-filtered and it contains only True and False values. You can filter those you want using the concept of masking . For ex: If for some array a and boolean condition condition = a > 2, a[condition] will result in an array that contains only the numbers in array a that are greater than 2.


In [6]:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])

# Pull out second element of third row
print(a[2][1])
print('==========')
# Pull out first two rows and columns
print(a[:2,:2])
print('==========')
# Pull all elements of the third row
print(a[2,:])

8
[[1 2]
 [4 5]]
[7 8 9]


In [7]:
# An example of integer array indexing
a=np.array([[1,2],[3,4],[5,6]])

print(a[[0,1,2],[0,1,0]])
print('==========')

print(np.array([a[0,0],a[1,1],a[2,0]]))
print('==========')

print(a[[0,0],[1,1]])
print('==========')

print(np.array([a[0,1],a[0,1]]))

#Explaination:
#The print statements in line numbers 4 and 7 yield the same result, likewise in lines 10 and 13. 
#In the first case, a[[0, 1, 2], [0, 1, 0]] essentially means we are indexing the value in first row-first column, 
#second row-second column and third row-first column, which is the same as a[0, 0], a[1, 1], a[2, 0]]. 
#Similarly you should be able to deduce the logic behind the second case.

[1 4 5]
[1 4 5]
[2 2]
[2 2]


In [8]:
#Boolean indexing
a = np.array([[4,7,1],[2,5,7],[7,1,1]])

# Boolean condition for values greater than 3
mask = a > 3
print(mask)

# Masking for the above boolean condition in the array
print(a[mask])

[[ True  True False]
 [False  True  True]
 [ True False False]]
[4 7 5 7 7]


# Filter imaginary numbers

* Create a (4,) array with values 3, 4.5, 3 + 5j and 0 using "np.array()". Save it to a variable array
* Create a boolean condition real to retain only real number using .isreal(array). (Note: .isreal(array) returns a Boolean value which is True if the number inside the array is a real number and False otherwise)
*Now apply this Boolean condition i.e. real on array using Boolean indexing (explained in the topic) by array[real] and store it in variable real_array.
*Similarly create a Boolean condition imag to retain only complex numbers which you can do it using .iscomplex(array). This time create an array imag_array which contains only complex numbers using the Boolean condition array[imag]

In [9]:
# initialize array
array = np.array([ 3, 4.5, 3 + 5j , 0])

# boolean filter
real = np.isreal(array)
print(real)

# boolean filter
real_array = array[real]
print(real_array)


imag = np.iscomplex(array)
print(imag)

imag_array = array[imag]
print(imag_array)

[ True  True False  True]
[3. +0.j 4.5+0.j 0. +0.j]
[False False  True False]
[3.+5.j]
