# Numpy
[Numpy](http://www.numpy.org/)\(abbreviated as *Numerical Python*) is a math libraries for python. It enables us to do computation effectively and efficiently. Almost all the data science libraries(scipy,pandas,skikit-learn etc) are built on top of Numpy. The NumPy library takes advantage of a processor feature called Single Instruction Multiple Data (SIMD) to process data faster. SIMD allows a processor to perform the same operation, on multiple data points, in a single processor cycle.

When choosing between a high- and low-level language, you make a trade-off between being able to work quickly, and creating programs that run quickly and efficiently. Luckily, there are two Python libraries that give us the best of both worlds: NumPy and pandas. Together, pandas and NumPy provide a powerful toolset for working with data in Python because they allow us to write code quickly without sacrificing performance.

In [1]:
import numpy as np #importing the libraries

## Numpy Array
### Methods to create numpy array

In [3]:
my_list = [1,2,3] # this is a list
type(my_list)

list

In [5]:
#to convert this list to numpy array

print(np.array(my_list))
type(my_list)

[1 2 3]


list

In [6]:
arr = np.array(my_list)
arr

array([1, 2, 3])

In [9]:
#2-d array
mylist =[[1,2,3],[4,5,6],[7,8,9]] #nested list
arr_2d = np.array(mylist)
arr_2d #the dimension of array can be checked by number of brackets in the start or the end

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [14]:
# To access methods for numpy array. type down the array ending with a .(dot) and hit tab
arr_2d.shape #shape of matrix

(3, 3)

#### Built-in Methods
Build in methods to genreate Numpy array

#### arange

In [17]:
np.arange(0,10) #hit shift + tab to see the parameter of the method
#Start is inclusive and end is exclusive

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [19]:
print(np.arange(0,10,2))
print(np.arange(0,11,2))

[0 2 4 6 8]
[ 0  2  4  6  8 10]


#### zeros

In [20]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [21]:
type(0.)

float

In [22]:
np.zeros((4,10)) #row x columns

array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

#### ones

In [23]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [24]:
np.ones((5,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [28]:
print(np.ones((5,5))+5) #broadcasting property. does not work with lists
print('\n\n')
print(np.ones((5,5)) *5) 

[[6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]]



[[5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]]


#### linspace
if we need to create linearly spaced array 

In [33]:
print(np.linspace(0,10,5)) #Shift + tab
#5 numbers evenly spaced between 0 and 10
print('\n\n')
print(np.linspace(0,20,9))
#9 numbers evenly spaced between 0 and 20
#keep in mind the difference between arange and linspace.
#linspace is start and stop inclusice
#arange is start inclusive but stop exclusive

[ 0.   2.5  5.   7.5 10. ]



[ 0.   2.5  5.   7.5 10.  12.5 15.  17.5 20. ]


#### Identity matrix

In [35]:
#identity matrix, if you remember linear algebra
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

#### Creating array with random values
creates an array of given shape and populates it with random values from a [uniform distribution](https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)) over \[0,1)

In [47]:
print('np.random.rand(): ',np.random.rand()) 
print('\n np.random.rand(1): ',np.random.rand(1)) #creates an array with 1 element 
print('\n np.random.rand(4): ',np.random.rand(4)) # 4 random numbers
print('\n np.random.rand(2,2): ',np.random.rand(2,2)) #4 x 4 matrix of random numbers, notice the format difference from ones or zeros

np.random.rand():  0.8960276855086365

 np.random.rand(1):  [0.72612054]

 np.random.rand(4):  [0.2353357  0.37214931 0.34807536 0.92820463]

 np.random.rand(2,2):  [[0.76666932 0.94036669]
 [0.10475954 0.13941332]]


To create a random array with given shape and random values from [standard normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) 

In [48]:
np.random.randn()

1.2707737263554346

In [49]:
np.random.randn(5)

array([ 0.40293   , -0.24993836,  0.61915481, -0.61760215, -1.25321853])

In [50]:
np.random.randn(5,5)

array([[ 1.75175749, -0.29667004,  1.45760242, -0.26893354,  0.03248742],
       [ 1.06571421,  0.15259204, -1.13651486, -1.30311071, -0.50521058],
       [-0.14577198, -1.18548374,  2.09769754,  0.80522109,  1.71843754],
       [ 1.05308866,  1.20735662, -0.78673114, -0.61097596,  0.77928464],
       [ 0.29482047, -2.21726467,  2.01080421,  1.27914407, -1.75332682]])

In [52]:
#to create a random number from a specified mean and stdev
np.random.normal(loc = 5,scale=1,size = (2,2)) #shift + tab , loc = mean and scale = stdev

array([[6.31441693, 6.06944229],
       [4.26118199, 4.15789035]])

In [56]:
#to create random integers
np.random.randint(0,5) #random number between 0 and 5
np.random.randint(1,100,11) #11 random integer selected between 1 and 100,both inclusive

array([54, 99, 87, 15, 20, 49, 27, 63, 44,  4, 81])

### Array Attributes and methods

In [57]:
arr = np.arange(25)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

In [58]:
ranarr = np.random.randint(0,50,10)
ranarr

array([27, 13, 38, 18, 21, 23,  1, 16, 15, 43])

#### Reshape
Returns an array containing the same data with a new shape [reference](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.reshape.html)

In [59]:
arr.shape

(25,)

In [61]:
arr.reshape(5,5) #should adhere to the laws of matrices.

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [62]:
arr.reshape(5,10) #should adhere to the laws of matrices.

ValueError: cannot reshape array of size 25 into shape (5,10)

In [70]:
arr.reshape(1,25)

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24]])

In [71]:
arr.reshape(1,25).shape

(1, 25)

#### Min,Max,argmin,argmax
These are useful methods for finding max or min values. Or to find their index locations using argmin or argmax

In [63]:
ranarr

array([27, 13, 38, 18, 21, 23,  1, 16, 15, 43])

In [64]:
ranarr.max() #max value

43

In [65]:
ranarr.min()#minimum value

1

In [67]:
ranarr.argmax() #index at which maximum value reside,keep in mind python indexing starts from 0

9

In [69]:
ranarr.argmin() #index at which minimum value reside,keep in mind python indexing starts from 0

6

#### NumPy Indexing and Selection

In [73]:
arr = np.arange(11)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [74]:
arr[8] #index location 8,remeber that python indexing starts from 0


8

In [75]:
arr[1:5] #index 1 till index 5,not including 5th index

array([1, 2, 3, 4])

In [76]:
arr[1:] #from index 1 till the end of the array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [78]:
arr[:5] #from the start of array upto but not including index 5

array([0, 1, 2, 3, 4])

#### Broadcasting

One of the key feature differece between numpy and lists are the numpy ability to broadcast. 

In [79]:
#broadcasting, the changes are not permanent
arr+100

array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110])

In [80]:
arr * 5

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50])

In [84]:
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [85]:
new_arr = arr+100
new_arr

array([100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110])

In [86]:
print(arr,'\n')
slice_of_arr = arr[:6]
print(slice_of_arr)

[ 0  1  2  3  4  5  6  7  8  9 10] 

[0 1 2 3 4 5]


In [87]:
slice_of_arr[:] = 99

In [88]:
slice_of_arr

array([99, 99, 99, 99, 99, 99])

In [89]:
# arr should not change,right?
arr

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

`arr` also changes!! Note that in Numpy the data is not copied but a view of orginal array is created. This allows the numpy to perform fast operations and save memory space.

If you want to copy the data without affecting the orginal array,it needs to specfied explicitly.

In [90]:
arr_copy = arr.copy()
arr_copy

array([99, 99, 99, 99, 99, 99,  6,  7,  8,  9, 10])

### NumPy Indexing and Selection in 2D array

In [92]:
arr_2d =np.array([[5,10,15],[20,25,30],[35,40,45]])
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [93]:
arr_2d.shape

(3, 3)

There are multiple ways to index a 2D array. Choose anyone that is comfortable with.Remember that python index starts from 0

In [97]:
#1st way to index
arr_2d[1] #row on index 1

array([20, 25, 30])

In [95]:
arr_2d[1][1] #row on index 1,column on index 1

25

In [104]:
#2nd way to index, I use this!
arr_2d[2,2]

45

In [105]:
#getting a slice of array

arr_2d[1:,1:]

array([[25, 30],
       [40, 45]])

### Conditional Selection

In [106]:
arr = np.arange(11)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [107]:
arr > 5

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True,  True])

In [110]:
bool_arr = arr>5

In [112]:
arr[bool_arr] #will return array where the index value is true for the bool array. Evertything greater than 5

array([ 6,  7,  8,  9, 10])

In [114]:
#shorter way to do it
arr[arr>5] #pretty neat!!

array([ 6,  7,  8,  9, 10])

## Numpy operations

### Arithmetic

In [115]:
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [116]:
arr + 10

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

In [117]:
arr /100

array([0.  , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ])

In [118]:
arr **2 

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100], dtype=int32)

In [119]:
(arr+2)**5

array([    32,    243,   1024,   3125,   7776,  16807,  32768,  59049,
       100000, 161051, 248832], dtype=int32)

In [120]:
arr + arr

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

In [122]:
1/arr #should give error,because of division by 0
# but numpy gives a warning

  """Entry point for launching an IPython kernel.


array([       inf, 1.        , 0.5       , 0.33333333, 0.25      ,
       0.2       , 0.16666667, 0.14285714, 0.125     , 0.11111111,
       0.1       ])

In [123]:
arr/arr

  """Entry point for launching an IPython kernel.


array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

- nan = not a number
- inf. = infinity

### Universal Array Functions

In [124]:
# Taking Square Roots
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ,
       3.16227766])

In [125]:
# Calculating exponential (e^)
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03, 2.20264658e+04])

In [126]:
# Trigonometric Functions like sine
np.sin(arr)

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849,
       -0.54402111])

In [127]:
# Taking the Natural Logarithm
np.log(arr)

  


array([      -inf, 0.        , 0.69314718, 1.09861229, 1.38629436,
       1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458,
       2.30258509])

### Summary Statistics on Arrays

In [128]:
arr.sum() #summation of all the element in array

55

In [129]:
arr.mean() #mean of array

5.0

In [130]:
arr.max() #max element in array

10

In [131]:
arr.var() #variance of array

10.0

In [132]:
arr.std() #standard deviation of array

3.1622776601683795

### Concept of array axis

In [137]:
arr_2d =np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
arr_2d

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [136]:
arr_2d.sum() #will sum up all the elements

78

What if we wanted sum by rows? or columns? or conditional row/column?

In [139]:
arr_2d.sum(axis = 0) #shift+tab to read the parameters, axis = 0 across the rows
#across the row means along the column

array([15, 18, 21, 24])

In [140]:
arr_2d.sum(axis =1) #axis =1 across the columns
#across the columns means along the rows


array([10, 26, 42])

As confusing as it seems, it needs some practice to get used the concept. Read the stackoverflow [post](https://stackoverflow.com/questions/22149584/what-does-axis-in-pandas-mean) for further clarification.