#1. Introduction to Numpy
Numpy is the most basic and a powerful package for working with data in python.


If you are going to work on data analysis or machine learning projects, then having a solid understanding of numpy is nearly mandatory.


Because other packages for data analysis (like pandas) is built on top of numpy and the scikit-learn package which is used to build machine learning applications works heavily with numpy as well.

So what does numpy provide?

At the core, numpy provides the excellent ndarray objects, short for n-dimensional arrays.

In a ‘ndarray’ object, aka ‘array’, you can store multiple items of the same data type. It is the facilities around the array object that makes numpy so convenient for performing math and data manipulations.

You might wonder, ‘I can store numbers and other objects in a python list itself and do all sorts of computations and manipulations through list comprehensions, for-loops etc. What do I need a numpy array for?’

Well, there are very significant advantages of using numpy arrays overs lists.

To understand this, let’s first see how to create a numpy array.

#2. How to create a numpy array?
Arrays in Numpy can be created by multiple ways, with various number of Ranks, defining the size of the Array. Arrays can also be created with the use of various data types such as lists, tuples, etc. 

In [1]:
# Create an 1d array from a list
import numpy as np
list1 = [0,1,2,3,4]
arr1d = np.array(list1)

# Print the array and its type
print( type(arr1d))
print('\n' , arr1d)

# Creating an array from tuple
arr = np.array((1, 3, 2))
print("\nArray created using "
      "passed tuple:\n", arr)

<class 'numpy.ndarray'>

 [0 1 2 3 4]

Array created using passed tuple:
 [1 3 2]


The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not.

That means, if you apply a function it is performed on every item in the array, rather than on the whole array object.

Let’s suppose you want to add the number 2 to every item in the list. The intuitive way to do it is something like this

In [0]:
list1 + 2  # error

That was not possible with a list. But you can do that on a ndarray.

In [3]:
# Add 2 to each element of arr1d
arr1d + 2


array([2, 3, 4, 5, 6])

You may also specify the datatype by setting the dtype argument. Some of the most commonly used numpy dtypes are: 'float', 'int', 'bool', 'str' and 'object'.

To control the memory allocations you may choose to use one of ‘float32’, ‘float64’, ‘int8’, ‘int16’ or ‘int32’.

In [2]:
# Create a float 2d array
arr2d_f = np.array(list1, dtype='float')
print(arr2d_f)

[0. 1. 2. 3. 4.]


The decimal point after each number is indicative of the float datatype. You can also convert it to a different datatype using the astype method.




In [0]:
# Convert to 'int' datatype
arr2d_f.astype('int')

array([0, 1, 2, 3, 4])

A numpy array must have all items to be of the same data type, unlike lists. This is another significant difference.

However, if you are uncertain about what datatype your array will hold or if you want to hold characters and numbers in the same array, you can set the dtype as 'object'.

In [4]:
# Create an object array to hold numbers as well as strings
arr1d_obj = np.array([1, 'a'], dtype='object')
arr1d_obj

array([1, 'a'], dtype=object)

Finally, you can always convert an array back to a python list using tolist().



In [5]:
# Convert an array back to a list
arr1d_obj.tolist()

[1, 'a']



```
# This is formatted as code
```

To summarise, the main differences with python lists are:


1.   Arrays support vectorised operations, while lists don’t.
2.   Once an array is created, you cannot change its size. You will have to create a new array or overwrite the existing one.
4.   Every array has one and only one dtype. All items in it should be of that dtype.
5.   An equivalent numpy array occupies much less space than a python list of lists.





#3. How to inspect the size and shape of a numpy array?

<li>If it is a 1D or a 2D array or more. (ndim)</li>

<li>How many items are present in each dimension (shape)</li>

<li>What is its datatype (dtype)</li>

<li>What is the total number of items in it (size)</li>


In [8]:
# Creating a rank 2 Array
arr2 = np.array([[1, 2, 3] , [4 , 5 , 6]])
print(arr2)

# shape
print('\nShape: ', arr2.shape)

# dtype
print('\nDatatype: ', arr2.dtype)

# size
print('\nSize: ', arr2.size)

# ndim
print('\nNum Dimensions: ', arr2.ndim)

[[1 2 3]
 [4 5 6]]

Shape:  (2, 3)

Datatype:  int64

Size:  6

Num Dimensions:  2


#4. How to extract specific items from an array?

You can extract specific portions on an array using indexing starting with 0, something similar to how you would do with python lists.

In [9]:
# Extract the first 2 rows and columns
arr2[:2, :2]

array([[1, 2],
       [4, 5]])

Additionally, numpy arrays support boolean indexing.

A boolean index array is of the same shape as the array-to-be-filtered and it contains only True and False values. The values corresponding to True positions are retained in the output.

In [10]:
b = arr2 > 4
b

array([[False, False, False],
       [False,  True,  True]])

#4.2 How to represent missing values and infinite?
Missing values can be represented using np.nan object, while np.inf represents infinite. Let’s place some in arr2d.

In [14]:
# Insert a nan and an inf

print('not a number',np.nan)


print('\ninfinite',np.nan)


not a number nan

infinite nan


#4.3 How to compute mean, min, max on the ndarray?
The ndarray has the respective methods to compute this for the whole array.

In [16]:
# mean, max and min
print("Mean value is: ", arr2.mean())

print("\nMax value is: ", arr2.max())

print("\nMin value is: ", arr2.min())

print("\nStandard Deviation is: ", arr2.std())

Mean value is:  3.5

Max value is:  6

Min value is:  1

Standard Deviation is:  1.707825127659933


#5. How to create a new array from an existing array?
If you just assign a portion of an array to another array, the new array you just created actually refers to the parent array in memory.

That means, if you make any changes to the new array, it will reflect in the parent array as well.

So to avoid disturbing the parent array, you need to make a copy of it using copy(). All numpy arrays come with the copy() method.

In [17]:
# Assign portion of arr2 to arr2a. Doesn't really create a new array.
arr2a = arr2[:2,:2]  
arr2a[:1, :1] = 100  # 100 will reflect in arr2
arr2

array([[100,   2,   3],
       [  4,   5,   6]])

In [18]:
# Copy portion of arr2 to arr2b
arr2b = arr2[:2, :2].copy()
arr2b[:1, :1] = 101  # 101 will not reflect in arr2
arr2

array([[100,   2,   3],
       [  4,   5,   6]])

#6. Reshaping and Flattening Multidimensional arrays
Reshaping is changing the arrangement of items so that shape of the array changes while maintaining the same number of dimensions.

Flattening, however, will convert a multi-dimensional array to a flat 1d array. And not any other shape.

First, let’s reshape the arr2 array from 2×3 to 3×2 shape.

In [22]:
# Reshape a 3x4 array to 4x3 array
arr2.reshape(3, 2)

array([[100,   2],
       [  3,   4],
       [  5,   6]])

#6.1 What is the difference between flatten() and ravel()?
There are 2 popular ways to implement flattening. That is using the flatten() method and the other using the ravel() method.

The difference between ravel and flatten is, the new array created using ravel is actually a reference to the parent array. So, any changes to the new array will affect the parent as well. But is memory efficient since it does not create a copy.

In [23]:
# Flatten it to a 1d array
arr2.flatten()

array([100,   2,   3,   4,   5,   6])

In [24]:
# Changing the flattened array does not change parent
b1 = arr2.flatten()  
b1[0] = 100  # changing b1 does not affect arr2
arr2

array([[100,   2,   3],
       [  4,   5,   6]])

In [25]:
# Changing the raveled array changes the parent also.
b2 = arr2.ravel()  
b2[0] = 101  # changing b2 changes arr2 also
arr2

array([[101,   2,   3],
       [  4,   5,   6]])

#7. How to create sequences, repetitions and random numbers using numpy?
The np.arange function comes handy to create customised number sequences as ndarray.

In [34]:
# Lower limit is 0 be default
print(np.arange(5))  

# 0 to 9
print('\n' , np.arange(0, 10))  

# 0 to 9 with step of 2
print('\n' , np.arange(0, 10, 2))  

# 10 to 1, decreasing order
print('\n' , np.arange(10, 0, -1))

[0 1 2 3 4]

 [0 1 2 3 4 5 6 7 8 9]

 [0 2 4 6 8]

 [10  9  8  7  6  5  4  3  2  1]


You can set the starting and end positions using np.arange. But if you are focussed on the number of items in the array you will have to manually calculate the appropriate step value.

Say, you want to create an array of exactly 10 numbers between 1 and 50, Can you compute what would be the step value?

Well, I am going to use the np.linspace instead.

In [35]:
# Start at 1 and end at 50
np.linspace(start=1, stop=50, num=10, dtype=int)

array([ 1,  6, 11, 17, 22, 28, 33, 39, 44, 50])

The np.zeros and np.ones functions lets you create arrays of desired shape where all the items are either 0’s or 1’s.

In [36]:
np.zeros([2,2])

array([[0., 0.],
       [0., 0.]])

In [37]:
np.ones([2,2])

array([[1., 1.],
       [1., 1.]])

#7.2 How to generate random numbers?
The random module provides nice functions to generate random numbers (and also statistical distributions) of any given shape.

In [40]:
# Random numbers between [0,1) of shape 2,2
print(np.random.rand(2,2))

# Normal distribution with mean=0 and variance=1 of shape 2,2
print('\n' , np.random.randn(2,2))

# Random integers between [0, 10) of shape 2,2
print('\n' , np.random.randint(0, 10, size=[2,2]))

# One random number between [0,1)
print('\n' , np.random.random())

# Random numbers between [0,1) of shape 2,2
print('\n' , np.random.random(size=[2,2]))

# Pick 10 items from a given list, with equal probability
print('\n' , np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10))  


[[0.11293927 0.90182994]
 [0.55347829 0.43346377]]

 [[-1.16982018 -0.57790422]
 [-0.09951061  0.44328968]]

 [[1 3]
 [6 6]]

 0.9587565027061363

 [[0.94616945 0.95918836]
 [0.02005605 0.26668104]]

 ['a' 'a' 'i' 'e' 'o' 'a' 'i' 'a' 'e' 'u']


#7.3 How to get the unique items and the counts?
The np.unique method can be used to get the unique items. If you want the repetition counts of each item, set the return_counts parameter to True.

In [44]:
# Create random integers of size 10 between [0,10)
arr_rand = np.random.randint(0, 10, size=10)
print(arr_rand)

[2 2 1 0 8 4 0 9 6 2]


In [45]:
# Get the unique items and their counts
uniqs, counts = np.unique(arr_rand, return_counts=True)
print("Unique items : ", uniqs)
print("Counts       : ", counts)

Unique items :  [0 1 2 4 6 8 9]
Counts       :  [2 1 3 1 1 1 1]


#8.Array math
Basic mathematical functions operate elementwise on arrays, and are available both as operator overloads and as functions in the numpy module:

In [47]:
import numpy as np

x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
print(x + y)
print(np.add(x, y))

# Elementwise difference; both produce the array
print('\n' ,x - y)
print(np.subtract(x, y))

# Elementwise product; both produce the array
print('\n' , x * y)
print(np.multiply(x, y))

# Elementwise division; both produce the array
print('\n' , x / y)
print(np.divide(x, y))

# Elementwise square root; produces the array
print('\n' , np.sqrt(x))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]

 [[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]

 [[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]

 [[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]

 [[1.         1.41421356]
 [1.73205081 2.        ]]


 '*'  is elementwise multiplication, not matrix multiplication. We instead use the dot function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. dot is available both as a function in the numpy module and as an instance method of array objects:

In [55]:
x = np.array([[1,2],[3,4]])

v = np.array([9,10])



print( np.dot(x, v))




 [29 67]


Numpy provides many useful functions for performing computations on arrays; one of the most useful is sum:








In [0]:
x = np.array([[1,2],[3,4]])

print(np.sum(x))  # Compute sum of all elements; prints "10"
print(np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print(np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

Apart from computing mathematical functions using arrays, we frequently need to reshape or otherwise manipulate data in arrays. The simplest example of this type of operation is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object:

In [57]:
x = np.array([[1,2], [3,4]])
print(x)

print('\n' , x.T)




[[1 2]
 [3 4]]

 [[1 3]
 [2 4]]


#Inverse matrix
We use numpy.linalg.inv () function to calculate the inverse of a matrix. The inverse of a matrix is ​​such that if it is multiplied by the original matrix, it results in identity matrix.


In [64]:
x = np.array([[1,4],[1,9]]) 
print(x)


y = np.linalg.inv(x) 
print('\n' , y)


print('\n' ,np.dot(x,y))

[[1 4]
 [1 9]]

 [[ 1.8 -0.8]
 [-0.2  0.2]]

 [[ 1.00000000e+00  0.00000000e+00]
 [-5.55111512e-17  1.00000000e+00]]
