# Numpy

## Installation of NumPy

* If you have Python and PIP already installed on a system
* pip install numpy

In [1]:
import numpy as np
import pandas as pd

In [2]:
print(np.__version__)

1.16.5


### Dimensions in Arrays

* A dimension in arrays is one level of array depth (nested arrays).
* nested array: are arrays that have arrays as their elements.

In [3]:
# np.array() used to create array object

In [94]:
# 0-D arrays, or Scalars,
arr = np.array(42)
print(arr)
print(arr.dtype)

42
int32


In [5]:
# 1D array
arr=[1,2,3,8,9,6]
numpy_array=np.array(arr)
numpy_array

array([1, 2, 3, 8, 9, 6])

In [99]:
# 2D array
arr=[[1,2,3],[4,5,6],[6,7,8]]
numpy_array=np.array(arr)
print("Size of array:",numpy_array.size)
print("itemsize of array:",numpy_array.itemsize)
print(numpy_array)

Size of array: 9
itemsize of array: 4
[[1 2 3]
 [4 5 6]
 [6 7 8]]


In [7]:
# 3D array
arr=[[[1,2,3],[4,5,6],[6,7,8]]]
numpy_array=np.array(arr)
numpy_array

array([[[1, 2, 3],
        [4, 5, 6],
        [6, 7, 8]]])

In [8]:
# n dimensional array
arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print('number of dimensions :', arr.ndim)

[[[[[1 2 3 4]]]]]
number of dimensions : 5


In [9]:
# Find the dimension of array
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

0
1
2
3


### Access Array Elements

In [10]:
arr = np.array([1, 2, 3, 4])

print(arr[0])

1


In [11]:
arr = np.array([1, 2, 3, 4])

print(arr[2] + arr[3])

7


In [12]:
# 2d araay
# Think of 2-D arrays like a table with rows and columns, 
# where the dimension represents the row and the index represents the column.
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('2nd element on 1st row: ', arr[0, 1])

2nd element on 1st row:  2


In [13]:
# 3D array access
# To access elements from 3-D arrays we can use comma separated integers representing the dimensions 
# and the index of the element.


In [14]:
# Access the third element of the second array of the first array:
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr)
print(arr[0, 1, 2])
# Access the second element of the first array of the second array:
print(arr[1, 0, 1])

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
6
8


"""
Example Explained
arr[0, 1, 2] prints the value 6.

And this is why:

The first number represents the first dimension, which contains two arrays:
[[1, 2, 3], [4, 5, 6]]
and:
[[7, 8, 9], [10, 11, 12]]
Since we selected 0, we are left with the first array:
[[1, 2, 3], [4, 5, 6]]

The second number represents the second dimension, which also contains two arrays:
[1, 2, 3]
and:
[4, 5, 6]
Since we selected 1, we are left with the second array:
[4, 5, 6]

The third number represents the third dimension, which contains three values:
4
5
6
Since we selected 2, we end up with the third value:
6
"""

## slicing

In [15]:
arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[-3:-1])

[5 6]


In [16]:
arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5:2])

[2 4]


In [17]:
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print(arr[1, 1:4])

[7 8 9]


## Data Types in NumPy

Below is a list of all data types in NumPy and the characters used to represent them.

* i - integer
* b - boolean
* u - unsigned integer
* f - float
* c - complex float
* m - timedelta
* M - datetime
* O - object
* S - string
* U - unicode string
* V - fixed chunk of memory for other type ( void )

In [18]:
arr = np.array([1, 2, 3, 4])

print(arr.dtype)

int32


In [19]:
# Create an array with data type string:
arr = np.array([1, 2, 3, 4], dtype='S')

print(arr)
print(arr.dtype)

[b'1' b'2' b'3' b'4']
|S1


In [20]:
# Create an array with data type 4 bytes integer:
arr = np.array([1, 2, 3, 4], dtype='i4')

print(arr)
print(arr.dtype)

[1 2 3 4]
int32


#### Converting Data Type on Existing Arrays

* The best way to change the data type of an existing array, is to make a copy of the array with the astype() method.

* The astype() function creates a copy of the array, and allows you to specify the data type as a parameter.

* The data type can be specified using a string, like 'f' for float, 'i' for integer etc. or you can use the data type directly like float for float and int for integer.

In [21]:
arr = np.array([1.1, 2.1, 3.1])

newarr = arr.astype('i')

print(newarr)
print(newarr.dtype)

[1 2 3]
int32


In [22]:
arr = np.array([1, 0, 3])

newarr = arr.astype(bool)

print(newarr)
print(newarr.dtype)

[ True False  True]
bool


**The Difference Between Copy and View**
* The main difference between a copy and a view of an array is that the copy is a new array, and the view is just a view of the original array.

* The copy owns the data and any changes made to the copy will not affect original array, and any changes made to the original array will not affect the copy.

* The view does not own the data and any changes made to the view will affect the original array, and any changes made to the original array will affect the view.

In [23]:
# Make a copy, change the original array, and display both arrays:
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
arr[0] = 42

print(arr)
print(x)

[42  2  3  4  5]
[1 2 3 4 5]


## Shape of an Array

* The shape of an array is the number of elements in each dimension.

In [24]:
# Print the shape of a 2-D array:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print(arr.shape)

(2, 4)


In [25]:
# Create an array with 5 dimensions using ndmin using a vector with values 1,2,3,4 and verify that last dimension has value 4:
arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print('shape of array :', arr.shape)

[[[[[1 2 3 4]]]]]
shape of array : (1, 1, 1, 1, 4)


**What does the shape tuple represent?**
* Integers at every index tells about the number of elements the corresponding dimension has.

* In the example above at index-4 we have value 4, so we can say that 5th ( 4 + 1 th) dimension has 4 elements.

## **Reshaping arrays**
* Reshaping means changing the shape of an array.

* The shape of an array is the number of elements in each dimension.

* By reshaping we can add or remove dimensions or change number of elements in each dimension.

In [26]:
# Convert the following 1-D array with 12 elements into a 2-D array.
#The outermost dimension will have 4 arrays, each with 3 elements:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(4, 3)

print(newarr)


[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [27]:
# Convert the following 1-D array with 12 elements into a 3-D array.
# The outermost dimension will have 2 arrays that contains 3 arrays, each with 2 elements:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(2, 3, 2)

print(newarr)

[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]


## Iterating Arrays
* Iterating means going through elements one by one.

* As we deal with multi-dimensional arrays in numpy, we can do this using basic for loop of python.

* If we iterate on a 1-D array it will go through each element one by one.

In [28]:
# Iterate on the elements of the following 1-D array:
arr = np.array([1, 2, 3])

for x in arr:
  print(x)

1
2
3


In [29]:
# Iterate on the elements of the following 2-D array:
arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
  print(x)

[1 2 3]
[4 5 6]


If we iterate on a n-D array it will go through n-1th dimension one by one.

To return the actual values, the scalars, we have to iterate the arrays in each dimension.

In [30]:
# Iterate on each scalar element of the 2-D array:
arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
  for y in x:
    print(y)

1
2
3
4
5
6


In [31]:
# Iterate on the elements of the following 3-D array:
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
  print(x)

[[1 2 3]
 [4 5 6]]
[[ 7  8  9]
 [10 11 12]]


In [32]:
# Iterate down to the scalars:
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

for x in arr:
  for y in x:
    for z in y:
      print(z)

1
2
3
4
5
6
7
8
9
10
11
12


## Iterating Arrays Using nditer()
* The function nditer() is a helping function that can be used from very basic to very advanced iterations. 
* It solves some basic issues which we face in iteration, lets go through it with examples.

In [33]:
arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

for x in np.nditer(arr):
  print(x)

1
2
3
4
5
6
7
8


In [34]:
# Iterate through every scalar element of the 2D array skipping 1 element:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for x in np.nditer(arr[:, ::2]):
  print(x)

1
3
5
7


## Enumerated Iteration Using ndenumerate()
* Enumeration means mentioning sequence number of somethings one by one.

* Sometimes we require corresponding index of the element while iterating, the ndenumerate() method can be used for those usecases.

In [35]:
# Enumerate on following 1D arrays elements:
arr = np.array([1, 2, 3])

for idx, x in np.ndenumerate(arr):
  print(idx, x)

(0,) 1
(1,) 2
(2,) 3


In [36]:
# Enumerate on following 2D array's elements:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for idx, x in np.ndenumerate(arr):
  print(idx, x)

(0, 0) 1
(0, 1) 2
(0, 2) 3
(0, 3) 4
(1, 0) 5
(1, 1) 6
(1, 2) 7
(1, 3) 8


## Joining NumPy Arrays
* Joining means putting contents of two or more arrays in a single array.

* In SQL we join tables based on a key, whereas in NumPy we join arrays by axes.

* We pass a sequence of arrays that we want to join to the concatenate() function, along with the axis. If axis is not explicitly passed, it is taken as 0.

In [37]:
# Join two arrays
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.concatenate((arr1, arr2))

print(arr)

[1 2 3 4 5 6]


In [38]:
# Join two 2-D arrays along rows (axis=1):
arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=1)

print(arr)

[[1 2 5 6]
 [3 4 7 8]]


### Joining Arrays Using Stack Functions
* Stacking is same as concatenation, the only difference is that stacking is done along a new axis.

* We can concatenate two 1-D arrays along the second axis which would result in putting them one over the other, ie. stacking.

* We pass a sequence of arrays that we want to join to the stack() method along with the axis. If axis is not explicitly passed it is taken as 0.

In [39]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.stack((arr1, arr2), axis=1)

print(arr)

[[1 4]
 [2 5]
 [3 6]]


In [40]:
# hstack() to stack along rows.
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.hstack((arr1, arr2))

print(arr)

[1 2 3 4 5 6]


In [41]:
# vstack()  to stack along columns.
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.vstack((arr1, arr2))

print(arr)

[[1 2 3]
 [4 5 6]]


In [42]:
# dstack() to stack along height, which is the same as depth.
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.vstack((arr1, arr2))

print(arr)

[[1 2 3]
 [4 5 6]]


## Splitting NumPy Arrays
* Splitting is reverse operation of Joining.

* Joining merges multiple arrays into one and Splitting breaks one array into multiple.

* We use array_split() for splitting arrays, we pass it the array we want to split and the number of splits.

In [43]:
arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3) # The return value is a list containing three arrays.

print(newarr) # If the array has less elements than required, it will adjust from the end accordingly.

[array([1, 2]), array([3, 4]), array([5, 6])]


In [44]:
arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 4)

print(newarr)

[array([1, 2]), array([3, 4]), array([5]), array([6])]


**Note:**
* We also have the method split() available but it will not adjust the elements when elements are less in source array
* for splitting like in example above, array_split() worked properly but split() would fail.

In [45]:
# Use the hsplit() method to split the 2-D array into three 2-D arrays along rows.
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])

newarr = np.hsplit(arr, 3)

print(newarr)

[array([[ 1],
       [ 4],
       [ 7],
       [10],
       [13],
       [16]]), array([[ 2],
       [ 5],
       [ 8],
       [11],
       [14],
       [17]]), array([[ 3],
       [ 6],
       [ 9],
       [12],
       [15],
       [18]])]


## Searching Arrays
* You can search an array for a certain value, and return the indexes that get a match.

* To search an array, use the where() method.

In [46]:
# Find the indexes where the value is 4:
arr = np.array([1, 2, 3, 4, 5, 4, 4])

x = np.where(arr == 4)

print(x)

(array([3, 5, 6], dtype=int64),)


In [47]:
# Find the indexes where the values are even:
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

x = np.where(arr%2 == 0)

print(x)

(array([1, 3, 5, 7], dtype=int64),)


## Sorting Arrays
* Sorting means putting elements in an ordered sequence.

* Ordered sequence is any sequence that has an order corresponding to elements, like numeric or alphabetical, ascending or descending.

* The NumPy ndarray object has a function called sort(), that will sort a specified array.

In [48]:
arr = np.array([3, 2, 0, 1])

print(np.sort(arr))

[0 1 2 3]


In [49]:
# If you use the sort() method on a 2-D array, both arrays will be sorted:
arr = np.array([[3, 2, 4], [5, 0, 1]])

print(np.sort(arr))

[[2 3 4]
 [0 1 5]]


## Filtering Arrays
* Getting some elements out of an existing array and creating a new array out of them is called filtering.

* In NumPy, you filter an array using a boolean index list.

* A boolean index list is a list of booleans corresponding to indexes in the array.

* If the value at an index is True that element is contained in the filtered array, if the value at that index is False that element is excluded from the filtered array.

In [50]:
# Create an array from the elements on index 0 and 2:
arr = np.array([41, 42, 43, 44])

x = [True, False, True, False]

newarr = arr[x]

print(newarr)

[41 43]


In [51]:
# Create a filter array that will return only values higher than 42:
arr = np.array([41, 42, 43, 44])

# Create an empty list
filter_arr = []

# go through each element in arr
for element in arr:
  # if the element is higher than 42, set the value to True, otherwise False:
  if element > 42:
    filter_arr.append(True)
  else:
    filter_arr.append(False)

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False, False, True, True]
[43 44]


In [52]:
# Create a filter array that will return only values higher than 42:
arr = np.array([41, 42, 43, 44])

filter_arr = arr > 42

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False False  True  True]
[43 44]


In [53]:
# Create a filter array that will return only even elements from the original array:
arr = np.array([1, 2, 3, 4, 5, 6, 7])

filter_arr = arr % 2 == 0

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False  True False  True False  True False]
[2 4 6]


## Random Numbers in NumPy

* Random number does NOT mean a different number every time. Random means something that can not be predicted logically.
* NumPy offers the random module to work with random numbers.

In [54]:
from numpy import random

In [55]:
# Generate a random integer from 0 to 100:
x = random.randint(100)

print(x)

10


In [56]:
# Generate a random float from 0 to 1:
x = random.rand()

print(x)

0.20305472166090432


In [57]:
# Generate a 1-D array containing 5 random integers from 0 to 100:
# The randint() method takes a size parameter where you can specify the shape of an array.
x=random.randint(100, size=(5))

print(x)

[76 50 63 60 61]


In [58]:
# Generate a 2-D array with 3 rows, each row containing 5 random integers from 0 to 100: 
x = random.randint(100, size=(3, 5))

print(x)

[[44 45 15 12 96]
 [51 72 21 29 49]
 [ 4 46 83 68 13]]


In [59]:
# Generate a 1-D array containing 5 random floats: 
x = random.rand(5)

print(x)

[0.56457952 0.38843029 0.70906519 0.61032181 0.81927047]


In [60]:
# Generate a 2-D array with 3 rows, each row containing 5 random numbers:
x = random.rand(3, 5)

print(x)

[[0.04764545 0.98586617 0.45499138 0.57769879 0.06607519]
 [0.3644219  0.90703593 0.28920151 0.10381248 0.71028932]
 [0.50666742 0.6552872  0.89085733 0.3494739  0.72764263]]


## Generate Random Number From Array
* The choice() method allows you to generate a random value based on an array of values.

* The choice() method takes an array as a parameter and randomly returns one of the values.

In [61]:
# Return one of the values in an array:
x = random.choice([3, 5, 7, 9])

print(x)

9


In [62]:
# Generate a 2-D array that consists of the values in the array parameter (3, 5, 7, and 9):
x = random.choice([3, 5, 7, 9], size=(3, 5))

print(x)

[[9 5 5 7 3]
 [9 5 5 3 3]
 [5 7 7 3 5]]


## What is Data Distribution?
* Data Distribution is a list of all possible values, and how often each value occurs.

* Such lists are important when working with statistics and data science.

* The random module offer methods that returns randomly generated data distributions.

In [63]:
# Generate a 1-D array containing 100 values, where each value has to be 3, 5, 7 or 9.

# The probability for the value to be 3 is set to be 0.1

# The probability for the value to be 5 is set to be 0.3

# The probability for the value to be 7 is set to be 0.6

# The probability for the value to be 9 is set to be 0
x = random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(100))

print(x)

[7 7 7 5 5 7 7 5 7 7 7 7 7 5 3 7 7 7 7 7 7 7 7 7 5 5 7 7 7 7 7 5 7 5 5 7 7
 3 5 7 3 7 7 7 7 7 7 7 7 7 5 5 7 5 7 5 5 7 3 7 7 5 5 5 5 7 7 7 7 3 5 7 7 7
 5 7 3 5 7 7 7 7 3 7 5 7 3 5 7 7 5 5 5 7 5 7 7 7 7 3]


In [64]:
# Randomly shuffle elements of following array:
arr = np.array([1, 2, 3, 4, 5])

random.shuffle(arr) # The shuffle() method makes changes to the original array.

print(arr)

[5 2 3 1 4]


In [65]:
# Generate a random permutation of elements of following array:
arr = np.array([1, 2, 3, 4, 5])

print(random.permutation(arr)) # The permutation() method returns a re-arranged array (and leaves the original array un-changed).

[2 5 3 4 1]


In [69]:
# create numpy array
arr = np.array([1, 2, 4, 5, 6])

#flipud method for reversing
reverse_arr = np.flipud(arr)
print(reverse_arr)

[6 5 4 2 1]


In [70]:
# We can make use of the bincount() function to compute the number of times a given value is there in the array. 
import numpy as np
arr = np.array([1, 2, 1, 3, 5, 0, 0, 0, 2, 3])
result = np.bincount(arr)
print(result)

[3 2 2 2 0 1]


In [92]:
# arr[:,0] - Returns 0th index elements of all rows. In other words, return the first column elements.
arr = np.array([[1,2,3,4],[5,6,7,8]])
new_arr =arr[:,0]
print(new_arr)

[1 5]


In [72]:
# arr[:,[0]] - This returns the elements of the first column by adding extra dimension to it.
arr = np.array([[1,2,3,4],[5,6,7,8]])
new_arr =arr[:,[0]]
print(new_arr)

[[1]
 [5]]


In [73]:
# NumPy matrices
A = np.arange(15,24).reshape(3,3)
B = np.arange(20,29).reshape(3,3)
print("A: ",A)
print("B: ",B)

# Multiply A and B
result = A.dot(B)
print("Result: ", result)

A:  [[15 16 17]
 [18 19 20]
 [21 22 23]]
B:  [[20 21 22]
 [23 24 25]
 [26 27 28]]
Result:  [[1110 1158 1206]
 [1317 1374 1431]
 [1524 1590 1656]]


In [74]:
# NumPy matrices
A = np.arange(15,24).reshape(3,3)
B = np.arange(20,29).reshape(3,3)
print("A: ",A)
print("B: ",B)

# Multiply A and B
result = np.dot(A,B)
print("Result: ", result)

A:  [[15 16 17]
 [18 19 20]
 [21 22 23]]
B:  [[20 21 22]
 [23 24 25]
 [26 27 28]]
Result:  [[1110 1158 1206]
 [1317 1374 1431]
 [1524 1590 1656]]


In [75]:
# Concatenating 2 arrays by adding elements to the end can be achieved by making use of the concatenate() method

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])

# Concatenate with axis 0
c = np.concatenate((a,b), axis=0)
print("With axis 0: \n",c )

# Concatenate with axis 1 (b.T represents transpose matrix)
d = np.concatenate((a,b.T), axis=1)
print("With axis 1: \n",d )

With axis 0: 
 [[1 2]
 [3 4]
 [5 6]]
With axis 1: 
 [[1 2 5]
 [3 4 6]]


#### How do you convert Pandas DataFrame to a NumPy array?

In [76]:
# Pandas DataFrame
df = pd.DataFrame(data={'A': [3, 2, 1], 'B': [6,5,4], 'C': [9, 8, 7]}, 
                  index=['i', 'j', 'k'])
print("Pandas DataFrame: ")
print(df)

# Convert Pandas DataFrame to NumPy Array
np_arr = df.to_numpy()
print("Pandas DataFrame to NumPy array: ")
print(np_arr)


# Convert specific columns of Pandas DataFrame to NumPy array
arr = df[['B', 'C']].to_numpy()
print("Convert B and C columns of Pandas DataFrame to NumPy Array: ")
print (arr)

Pandas DataFrame: 
   A  B  C
i  3  6  9
j  2  5  8
k  1  4  7
Pandas DataFrame to NumPy array: 
[[3 6 9]
 [2 5 8]
 [1 4 7]]
Convert B and C columns of Pandas DataFrame to NumPy Array: 
[[6 9]
 [5 8]
 [4 7]]
