<a href="https://colab.research.google.com/github/PraveenPrabhat125/Python-for-Data-Analyst/blob/main/Python_for_data_analyst_dctionary_sets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Numpy Arrays
The numpy library is one of the core packages in Python's data science software stack. Many other Python data analysis libraries require numpy as a prerequisite, because they use its array data structure as a building block. The Kaggle Python environment has numpy available by default; if you are running Python locally, the Anaconda Python distribution comes with numpy as well.

Numpy implements a data structure called the N-dimensional array or ndarray. ndarrays are similar to lists in that they contain a collection of items that can be accessed via indexes. On the other hand, ndarrays are homogeneous, meaning they can only contain objects of the same type and they can be multi-dimensional, making it easy to store 2-dimensional tables or matrices.

To work with ndarrays, we need to load the numpy library. It is standard practice to load numpy with the alias "np" like so:

In [3]:
import numpy as np

In [5]:
my_list = [1,2,3,4]

my_array = np.array(my_list)
print(my_array)
type(my_array)

[1 2 3 4]


numpy.ndarray

To create an array with more than one dimension, pass a nested list to np.array():

In [7]:
# 2 array
second_list = [5,6,7,8]

two_d_array = np.array([my_list, second_list]) # 2d array

print(two_d_array)
type(two_d_array)

[[1 2 3 4]
 [5 6 7 8]]


numpy.ndarray

In [8]:
two_d_array.shape

(2, 4)

In [9]:
two_d_array.size

8

Check the type of the data in an ndarray with the dtype attribute:

In [10]:
two_d_array.dtype

dtype('int64')

#. Special metric
1. Identity metric
2. zero metric
3. null metric

In [11]:
#Identity metric
np.identity(n = 5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [12]:
np.zeros(shape  = [4,6])

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [17]:
np.eye(N=3, M = 5, k = 2)

# np.eye() to create a 2d array with 1's across a specified diagonal

# np.eye(N = 3,  # Number of rows
#        M = 5,  # Number of columns
#        k = 1)  # Index of the diagonal (main diagonal (0) is default)

array([[0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [18]:
# ones
np.ones(shape = [3,4])

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

## Array Indexing and Slicing

In [22]:
# one d array
one_d_array  = np.array([1,2,3,4,5,6])
print(one_d_array[3]) # only 3rd index ele

print(one_d_array[3:]) # from 4th place

print(one_d_array[:]) # all element

print(one_d_array[::-1]) # reverse array

4
[4 5 6]
[1 2 3 4 5 6]
[6 5 4 3 2 1]


In [23]:
# new 2d array
two_d_arry = np.array([one_d_array, one_d_array+6, one_d_array+12])
two_d_arry

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12],
       [13, 14, 15, 16, 17, 18]])

In [26]:
print(two_d_arry[1,1]) # one else 1,1

print(two_d_arry[:, :]) # complete metrix

print(two_d_arry[1:, 4:]) # one else 1,1



8
[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]
 [13 14 15 16 17 18]]
[[11 12]
 [17 18]]


In [27]:
# Reverse both dimensions (180 degree rotation
print(two_d_arry[::-1, ::-1])


[[18 17 16 15 14 13]
 [12 11 10  9  8  7]
 [ 6  5  4  3  2  1]]


## Reshaping Arrays
Numpy has a variety of built in functions to help you manipulate arrays quickly without having to use complicated indexing operations.

Reshape an array into a new array with the same data but different structure with np.reshape():

In [28]:
np.reshape(a=two_d_arry, newshape=[6,3])

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

Unravel a multi-dimensional into 1 dimension with np.ravel():

In [29]:
np.ravel(a=two_d_arry, order = 'C')

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18])

In [30]:
np.ravel(a=two_d_array,
         order='F')         # Use Fortran-style unraveling (by columns)

array([1, 5, 2, 6, 3, 7, 4, 8])

Alternatively, use ndarray.flatten() to flatten a multi-dimensional into 1 dimension and return a copy of the result:

In [31]:
two_d_arry.flatten()

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18])

## Transpose
get the transpose of an array with ndarray.T:

In [32]:
two_d_arry.T

array([[ 1,  7, 13],
       [ 2,  8, 14],
       [ 3,  9, 15],
       [ 4, 10, 16],
       [ 5, 11, 17],
       [ 6, 12, 18]])

Flip an array vertically or horizontally with np.flipud() and np.fliplr() respectively:

In [34]:
np.flipud(two_d_arry)

array([[13, 14, 15, 16, 17, 18],
       [ 7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6]])

In [35]:
np.fliplr(two_d_arry)

array([[ 6,  5,  4,  3,  2,  1],
       [12, 11, 10,  9,  8,  7],
       [18, 17, 16, 15, 14, 13]])

Rotate an array 90 degrees counter-clockwise with np.rot90():

In [36]:
np.rot90(two_d_arry, k =1)

array([[ 6, 12, 18],
       [ 5, 11, 17],
       [ 4, 10, 16],
       [ 3,  9, 15],
       [ 2,  8, 14],
       [ 1,  7, 13]])

Shift elements in an array along a given dimension with np.roll():

In [37]:
np.roll(a= two_d_array,
        shift = 2,        # Shift elements 2 positions
        axis = 1)         # In each row

array([[3, 4, 1, 2],
       [7, 8, 5, 6]])

## Concat


In [41]:
array_to_join = np.array([[10,20,30],[40,50,60],[70,80,90]])
np.concatenate((two_d_arry, array_to_join), axis = 1)

array([[ 1,  2,  3,  4,  5,  6, 10, 20, 30],
       [ 7,  8,  9, 10, 11, 12, 40, 50, 60],
       [13, 14, 15, 16, 17, 18, 70, 80, 90]])

In [42]:
np.concatenate((two_d_arry, array_to_join), axis = 0)

ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 6 and the array at index 1 has size 3

## Array Math Operations

Creating and manipulating arrays is nice, but the true power of numpy arrays is the ability to **perform** mathematical operations on many values quickly and easily. Unlike built in Python objects, you can use math operators like +, -, / and * to perform basic math operations with ndarrays:

In [49]:
print(two_d_arry + 100)

print(two_d_arry - 100)

print(two_d_arry * 100)

print(two_d_arry ** 2)

print(two_d_arry / 100)

print(two_d_arry // 100)

print(two_d_arry % 3)

[[101 102 103 104 105 106]
 [107 108 109 110 111 112]
 [113 114 115 116 117 118]]
[[-99 -98 -97 -96 -95 -94]
 [-93 -92 -91 -90 -89 -88]
 [-87 -86 -85 -84 -83 -82]]
[[ 100  200  300  400  500  600]
 [ 700  800  900 1000 1100 1200]
 [1300 1400 1500 1600 1700 1800]]
[[  1   4   9  16  25  36]
 [ 49  64  81 100 121 144]
 [169 196 225 256 289 324]]
[[0.01 0.02 0.03 0.04 0.05 0.06]
 [0.07 0.08 0.09 0.1  0.11 0.12]
 [0.13 0.14 0.15 0.16 0.17 0.18]]
[[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]]
[[1 2 0 1 2 0]
 [1 2 0 1 2 0]
 [1 2 0 1 2 0]]


In [50]:
small_array1 = np.array([[1,2],[3,4]])

small_array1 + small_array1

array([[2, 4],
       [6, 8]])

In [51]:
small_array1 - small_array1

array([[0, 0],
       [0, 0]])

In [52]:
small_array1 * small_array1

array([[ 1,  4],
       [ 9, 16]])

In [53]:
small_array1 ** small_array1

array([[  1,   4],
       [ 27, 256]])

Numpy also offers a variety of named math functions for ndarrays. There are too many to cover in detail here, so we'll just look at a selection of some of the most useful ones for data analysis:

In [54]:
np.mean(two_d_arry)

9.5

In [56]:
np.mean(two_d_arry, axis = 1) # row wise mean

array([ 3.5,  9.5, 15.5])

In [58]:
np.mean(two_d_arry, axis = 0) # col wise mean

array([ 7.,  8.,  9., 10., 11., 12.])

In [59]:
# Get the standard deviation all the elements in an array with np.std()

np.std(two_d_arry)

5.188127472091127

In [61]:
# Provide an axis argument to get standard deviations across a dimension

np.std(two_d_arry,
        axis = 0)     # Get stdev for each column

array([4.89897949, 4.89897949, 4.89897949, 4.89897949, 4.89897949,
       4.89897949])

In [62]:
# Sum the elements of an array across an axis with np.sum()

np.sum(two_d_arry,
       axis=1)        # Get the row sums

array([21, 57, 93])

In [63]:
# Take the log of each element in an array with np.log()

np.log(two_d_arry)

array([[0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791,
        1.79175947],
       [1.94591015, 2.07944154, 2.19722458, 2.30258509, 2.39789527,
        2.48490665],
       [2.56494936, 2.63905733, 2.7080502 , 2.77258872, 2.83321334,
        2.89037176]])

In [64]:
# Take the square root of each element with np.sqrt()

np.sqrt(two_d_arry)

array([[1.        , 1.41421356, 1.73205081, 2.        , 2.23606798,
        2.44948974],
       [2.64575131, 2.82842712, 3.        , 3.16227766, 3.31662479,
        3.46410162],
       [3.60555128, 3.74165739, 3.87298335, 4.        , 4.12310563,
        4.24264069]])

## Dot Product
Take the dot product of two arrays with np.dot(). This function performs an element-wise multiply and then a sum for 1-dimensional arrays (vectors) and matrix multiplication for 2-dimensional arrays.

In [65]:
# Take the vector dot product of row 0 and row 1

np.dot(two_d_arry[0,0:],  # Slice row 0
       two_d_arry[1,0:])  # Slice row 1

217

In [68]:
# Do a matrix multiply
# Do a matrix multiply

np.dot(small_array1, small_array1)

array([[ 7, 10],
       [15, 22]])