# NumPY

### NumPy, short for Numerical Python, is the fundamental package required for high performance scientific computing and data analysis.


### One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large data sets in Pyhton. ndarray, a fast and space-efficient multidimensional array providing vectorized arithmetic operations and sophisticated broadcastig capabilties.

#### Installation of numpy

conda install numpy

In [4]:
import numpy as np
np.__version__

'1.18.5'

### How to create ndarrays?

#### Let us start 1-Dimensional (1-D) array 

It is pretty straight-forward - just call function "array" and this function accepts any sequence-like objects such as list as parameters.

In [3]:
temp_list = [1,2,3]
temp_arr_id = np.array(temp_list)

In [4]:
temp_arr_id

array([1, 2, 3])

#### Built-in attributes

In [6]:
temp_arr_id.ndim # shows the dimension

1

In [7]:
temp_arr_id.shape # shape of the array

(3,)

In [8]:
temp_arr_id.dtype # datatype of the array -- int32

dtype('int32')

#### 2-Dimensional (2-D) array: 

In [9]:
temp_arr_2d = np.array([[1,2,3],
                       [4,5,6]])
temp_arr_2d

array([[1, 2, 3],
       [4, 5, 6]])

In [10]:
temp_arr_2d.shape # i.e. 2 list and each list has 3 values.

(2, 3)

#### 3-Dimensional (3-D) array: 

In [11]:
temp_arr_3d = np.array([[[1,2,3],
                        [4,5,6]],
                        [[11,21,31],
                        [41,51,61]]])
temp_arr_3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[11, 21, 31],
        [41, 51, 61]]])

##### Simple way to identify the dimension to see the number of sqaure brackets in the start -- 3 bracketrs indicates that this is 3-D array.

In [12]:
temp_arr_3d.shape # 2 super-list, within each super theres 2 list
#and each list has two elements.

(2, 2, 3)

### Creating array by explicitly mentioning the data type

In [13]:
np.array([1.0,2.0,3.0], dtype= 'int32')

array([1, 2, 3])

In [14]:
np.array([1.0,2.0,3.0], dtype= 'float64')

array([1., 2., 3.])

In [15]:
# If we don't explicitly mention the data type, numpy infers the data type based on the value
np.array([1.0,2.0,3.0]).dtype   # Array of float

dtype('float64')

In [16]:
np.array(['xyz','a','b']).dtype # Unlock string with number of characters

dtype('<U3')

For more information on data types, refer

https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html

### Other ways of creating arrays

In [17]:
np.zeros(10)
# Create a length-10 integer array filled with zeros

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [18]:
np.ones(5)      # 1-D array of ones of length 5.

array([1., 1., 1., 1., 1.])

In [19]:
np.ones((5,5))  # 2-D array of shape 5,5; which is mentioned as a tuple.

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [21]:
# Numpy has an equivalent function to built-in Python range function:
np.arange(15) # range in Python, arange in NumPy, equivalent but efficient

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [22]:
# Array of five values evenly space between 0 to 10.
np.linspace(0, 10, 5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [23]:
# 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

Besides the need to just initialize arrays with zeros or ones, we might need to initialize the arrays with some random values.

In [24]:
# Create a 3x3 array of uniformly distributed random values between 0 and 1.
np.random.random((4,4))

array([[0.49351445, 0.51722843, 0.32036858, 0.83107131],
       [0.51693348, 0.53763052, 0.23776301, 0.68297049],
       [0.31658914, 0.41423634, 0.84914253, 0.63443662],
       [0.9584373 , 0.18150804, 0.50868468, 0.49327797]])

In [25]:
# Array of random integers between 0 (inclusive) and 10 (exclusive)

np.random.randint(0, 10, (4, 4))
# np.random.randint(0, 11, (4, 4))

array([[4, 6, 4, 1],
       [1, 3, 2, 4],
       [0, 6, 5, 1],
       [0, 0, 3, 9]])

In [26]:
# Reshape function allows us to change of the array, for instance, 1-D array could be converted to 2-D array.
np.arange(15).reshape((3,5))

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

### Vectorization

Let us say, we would want to multiply a scalr value 2 to each of the element of the array. # Broadcasting
Programatically, we would iterate through each element in a four loop and multiply by 2. However, numpy arrays, does this operation efficiently and easily.

In [27]:
temp_array_2d = np.array([[1.,2.,3.],
                         [4.,5.,6.]])

temp_array_2d * 2 # Does element-wise operation.

array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]])

In [28]:
# Sqauring the elements of the array
temp_array_2d * temp_array_2d

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [29]:
temp_array_2d ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

### Access

Acessing of elements in an array is very similar to the way we saw in Python list.

#### 1-D array

In [5]:
temp_array_1d = np.arange(10)
temp_array_1d

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [35]:
temp_array_1d[:5]

array([0, 1, 2, 3, 4])

In [36]:
temp_array_1d[:-1] # Gets all the elements till the last item (exclusive)

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [37]:
temp_array_1d[-5:] # last 5 items

array([5, 6, 7, 8, 9])

In [38]:
temp_array_1d[1::2] # Alternate element, starting with second element.

array([1, 3, 5, 7, 9])

#### 2-D array

In [39]:
temp_array_2d = np.arange(10).reshape(2,5)
temp_array_2d

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [40]:
temp_array_2d[0, 0] # Acessing the zeroth cell i.e. cell at intersection of 1st row and 1st column.

0

In [41]:
temp_array_2d[:,2] # All rows, 3rd column

array([2, 7])

In [42]:
temp_array_2d[1,:] # Second row, all columns

array([5, 6, 7, 8, 9])

In [43]:
temp_array_2d[1::2] # Alternative row

array([[5, 6, 7, 8, 9]])

In [44]:
temp_array_2d[::-1, ::-1] # Reversing the sub-array

array([[9, 8, 7, 6, 5],
       [4, 3, 2, 1, 0]])

In [45]:
temp_array_2d = temp_array_2d.reshape(5,2)
temp_array_2d

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

To select out a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order:

In [47]:
temp_array_2d[[2,3,0]] # Fancy Indexing

array([[4, 5],
       [6, 7],
       [0, 1]])

### Boolean Indexing

In [48]:
# Logical Indexing
temp_array_1d > 5  # Output: Boolean array based on the condition

array([False, False, False, False, False, False,  True,  True,  True,
        True])

In [6]:
temp_array_1d[temp_array_1d > 5] # This returns the array of those position whose value was true.

array([6, 7, 8, 9])

In [7]:
temp_data = np.arange(10).reshape(5,2)
cond_array = np.array([1,2,3,4,5])

temp_data[cond_array > 2]
# This type of filtering is commonly used when we want to filter an array based on values of another array.

array([[4, 5],
       [6, 7],
       [8, 9]])

In [8]:
# One of the common operation we do in data analysis is replacing same of the value of
#an array to specific value (for instance, 0).

temp_array_1d = np.arange(-5,10)
temp_array_1d[temp_array_1d < 0] = 0 # Replacing all negative values with 0.
temp_array_1d

array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [None]:
# Note: This replaces the actual array itself and does not create a copy. So be careful before performing this operation.

#### Common Statisticsl functions

#### 1-D array

In [10]:
temp_array_1d = np.arange(5)
np.sqrt(temp_array_1d)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ])

In [11]:
np.min(temp_array_1d)

0

In [12]:
np.max(temp_array_1d)

4

In [13]:
np.mean(temp_array_1d)

2.0

In [14]:
np.sum(temp_array_1d)

10

#### 2-D array

In [15]:
temp_array_2d = np.arange(15).reshape(5,3)
temp_array_2d

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [16]:
np.min(temp_array_2d)

0

In [17]:
np.min(temp_array_2d, axis= 0 )

array([0, 1, 2])

In [18]:
np.min(temp_array_2d, axis= 1)

array([ 0,  3,  6,  9, 12])

##### axis parameter could be provided for functions such as max, sum, mean, etc.

In [19]:
temp_array_1d = np.arange(10)
temp_array_1d

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [20]:
# If you want to identify the index which has the maximum value.
temp_array_1d.argmax()

9

In [22]:
# If we want to identify the index at which the value is when the list is sorted, we can use argsort function.

temp_array_1d = np.array([2,1,4,3,5]) # i.e. the least value at second position
i = np.argsort(temp_array_1d)  # 0 given in second positon in result.
i

array([1, 0, 3, 2, 4], dtype=int64)

#### Storing of arrays

In [24]:
# The arrays can be stored in disk as well for future reference.

temp_array_1d = np.arange(10)
np.save('temp_array_uncompressed', temp_array_1d) # Uncompressed binary raw format

In [27]:
load_array_uncomp = np.load('temp_array_uncompressed.npy')
load_array_uncomp  # numerical python file

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [30]:
# Compressed

np.savez('temp_array_uncompressed.npz', arr = temp_array_1d)

In [31]:
load_array_compressed = np.load('temp_array_uncompressed.npz')
load_array_compressed['arr']

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])