# NumPy Basics: Arrays and Vectorized Computation

- NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python.
- While NumPy by itself does not provide modeling or scientific functionality, having an understanding of NumPy arrays and array-oriented computing will help you use tools with array-oriented semantics, like pandas, much more effectively.


One of the reasons NumPy is so important for numerical computations in Python is because it is designed for efficiency on large arrays of data. There are a number of reasons for this:

- NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.

- NumPy operations perform complex computations on entire arrays without the need for Python *for loops.*

## The NumPy ndarray: A Multidimensional Array Object

- One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python.
- Arrays enable you to perform mathematical operations on whole blocks of data.

In [1]:
import numpy as np

In [2]:
#generate random data

In [5]:
data = np.random.rand(1)  # prints a random value between 0 and 1

In [4]:
data

array([0.96908927])

In [7]:
print(np.random.rand(5))  # prints 5 random val between 0 and 1

[0.03732057 0.80391374 0.67015069 0.86161672 0.91889504]


In [21]:
data2 = np.random.rand(3 , 3)  # prints random vals between 0 and 1 and arranges them in a 3x3 array
data2

array([[0.77811185, 0.40545395, 0.78255862],
       [0.92206072, 0.20131135, 0.82882704],
       [0.82399096, 0.04995174, 0.55542524]])

In [11]:
data * 10  # multiplies 10 to the random value variable 

array([7.22584447])

In [13]:
data + data  # adds data into data

array([1.44516889])

- An *ndarray* is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type
- Every array has a shape, a tuple indicating the size of each dimension, and a *dtype*, an object describing the data type of the array:

In [16]:
data.shape  # gives the number of rows and columns in an array

(1,)

In [22]:
data2.shape

(3, 3)

In [24]:
data2.dtype  # gives the data type stored in th array

dtype('float64')

## Creating ndarrays


- The easiest way to create an array is to use the array function.
- This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data.

In [25]:
lst = [2 , 5 , 89 , 45 , 10 , 3]

In [29]:
arr1 = np.array(lst)  # passing a list to the array function to create an array

In [31]:
arr1

array([ 2,  5, 89, 45, 10,  3])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:

In [32]:
lst2 = ([[1 , 2 , 3], [4 , 5 , 6]])

In [33]:
arr2 = np.array(lst2)

In [34]:
arr2

array([[1, 2, 3],
       [4, 5, 6]])

In [36]:
arr2.ndim 
# gives the dimension of the array

2

In [38]:
arr2.shape  # gives the rows and the columns 

(2, 3)

In [41]:
np.zeros(5) # generates an array of 5 zeros

array([0., 0., 0., 0., 0.])

In [44]:
np.zeros((3 , 3))  # Generates an array of m x n... (A tuple should be passed in the fuction)

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [49]:
np.empty((3 , 3))  # Generates an array of zeros (A tuple should be passesd)

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

**It’s not safe to assume that np.empty will return an array of all zeros. In some cases, it may return uninitialized “garbage” values.**

In [51]:
np.arange(15) # Generates and an array till the (range - 1)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [54]:
np.eye(3)  # Generated an identity matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [55]:
#                                        Summary Table 

![image.png](attachment:image.png)

***

## Data Types for ndarrays

- The data type or dtype is a special object containing the information (or metadata, data about data)
- The ndarray needs to interpret a chunk of memory as a particular type of data

In [56]:
array1 = np.array([1 , 2 , 3], dtype = np.int32) # tells that the data type is of int

In [57]:
array1.dtype

dtype('int32')

In [58]:
array2 = np.array([4 , 5 , 6], dtype = np.float64) # tells that the data type is of float 

In [59]:
array2.dtype

dtype('float64')

In [60]:
array2

array([4., 5., 6.])

**Summary table of data types in Python** 

![image.png](attachment:image.png)

- You can explicitly convert or cast an array from one dtype to another using ndarray’s *astype*  method

In [62]:
arr = np.array([99, 98, 97])
arr

array([99, 98, 97])

In [63]:
arr.dtype

dtype('int32')

In [64]:
float_arr = arr.astype(np.float64)  # converts the array elements to float
float_arr

array([99., 98., 97.])

In [65]:
float_arr2 = np.array([1.25, 6.57, 5.69])
float_arr2

array([1.25, 6.57, 5.69])

In [66]:
float_arr2.dtype

dtype('float64')

In [76]:
int_arr = float_arr2.astype(np.int64) # coverts the float array into int array, the decimal part will be omitted
int_arr


array([1, 6, 5], dtype=int64)

In [69]:
int_arr.dtype

dtype('int64')

In [71]:
string_arr = np.array(['1.25', '3.25', '6.25'])
string_arr.dtype

dtype('<U4')

In [74]:
int_arr2 = string_arr.astype(np.float64)  # converting strings of numbers into float 
int_arr2

array([1.25, 3.25, 6.25])

- If casting were to fail for some reason (like a string that cannot be converted to float64), a ValueError will be raised
- Instead of writing *np.float64* just float will also do (same goes for int) NumPy aliases the Python types to its own equivalent data dtypes.

## Arithmetic with NumPy Arrays

- Arrays are important because they enable you to express batch operations on data writing any *for loops*
- Any arithmetic operations between equal-size arrays applies the operation element-wise:

In [79]:
arr = np.arange(9).reshape(3,3)

In [80]:
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [81]:
arr * arr

array([[ 0,  1,  4],
       [ 9, 16, 25],
       [36, 49, 64]])

In [82]:
arr + arr

array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

In [83]:
arr - arr


array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

- Arithmetic operations with scalars (any other number other than the array as shown previously) propagate the scalar argument to each element in the array

In [84]:
1/arr # 

  1/arr


array([[       inf, 1.        , 0.5       ],
       [0.33333333, 0.25      , 0.2       ],
       [0.16666667, 0.14285714, 0.125     ]])

**Usually 1/0 or 0/1 gives an error in python, but NumPy just shows a warning and displays the result anyways.**

In [85]:
arr * 0.5

array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ]])

arr2 = np.array([[2 , 5 , 1], [1 , 8, 10], [54, 20, 2]])

In [93]:
arr2.shape

(3, 3)

In [95]:
arr > arr2  # compares every element in array 1 to array 2 and gives true if it is greater or else it returns false 

array([[False, False,  True],
       [ True, False, False],
       [False, False,  True]])

## Basic Indexing and Slicing

- NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements.

In [96]:
arr = np.arange(5)

In [97]:
arr

array([0, 1, 2, 3, 4])

In [99]:
arr[2]   # select the element at index 2

2

In [102]:
arr[1:3]  # select the elements from index 1(including) till element 3 (excluding element at index 3)

array([1, 2])

In [105]:
arr[0:3] = 10  # update the values of elements at indices 0 to 3 (excluding element at 3)  to 10

In [104]:
arr

array([10, 10, 10,  3,  4])

In [106]:
arr[0:4] = 1

In [107]:
arr

array([1, 1, 1, 1, 4])

An important first distinction from Python’s built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array.

In [108]:
arr2 = np.arange(10)

In [109]:
arr2


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [112]:
arr_slice = arr2[5:9]  # Extract elements from 5 till 9 exclusing element at 9

In [111]:
arr_slice

array([5, 6, 7, 8])

In [113]:
arr_slice[1] = 100  # updating the value at index 1 

In [114]:
arr_slice

array([  5, 100,   7,   8])

In [116]:
arr2  # Now when the values in arr_slice is updated, it gets reflected in the original array also

array([  0,   1,   2,   3,   4,   5, 100,   7,   8,   9])

In [119]:
arr2[:] = 100  # assigns all the values in the array to 100

In [120]:
arr2

array([100, 100, 100, 100, 100, 100, 100, 100, 100, 100])

As NumPy has been designed to be able to work with very large arrays, you could imagine performance and memory problems if NumPy insisted on always copying data.

In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays

In [121]:
arr = np.arange(9).reshape(3 , 3)

In [122]:
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [124]:
arr[0]  # returns a whole 1D array instead of a single element

array([0, 1, 2])

In [127]:
arr[0][1] # returns the value at the 1st index of the 1st 1D array or the value at 1st row second column (as indexing 
          # starts from 0)

1

In [130]:
arr[0 , 1]  # efficient way of doing the same thing

1

### Indexing with slices

Like one-dimensional objects such as Python lists, ndarrays can be sliced with the
familiar syntax

In [131]:
arr = np.arange(10)

In [132]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [134]:
arr[0:5]  # elements from 0 to 5 (excluding element at 5)

array([0, 1, 2, 3, 4])

In [135]:
arr [1:2]

array([1])

In [136]:
arr3 = np.arange(9).reshape(3 ,3)

In [137]:
arr3

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [140]:
arr3[1:]  # gives element from the row 1 till the end (all columns included cause the field is left blank)

array([[3, 4, 5],
       [6, 7, 8]])

In [154]:
arr3[:, 1:]  # gives element from the column 1 till the end (all rows included cause the field is left blank)

array([[1, 2],
       [4, 5],
       [7, 8]])

**- arr[ x: , : ] means all the rows from that row to the end**

**- arr[ :x , : ] means all rows till that row excluding the mentioned number**
 
**- arr[ : , x: ] means all the columns from that row to the end**

**-arr[ : , :x ] means all rows till that columns excluding the mentioned number**

In [173]:
arr3

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [178]:
arr3[1:, 1:] 

array([[0, 0],
       [0, 0]])

In [181]:
arr3[1:, 1:] = 0  #assigning the whole sliced section to a particular value

In [180]:
arr3

array([[0, 1, 2],
       [3, 0, 0],
       [6, 0, 0]])

In [184]:
arr3[:2, :2]  # getting the top left corner 

array([[0, 1],
       [3, 0]])

In [187]:
arr3[:2, 1:]  # getting the top left corner 

array([[1, 2],
       [0, 0]])

In [189]:
arr3[1: , :2]  # getting the bottom left corner 

array([[3, 0],
       [6, 0]])

In [191]:
arr3[1:, 1:]  # getting the bottom left corner 

array([[0, 0],
       [0, 0]])

## Fancy Indexing

[Fancy Indexing Tutorial](https://www.youtube.com/watch?v=iTL6g2yfBzU)

In [195]:
arr = np.empty((10 , 5)) # creating an empty array

In [196]:
arr

array([[             nan,  6.79038653e-310,  6.12641401e-321,
         4.94065646e-324,  4.48611606e-321],
       [ 2.03060980e-321,  4.96041908e-321,  2.03060980e-321,
         4.98018171e-321,  5.02034522e+175],
       [ 5.13828272e-321,  1.39838039e-076,  2.70747974e-321,
         1.14354099e-071,  5.98807563e-321],
       [ 1.12855799e+277,  6.22522714e-321,  3.77716546e+233,
         6.12641401e-321,  8.37170362e-144],
       [ 3.33988377e-321,  9.30537139e+199,  6.34380289e-321,
         9.15563409e-072,  5.69163624e-321],
       [ 2.34352921e-056,  4.26416588e-096,  8.37170584e-144,
         7.72819855e-091,  3.22241147e-057],
       [ 2.32023351e-052,  5.74020278e+180,  8.37174974e-144,
         5.81224723e+294,  0.00000000e+000],
       [-3.25953926e-311,  8.13123409e-312,  9.68368666e-322,
         0.00000000e+000,  0.00000000e+000],
       [ 0.00000000e+000,  5.02034658e+175,  1.21540734e-046,
         3.53852369e-057,  9.60790485e-071],
       [ 3.27233009e+179,  1.47763641

In [199]:
for i in range(10):  # assigning elements using the for loop, each row is filled with just one number 
    arr[i] = i

In [200]:
arr

array([[0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5.],
       [6., 6., 6., 6., 6.],
       [7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9.]])

Now, if you want a row of all 2 or all 5, you pass that values (which you want to retreive) as a list, basically acts as a filter

In [202]:
arr[[2 , 5]]  # gives you the only 2 and 5 rows

array([[2., 2., 2., 2., 2.],
       [5., 5., 5., 5., 5.]])

In [203]:
arr[[9, 1, 5, 4]]

array([[9., 9., 9., 9., 9.],
       [1., 1., 1., 1., 1.],
       [5., 5., 5., 5., 5.],
       [4., 4., 4., 4., 4.]])

Negative indices also can be passed, negative indices would grab the rows from the bottom

In [205]:
arr[[-1 , -8]]  # negative indices start from 1 and it grabs rows from the bottom of the array

array([[9., 9., 9., 9., 9.],
       [2., 2., 2., 2., 2.]])

In [206]:
arr[[-1, -10]]

array([[9., 9., 9., 9., 9.],
       [0., 0., 0., 0., 0.]])

In [207]:
arr1 = np.arange(40).reshape(8 , 5)

In [208]:
arr1

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39]])

Now if you want to extract particular elment or elments from a 2D array, you pass in two lists; one for the row and one for the column, in which the element(s) are present. For example you pass in: 
**array([[a , b, c], [e, f, g]])**
Then the elements *(a , e) , (b , f), (c , g)* would be returned 

In [209]:
# Extracting 3 , 22, 30, 12 thru fancy indexing

In [211]:
# 3 = (0,3) , 22 = (4,2) , 12 = (2,2), 30 = (6,0)

arr1[[0 , 4 , 2 , 6], [3, 2, 2, 0]]  # first list is of rows, second is of columns 

array([ 3, 22, 12, 30])

In [212]:
arr1[[0 , 1 , 6, 7], [2, 4, 1, 0]]

array([ 2,  9, 31, 35])

Regardless of how many dimensions the array has (here, only 2), the result of fancy indexing is always one-dimensional.

## Transposing Arrays and Swapping Axes

- Transposing is a special form of reshaping that similarly returns a view on the underlying data without copying anything
- Arrays have the *transpose* method and also the special T attribute

In [215]:
arr4 = np.arange(16).reshape(4 , 4)
arr4

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [218]:
arr4.T  # transoses the array i.e rows become the columns and vice-versa

array([[ 0,  4,  8, 12],
       [ 1,  5,  9, 13],
       [ 2,  6, 10, 14],
       [ 3,  7, 11, 15]])

When doing matrix computations, you may do this very often—for example, when computing the inner matrix product using *np.dot:*

In [220]:
np.dot(arr4, arr4.T)  # computes the dot product between 2 arrays

array([[ 14,  38,  62,  86],
       [ 38, 126, 214, 302],
       [ 62, 214, 366, 518],
       [ 86, 302, 518, 734]])

In [221]:
np.dot(arr4, arr4)

array([[ 56,  62,  68,  74],
       [152, 174, 196, 218],
       [248, 286, 324, 362],
       [344, 398, 452, 506]])

## Universal Functions: Fast Element-Wise Array Functions

- A universal function, or ufunc, is a function that performs element-wise operations on data in ndarrays.
- You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results. 
- Many ufuncs are simple element-wise transformations, like *sqrt* or *exp*