# NumPy Basics: Arrays and Vectorized Computation

- NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python.
- While NumPy by itself does not provide modeling or scientific functionality, having an understanding of NumPy arrays and array-oriented computing will help you use tools with array-oriented semantics, like pandas, much more effectively.


One of the reasons NumPy is so important for numerical computations in Python is because it is designed for efficiency on large arrays of data. There are a number of reasons for this:

- NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.

- NumPy operations perform complex computations on entire arrays without the need for Python *for loops.*

## The NumPy ndarray: A Multidimensional Array Object

- One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast, flexible container for large datasets in Python.
- Arrays enable you to perform mathematical operations on whole blocks of data.

In [2]:
import numpy as np

In [2]:
#generate random data

In [5]:
data = np.random.rand(1)  # prints a random value between 0 and 1

In [4]:
data

array([0.96908927])

In [7]:
print(np.random.rand(5))  # prints 5 random val between 0 and 1

[0.03732057 0.80391374 0.67015069 0.86161672 0.91889504]


In [21]:
data2 = np.random.rand(3 , 3)  # prints random vals between 0 and 1 and arranges them in a 3x3 array
data2

array([[0.77811185, 0.40545395, 0.78255862],
       [0.92206072, 0.20131135, 0.82882704],
       [0.82399096, 0.04995174, 0.55542524]])

In [11]:
data * 10  # multiplies 10 to the random value variable 

array([7.22584447])

In [13]:
data + data  # adds data into data

array([1.44516889])

- An *ndarray* is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type
- Every array has a shape, a tuple indicating the size of each dimension, and a *dtype*, an object describing the data type of the array:

In [16]:
data.shape  # gives the number of rows and columns in an array

(1,)

In [22]:
data2.shape

(3, 3)

In [24]:
data2.dtype  # gives the data type stored in th array

dtype('float64')

## Creating ndarrays


- The easiest way to create an array is to use the array function.
- This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data.

In [25]:
lst = [2 , 5 , 89 , 45 , 10 , 3]

In [29]:
arr1 = np.array(lst)  # passing a list to the array function to create an array

In [31]:
arr1

array([ 2,  5, 89, 45, 10,  3])

Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:

In [32]:
lst2 = ([[1 , 2 , 3], [4 , 5 , 6]])

In [33]:
arr2 = np.array(lst2)

In [34]:
arr2

array([[1, 2, 3],
       [4, 5, 6]])

In [36]:
arr2.ndim 
# gives the dimension of the array

2

In [38]:
arr2.shape  # gives the rows and the columns 

(2, 3)

In [41]:
np.zeros(5) # generates an array of 5 zeros

array([0., 0., 0., 0., 0.])

In [44]:
np.zeros((3 , 3))  # Generates an array of m x n... (A tuple should be passed in the fuction)

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [49]:
np.empty((3 , 3))  # Generates an array of zeros (A tuple should be passesd)

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

**It’s not safe to assume that np.empty will return an array of all zeros. In some cases, it may return uninitialized “garbage” values.**

In [51]:
np.arange(15) # Generates and an array till the (range - 1)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [54]:
np.eye(3)  # Generated an identity matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [55]:
#                                        Summary Table 

![image.png](attachment:image.png)

***

## Data Types for ndarrays

- The data type or dtype is a special object containing the information (or metadata, data about data)
- The ndarray needs to interpret a chunk of memory as a particular type of data

In [56]:
array1 = np.array([1 , 2 , 3], dtype = np.int32) # tells that the data type is of int

In [57]:
array1.dtype

dtype('int32')

In [58]:
array2 = np.array([4 , 5 , 6], dtype = np.float64) # tells that the data type is of float 

In [59]:
array2.dtype

dtype('float64')

In [60]:
array2

array([4., 5., 6.])

**Summary table of data types in Python** 

![image.png](attachment:image.png)

- You can explicitly convert or cast an array from one dtype to another using ndarray’s *astype*  method

In [62]:
arr = np.array([99, 98, 97])
arr

array([99, 98, 97])

In [63]:
arr.dtype

dtype('int32')

In [64]:
float_arr = arr.astype(np.float64)  # converts the array elements to float
float_arr

array([99., 98., 97.])

In [65]:
float_arr2 = np.array([1.25, 6.57, 5.69])
float_arr2

array([1.25, 6.57, 5.69])

In [66]:
float_arr2.dtype

dtype('float64')

In [76]:
int_arr = float_arr2.astype(np.int64) # coverts the float array into int array, the decimal part will be omitted
int_arr


array([1, 6, 5], dtype=int64)

In [69]:
int_arr.dtype

dtype('int64')

In [71]:
string_arr = np.array(['1.25', '3.25', '6.25'])
string_arr.dtype

dtype('<U4')

In [74]:
int_arr2 = string_arr.astype(np.float64)  # converting strings of numbers into float 
int_arr2

array([1.25, 3.25, 6.25])

- If casting were to fail for some reason (like a string that cannot be converted to float64), a ValueError will be raised
- Instead of writing *np.float64* just float will also do (same goes for int) NumPy aliases the Python types to its own equivalent data dtypes.

## Arithmetic with NumPy Arrays

- Arrays are important because they enable you to express batch operations on data writing any *for loops*
- Any arithmetic operations between equal-size arrays applies the operation element-wise:

In [79]:
arr = np.arange(9).reshape(3,3)

In [80]:
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [81]:
arr * arr

array([[ 0,  1,  4],
       [ 9, 16, 25],
       [36, 49, 64]])

In [82]:
arr + arr

array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

In [83]:
arr - arr


array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

- Arithmetic operations with scalars (any other number other than the array as shown previously) propagate the scalar argument to each element in the array

In [84]:
1/arr # 

  1/arr


array([[       inf, 1.        , 0.5       ],
       [0.33333333, 0.25      , 0.2       ],
       [0.16666667, 0.14285714, 0.125     ]])

**Usually 1/0 or 0/1 gives an error in python, but NumPy just shows a warning and displays the result anyways.**

In [85]:
arr * 0.5

array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ]])

arr2 = np.array([[2 , 5 , 1], [1 , 8, 10], [54, 20, 2]])

In [93]:
arr2.shape

(3, 3)

In [95]:
arr > arr2  # compares every element in array 1 to array 2 and gives true if it is greater or else it returns false 

array([[False, False,  True],
       [ True, False, False],
       [False, False,  True]])

## Basic Indexing and Slicing

- NumPy array indexing is a rich topic, as there are many ways you may want to select a subset of your data or individual elements.

In [96]:
arr = np.arange(5)

In [97]:
arr

array([0, 1, 2, 3, 4])

In [99]:
arr[2]   # select the element at index 2

2

In [102]:
arr[1:3]  # select the elements from index 1(including) till element 3 (excluding element at index 3)

array([1, 2])

In [105]:
arr[0:3] = 10  # update the values of elements at indices 0 to 3 (excluding element at 3)  to 10

In [104]:
arr

array([10, 10, 10,  3,  4])

In [106]:
arr[0:4] = 1

In [107]:
arr

array([1, 1, 1, 1, 4])

An important first distinction from Python’s built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array.

In [108]:
arr2 = np.arange(10)

In [109]:
arr2


array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [112]:
arr_slice = arr2[5:9]  # Extract elements from 5 till 9 exclusing element at 9

In [111]:
arr_slice

array([5, 6, 7, 8])

In [113]:
arr_slice[1] = 100  # updating the value at index 1 

In [114]:
arr_slice

array([  5, 100,   7,   8])

In [116]:
arr2  # Now when the values in arr_slice is updated, it gets reflected in the original array also

array([  0,   1,   2,   3,   4,   5, 100,   7,   8,   9])

In [119]:
arr2[:] = 100  # assigns all the values in the array to 100

In [120]:
arr2

array([100, 100, 100, 100, 100, 100, 100, 100, 100, 100])

As NumPy has been designed to be able to work with very large arrays, you could imagine performance and memory problems if NumPy insisted on always copying data.

In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays

In [121]:
arr = np.arange(9).reshape(3 , 3)

In [122]:
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [124]:
arr[0]  # returns a whole 1D array instead of a single element

array([0, 1, 2])

In [127]:
arr[0][1] # returns the value at the 1st index of the 1st 1D array or the value at 1st row second column (as indexing 
          # starts from 0)

1

In [130]:
arr[0 , 1]  # efficient way of doing the same thing

1

### Indexing with slices

Like one-dimensional objects such as Python lists, ndarrays can be sliced with the
familiar syntax

In [131]:
arr = np.arange(10)

In [132]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [134]:
arr[0:5]  # elements from 0 to 5 (excluding element at 5)

array([0, 1, 2, 3, 4])

In [135]:
arr [1:2]

array([1])

In [136]:
arr3 = np.arange(9).reshape(3 ,3)

In [137]:
arr3

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [140]:
arr3[1:]  # gives element from the row 1 till the end (all columns included cause the field is left blank)

array([[3, 4, 5],
       [6, 7, 8]])

In [154]:
arr3[:, 1:]  # gives element from the column 1 till the end (all rows included cause the field is left blank)

array([[1, 2],
       [4, 5],
       [7, 8]])

**- arr[ x: , : ] means all the rows from that row to the end**

**- arr[ :x , : ] means all rows till that row excluding the mentioned number**
 
**- arr[ : , x: ] means all the columns from that row to the end**

**-arr[ : , :x ] means all rows till that columns excluding the mentioned number**

In [173]:
arr3

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [178]:
arr3[1:, 1:] 

array([[0, 0],
       [0, 0]])

In [181]:
arr3[1:, 1:] = 0  #assigning the whole sliced section to a particular value

In [180]:
arr3

array([[0, 1, 2],
       [3, 0, 0],
       [6, 0, 0]])

In [184]:
arr3[:2, :2]  # getting the top left corner 

array([[0, 1],
       [3, 0]])

In [187]:
arr3[:2, 1:]  # getting the top left corner 

array([[1, 2],
       [0, 0]])

In [189]:
arr3[1: , :2]  # getting the bottom left corner 

array([[3, 0],
       [6, 0]])

In [191]:
arr3[1:, 1:]  # getting the bottom left corner 

array([[0, 0],
       [0, 0]])

## Fancy Indexing

[Fancy Indexing Tutorial](https://www.youtube.com/watch?v=iTL6g2yfBzU)

In [195]:
arr = np.empty((10 , 5)) # creating an empty array

In [196]:
arr

array([[             nan,  6.79038653e-310,  6.12641401e-321,
         4.94065646e-324,  4.48611606e-321],
       [ 2.03060980e-321,  4.96041908e-321,  2.03060980e-321,
         4.98018171e-321,  5.02034522e+175],
       [ 5.13828272e-321,  1.39838039e-076,  2.70747974e-321,
         1.14354099e-071,  5.98807563e-321],
       [ 1.12855799e+277,  6.22522714e-321,  3.77716546e+233,
         6.12641401e-321,  8.37170362e-144],
       [ 3.33988377e-321,  9.30537139e+199,  6.34380289e-321,
         9.15563409e-072,  5.69163624e-321],
       [ 2.34352921e-056,  4.26416588e-096,  8.37170584e-144,
         7.72819855e-091,  3.22241147e-057],
       [ 2.32023351e-052,  5.74020278e+180,  8.37174974e-144,
         5.81224723e+294,  0.00000000e+000],
       [-3.25953926e-311,  8.13123409e-312,  9.68368666e-322,
         0.00000000e+000,  0.00000000e+000],
       [ 0.00000000e+000,  5.02034658e+175,  1.21540734e-046,
         3.53852369e-057,  9.60790485e-071],
       [ 3.27233009e+179,  1.47763641

In [199]:
for i in range(10):  # assigning elements using the for loop, each row is filled with just one number 
    arr[i] = i

In [200]:
arr

array([[0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5.],
       [6., 6., 6., 6., 6.],
       [7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9.]])

Now, if you want a row of all 2 or all 5, you pass that values (which you want to retreive) as a list, basically acts as a filter

In [202]:
arr[[2 , 5]]  # gives you the only 2 and 5 rows

array([[2., 2., 2., 2., 2.],
       [5., 5., 5., 5., 5.]])

In [203]:
arr[[9, 1, 5, 4]]

array([[9., 9., 9., 9., 9.],
       [1., 1., 1., 1., 1.],
       [5., 5., 5., 5., 5.],
       [4., 4., 4., 4., 4.]])

Negative indices also can be passed, negative indices would grab the rows from the bottom

In [205]:
arr[[-1 , -8]]  # negative indices start from 1 and it grabs rows from the bottom of the array

array([[9., 9., 9., 9., 9.],
       [2., 2., 2., 2., 2.]])

In [206]:
arr[[-1, -10]]

array([[9., 9., 9., 9., 9.],
       [0., 0., 0., 0., 0.]])

In [207]:
arr1 = np.arange(40).reshape(8 , 5)

In [208]:
arr1

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39]])

Now if you want to extract particular elment or elments from a 2D array, you pass in two lists; one for the row and one for the column, in which the element(s) are present. For example you pass in: 
**array([[a , b, c], [e, f, g]])**
Then the elements *(a , e) , (b , f), (c , g)* would be returned 

In [209]:
# Extracting 3 , 22, 30, 12 thru fancy indexing

In [211]:
# 3 = (0,3) , 22 = (4,2) , 12 = (2,2), 30 = (6,0)

arr1[[0 , 4 , 2 , 6], [3, 2, 2, 0]]  # first list is of rows, second is of columns 

array([ 3, 22, 12, 30])

In [212]:
arr1[[0 , 1 , 6, 7], [2, 4, 1, 0]]

array([ 2,  9, 31, 35])

Regardless of how many dimensions the array has (here, only 2), the result of fancy indexing is always one-dimensional.

## Transposing Arrays and Swapping Axes

- Transposing is a special form of reshaping that similarly returns a view on the underlying data without copying anything
- Arrays have the *transpose* method and also the special T attribute

In [215]:
arr4 = np.arange(16).reshape(4 , 4)
arr4

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [218]:
arr4.T  # transoses the array i.e rows become the columns and vice-versa

array([[ 0,  4,  8, 12],
       [ 1,  5,  9, 13],
       [ 2,  6, 10, 14],
       [ 3,  7, 11, 15]])

When doing matrix computations, you may do this very often—for example, when computing the inner matrix product using *np.dot:*

In [220]:
np.dot(arr4, arr4.T)  # computes the dot product between 2 arrays

array([[ 14,  38,  62,  86],
       [ 38, 126, 214, 302],
       [ 62, 214, 366, 518],
       [ 86, 302, 518, 734]])

In [221]:
np.dot(arr4, arr4)

array([[ 56,  62,  68,  74],
       [152, 174, 196, 218],
       [248, 286, 324, 362],
       [344, 398, 452, 506]])

## Universal Functions: Fast Element-Wise Array Functions

- A universal function, or ufunc, is a function that performs element-wise operations on data in ndarrays.
- You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results. 
- Many ufuncs are simple element-wise transformations, like *sqrt* or *exp*

In [4]:
arr = np.arange(9).reshape(3,3)

In [5]:
arr

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [7]:
np.sqrt(arr)   # computes the square root of every element in the array

array([[0.        , 1.        , 1.41421356],
       [1.73205081, 2.        , 2.23606798],
       [2.44948974, 2.64575131, 2.82842712]])

In [9]:
np.exp(arr)  # computes the exponenet of every element in the array

array([[1.00000000e+00, 2.71828183e+00, 7.38905610e+00],
       [2.00855369e+01, 5.45981500e+01, 1.48413159e+02],
       [4.03428793e+02, 1.09663316e+03, 2.98095799e+03]])

These are referred to as unary ufuncs. Others, such as add or maximum, take two arrays (thus, binary ufuncs) and return a single array as the result

In [10]:
x = np.random.randn(10)
y = np.random.randn(10)

In [11]:
x

array([-0.01593979, -0.5895287 ,  0.01869394,  1.0417857 , -1.60525499,
       -0.36324334,  0.81629869,  2.21079897,  0.83062258,  0.78688076])

In [12]:
y

array([-1.61906909, -0.81422874, -0.24264635, -1.1662876 , -0.45601508,
        0.87840636,  1.32362918, -0.64425527,  1.10270287,  0.0153084 ])

In [14]:
np.maximum(x , y)  # compares the elements between the array and stores the max among them in a new array

array([-0.01593979, -0.5895287 ,  0.01869394,  1.0417857 , -0.45601508,
        0.87840636,  1.32362918,  2.21079897,  1.10270287,  0.78688076])

- While not common, a ufunc can return multiple arrays
- modf is one example, a vectorized version of the built-in Python divmod;
- it returns the fractional and integral parts of a floating-point array

In [15]:
z = np.random.randn(10) * 5

In [16]:
z

array([-1.45913615,  9.21802084,  9.56331663, -3.8720521 ,  5.74154132,
       -7.12671803, -4.14697808, -5.5548274 ,  3.83992644, 11.12152102])

In [20]:
fraction_part ,  whole_part = np.modf(z)  # assigning two variables to the functions (since it returns two values )

In [18]:
fraction_part # Calling the fractional part from the function

array([-0.45913615,  0.21802084,  0.56331663, -0.8720521 ,  0.74154132,
       -0.12671803, -0.14697808, -0.5548274 ,  0.83992644,  0.12152102])

In [21]:
whole_part  # calling the whole part from the function



array([-1.,  9.,  9., -3.,  5., -7., -4., -5.,  3., 11.])

In [22]:
d = np.random.randn(8) * 5

In [23]:
d

array([ -4.69483998,  -3.20425347,   4.9939706 ,  -9.9362727 ,
         7.2605267 ,  -5.00957067, -11.05240088,   0.9720298 ])

In [28]:
fractional_part , whole_part = np.modf(d) 

In [29]:
whole_part

array([ -4.,  -3.,   4.,  -9.,   7.,  -5., -11.,   0.])

In [30]:
fractional_part


array([-0.69483998, -0.20425347,  0.9939706 , -0.9362727 ,  0.2605267 ,
       -0.00957067, -0.05240088,  0.9720298 ])

**The syntax of modf() should always have fractional variable assigned first then the whole number variable; i.e it should always be:**

***fractional_part , whole_part = np.modf( )* and not the other way around**


 Summary table of Unary Functions

![image.png](attachment:image.png)

***

## Expressing Conditional Logic as Array Operations

- The numpy.where function is a vectorized version of the ternary expression x if condition else y.
- Supoose, there is an boolean array and two normal array, arr1 and arr2. Now, suppose we want to choose the value from arr1 if the boolean array has true else from arr2. Then in this case the np.where() can be used.
- It is like the ternary operator in Java 

arr1 = [1 , 2 , 3] , arr2 = [4 , 5 , 6] , bool_arr = [True, False, True]

arr_result = [1 , 5, 3] -> Chose element from array1 if boolean array had true else it choose from array2

In [31]:
arr1 = np.arange(9).reshape(3 , 3)
arr1

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [34]:
arr2 = np.arange(1,10).reshape(3 , 3) * 5
arr2

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

In [14]:
boolean_array = np.array([True, False, True, True, False, True, False, False, False]).reshape(3 , 3)

Syntax of np.where()  -> np.where(condition ,  what to do if true , what to do if false)

In [41]:
result = np.where(boolean_array, arr1, arr2) # take the value from arr1 if bool array has 'True' esle take it from array 2

In [42]:
result

array([[ 0, 10,  2],
       [ 3, 25,  5],
       [35, 40, 45]])

In [43]:
arr3 = np.random.randn(9).reshape(3 , 3)
arr3

array([[-0.84452518,  0.01214621, -0.26625194],
       [-0.91120457, -0.59808645, -0.85628334],
       [ 0.7794432 ,  0.05186561,  1.5581357 ]])

In [45]:
arr4 = np.random.randn(9).reshape(3 , 3)
arr4

array([[-0.88077941,  2.26967609,  0.94107124],
       [-0.80255878, -1.61629225, -0.37419459],
       [ 2.11361942,  1.0764008 ,  1.16471868]])

In [48]:
result = np.where(arr3 > arr4, 0 , 100) # compare 2 arrays and fill 0 if element of array3 > array4; else fill 100

In [49]:
result 

array([[  0, 100, 100],
       [100,   0, 100],
       [100, 100,   0]])

In [52]:
result = np.where(arr3 > 0, arr, 0) # replace the value with 0 if it is less than 0 else do nothing

In [53]:
result 

array([[0, 1, 0],
       [0, 0, 0],
       [6, 7, 8]])

## Mathematical and Statistical Methods

In [56]:
arr = np.random.randn(4 , 4)

In [57]:
arr

array([[-1.11001899, -0.64181847,  0.13564078, -0.73726988],
       [-1.39344235,  1.03827315, -1.1343597 , -1.81638529],
       [-0.33016285,  2.17487118, -0.06375906,  0.24557283],
       [ 0.52077522,  0.51365897,  0.37956285,  0.9928947 ]])

In [60]:
arr.mean()   # computes the mean of the array

-0.07662293193514762

In [61]:
np.mean(arr)

-0.07662293193514762

In [63]:
arr.sum()  # computes the sum of the array

-1.2259669109623619

Functions like mean and sum take an optional axis argument that computes the statistic over the given axis, resulting in an array with one fewer dimension

In [65]:
arr.sum(axis = 1) # sums each row from column 1 till the end and the number at index 0 of the result array is the sum of row 1 

array([-2.35346657, -3.30591419,  2.02652211,  2.40689174])

In [66]:
arr.mean(axis = 1) #  calculates the mean of  each row from column 1 till the end and the number at index 0 of the 
                   # result array is the sum of row 1 

array([-0.58836664, -0.82647855,  0.50663053,  0.60172293])

**Here, arr.mean(1) means “compute mean across the columns” where arr.sum(0) means “compute sum down the rows.”**

In [67]:
arr = np.array([1 , 5 , 8 , 9 , 10 , 99 , 4])

In [69]:
np.cumsum(arr)  # calculates the cumulative sum (like the fibonacci series)

array([  1,   6,  14,  23,  33, 132, 136], dtype=int32)

In [4]:
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])

In [5]:
arr


array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [9]:
np.cumsum(arr, axis = 0)  # calculates the cumulative sum of each column from the top to the bottom

array([[ 0,  1,  2],
       [ 3,  5,  7],
       [ 9, 12, 15]], dtype=int32)

In [11]:
np.cumsum(arr, axis = 1) # calculates the cumulative sum of each row from the left to right 

array([[ 0,  1,  3],
       [ 3,  7, 12],
       [ 6, 13, 21]], dtype=int32)

In [12]:
np.cumprod(arr, axis = 1) #  calculates the cumulative product of each row from the left to right 

array([[  0,   0,   0],
       [  3,  12,  60],
       [  6,  42, 336]], dtype=int32)

In [13]:
np.cumprod(arr, axis = 0) #  calculates the cumulative product of each column from top to bottom

array([[ 0,  1,  2],
       [ 0,  4, 10],
       [ 0, 28, 80]], dtype=int32)

**Here, arr.mean(1) means “compute mean across the columns” where arr.sum(0) means “compute sum down the rows.”**

Summary Table 

![image.png](attachment:image.png)

***

## Methods for Boolean Arrays

Boolean values are coerced to 1 (True) and 0 (False) in the preceding methods

In [6]:
arr = np.random.randn(100).reshape(20 , 5)
arr

array([[ 7.32348410e-01,  1.38801774e+00,  1.13247098e+00,
         1.09149798e+00, -2.72228404e+00],
       [ 1.41652627e-01,  3.54491518e-01, -2.04118622e+00,
        -7.36739208e-01, -1.18840012e-01],
       [ 7.44830928e-01, -1.09227971e+00,  8.25878775e-01,
         7.90909543e-02,  4.07867519e-01],
       [ 7.49269531e-01,  2.01776978e+00, -1.44791873e-01,
        -3.49168330e-01,  5.14464212e-01],
       [ 2.27027071e-01,  1.43676804e+00, -7.70967632e-01,
        -1.76831246e+00,  3.55088417e-01],
       [-1.03389404e+00, -1.64469212e+00,  9.89418086e-01,
         6.24754940e-01,  8.21430981e-01],
       [-3.07025642e-01, -1.04800476e+00,  1.34907328e+00,
        -2.26893483e-01,  1.24917522e-03],
       [ 1.26581569e+00, -4.53139347e-01, -1.10425128e-01,
        -9.96187236e-02, -3.85077258e-01],
       [ 8.30123535e-01,  1.28336371e+00,  1.45493593e+00,
         1.28106337e-01, -5.45609413e-01],
       [ 1.76021952e-02, -2.72530930e-01, -2.97055911e-01,
         5.19788237e-01

In [4]:
result = (arr > 0).sum()  # calculates the sum of all the positive elements in the array
result

47

In [9]:
result = (arr < 0).sum() # calculates the sum of all the negative elements in the array
result

44

In [10]:
mean = (arr > 0).mean()  # calculates the mean of all the positive elements in the array

In [11]:
mean

0.56

In [12]:
mean = (arr < 0).mean()  # calculates the mean of all the negative elements in the array

In [13]:
mean

0.44

- There are two additional methods, any and all, useful especially for boolean arrays.
- *any* tests whether one or more values in an array is True
- while *all* checks if every value is True

In [17]:
boolean_array = np.array([True, False, True, True])
boolean_array.all()  # cause all the values are not true 

False

In [18]:
boolean_array.any()  # cause one or more than one value is true

True

In [20]:
boolean_array = np.array([True, True, True, True])
boolean_array.all()  # cause all the values are true

True

These methods also work with non-boolean arrays, where non-zero elements evaluate to True


In [33]:
arr = np.arange(9)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [31]:
arr.all() # cause there is one zero

False

In [24]:
arr.any()

True

In [28]:
arr = np.arange(1 , 10)
arr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [27]:
arr.all() # all the values in the array are non-zero, hence True

True

## Sorting

Like Python’s built-in list type, NumPy arrays can be sorted in-place with the sort method

In [34]:
arr = np.random.rand(25).reshape(5 , 5)

In [35]:
arr

array([[0.07259965, 0.41169117, 0.34568741, 0.49454511, 0.82568804],
       [0.20197946, 0.93345674, 0.0147684 , 0.2257458 , 0.78935302],
       [0.36263129, 0.96458006, 0.02510008, 0.20983304, 0.2412587 ],
       [0.60071682, 0.2227295 , 0.72285969, 0.3854583 , 0.85553789],
       [0.08075877, 0.50495592, 0.00725133, 0.67565866, 0.10684084]])

In [48]:
arr.sort()  # sorts each row in the ascending order 
arr

array([[0.00725133, 0.08075877, 0.10684084, 0.36263129, 0.67565866],
       [0.0147684 , 0.20197946, 0.2257458 , 0.49454511, 0.82568804],
       [0.02510008, 0.20983304, 0.2412587 , 0.50495592, 0.85553789],
       [0.07259965, 0.34568741, 0.41169117, 0.72285969, 0.93345674],
       [0.2227295 , 0.3854583 , 0.60071682, 0.78935302, 0.96458006]])

In [74]:
arr2 = np.array([[1 , 98 , 2], [55, 67, 5], [87, 32, 7]])
arr2

array([[ 1, 98,  2],
       [55, 67,  5],
       [87, 32,  7]])

In [75]:
arr2.sort()  # sorts each row in ascending order 

In [76]:
arr2

array([[ 1,  2, 98],
       [ 5, 55, 67],
       [ 7, 32, 87]])

In [77]:
arr2.sort(0)  # sorts each column in ascending order 

In [78]:
arr2

array([[ 1,  2, 67],
       [ 5, 32, 87],
       [ 7, 55, 98]])

In [81]:
arr2.sort(1)  # idk what this does 

In [82]:
arr2

array([[ 1,  2, 67],
       [ 5, 32, 87],
       [ 7, 55, 98]])

In [83]:
arr3 = np.random.randn(5 , 5)
arr3

array([[-0.62286201, -0.53627242, -0.03495898, -0.52696104, -0.41523113],
       [ 0.82681651, -1.77809106, -0.71362689, -0.07272242, -0.64488803],
       [ 1.20258865,  0.67688718,  1.78387351,  1.91254002,  0.57115513],
       [ 0.91499133, -0.37042607,  1.57859272,  0.50118628, -1.36174308],
       [-0.47496658,  0.08992814,  0.47455978,  2.17493878, -0.1263652 ]])

In [86]:
arr3.sort()  # sorts each row of the array in ascending order

In [87]:
arr3

array([[-0.62286201, -0.53627242, -0.52696104, -0.41523113, -0.03495898],
       [-1.77809106, -0.71362689, -0.64488803, -0.07272242,  0.82681651],
       [ 0.57115513,  0.67688718,  1.20258865,  1.78387351,  1.91254002],
       [-1.36174308, -0.37042607,  0.50118628,  0.91499133,  1.57859272],
       [-0.47496658, -0.1263652 ,  0.08992814,  0.47455978,  2.17493878]])

In [90]:
arr3.sort(0) # sorts each row of the array in ascending order

In [91]:
arr3

array([[-1.77809106, -0.71362689, -0.64488803, -0.41523113, -0.03495898],
       [-1.36174308, -0.53627242, -0.52696104, -0.07272242,  0.82681651],
       [-0.62286201, -0.37042607,  0.08992814,  0.47455978,  1.57859272],
       [-0.47496658, -0.1263652 ,  0.50118628,  0.91499133,  1.91254002],
       [ 0.57115513,  0.67688718,  1.20258865,  1.78387351,  2.17493878]])

## Unique and Other Set Logic

- NumPy has some basic set operations for one-dimensional ndarrays. A commonly used one is *np.unique*, which returns the sorted unique values in an array


In [92]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In [97]:
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [98]:
np.unique(names)  # returns unique from the array (i.e no duplicates is returned)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

In [99]:
arr = np.array([1 , 5 , 66, 7 , 8 , 66 , 1 , 7 , 99, 5 , 5 , 66])

In [100]:
arr

array([ 1,  5, 66,  7,  8, 66,  1,  7, 99,  5,  5, 66])

In [102]:
np.unique(arr)  # unique elements

array([ 1,  5,  7,  8, 66, 99])

In [103]:
arr 

array([ 1,  5, 66,  7,  8, 66,  1,  7, 99,  5,  5, 66])

In [108]:
arr2 = np.array([1 , 5, 65, 47, 87 , 66, 12, 34, 54])

in1d() function in Python. numpy. in1d() function test whether each element of a 1-D array is also present in a second array and return a boolean array the same length as arr1 that is True where an element of arr1 is in arr2 and False otherwise

In [110]:
result = np.in1d(arr , arr2) # checks if the same elements are present in both the array, returns true if it is present else
                            # returns false

In [111]:
result 

array([ True,  True,  True, False, False,  True,  True, False, False,
        True,  True,  True])

In [112]:
arr3 = np.array([1 , 2 , 3])

In [113]:
arr4 = np.array([3 , 4, 5])

In [114]:
np.in1d(arr3 , arr4) # array 3 is compared with array 4

array([False, False,  True])

In [116]:
np.in1d(arr4 , arr3)  # array 4 is compared with array 3

array([ True, False, False])

Array Set Operations Summary Table 


![image.png](attachment:image.png)

## Linear Algebra

In [118]:
arr = np.array([[1 , 5 , 6], [4, 5 , 7], [88, 41, 65]])
arr

array([[ 1,  5,  6],
       [ 4,  5,  7],
       [88, 41, 65]])

In [119]:
arr2 = np.array([[5 , 88 , 64], [38, 7 , 4], [14, 52, 73]])
arr2

array([[ 5, 88, 64],
       [38,  7,  4],
       [14, 52, 73]])

In [121]:
arr.dot(arr2)  # computes the dot product of the two arrays

array([[  279,   435,   522],
       [  308,   751,   787],
       [ 2908, 11411, 10541]])

In [123]:
np.dot(arr, arr2)  # gives the same result as arr.dot(arr2)

array([[  279,   435,   522],
       [  308,   751,   787],
       [ 2908, 11411, 10541]])

In [138]:
arr_ones = np.ones(3)    # dot product of a 1D array with a 2D array.The result is always a 1D array

**The columns of the 1D arrray should br equal to the columns of the 2D array, then only the dot product is possible else an error would br thrown**

In [139]:
arr

array([[ 1,  5,  6],
       [ 4,  5,  7],
       [88, 41, 65]])

In [140]:
arr_ones.dot(arr) 

array([93., 51., 78.])

In [141]:
arr4 = np.random.randn(3) 
arr4

array([ 0.29155993, -0.86450732, -1.03456882])

In [142]:
arr


array([[ 1,  5,  6],
       [ 4,  5,  7],
       [88, 41, 65]])

In [143]:
arr4.dot(arr)  # dot product of a random 1D array with a 2D array

array([-94.20852525, -45.28205845, -71.54916477])

*numpy.linalg*  has a standard set of matrix decompositions and things like inverse and determinant

In [154]:
from numpy import linalg as la  # importing the linear algebra library 

In [155]:
x = np.random.randn(4 , 4) # making a random array of 4 x 4 

In [146]:
x

array([[ 0.93770492,  1.1059701 , -0.00487374, -0.22185888],
       [-1.16306403,  0.06461573, -0.77982641, -1.96764805],
       [-1.66828622,  0.70283777,  1.22549572, -0.18953298],
       [ 0.11516718,  0.13257027, -0.75220915,  1.30296033]])

In [156]:
x.T        # finding the Transpose of x

array([[ 0.963362  , -0.33736306, -1.0071258 , -1.23918682],
       [ 0.81747814, -0.22056612, -0.05307619, -0.42963922],
       [ 0.30030429, -1.02286257, -0.60652685, -3.14607743],
       [-1.01706194, -0.03568138, -0.19776963, -0.08430614]])

In [157]:
result = x.T.dot(x)   # taking the dot product of x and its transpose 

In [149]:
result

array([[ 5.02845084, -0.19534545, -1.22868952,  2.54671603],
       [-0.19534545,  1.73890085,  0.70582485, -0.33298743],
       [-1.22868952,  0.70582485,  2.67581134,  0.32313466],
       [ 2.54671603, -0.33298743,  0.32313466,  5.65448859]])

In [170]:
la.inv(result)   # inverse of the result 

array([[  2.64393399, -13.25598782,   1.04546615,  -8.00754029],
       [-13.25598782, 126.61002364, -15.47852649,  84.43546899],
       [  1.04546615, -15.47852649,   2.24344724, -10.71329968],
       [ -8.00754029,  84.43546899, -10.71329968,  57.78699135]])

In [174]:
result.dot(la.inv(result))  # taking the dot product of the result with the inverse of the result 

array([[ 1.00000000e+00,  1.42108547e-14, -1.77635684e-15,
         0.00000000e+00],
       [ 1.77635684e-15,  1.00000000e+00,  0.00000000e+00,
        -3.55271368e-15],
       [ 0.00000000e+00, -2.84217094e-14,  1.00000000e+00,
         0.00000000e+00],
       [ 4.44089210e-16,  5.32907052e-15, -2.77555756e-16,
         1.00000000e+00]])

In [176]:
x

array([[ 0.963362  ,  0.81747814,  0.30030429, -1.01706194],
       [-0.33736306, -0.22056612, -1.02286257, -0.03568138],
       [-1.0071258 , -0.05307619, -0.60652685, -0.19776963],
       [-1.23918682, -0.42963922, -3.14607743, -0.08430614]])

In [182]:
x.trace()  

0.05196288678522215

In [184]:
iden = np.eye(4)

In [185]:
iden

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [187]:
iden.trace()

4.0

In [181]:
la.det(x)

0.27836455311406727

In [188]:
la.det(iden)

1.0

Commonly used numpy.linalg functions

![image.png](attachment:image.png)

![image.png](attachment:image.png)

***