## <I>What is NumPy ? </I>

Numpy stands for Numeric Python, it is a library for scientific computation. Basically, it makes use of fast compiled C code hidden inside it, to make your operations freaking fast !! 

Regular python lists are slow and mundane, but with Numpy they are fast and efficient because it is our beloved C lang under the hood doing all heavy lifting. 


reference : https://wesmckinney.com/book/numpy-basics

In [1]:
import numpy as np

the following code, compares the time it takes to double each element of the numpy array v/s the time it takes to double each element of pyhton list. 

In [3]:
my_arr = np.arange(1_000_000)
my_list = list(range(1_000_000))

def double_arr(a):
    return a * 2

def double_list(a):
    return [x * 2 for x in a]

%timeit double_arr(my_arr)
%timeit double_list(my_list)

747 μs ± 2.94 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
26 ms ± 849 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


the time for the operation in numpy arrays is in micro-seconds, while the time for the operation in python list is in milli-seconds

### The np.ndarray Object

An ndarray is a generic multidimensional container for homogeneous data; that is, all of the elements must be the same type. Every array has a shape, a tuple indicating the size of each dimension, and a dtype, an object describing the data type of the array:



In [6]:
arr = np.array([
    [1 , 2 , 3],
    [-0.1, -0.2, -0.3]
])

# scalar like operations
print(arr + arr)
print(arr * 10)

# shape and data type
print(arr.shape)
print(arr.dtype)

[[ 2.   4.   6. ]
 [-0.2 -0.4 -0.6]]
[[10. 20. 30.]
 [-1. -2. -3.]]
(2, 3)
float64


### Creating ndarrays 

use the np.array() function to create an array out of a data-structure

In [11]:
ls = [0.1, -3.4, 5.6 , 10]

arr = np.array(ls)


print(arr.ndim) # <-- it's a 1-D array container
print(arr.shape) # <--- tells the number of components in the array 

1
(4,)


In [20]:
# 2D array 

arr1 = np.array(
    [
        [1 , 6 , 3],
        [2 , -5, -10]
    ]
)

print("Array =", arr1)
print("arr.ndim  =", arr1.ndim)
print("arr.shape =", arr1.shape)
print("Data type =", arr1.dtype)

Array = [[  1   6   3]
 [  2  -5 -10]]
arr.ndim  = 2
arr.shape = (2, 3)
Data type = int64


In addition to `numpy.array`, there are a number of other functions for creating new arrays. As examples, `numpy.zeros` and `numpy.ones` create arrays of 0s or 1s, respectively, with a given length or shape. `numpy.empty` creates an array without initializing its values to any particular value. <b>To create a higher dimensional array with these methods, pass a tuple for the shape:
</b>


In [24]:
arr1 = np.zeros(10)
arr2 = np.zeros((10, 6)) # <--- for higher dimensional case, pass a tuple with the shape 
print("arr1 =", arr1)
print("arr2 =", arr2)

arr1 = [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
arr2 = [[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]


In [29]:
arr_3 = np.zeros((2, 3, 5)) # <--- the first number tells how many matrices in the bigger matrix with (t[1],t[2]) shape
print("arr_3.ndim =", arr_3.ndim)
print("arr_3.shape =", arr_3.shape)
print("arr_3 =\n\n", arr_3)

arr_3.ndim = 3
arr_3.shape = (2, 3, 5)
arr_3 =

 [[[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]

 [[0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]
  [0. 0. 0. 0. 0.]]]


In [30]:
arr_4 = np.zeros((2 ,3, 4 , 5))
print("arr_4.ndim =", arr_4.ndim)
print("arr_4.shape =", arr_4.shape)
print("arr_4 =\n\n", arr_4)

arr_4.ndim = 4
arr_4.shape = (2, 3, 4, 5)
arr_4 =

 [[[[0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]]

  [[0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]]

  [[0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]]]


 [[[0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]]

  [[0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]]

  [[0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]
   [0. 0. 0. 0. 0.]]]]


`numpy.arange` is an array-valued version of the built-in Python range function:

In [31]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [32]:
np.full((2, 3), dtype=int , fill_value=3)

array([[3, 3, 3],
       [3, 3, 3]])

In [35]:
np.eye(5) # <-- np.eye(n) creates an n by n identity matrix  

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [37]:
np.identity(5) # same work as np.eye()

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

## IMPORTANT TABLE

| Function                  | Description                                                                                                                       |
| ------------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| **array**                 | Converts input data (list, tuple, array, or any sequence) to an `ndarray`. Infers dtype unless specified. Copies data by default. |
| **asarray**               | Converts input to an `ndarray`, but *avoids copying* if the input is already an `ndarray`.                                        |
| **arange**                | Like Python’s `range`, but returns an `ndarray` instead of a list.                                                                |
| **ones**, **ones_like**   | Creates an array filled with 1s. `ones_like` uses the shape and dtype of another array.                                           |
| **zeros**, **zeros_like** | Same idea as `ones` and `ones_like`, but filled with 0s.                                                                          |
| **empty**, **empty_like** | Allocates new arrays without initializing the entries. Values are garbage but fast.                                               |
| **full**, **full_like**   | Creates an array filled with a specified value. `full_like` uses another array’s shape and dtype.                                 |
| **eye**, **identity**     | Builds a square `N × N` identity matrix with 1s on the diagonal.                                                                  |



You can explicitly convert or cast an array from one data type to another using ndarray’s `astype` method:



In [40]:
arr = np.array([1, 2, 3 , 4])
print(arr)
print(arr.dtype)

float_arr = arr.astype(np.float64)

print(float_arr)
print(float_arr.dtype)

[1 2 3 4]
int64
[1. 2. 3. 4.]
float64


Note-
Calling `astype` always creates a new array (a copy of the data), even if the new data type is the same as the old data type.

Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays apply the operation element-wise:


In [51]:
arr = np.array([[0.5, 0.2, 1.3], [4, 5.9 , 6]])
brr = np.asarray(arr**0.5)


brr > arr

array([[ True,  True, False],
       [False, False, False]])

### Basic Indexing and Slicing

In [54]:
arr = np.arange(10)
arr[5:8] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

An important first distinction from Python's built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array.

In [66]:
arr = np.arange(10)
arr_slice = arr[5:8]
arr_slice[:] = 123456
arr

array([     0,      1,      2,      3,      4, 123456, 123456, 123456,
            8,      9])

If you want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the array—for example, arr[5:8].copy(). As you will see, pandas works this way, too.

In [68]:
arr = np.arange(10)
arr_slice = arr[5:8].copy()
arr_slice[:] = 123456

print(arr_slice)
print(arr)

[123456 123456 123456]
[0 1 2 3 4 5 6 7 8 9]


<img src="axis.png" width="300">

In [69]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [73]:
# indexing in multi-dimensional arrays 

arr3d[0, 1]

array([4, 5, 6])

slicing in multidimensional arrays 

In [82]:
arr2d = np.array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

arr2d[:,:2]

array([[1, 2],
       [4, 5],
       [7, 8]])

In [84]:
lower_dim_slice = arr2d[1, :2]
lower_dim_slice

array([4, 5])

Note that a colon by itself means to take the entire axis

In [87]:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

In [89]:
arr2d[:2, 1:] = 0
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

## Boolean Indexing 

In [90]:
names = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"])
data = np.array([[4, 7], [0, 2], [-5, 6], [0, 0], [1, 2], [-12, -4], [3, 4]])

print(names)
print(data)


['Bob' 'Joe' 'Will' 'Bob' 'Will' 'Joe' 'Joe']
[[  4   7]
 [  0   2]
 [ -5   6]
 [  0   0]
 [  1   2]
 [-12  -4]
 [  3   4]]


Suppose each name corresponds to a row in the data array and we wanted to select all the rows with the corresponding name "Bob". Like arithmetic operations, comparisons (such as ==) with arrays are also vectorized. Thus, comparing names with the string "Bob" yields a Boolean array:

In [95]:
names == "Bob"
print(names)
print(data[names == "Bob"])

print("\n")
print(data[names == "Bob", 1]) 
print("\n")
print(data[names == "Bob", 1:])

['Bob' 'Joe' 'Will' 'Bob' 'Will' 'Joe' 'Joe']
[[4 7]
 [0 0]]


[7 0]


[[7]
 [0]]


## Explanation 

1. <b>data[names == "Bob", 1]</b>

        This means:

        • Use the boolean mask for rows
        • Use column index 1 (the second column)

        So you’re grabbing:

        Row 0, col 1 → 7

        Row 3, col 1 → 0


2. <b>data[names == "Bob", 1:]</b>

Here’s where NumPy does the classic "surprise, it still works" move.

        1: means all columns starting from index 1.

        But data only has two columns: col 0 and col 1.

        So 1: just means column 1 only, but still as a slice, not a single integer index.

        Slices preserve dimensions.

        So:

        For row 0, columns 1: → [7]

        For row 3, columns 1: → [0]

NumPy keeps the last axis because slicing means "keep this dimension".

In [97]:
# Using the `~` operator 

cond = names == "Bob"

print(cond)

data[~(cond)]

[ True False False  True False False False]


array([[  0,   2],
       [ -5,   6],
       [  1,   2],
       [-12,  -4],
       [  3,   4]])

To select two of the three names to combine multiple Boolean conditions, use Boolean arithmetic operators like & (and) and | (or):

The Python keywords `and` and `or` do not work with Boolean arrays. Use & (and) and | (or) instead.

In [100]:
mask = (names == "Bob") | (names == "Will")

print(mask)

data[mask]

[ True False  True  True  True False False]


array([[ 4,  7],
       [-5,  6],
       [ 0,  0],
       [ 1,  2]])

### NOTE : 

Selecting data from an array by Boolean indexing and assigning the result to a new variable always creates a copy of the data, even if the returned array is unchanged.

In [101]:
# setting all negative values in the data to be positive : 

data[data < 0] = 0

data

array([[4, 7],
       [0, 2],
       [0, 6],
       [0, 0],
       [1, 2],
       [0, 0],
       [3, 4]])

## Fancy Indexing
Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays. Suppose we had an 8 × 4 array:

In [106]:
arr = np.zeros((8,4))

for i in range(8):
    arr[i] = i 

print(arr)

[[0. 0. 0. 0.]
 [1. 1. 1. 1.]
 [2. 2. 2. 2.]
 [3. 3. 3. 3.]
 [4. 4. 4. 4.]
 [5. 5. 5. 5.]
 [6. 6. 6. 6.]
 [7. 7. 7. 7.]]


To select a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order

In [108]:
arr[[4 , 2, 1, 0 ,7]] # <--- selects the rows in this order.. 

array([[4., 4., 4., 4.],
       [2., 2., 2., 2.],
       [1., 1., 1., 1.],
       [0., 0., 0., 0.],
       [7., 7., 7., 7.]])

In [109]:
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

Passing multiple index arrays does something slightly different; it selects a one-dimensional array of elements corresponding to each tuple of indices:

In [113]:
arr = np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [115]:
arr[[4 , 5 , 6], [0 , 3, 3]] # <-- gives an array of arr[4, 0] arr[5, 3] arr[6 3]

array([16, 23, 27])

## Pseudorandom number generation

The `numpy.random` module supplements the built-in Python `random` module with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions. For example, you can get a 4 × 4 array of samples from the standard normal distribution using `numpy.random.standard_normal`

`numpy.random` is well over an order of magnitude faster for generating very large samples:

In [116]:
samples = np.random.standard_normal((4 , 4))
samples

array([[-1.23635473, -0.99871211,  0.20360771,  0.94153481],
       [ 0.74785826, -1.50846761, -2.56487061, -0.95042524],
       [ 0.72039396, -0.01331134,  2.12469812, -0.81293543],
       [ 1.72272384, -0.5976224 ,  0.09357443, -0.9016288 ]])

## Universal Functions 

In [126]:
arr = np.array([ 4.5146, -8.1079, -0.7909,  2.2474, -6.718 , -0.4084,  8.6237])
out = np.zeros_like(arr)

print(arr)

np.add(arr, 1, out=out)

print(out)

[ 4.5146 -8.1079 -0.7909  2.2474 -6.718  -0.4084  8.6237]
[ 5.5146 -7.1079  0.2091  3.2474 -5.718   0.5916  9.6237]


In [None]:
np.greater(arr, 5, out=out)
out

array([0., 0., 0., 0., 0., 0., 1.])

## Some Common Functions

| Function                                     | Description                                                             |
|----------------------------------------------|-------------------------------------------------------------------------|
| add                                          | Add corresponding elements in arrays                                    |
| subtract                                     | Subtract elements in second array from first array                      |
| multiply                                     | Multiply array elements                                                 |
| divide, floor_divide                         | Divide or floor divide (truncating the remainder)                       |
| power                                        | Raise elements in first array to powers indicated in second array       |
| maximum, fmax                                | Element-wise maximum; fmax ignores NaN                                  |
| minimum, fmin                                | Element-wise minimum; fmin ignores NaN                                  |
| mod                                          | Element-wise modulus (remainder of division)                            |
| copysign                                     | Copy sign of values in second argument to values in first argument      |
| greater, greater_equal, less, less_equal, equal, not_equal | Element-wise comparisons yielding Boolean arrays            |
| logical_and                                  | Element-wise truth value of AND (&) operation                           |
| logical_or                                   | Element-wise truth value of OR (|) operation                            |
| logical_xor                                  | Element-wise truth value of XOR (^) operation                           |


## Expressing Conditional Logic as Array Operations

The `numpy.where` function is a vectorized version of the ternary expression `x if condition else y`. Suppose we had a Boolean array and two arrays of values: