##### CHAPTER 4 -> NumPy Basics and Vectorized Computation

**NumPy -> numerical python**
- (pandas , ScyPy , sckit learn) -> behind the scene are based on Numpy
- Extremely fast ( C - Based algorithm)
- ndarray (python list on steroid)
- vectorized maths -> complete array operations without needing a loop
- I/O -> read/write array to disks 
- C API (python meets C/C++)
------
**WHY NUMPY IS FAST**
- Contigous memory location
- No type checking (C behind the scene saying "i got it captain"😋)
- vectorized operations
- we can treat ndarray like as if we are playing with constants

In [None]:
"""SOME BASIC FUNCTIONALITIES"""
import numpy as np
array = np.array([[1,2,3],[3,4,5],[7,8,9]])
array.dtype #checks the data type
array.shape #checks the shape (tuple)
array.ndim #checks the dimensions

**CREATING NUMPY ARRAYS**

In [None]:
python_list = [1,2,3,4,5,6]
python_nested_list = [[1,2,3],[4,5,6]]
numpy_array = np.array(python_list) #conversts list to array
numpy_muldim_array = np.array(python_nested_list) #converts to mul dim array

np.zeros((2,3),dtype=np.float64) #shape/length , dtype(optional)
np.ones(6,dtype=np.int64) #similar
np.empty((4,5)) #don't assume it would return only zero it can also return garbage values(uninitialized memory)
np.arange(6) #lost brother of python range function
np.asarray([1, 2, 3])  # Converts input to ndarray (doesn't copy if already ndarray)
np.ones_like(np.array([[1, 2], [3, 4]]))  # Returns an array of ones with the same shape and type as the input array
np.zeros_like(np.array([[1, 2], [3, 4]]))  # Returns an array of zeros with the same shape and type as the input array
np.empty_like(np.array([[1, 2], [3, 4]]))  # Returns an uninitialized array with the same shape and type as the input array (could be garbage values)
np.full((3, 4), 7)  # Returns a new array of shape (3, 4) filled with the value 7
np.full_like(np.array([[1, 2], [3, 4]]), 9)  # Returns an array with the same shape and type as the input, filled with 9
np.eye(4)  # Creates a 4x4 identity matrix (ones on diagonal, zeros elsewhere)
np.identity(3)  # Creates a 3x3 identity matrix (same as eye but simpler syntax for square matrices)

**DATA TYPES**
- You can explicitly choose the data type for an array
- calling astype creates new copy even if you convert to same dtype
- numpy.string_ data type is fixed length data type so if you try to store <br>
data longer then allocated length it gets chopped off without a warning
- float to integer conversions truncates the value after the decimal
-----
- you can use the type code to specify the data type <br>
int (i/u 1/2/4/8) <br> float (f 2/4/8/16) <br> complex (c 8/16/32) <br> bool (?), object (O), string (S), unicode (U)



In [17]:
array_1 = np.array([1,2,3,4],dtype=np.float64) #normal way
array_2 = np.array([5,6,7],dtype="int64") #short hand method
array_3 = array_2.astype(np.float64)
array_4 = array_2.astype(array.dtype) #taking other array dtype attribute

In [None]:
string_array = np.array(["hello","world"],dtype="S4")
#array([b'hell', b'worl'], dtype='|S4')

**ARITHMATIC OPERATION**
- Arithmatic operations here are done on equal size/shape array
- Evaluating operations b/w differently sized arrays is called broadcasting

In [None]:
#basically the operation happens to corresponding elements
array_5 = np.array([[2,3,4],[5,6,7],[7,8,9]])
array_5 * 5 #similarly multiplication , division , subtraction , exponent etc
array_5 + array_5 #similarly multiplication , division , subtraction ,exponent etc
array_5 > array_4 #gives a boolean array making comparison of corresponding elements

**BASIC INDEXING AND SLICING**
- NumPy slices accept scalars (broadcasted to all elements) or arrays (assigned element-wise).
- Python lists require an iterable for slice assignments—scalars aren't allowed.
- A NumPy slice is a view, not a copy—changes affect the original array (same in pandas).
- To create an independent copy of a slice: arr[5:8].copy()
- In one dimensional -> bare slices [:] -> refers to all the values in the array
- Multidimensional syntax of numpy arrays will not work for python lists

In [None]:
arr = np.arange(7)
view = arr[0:5] #it creates a view of the original arr
view[0:3] = 5 #it is going to modify the original arr(no copy)
arr #array([5, 5, 5, 3, 4, 5, 6])-> this is not the case in python list

**2D Arrays**
- Each index gives you a 1D array, not a scalar.
- axis=0 → rows, axis=1 → columns.

**Multidimensional Arrays**
- Each index returns a lower-dimensional array (not just 2D or scalar).
- Use multiple indices to drill deeper into the array.<br>
**EXTRA**
- When passing a scalar value to a arr2d slice or arr3d slice it will be broadcasted to<br>
every value just like in 1D

In [None]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr2d[0][1] or arr2d[0,1] #are the same thing
arr3d[0][1][1] or arr3d[0,1,1] #are the same thing as well(row,column,slices)


##### INDEXING WITH SLICING IN 2D
| **Expression**   | **Shape** |
|------------------|-----------|
| `arr[:2, 1:]`    | `(2, 2)`  |
| `arr[2]`         | `(3,)`    |
| `arr[2, :]`      | `(3,)`    |
| `arr[2:, :]`     | `(1, 3)`  |
| `arr[:, :2]`     | `(3, 2)`  |
| `arr[1, :2]`     | `(2,)`    |
| `arr[1:2, :2]`   | `(1, 2)`  |

**BOOLEAN INDEXING**
- Boolean indexing creates a copy, not a view (unlike slicing).
- Pass boolean arrays to index arrays directly.
- Boolean array length must match the axis it indexes.
- Use & (and), | (or) for multiple conditions (not and / or keywords).
- Use ~ or != for negation (~ is handy with saved boolean arrays in some variable).
- The result of boolean indexing is a copy, not linked to the original array.
- After boolean indexing, you can assign scalars or arrays (broadcasting applies).
- You can mix and match boolean arrays with slices or integers as well

In [None]:
names = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"])
scores = np.array([[4, 7], [0, 2], [-5, 6], [0, 0], [1, 2], [-12, -4], [3, 4]])

is_bob = names == "Bob"                        # Boolean array (True where Bob)
bob_scores = scores[is_bob]                    # Rows where name is Bob (copy)
bob_or_will = (names == "Bob") | (names == "Will")  # Combine conditions
bw_scores = scores[bob_or_will]                # Rows where Bob or Will (copy)
bw_col1 = scores[bob_or_will, 1]               # Select 2nd column for Bob/Will rows
high_scores = scores[scores >= 7]              # Elements >= 7 (flattened)
not_joe = scores[~(names == "Joe")]            # Exclude Joe (negation)

**FANCY INDEXING**
- Term used in numpy where you pass list or ndarray of integers to get the perticular subset of values , rows , column
- -ve index will select from the end
- copies the data when assign to a new variable

In [None]:
arr = np.arange(32).reshape((8,4))

arr[[1, 5, 7]]                          # Select rows 1, 5, 7 → returns 2D array
arr[[-1, -4, -6]]                      # Select rows from end → last, 4th last, 6th last
arr[[1, 3, 7], [1, 2, 0]]              # Picks (1,1), (3,2), (7,0) → returns 1D array of elements
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]     # Fancy indexing: picks rows (1,5,7,2) and reorders columns → cols 0,3,1,2
arr[[1, 5, 7, 2], [0, 3, 1, 2]] = 0 #just like before

**TRANSPOSING ARRAYS AND SWAPING AXES**
- Special form of reshape() , It returns a view of the arr
- Comes handy in matrix multiplications -> np.dot() or you can use @ to multiply to matrix
- .T attributes is use for transpose
- for multidimensional we use swapaxes and specify the axis for swapping

In [None]:
arr = np.arange(15).reshape((3,5))
arr.T or arr.swapaxes(0,1) #transposing
np.dot(arr.T, arr) or arr @ arr.T #for matrix multiplication

**PSUEDORANDOM NUMBER GENERATION**
- NumPy has a built-in random number generator (RNG) → follows a pattern (hence pseudorandom).
- You can control the pattern using your own seed.
- Seed sets the RNG’s initial state → changes every time unless fixed.
- Custom RNG (rng) is isolated from np.random, avoiding interference.
- This rng is going to be  used in the whole book

In [None]:
np.random.standard_normal((4, 4)) #python's built in rng

#python vs numpy -> one by one vs vectorized operation😉
from random import normalvariate
N = 1_000_000
%timeit samples = [normalvariate(0,1) for _ in range(N)]
# 895 ms ± 40 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit np.random.standard_normal(N)
# 25 ms ± 284 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

rng = np.random.default_rng(seed=12345) #Generator(PCG64) at 0x2152E5D34C0
data = rng.standard_normal((2,3)) #specified rng 

#### NumPy Random Methods & Descriptions

- **`permutation`** → Returns a random permutation of a sequence or permuted range  
- **`shuffle`** → Randomly permutes a sequence in place  
- **`uniform`** → Draws samples from a uniform distribution  
- **`integers`** → Draws random integers from a given low-to-high range  
- **`standard_normal`** → Draws samples from a normal distribution (mean = 0, std = 1)  
- **`binomial`** → Draws samples from a binomial distribution  
- **`normal`** → Draws samples from a normal (Gaussian) distribution  
- **`beta`** → Draws samples from a beta distribution  
- **`chisquare`** → Draws samples from a chi-square distribution  
- **`gamma`** → Draws samples from a gamma distribution  
- **`uniform`** → Draws samples from a uniform `[0, 1)` distribution  


**UNIVERSAL FUNCTION -> FAST ELEMENT-WISE ARRAY FUNCTIONS**
- Perform element-wise operations and return element-wise results.
- **Unary ufunc** → Operates on one array.
- **Binary ufunc** → Operates on two arrays, returns one array.
- By default, returns a **copy**.
- Some ufuncs (like `np.modf(arr)`) return **two arrays**.
- Use `out` to store results in existing arrays (saves memory, avoids temp arrays).

In [None]:
arr = np.random.standard_normal(8) * 5
khemu = np.zeros_like(arr)
np.add(arr , 1 , out = khemu)
remainder , whole_part = np.modf(arr)

### Unary Universal Functions (Table 4-4)

- **`abs`, `fabs`**: Absolute value (int, float, complex)
- **`sqrt`**: Square root (`arr ** 0.5`)
- **`square`**: Square (`arr ** 2`)
- **`exp`**: Exponential (`e^x`)
- **`log`, `log10`, `log2`, `log1p`**: Natural log, base-10, base-2, `log(1 + x)`
- **`sign`**: Sign (1 if +, 0 if 0, -1 if -)
- **`ceil`**: Smallest int ≥ element
- **`floor`**: Largest int ≤ element
- **`rint`**: Round to nearest int (preserves dtype)
- **`modf`**: Fractional & integral parts (returns two arrays)
- **`isnan`**: Check for NaN values
- **`isfinite`, `isinf`**: Check for finite or infinite values
- **`cos`, `cosh`, `sin`, `sinh`, `tan`, `tanh`**: Trigonometric & hyperbolic functions
- **`arccos`, `arcsin`, `arctan`, `arccosh`, `arcsinh`, `arctanh`**: Inverse trig functions
- **`logical_not`**: Element-wise NOT (`~arr`)

---

### Binary Universal Functions (Table 4-5)

- **`add`**: Element-wise addition
- **`subtract`**: Subtract second array from first
- **`multiply`**: Element-wise multiplication
- **`divide`, `floor_divide`**: Element-wise division or floor division (truncate remainder)
- **`power`**: Raise first array to powers from second array
- **`maximum`, `fmax`**: Element-wise max (`fmax` ignores NaN)
- **`minimum`, `fmin`**: Element-wise min (`fmin` ignores NaN)
- **`mod`**: Element-wise modulus (remainder)
- **`copysign`**: Copy sign from second to first array
- **`greater`, `greater_equal`, `less`, `less_equal`, `equal`, `not_equal`**: Element-wise comparison (Boolean output)
- **`logical_and`**: Element-wise AND (`&`)
- **`logical_or`**: Element-wise OR (`|`)
- **`logical_xor`**: Element-wise XOR (`^`)
