## 📦 What is NumPy?

[NumPy](https://numpy.org/doc/stable/user/whatisnumpy.html) is a fundamental package for scientific computing in Python. It provides:
- Support for large, multi-dimensional arrays and matrices
- A collection of mathematical functions to operate on these arrays
- Internals written in C, making it highly efficient for numerical operations

It is primarily used for numerical operations, especially with large, multi-dimensional arrays. It is typically imported using import numpy as NP. Unlike Python lists, NumPy arrays are designed for efficient numerical operations and consistent data types.

In [None]:
import numpy as np

arr = np.array([1,2,3])
arr
arr.shape

In [None]:
b = np.array([[1,2,3], [4,5,6]])
b.shape

NumPy arrays are “N-dimensional,” meaning they can have any number of dimensions—1D, 2D, 3D, and so on. However, in multi-dimensional arrays, NumPy requires that the structure be regular. This means that each row in a 2D array must have the same number of columns, and each sub-array in higher dimensions must also be uniform in size. You cannot create arrays with uneven or jagged shapes—doing so will result in an error.

In [None]:
#Valid NumPy 2D Array (All rows same length)
# All rows have the same number of columns
valid_array = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

print(valid_array)

In [None]:
#❌ Invalid Ragged Array (Different row lengths)
# Rows have different lengths this will throw error "ValueError: setting an array element 
# with a sequence. The requested array has an inhomogeneous shape after 1 dimensions..."
ragged_array = np.array([
    [1, 2, 3],
    [4, 5],
    [6]
])

## Why is NumPy fast?
NumPy is fast for three key reasons — and all of them boil down to avoiding Python's usual overhead and leveraging efficient, low-level optimizations.

#### 1. Contiguous, Typed Memory (Homogeneous Arrays)
        Python lists:
         - Store references to objects
         - Can contain mixed types (e.g., [3, "cat", True])
         - Require dynamic type resolution at runtime
 
        NumPy arrays:
         - Store data in contiguous blocks of memory
         - Use fixed, uniform types (like int32, float64)
         - Allow for cache-friendly access and low-level optimization
#### 2. Precompiled C/Fortran Backend
        NumPy is essentially a Python wrapper over optimized C libraries.
        Examples:
         - Matrix operations → BLAS, LAPACK
         - Fast transforms → FFTW
         - Linear algebra → optimized C routines
         These libraries are heavily optimized over decades for high-performance computing, and Python simply calls them with a clean interface.
#### 3. Vectorization (No Python Loops)
        Vectorization is a programming technique where you avoid explicit loops and indexing in your code by applying operations to whole arrays or collections at once. Instead of writing a loop to process each element, you rely on libraries (like NumPy) that perform the underlying operations using highly optimized, pre-compiled C or Fortran code. Instead of using Python's slow, interpreted for loops, NumPy performs operations on entire arrays at once using:
                - C or Fortran-level loops under the hood
                - Single function calls that process data in bulk
        This avoids Python’s loop overhead and type checking on every iteration.

In [None]:
#Non-vectorized Code (Using an Explicit Loop)
import time
import random

# Create two lists of 1 million random numbers.
size = 10**6
a = [random.random() for _ in range(size)]
b = [random.random() for _ in range(size)]

# Time the explicit loop multiplication
start_time = time.time()
c = []
for i in range(size):
    c.append(a[i] * b[i])
end_time = time.time()

print("Time with explicit loop:", end_time - start_time, "seconds")


In [None]:
# Vectorized Code (Using NumPy)
import time
import numpy as np

# Create two NumPy arrays of 1 million random numbers.
a = np.random.random(size)
b = np.random.random(size)

# Time the vectorized multiplication
start_time = time.time()
c = a * b
end_time = time.time()

print("Time with vectorized operation:", end_time - start_time, "seconds")

What You’ll Observe:
Memory & Speed: The vectorized operation is typically orders of magnitude faster than the explicit loop.
Simplicity: The code is much more concise, making it easier to read and maintain.

In [None]:
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr.shape

We can use `reshape` to change shape of numpy array. The above `arr` has shape `(3,3)` which is 3 rows and 3 columns. Let's change it to `(9, 1)` which 8 rows and 1 column using reshape. One thing to remember while reshape is we need to make sure total number of elements in array stays same. In our case total element is 9 - so we can reshape to any one of the following
```
arr.reshape(1, 9)   # 1 row, 9 columns
arr.reshape(9, 1)   # 9 rows, 1 column
arr.reshape(3, 3)   # same as original
arr.reshape(9,)     # flatten into 1D array
```
Trying to do `arr.reshape(3, 2)` will raise a ValueError because original shape: (3, 3) → 9 elements
but new shape: (3, 2) → 6 elements

In [None]:
arr.reshape(9,1)

Creating numpy array manually everytime  is inefficient, so we can use few built in functions to quickly create numpy arrays.

In [None]:
# Using np.zeros((r,c), dtype) for quickly generating an array filled with zeros of a specified shape
np_zeros_default = np.zeros((3,3))
print(f"np_zeros_default = {np_zeros_default}")
#above will generate array of floating point numbers, if we want specific dataype we can generate 
np_zeros_int = np.zeros((2, 3), dtype=int) 
print(f"np_zeros_int = {np_zeros_int}")

In [None]:
# Using np.ones((r,c), dtype) for quickly generating an array filled with ones of a specified shape
np_ones_default = np.ones((3,3))
print(f"np_ones_default = {np_ones_default}")
#above will generate array of floating point numbers, if we want specific dataype we can generate 
np_ones_int = np.ones((2, 3), dtype=int) 
print(f"np_ones_int = {np_ones_int}")

In [None]:
#np.arange() is a NumPy function used to create evenly spaced values within a given interval, 
# similar to Python's built-in range() — but returns a NumPy array instead.

print(np.arange(5))
# → array([0, 1, 2, 3, 4])
print(np.arange(2, 10))
# → array([2, 3, 4, 5, 6, 7, 8, 9])
print(np.arange(1, 10, 2))
# → array([1, 3, 5, 7, 9])
print(np.arange(0, 1, 0.2))
# → array([0. , 0.2, 0.4, 0.6, 0.8])

In [None]:
#np.linspace() is used in NumPy to generate a specified number of
#  evenly spaced values over a defined interval — including the endpoint by default.
# np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
print(np.linspace(0, 1, 5))
# → array([0.  , 0.25, 0.5 , 0.75, 1.  ])
print(np.linspace(1, 10, 4))
# → array([ 1.,  4.,  7., 10.])
print(np.linspace(0, 1, 5, endpoint=False))
# → array([0. , 0.2, 0.4, 0.6, 0.8])
values, step = np.linspace(0, 1, 5, retstep=True)
print(values)  # → array([0.  , 0.25, 0.5 , 0.75, 1.  ])
print(step)    # → 0.25

We can access elements in numpy array just like a 2D matrix:

In [None]:
arr = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
#get First row
first_row = arr[0]
print(f"first_row = {first_row}")
# First element (row 0, col 0)
first_element = arr[0,0]
print(f"first_element = {first_element}")
#we can use slicing  to get specific colums
first_column = arr[:, 0]
print(f"first_column= {first_column}")

# we can also use slicing to get specific rows
first_row = arr[0, :]
print(f"first row = {first_row}")
last_row = arr[-1, :]
print(f"last row = {last_row}")

# getting multiple rows and colums
# Top left 2X2 block
top_left_2 = arr[:2, :2]
print(f"top left = {top_left_2}")


### Pairwise opearations and paralleliztion in numpy
One of the biggest strength of numpy is it's ability to perform pairwise(element-wise) operations in a vectorized and parallelized way - so it is fast and efficient compared to traditional python loops.

If we have 3X3 array and we want to multiply all the element in the array with 2, in traditional python way we do the below
```
result = []
for row in arr:
  for val in row:
        result.append(val * 2)
```

Above can be done in numpy simply with

` arr * 2 `

It’s cleaner and much faster, because NumPy operations are written in optimized C code under the hood and leverage SIMD (Single Instruction Multiple Data) and multithreading where possible.

In [None]:
# can perform operations like +, -, *, / directly on the array. 
# NumPy applies them to each element in parallel:
arr = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

result = arr * 2
print(result)
# → array([[ 2,  4,  6],
#          [ 8, 10, 12],
#          [14, 16, 18]])
result = arr + 1
print(result)
# → array([[ 2,  3,  4],
#          [ 5,  6,  7],
#          [ 8,  9, 10]])

# Logical comparisons work the same way — applied to each element:
result = arr > 7
print(result)
# → array([[False, False, False],
#          [False, False, False],
#          [False,  True,  True]])

# We can use the result as a mask to filter values:
arr[arr > 7]
# → array([8, 9])

In NumPy, axis refers to the direction along which operations are performed.

Axis 0 ↓  (rows – vertical)

Axis 1 →  (columns – horizontal)


 - Axis 0: acts down the rows (operates across rows)
 - Axis 1: acts across the columns (operates across columns)

In [None]:
B = np.array([[1, 2, 3],
              [4, 5, 6]])

# np.sum(B, axis=0) → Sum down each column (axis 0)
result = np.sum(B, axis=0)
print(result)

# np.sum(B, axis=1) → Sum across each row (axis 1)
np.sum(B, axis=1)
# array([6, 15]) → [1+2+3, 4+5+6]


np.mean() is used in NumPy to calculate the average (arithmetic mean) of array elements. You can compute the mean for the entire array or along a specific axis.

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

#Mean of the Entire Array
np.mean(A)

#Mean Across Rows → axis=1
np.mean(A, axis=1)
# → array([2., 5., 8.])
#For each row: Row 0: (1+2+3)/3 = 2   Row 1: (4+5+6)/3 = 5.  Row 2: (7+8+9)/3 = 8

# Mean Across Columns → axis=0
np.mean(A, axis=0)
# → array([4., 5., 6.])
#For each column: Column 0: (1+4+7)/3 = 4.  Column 1: (2+5+8)/3 = 5.  Column 2: (3+6+9)/3 = 6

`np.concatenate()` is used to join multiple NumPy arrays **along an existing axis**.


##### Syntax
```python
np.concatenate((arr1, arr2, ...), axis=0)
```

Rule: "Same Shape Except in the Dimension Corresponding to Axis".  
When concatenating:  
    - Along axis=0 (rows): columns must match.  
    - Along axis=1 (columns): rows must match

In [None]:
A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

result = np.concatenate((A, B), axis=0)
print(result)

result = np.concatenate((A, B), axis=1)
print(result)