# 2.1 Introduction to NumPy

**NumPy** stands for **Numerical Python**.  
It is a powerful Python library widely used for **scientific computing**, **data analysis**, and **numerical computing tasks**.

---

## Installation
pip install numpy

---

 ## Core Concept
In NumPy, an array is the fundamental object, known as an ndarray.

---

 ## NumPy Arrays
 - Arrays are used to store homogeneous data elements
 - Data is stored in a contiguous block of memory
 - This makes NumPy arrays faster and more memory-efficient than Python lists

 # 2.2 Creating Array

 ## 2.2.1 1D Array

In [2]:
import numpy as np

lst1 = [10, 20, 30, 40, 50]
arr1 = np.array(lst1)

np.array(lst1, dtype = float)

print(arr1)
print(type(arr1))

[10 20 30 40 50]
<class 'numpy.ndarray'>


 ## 1D Array using range

In [6]:
arr2 = np.arange(1, 8)
print(arr2)

[1 2 3 4 5 6 7]


 ## Array 0f Zeros - 1D Array

In [9]:
arr3 = np.zeros(4)
print(arr3)

[0. 0. 0. 0.]


 ## 2.2.2 2D Array

In [10]:
lst4 = [[10, 20, 30], [40, 50, 60], [70, 80, 90]]
arr4 = np.array(lst4)
print(arr4)

[[10 20 30]
 [40 50 60]
 [70 80 90]]


 ## 2D Array using range

In [11]:
arr5 = np.arange(11, 17).reshape((2, 3))
print(arr5)

[[11 12 13]
 [14 15 16]]


 ## Array of Zeros - 2D Array

In [12]:
arr6 = np.zeros((4, 2))
print(arr6)

[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]


 ## Array of Ones - 1D & 2D

In [13]:
arr7 = np.ones((5))
print(arr7)

arr8 = np.ones((5,3))
print(arr8)

[1. 1. 1. 1. 1.]
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


 # 2.3 Attributes of Numpy Array
 NumPy arrays (`ndarray`) have several important attributes that provide information about the array.

 ## a). `ndim`
Returns the number of dimensions of the array.

In [14]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.ndim)   # Output: 2

2


 ## b). `shape`
Returns a tuple representing the shape of the array (rows, columns).

In [15]:
print(arr.shape)  # Output: (2, 3)

(2, 3)


 ## c). `size`
Returns the total number of elements in the array.

In [16]:
print(arr.size)   # Output: 6

6


 ## d). `dtype`
Returns the data type of the elements in the array.

In [17]:
print(arr.dtype)  # Output: int32 / int64 (depends on system)

int64


 ## e). `itemsize`
Returns the size (in bytes) of each element of the array.

In [18]:
print(arr.itemsize)  # Output: 4 or 8 bytes

8


 ## f). `nbytes`
Returns the total number of bytes consumed by the array.

In [19]:
print(arr.nbytes)  # Output: arr.size * arr.itemsize

48


 ## g). `T` (Transpose)
Returns the transpose of the array.

In [20]:
print(arr.T)

[[1 4]
 [2 5]
 [3 6]]


## Summary Table: NumPy Array Attributes

| Attribute | Description |
|---|---|
| `ndim` | Number of dimensions |
| `shape` | Dimensions of the array |
| `size` | Total number of elements |
| `dtype` | Data type of elements |
| `itemsize` | Bytes per element |
| `nbytes` | Total memory used (in bytes) |
| `T` | Transpose of the array |

 # 2.4 Indexing in Numpy Arrays
 - Indexing in NumPy is used to access or modify elements of an array.  
 - NumPy supports **1D**, **2D**, and **multi-dimensional** array indexing.

## 2.4.1 Indexing in 1D Array

In [21]:
arr1 = np.array([10, 20, 30, 40, 50])
print(arr1[0])
print(arr1[-1]) #Reverse indexing

10
50


 ## 2.4.2 Indexing in 2D Array

In [25]:
arr2 = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])
print(arr2)

print(arr2[1, 2])
print(arr2[0, :]) ##whole 0th row
print(arr2[:, 1]) ##whole 1st col

[[10 20 30]
 [40 50 60]
 [70 80 90]]
60
[10 20 30]
[20 50 80]


## 2.5 Slicing in NumPy Array
 - Slicing in NumPy is used to extract a **portion of an array**.  
 - It allows accessing multiple elements at once.
 - array[start : stop : step] - 1D Array
    - start → starting index (inclusive)
    - stop → ending index (exclusive)
    - step → difference between indices (optional)
 - array[row_start:row_stop, col_start:col_stop] - 2D Array
---

 ### Important Notes
 - Slicing returns a view, not a copy
 - Changes made to sliced array affect the original array
 - Efficient for large datasets

 ## 2.5.1 Slicing in 1D Array

In [30]:
import numpy as np

arr = np.array([10, 20, 30, 40, 50, 60])
print(arr)

print(arr[1:4])    # [20 30 40]
print(arr[:3])     # [10 20 30]
print(arr[2:])     # [30 40 50 60]
print(arr[::2])    # [10 30 50]
print(arr[::-1])   # [60 50 40 30 20 10]

[10 20 30 40 50 60]
[20 30 40]
[10 20 30]
[30 40 50 60]
[10 30 50]
[60 50 40 30 20 10]


 ## 2.5.2 Slicing in 2D Array

In [31]:
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])
print(arr2d)

print(arr2d[0:2, 1:3])
# Output:
# [[2 3]
#  [5 6]]

print(arr2d[:, 0])     # First column
print(arr2d[1, :])     # Second row

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[2 3]
 [5 6]]
[1 4 7]
[4 5 6]


 ## 2.5.3 3. Slicing in 3D Array

In [33]:
arr3d = np.array([
    [[1, 2], [3, 4]],
    [[5, 6], [7, 8]]
])
print(arr3d)

print(arr3d[0, :, 1])   # [2 4]

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
[2 4]


## 2.6 Arithmetic Operations on NumPy Arrays
 - NumPy allows **element-wise arithmetic operations** on arrays.  
 - These operations are fast and efficient due to vectorization.

---

 ### Key Points
 - Operations are element-wise
 - Arrays must have the same shape (or be broadcastable)
 - Faster than Python loops

### 1. Addition

In [34]:
arr1 = np.array([10, 20, 30])
arr2 = np.array([1, 2, 3])

result = arr1 + arr2
print(result)   # [11 22 33]

[11 22 33]


 ### 2. Subtraction

In [35]:
result = arr1 - arr2
print(result)   # [9 18 27]

[ 9 18 27]


 ### 3. Multiplication

In [36]:
result = arr1 * arr2
print(result)   # [10 40 90]

[10 40 90]


 ### 4. Division

In [37]:
result = arr1 / arr2
print(result)   # [10. 10. 10.]

[10. 10. 10.]


### 5. Floor Division

In [39]:
result = arr1 // arr2
print(result)   # [10 10 10]

[10 10 10]


 ### 6. Modulus (Remainder)

In [40]:
result = arr1 % arr2
print(result)

[0 0 0]


 ### 7. Power

In [41]:
result = arr1 ** 2
print(result)   # [100 400 900]

[100 400 900]


 ### 8. Scalar Operations

In [42]:
print(arr1 + 5)   # [15 25 35]
print(arr1 * 2)   # [20 40 60]

[15 25 35]
[20 40 60]


 ### 9. Comparison Operations

In [43]:
print(arr1 > 15)     # [False  True  True]
print(arr1 == 20)    # [False  True False]

[False  True  True]
[False  True False]


 ### 10. Universal Functions (ufuncs)

In [44]:
print(np.sqrt(arr1))   # Square root
print(np.sum(arr1))    # Sum of elements
print(np.mean(arr1))   # Mean

[3.16227766 4.47213595 5.47722558]
60
20.0


### 11. Matrix Multiplication
Matrix multiplication follows the rule:
- Number of columns in Matrix A = Number of rows in Matrix B
 - Two Methods:
    - Using `dot()` Method
    - Using `@` Operator (Recommended)

In [45]:
A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

result = np.dot(A, B)
print(result)
# Output:
# [[19 22]
#  [43 50]]


result = A @ B
print(result)

[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


 ### 12. Transpose of a Matrix

In [48]:
print(A)

transpose_A = A.T
print(transpose_A)
# Output:
# [[1 3]
#  [2 4]]

print(B)
transpose_B = np.transpose(B)
print(transpose_B)

[[1 2]
 [3 4]]
[[1 3]
 [2 4]]
[[5 6]
 [7 8]]
[[5 7]
 [6 8]]


 # 2.7 Sorting in NumPy Arrays
NumPy provides multiple ways to sort arrays. Sorting can be done **in-place**, by **returning a sorted copy**, or by sorting along a specific **axis**.

## 1. `np.sort()` – Returns a Sorted Copy

- Does **not** modify the original array
- Returns a new sorted array

In [49]:
arr = np.array([40, 10, 30, 20])

sorted_arr = np.sort(arr)
print(sorted_arr)   # [10 20 30 40]
print(arr)          # Original array remains unchanged

[10 20 30 40]
[40 10 30 20]


 ## 2. `ndarray.sort()` – In-place Sorting
 - Sorts the array in-place
 - Modifies the original array

In [50]:
arr = np.array([40, 10, 30, 20])

arr.sort()
print(arr)   # [10 20 30 40]

[10 20 30 40]


 ## 3. `np.argsort()` – Indices that Would Sort the Array
 - Returns the indices of the sorted array
 - Useful for indirect sorting

In [51]:
arr = np.array([40, 10, 30, 20])

index = np.argsort(arr)
print(index)          # [1 3 2 0]
print(arr[index])    # [10 20 30 40]

[1 3 2 0]
[10 20 30 40]


 ## 4. Sorting 2D Arrays Using axis
 - Row-wise Sorting (axis=1)
 - Column-wise Sorting (axis=0)

In [53]:
#Row-wise Sorting (axis=1)
arr2d = np.array([[3, 1, 2],
                  [6, 5, 4]])

print(np.sort(arr2d, axis=1))
# Output:
# [[1 2 3]
#  [4 5 6]]

#Column-wise Sorting (axis=0)
print(np.sort(arr2d, axis=0))
# Output:
# [[3 1 2]
#  [6 5 4]]

[[1 2 3]
 [4 5 6]]
[[3 1 2]
 [6 5 4]]


## Key Differences Summary: Sorting Methods

| Method | In-place | Returns | Use Case |
|---|---|---|---|
| `np.sort()` | ❌ | ✅ Sorted copy | Keep original array unchanged |
| `ndarray.sort()` | ✅ | ❌ (modifies self) | Save memory by sorting in-place |
| `np.argsort()` | ❌ | ✅ Array of indices | Indirect sorting (get sorted indices) |
| `axis` parameter | — | — | Control row/column sorting |

 **Note** : The `axis` parameter is available for both `np.sort()` and `np.argsort()` to specify sorting along rows `(axis=0)` or columns `(axis=1)`.

# 2.8 Statistical Operations on 1D NumPy Array
NumPy provides several built-in statistical functions to perform **data analysis** on 1D arrays efficiently.

---

 ### Key Points
 - Statistical functions work element-wise
 - Suitable for EDA (Exploratory Data Analysis)
 - Much faster than Python loops

In [54]:
arr = np.array([10, 20, 30, 40, 50])

 ### 1. Mean (Average)

In [55]:
print(np.mean(arr))    # 30.0

30.0


 ### 2. Median

In [56]:
print(np.median(arr))  # 30.0

30.0


 ### 3. Sum

In [57]:
print(np.sum(arr))     # 150

150


 ### 4. Minimum & Maximum

In [58]:
print(np.min(arr))     # 10
print(np.max(arr))     # 50

10
50


 ### 5. Range (Max - Min)

In [59]:
print(np.ptp(arr))     # 40

40


 ### 6. Variance

In [60]:
print(np.var(arr))

200.0


 ### 7. Standard Deviation

In [61]:
print(np.std(arr))

14.142135623730951


 ### 8. Percentile

In [62]:
print(np.percentile(arr, 25))   # 25th percentile
print(np.percentile(arr, 50))   # Median
print(np.percentile(arr, 75))   # 75th percentile

20.0
30.0
40.0


 ### 9. Cumulative Sum

In [63]:
print(np.cumsum(arr))  # [ 10  30  60 100 150]

[ 10  30  60 100 150]


 ### 10. Cumulative Product

In [64]:
print(np.cumprod(arr))

[      10      200     6000   240000 12000000]


 ### 11. Count Non-Zero Elements

In [65]:
print(np.count_nonzero(arr))

5


# 2.9 NumPy Built-in Functions
NumPy provides a wide range of built-in functions for **array creation**, **mathematical operations**, **statistical analysis**, and **linear algebra**.

---

 ### Key Points
 - Built-in functions are optimized and fast
 - Support vectorized operations
 - Widely used in data analysis & machine learning

### 1. Array Creation Functions

In [67]:
arr1 = np.array([1, 2, 3])
print(arr1)

arr2 = np.zeros(5)
print(arr2)

arr3 = np.ones(4)
print(arr3)

arr4 = np.full(3, 7)
print(arr4)

arr5 = np.arange(1, 10, 2)
print(arr5)

arr6 = np.linspace(0, 1, 5)
print(arr6)

arr7 = np.eye(3)
print(arr7)

arr8 = np.random.rand(3)
print(arr8)

arr9 = np.random.randint(1, 10, 5)
print(arr9)

[1 2 3]
[0. 0. 0. 0. 0.]
[1. 1. 1. 1.]
[7 7 7]
[1 3 5 7 9]
[0.   0.25 0.5  0.75 1.  ]
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
[0.47255093 0.19715665 0.75615159]
[4 7 8 6 1]


 ### 2. Mathematical Functions

In [68]:
arr = np.array([10, 20, 30, 40])

print(np.sum(arr))
print(np.mean(arr))
print(np.median(arr))
print(np.min(arr))
print(np.max(arr))
print(np.std(arr))
print(np.var(arr))

100
25.0
25.0
10
40
11.180339887498949
125.0


 ### 3. Logical & Comparison Functions

In [69]:
print(np.any(arr > 3))
print(np.all(arr > 0))
print(np.isfinite(arr))
print(np.isnan(arr))

True
True
[ True  True  True  True]
[False False False False]


 ### 4. Linear Algebra Functions

In [70]:
A = np.array([[1, 2], [3, 4]])

print(np.dot(A, A))
print(np.linalg.det(A))
print(np.linalg.inv(A))

[[ 7 10]
 [15 22]]
-2.0000000000000004
[[-2.   1. ]
 [ 1.5 -0.5]]


 ### 5. Random Functions

In [72]:
print(np.random.rand(3, 3))
print(np.random.randn(5))
print(np.random.randint(1, 100, 5))
print(np.random.choice([10, 20, 30], 2))

[[0.69891288 0.33497075 0.41098496]
 [0.76441382 0.41200458 0.94500114]
 [0.20394915 0.53740466 0.45697909]]
[0.05035937 0.62625739 0.01456257 0.36831438 0.6832998 ]
[54 68 20 30  2]
[20 10]


## 2.10 Shape Manipulation on NumPy Arrays
 - Shape manipulation in NumPy is used to **change the dimensions of arrays** without changing the data.  
 - It is very important for **data preprocessing** and **matrix operations**.

---

### Key Points
 - Shape manipulation does not change data
 - `reshape()` requires compatible dimensions
 - `flatten()` → copy, ravel() → view
 - Essential for machine learning & data analysis

### 1. Checking Shape

In [73]:
arr = np.array([1, 2, 3, 4, 5, 6])
print(arr.shape)   # (6,)

(6,)


 ### 2. Reshape
Changes the shape of an array without changing data.
Total number of elements must remain the same.

In [74]:
reshaped = arr.reshape(2, 3)
print(reshaped)
# [[1 2 3]
#  [4 5 6]]

[[1 2 3]
 [4 5 6]]


 ### 3. Using -1 (Automatic Dimension)

In [75]:
arr.reshape(3, -1)

array([[1, 2],
       [3, 4],
       [5, 6]])

 ### 4. Flatten
Converts multi-dimensional array into a 1D array (returns copy).

In [76]:
arr2d = np.array([[1, 2], [3, 4]])

flat = arr2d.flatten()
print(flat)   # [1 2 3 4]

[1 2 3 4]


 ### 5. Ravel
Converts array to 1D (returns view when possible).

In [77]:
ravel_arr = arr2d.ravel()
print(ravel_arr)

[1 2 3 4]


 ### 6. Expand Dimensions
Adds a new axis to the array.

In [78]:
np.expand_dims(arr, axis=0)  # Row vector
np.expand_dims(arr, axis=1)  # Column vector

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6]])

 ### 7. Squeeze
Removes axes of length 1.

In [79]:
arr3d = np.array([[[1, 2, 3]]])
print(np.squeeze(arr3d))

[1 2 3]


 ### 8. Stacking Arrays

In [80]:
#Vertical Stack
np.vstack((arr, arr))

#Horizontal Stack
np.hstack((arr, arr))

array([1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6])

# 2.11 Conditional Selection on NumPy Array
 - Conditional selection in NumPy is used to **filter elements** from an array based on given conditions.  
 - It is widely used in **data analysis and preprocessing**.

---

 ### Key Points
 - Conditions return boolean arrays
 - Boolean indexing is fast and efficient
 - Use `np.where()` for conditional replacement
 - Widely used in EDA and data cleaning

### 1. Using Comparison Operators

In [81]:
arr = np.array([10, 25, 30, 15, 40])

result = arr > 20
print(result)
# [False  True  True False  True]

[False  True  True False  True]


 ### 2. Selecting Elements Based on Condition

In [82]:
print(arr[arr > 20])
# [25 30 40]

[25 30 40]


 ### 3. Multiple Conditions
Using Logical Operators (`&`, `|`)

In [83]:
print(arr[(arr > 20) & (arr < 40)])
# [25 30]

[25 30]


 ### 4. Using `np.where()`

In [84]:
result = np.where(arr > 20, arr, 0)
print(result)
# [ 0 25 30  0 40]

[ 0 25 30  0 40]


 ### 5. Conditional Selection in 2D Array

In [85]:
arr2d = np.array([[10, 20, 30],
                  [40, 50, 60]])

print(arr2d[arr2d > 30])
# [40 50 60]

[40 50 60]


 ### 6. Replacing Values Using Condition

In [86]:
arr[arr < 20] = 0
print(arr)

[ 0 25 30  0 40]


 ### 7. Using `np.any()` and `np.all()`

In [87]:
print(np.any(arr > 50))   # False
print(np.all(arr >= 0))   # True

False
True


# 2.12 NumPy Array Broadcasting

Broadcasting in NumPy allows **arithmetic operations on arrays of different shapes**  
without explicitly reshaping them.  
It makes code **simpler, faster, and more memory-efficient**.

---

## What is Broadcasting?

Broadcasting automatically **expands the smaller array** so that it matches the shape  
of the larger array during element-wise operations.

---

## Broadcasting Rules

1. If array dimensions are different, **prepend 1s** to the smaller shape.
2. Dimensions are compatible when:
   - They are equal, or  
   - One of them is **1**
3. If dimensions are not compatible, NumPy raises an error.

---

 ### Key Points
 - Broadcasting avoids unnecessary loops
 - Improves performance and readability
 - Very important for data analytics & machine learning

### 1. Scalar Broadcasting

In [88]:
arr = np.array([1, 2, 3, 4])

print(arr + 10)
# [11 12 13 14]

[11 12 13 14]


 ### 2. Broadcasting with 1D and 2D Arrays

In [89]:
arr2d = np.array([[1, 2, 3],
                  [4, 5, 6]])

arr1d = np.array([10, 20, 30])

print(arr2d + arr1d)
# [[11 22 33]
#  [14 25 36]]

[[11 22 33]
 [14 25 36]]


 ### 3. Column-wise Broadcasting

In [90]:
col = np.array([[10],
                [20]])

print(arr2d + col)
# [[11 12 13]
#  [24 25 26]]

[[11 12 13]
 [24 25 26]]


 ### 4. Broadcasting with Different Shapes

In [91]:
A = np.array([[1],
              [2],
              [3]])

B = np.array([10, 20, 30])

print(A + B)

[[11 21 31]
 [12 22 32]
 [13 23 33]]


 ### 5. Broadcasting Error Example

In [93]:
A = np.array([1, 2, 3])
B = np.array([1, 2])

# This will raise an error
#A + B

## Visual Representation: Broadcasting Rules

| Array A Shape | Array B Shape | Result Shape |
|---|---|---|
| `(4,)` | Scalar | `(4,)` |
| `(2, 3)` | `(3,)` | `(2, 3)` |
| `(2, 3)` | `(2, 1)` | `(2, 3)` |