#### Understanding numpy array axis
- **1D Array (1 axis)**
    ```
        arr = np.array([1, 2, 3, 4])
        # Shape: (4,)
        # axis 0 → goes along the elements
    ```
- **2D Array (2 axes)**
  ```
    A = np.array([[1, 2, 3],
                [4, 5, 6]])
    # Shape: (2, 3)
    # axis 0 → goes DOWN (rows)
    # axis 1 → goes ACROSS (columns)

    #        ← axis 1 →
    #      col0 col1 col2
    #  ↓   [  1,   2,   3 ]  row 0
    # axis [  4,   5,   6 ]  row 1
    #  0
  ```       


In [None]:
oned_arr = np.array([1, 2, 3, 4])
print(oned_arr.shape) # its shape is (4,) not (1,4), When shape is (4,) it means it is a vector, not a matrix.

print(np.sum(oned_arr,axis=0)) # this will return a scalar , -> 10


twod_arr = np.array([[1, 2, 3],
                    [4, 5, 6]])
print(twod_arr.shape)

print(np.sum(twod_arr,axis=0)) # this will along axis 0 (rows),  return a 1D array, -> [5 7 9]
print(np.sum(twod_arr,axis=1)) # this will along axis 1 (columns),  return a 1D array, -> [ 6 15]




(4,)
10
(2, 3)


####  Understanding axis in 3d array
```
    B = np.array([[[1, 2, 3, 4],
                [3, 4, 5, 6]],
                
                [[5, 6,7,8],
                [7, 8,9,2]],

                [[5, 3, 4, 2],
                [4, 5, 8, 6]]])
    # Shape: (3, 2, 4)
    # axis 0 → depth (which "layer")
    # axis 1 → rows (going down)
    # axis 2 → columns (going across)
```

In [None]:
B = np.array([[[1, 2, 3, 4],
                [3, 4, 5, 6]],
                
                [[5, 6,7,8],
                [7, 8,9,2]],

                [[5, 3, 4, 2],
                [4, 5, 8, 6]]])
print(B.shape)

print(np.sum(B,axis=0)) # this will, sum along axis 0 (depth), return a 2D array, -> [[11 11 14 14] [14 17 22 14]]
print(np.sum(B,axis=1)) # this will, sum along axis 1 (rows), return a 2D array, -> [[ 4 6 8 10] [12 14 16 10] [9 8 12 8]]
print(np.sum(B,axis=2)) # this will, sum along axis 2 (columns), return a 2D array, -> [[10 18 ],[26 26] [14,23]]


(3, 2, 4)
[[11 11 14 14]
 [14 17 22 14]]
[[ 4  6  8 10]
 [12 14 16 10]
 [ 9  8 12  8]]
[[10 18]
 [26 26]
 [14 23]]


##### Quick recap on Axis

`arr = np.random.rand(3, 4, 5, 6)  # 4D array`

#### Shape tells you axes:
#### (3, 4, 5, 6)
####  ↑  ↑  ↑  ↑
####  │  │  │  └─ axis 3 (size 6)
####  │  │  └──── axis 2 (size 5)
####  │  └─────── axis 1 (size 4)
####  └────────── axis 0 (size 3)

#### axis = 0 → operate along first dimension
#### axis = -1 → operate along last dimension (same as axis=3 here)
#### axis = -2 → second-to-last (same as axis=2 here)

### Dot Product

##### .dot() .matmul() , @ 
- If both `a` and `b` are 1-D arrays, it is inner product of vectors, remember this is a vector dot product.
    - np.dot(np.array([1, 2, 3]), np.array([4, 5, 6]))  ->        # 1*4 + 2*5 + 3*6 = 32
- If both `a` and `b` are 2-D arrays, it is matrix multiplication,
- If either `a` or `b` is 0-D (scalar), it is equivalent to a scalar multiplication to the vector.
- If `a` is an N-D array and `b` is a 1-D array, it is a sum product over
      the last axis of `a` and `b`.

In [None]:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
p = np.dot(a, b)        # 1*4 + 2*5 + 3*6 = 32
print(p)
print( a @ b)
print(np.matmul(a, b))


32
32
32


In [5]:
# A -> 3 x 2
A = np.array([[1, 2],
              [3, 4],
              [5, 6]])
# B -> 2 x 3
B = np.array([[2,3,4],
              [5,6,7]])
# C -> 3 x 3
C = np.dot(A, B)
print(C)
C = np.matmul(A, B)
print(C)
C = A @ B
print(C)


[[12 15 18]
 [26 33 40]
 [40 51 62]]
[[12 15 18]
 [26 33 40]
 [40 51 62]]
[[12 15 18]
 [26 33 40]
 [40 51 62]]


In [None]:
# A -> 3 x 2
A = np.array([[1, 2],
              [3, 4],
              [5, 6]])
# B -> scalar
B = 2
C = np.dot(A, B)
print(C)
# C = np.matmul(A, B) -> this will throw error as the dimensions are not compatible
# C = A @ B -> this will throw error as the dimensions are not compatible


[[ 2  4]
 [ 6  8]
 [10 12]]


In [10]:
A = np.array([1,2])
B = 2
C = np.dot(A, B)
print(C)
# C = np.matmul(A, B) -> this will throw error as the dimensions are not compatible
# C = A @ B -> this will throw error as the dimensions are not compatible


[2 4]


##### If `a` is an N-D array and `b` is a 1-D array, it is a sum product over the last axis of `a` and `b`.

In [None]:
A = np.array([[1, 2],
              [3, 4],
              [5, 6]])  # shape (3, 2)

B = np.array([2, 3])    # shape (2,)

# "Last axis of A" = the columns (axis 1), which has size 2 , along last axis of A, we have 3 vectors [1,2] [3,4] [5,6]
# Now we have to multiply each of these vector with B vector (2,)
# result[0] = 1*2 + 2*3 = 8   # first row of A · B
# result[1] = 3*2 + 4*3 = 18  # second row of A · B  
# result[2] = 5*2 + 6*3 = 28  # third row of A · B

# result = [8, 18, 28]







### Intutions

#### net_input calculation
- We have a matrix in shape (n_samples, n_features) -> X
- We initialized weights for each feature, So our W shape is (n_features,) -> W
- For net input, we need to calculate the net input for each sample. 
  - for example X[0] is one sample, it is of shape (n_features,) 
  - Now this X[0] must multiply with W -> np.dot(X[0], W) ->  this will be one scalar.
    - Like this we need to calculate for X[1], X[2] ... X[n] -> each one is one one sample, We need to find the net_input for each sample.
    - np.dot(X, W) where X is (n_samples, n_features) and w is (n_features,) → (n_samples,) , its not (n_samples,1) -> numpy reduces the dimenson to only (n_samples,)
- Mathematically this is xᵀw, but NumPy doesn’t need you to write x.T because 1D arrays have no row/column orientation. 

#### gradiant update
- We found our net_input for each sample, it is of shape (n_sample,)
- Now our errors is `errors = (y - net_input)`
- errors is also shape (n_sample,) -> each sample has an error.
- Our goal is to find the gradiant for **each feature**. So that each feature will be updated by its own gradiant. So we need gradiant in the shape of (n_features,)
- X is in shape (n_samples,n_features)
- error is in shape (n_sample), one error per sample.
- **Now lets see how each feature and weight affected the error**
  - In other word **How correlated is this feature with the errors?**
  - Or more specifically **"When this feature is large, do the errors tend to be large? When it's small, are errors small?"**
- Suppose we have 3 features, feature_0. And we have 3 samples.
  -  `feature_0 = [1, 4, 7]   # Feature 0 across all samples`
  -  `errors = [0.5, -0.3, 0.2] # Errors for all samples` 
  -  `feature_0 · errors = 1*0.5 + 4*(-0.3) + 7*0.2  # Dot product:`
  -  **Let's analyze each multiplication:**
     - **Sample 1**: `1 * 0.5 = 0.5`
       - Feature value: 1
       - Error: +0.5 (predicted too low) # remember error is (actual target - predicted target).
       - **Interpretation**: "In sample 1, feature 0 had value 1, and we under-predicted. Since the feature value was positive, maybe we should INCREASE weight 0 to predict higher next time." 
     - **Sample 2**: 4 * (-0.3) = -1.2 
       - Feature value: 4
       - Error: -0.3 (predicted too high)
       - **Interpretation**: "In sample 2, feature 0 had value 4 (larger!), and we over-predicted. Since the feature value was positive AND large, we should DECREASE weight 0 significantly to predict lower next time."
    - **Sample 3**: 7 * 0.2 = 1.4
      - Feature value: 7
      - Error: +0.2 (predicted too low)
      - **Interpretation**: "In sample 3, feature 0 had value 7 (even larger!), and we under-predicted. Since the feature value was large and positive, we should INCREASE weight 0 a lot."
    - **Overall**: Sample 1 is saying increase weight for feature 0. Sample 2 is saying decrease the weight for feature 0, sample 3 is saying increase the weight for feature 0.   # W[0] is the weight for feature 0.
      - **The Sum (0.5 - 1.2 + 1.4 = 0.7):**
        - This total tells you the net direction and strength of the update:
        - **Positive result** (+0.7): "Overall, across all samples, increasing this weight will reduce errors"
        - **Negative result**: "Overall, decreasing this weight will reduce errors"
        - **Large magnitude**: "This feature has strong correlation with errors - adjust it a lot!"
        - **Small magnitude**: "This feature doesn't correlate much with errors - adjust it a little"
      - **Notice**: this sum will be very large or small based the number of samples, if we have 100K samples then sum of all feature_0 * error will be very large. So we need to take the mean. 
- As we have 
  - X is in shape (n_samples,n_features)
  - error is in shape (n_sample), one error per sample.
  - To get the gradiant of for each weight, we need to multiply each feature vector (i.e feature_0 across all sample) to  error vector (we have one error per sample.)
  - Xᵀ will be of shape (n_features, n_samples) , each row is one feature value across sample.
  - Xᵀ.dot(error) -> each feature row vector will be multiplied by error vector -> result will be of shape (n_features,). Each one indicates a +ve or -ve value. based on this its weight will be adjusted.
  - Take the mean **Xᵀ.dot(error)/ X.shape[0]** #  X.shape[0] means number of rows, means number of samples.
  - We should not use this value directly to update our weight. As this value only indicates direction and magnitude. We should multiply this with our learning rate. **self.eta * 2.0 * X.T.dot(errors) / X.shape[0]** -> this gives a smaller number which need to be added to w.
  - 
