In [3]:
import numpy as np

## Statistical Functions
- `np.percentile(arr, q)` - Percentile of the array.
- `np.median(arr)` - Median value.
- `np.mean(arr)` - Mean (average) value.
- `np.std(arr)` - Standard deviation.
- `np.var(arr)` - Variance.
- `np.corrcoef(arr)` - Correlation coefficient matrix.
- `np.histogram(arr)` - Computes the histogram.

### `np.percentile(arr, q)` - Percentile of the array.
#### Percentile tells you what percent of data is below a certain point.

In [5]:
arr = np.array([1, 2, 3, 4, 5])
percentile = np.percentile(arr, 60)
print(percentile)

3.4


### `np.median(arr)` - Median value.

In [6]:
median_value = np.median(arr)
print(median_value)

3.0


### `np.mean(arr)` - Mean (average) value.

In [7]:
mean_value = np.mean(arr)
print(mean_value)

3.0


### `np.std(arr)` - Standard deviation.
#### Standard Deviation is a measure of how spread out the numbers in a data set are. It tells you how much the values in a dataset vary from the mean (average).
#### A low standard deviation means that most of the numbers are close to the mean (less spread out).
#### A high standard deviation means that the numbers are spread out over a wider range.



### Standard Deviation (σ) Formula:

$$
\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2}
$$

Where:
- **σ** = Standard deviation
- **N** = Number of data points
- **xᵢ** = Each individual data point
- **μ** = Mean (average) of all data points
- **Σ** = Summation (sum of all squared differences)


### Example: Calculating Standard Deviation

We have the following data points: **[2, 4, 4, 4, 5, 5, 7, 9]**.

#### Step-by-Step Solution:

1. **Find the mean (μ)**:

   Mean (μ) = $$\frac{2 + 4 + 4 + 4 + 5 + 5 + 7 + 9}{8} = 5$$

2. **Subtract the mean from each data point** and square the result:

   - $$(2 - 5)^2 = 9$$
   - $$(4 - 5)^2 = 1$$
   - $$(4 - 5)^2 = 1$$
   - $$(4 - 5)^2 = 1$$
   - $$(5 - 5)^2 = 0$$
   - $$(5 - 5)^2 = 0$$
   - $$(7 - 5)^2 = 4$$
   - $$(9 - 5)^2 = 16$$

3. **Find the average of these squared differences**:

   $$\frac{9 + 1 + 1 + 1 + 0 + 0 + 4 + 16}{8} = \frac{32}{8} = 4$$

4. **Take the square root of this average**:

   $$\sigma = \sqrt{4} = 2$$

Thus, the **standard deviation (σ)** is **2**.


In [9]:
std_deviation = np.std(arr)
print(std_deviation)

1.4142135623730951


### Mean Absolute Deviation (MAD) Formula

$$
MAD = \frac{1}{N} \sum_{i=1}^{N} |x_i - \mu|
$$

Where:
- **MAD** = Mean Absolute Deviation
- **N** = Number of data points
- **x_i** = Each individual data point
- **μ** = Mean (average) of the data points
- **Σ** = Summation (sum of absolute deviations)

### Example: Calculating MAD

We have the following data points: **[2, 4, 4, 4, 5, 5, 7, 9]**.

#### Step-by-Step Solution:

1. **Find the mean (μ)**:

   Mean (μ) = $$\frac{2 + 4 + 4 + 4 + 5 + 5 + 7 + 9}{8} = 5$$

2. **Find the absolute deviations** from the mean:

   - $$|2 - 5| = 3$$
   - $$|4 - 5| = 1$$
   - $$|4 - 5| = 1$$
   - $$|4 - 5| = 1$$
   - $$|5 - 5| = 0$$
   - $$|5 - 5| = 0$$
   - $$|7 - 5| = 2$$
   - $$|9 - 5| = 4$$

3. **Find the average of these absolute deviations**:

   $$MAD = \frac{3 + 1 + 1 + 1 + 0 + 0 + 2 + 4}{8} = \frac{12}{8} = 1.5$$

Thus, the **Mean Absolute Deviation (MAD)** is **1.5**.


### `np.var(arr)` - Variance.
#### variance tells you how far the individual data points are from the mean.

### Variance Formula

$$
\text{Variance} (\sigma^2) = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2
$$

Where:
- **Variance** (\(\sigma^2\)) = Variance of the data
- **N** = Number of data points
- **x_i** = Each individual data point
- **μ** = Mean (average) of the data points
- **Σ** = Summation (sum of squared deviations)

### Example: Calculating Variance

We have the following data points: **[2, 4, 4, 4, 5, 5, 7, 9]**.

#### Step-by-Step Solution:

1. **Find the mean (μ)**:

   Mean (μ) = $$\frac{2 + 4 + 4 + 4 + 5 + 5 + 7 + 9}{8} = 5$$

2. **Subtract the mean from each data point** and square the result:

   - $$(2 - 5)^2 = 9$$
   - $$(4 - 5)^2 = 1$$
   - $$(4 - 5)^2 = 1$$
   - $$(4 - 5)^2 = 1$$
   - $$(5 - 5)^2 = 0$$
   - $$(5 - 5)^2 = 0$$
   - $$(7 - 5)^2 = 4$$
   - $$(9 - 5)^2 = 16$$

3. **Find the average of these squared deviations**:

   $$\text{Variance} = \frac{9 + 1 + 1 + 1 + 0 + 0 + 4 + 16}{8} = \frac{32}{8} = 4$$

Thus, the **variance** is **4**.


In [10]:
variance = np.var(arr)
print(variance)

2.0


### `np.corrcoef(arr)` - Correlation coefficient matrix.
#### A correlation coefficient matrix is a table that shows the correlation coefficients between multiple variables in a dataset
#### The correlation coefficient ranges from -1 to 1:
#### +1: Perfect positive correlation (when one variable increases, the other increases in a perfectly linear manner).
#### -1: Perfect negative correlation (when one variable increases, the other decreases in a perfectly linear manner).
#### 0: No linear correlation (there's no predictable linear relationship between the two variables).




In [11]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
correlation = np.corrcoef(arr1, arr2)
print(correlation)

[[1. 1.]
 [1. 1.]]


### `np.histogram(arr)` - Computes the histogram.

In [12]:
histogram, bin_edges = np.histogram(arr, bins=3)
print(histogram)
print(bin_edges)

[2 1 2]
[1.         2.33333333 3.66666667 5.        ]


## Other Functions

### `np.equal()`- Compares two arrays element-wise.

In [13]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 2, 4])

result = np.equal(arr1, arr2)
print(result)

[ True  True False]


### `np.array_equal()` - Compares two arrays in their entirety.
#### Returns a single boolean value (True or False) indicating whether the two arrays are exactly the same, including their shape and elements.

In [14]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 2, 4])

result = np.array_equal(arr1, arr2)
print(result)

False


### `np.flatten()` - convert a multi-dimensional array into a one-dimensional array (similar to ravel)

In [16]:
arr = np.array([[1, 2, 3], 
                [4, 5, 6]])

# Flatten the array
flat_arr = arr.flatten()

print(flat_arr)

[1 2 3 4 5 6]


### Array slicing in NumPy allows you to extract a subset of elements from an array.
### `arr[start:stop:step]`

In [17]:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Slicing elements from index 2 to 5
slice_1 = arr[2:6]
print(slice_1)
# Output: [2 3 4 5]

# Slicing from the beginning to index 4
slice_2 = arr[:5]
print(slice_2)
# Output: [0 1 2 3 4]

# Slicing from index 4 to the end
slice_3 = arr[4:]
print(slice_3)
# Output: [4 5 6 7 8 9]

# Slicing with a step of 2
slice_4 = arr[1:8:2]
print(slice_4)
# Output: [1 3 5 7]

# Reverse slicing (negative step)
reverse_slice = arr[::-1]
print(reverse_slice)
# Output: [9 8 7 6 5 4 3 2 1 0]


[2 3 4 5]
[0 1 2 3 4]
[4 5 6 7 8 9]
[1 3 5 7]
[9 8 7 6 5 4 3 2 1 0]


### 2D Matrix
### `arr[row_start:row_end, col_start:col_end]`

In [18]:
# 2D array
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Slice the first two rows and all columns
slice_5 = arr_2d[:2, :]
print(slice_5)
# Output:
# [[1 2 3]
#  [4 5 6]]

# Slice all rows and the first two columns
slice_6 = arr_2d[:, :2]
print(slice_6)
# Output:
# [[1 2]
#  [4 5]
#  [7 8]]

# Slice specific rows and columns
slice_7 = arr_2d[1:, 1:]
print(slice_7)
# Output:
# [[5 6]
#  [8 9]]


[[1 2 3]
 [4 5 6]]
[[1 2]
 [4 5]
 [7 8]]
[[5 6]
 [8 9]]


In [21]:

# Creating a 2D array
arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

# Slice rows 0 to 1 and columns 1 to 2
slice_1 = arr[0:2, 1:3]
print(slice_1)
# Output:
# [[2 3]
#  [6 7]]

# Slice all rows and columns 1 to 3
slice_2 = arr[:, 1:3]
print(slice_2)
# Output:
# [[ 2  3]
#  [ 6  7]
#  [10 11]]

# Slice rows 1 to 2 and all columns
slice_3 = arr[1:3, :]
print(slice_3)
# Output:
# [[ 5  6  7  8]
#  [ 9 10 11 12]]

# Reverse the row order
reverse_rows = arr[::-1, :]
print(reverse_rows)
# Output:
# [[ 9 10 11 12]
#  [ 5  6  7  8]
#  [ 1  2  3  4]]

# Reverse the column order
reverse_cols = arr[:, ::-1]
print(reverse_cols)
# Output:
# [[ 4  3  2  1]
#  [ 8  7  6  5]
#  [12 11 10  9]]


[[2 3]
 [6 7]]
[[ 2  3]
 [ 6  7]
 [10 11]]
[[ 5  6  7  8]
 [ 9 10 11 12]]
[[ 9 10 11 12]
 [ 5  6  7  8]
 [ 1  2  3  4]]
[[ 4  3  2  1]
 [ 8  7  6  5]
 [12 11 10  9]]


## ravel()  vs flatten()
### `ravel()` - Returns a flattened 1D view (reference) of the original array(changes to the raveled array might affect the original array.)

In [22]:
arr = np.array([[1, 2], [3, 4]])

# Using ravel
print(arr)
raveled = arr.ravel()
raveled[0] = 10  # Changes will reflect in the original array
print(arr)  

[[1 2]
 [3 4]]
[[10  2]
 [ 3  4]]


### `flatten()` - Always returns a new flattened 1D array (a copy).(Changes made to the flattened array won't affect the original array.)

In [24]:
arr = np.array([[1, 2], [3, 4]])
flattened = arr.flatten()
flattened[0] = 20  # Changes will NOT affect the original array
print(arr)

[[1 2]
 [3 4]]


## View (Shallow Copy):
### A view is a new array object that references the same data as the original array.
### Any changes made to the view will also affect the original array, since both arrays share the same underlying data.



In [27]:
arr = np.array([1, 2, 3, 4])
view_arr = arr.view()  # Creates a view (shallow copy)

view_arr[0] = 10
print(arr)

[10  2  3  4]


## Copy (Deep Copy):
### A copy creates a completely new array with its own separate data.
### Changes made to the copy will not affect the original array because the data is stored independently.


In [26]:
arr = np.array([1, 2, 3, 4])
copy_arr = arr.copy()  # Creates a copy (deep copy)

copy_arr[0] = 10
print(arr)  # Output: [1, 2, 3, 4] (Original array is NOT modified)


[1 2 3 4]


## Boolean Masking(Filtering by Condition)

In [28]:
arr = np.array([1, 2, 3, 4, 5, 6])

# Filter elements greater than 3
filtered_arr = arr[arr > 3]
print(filtered_arr) 

[4 5 6]


### `np.insert(arr, index, values, axis=None)` - insert values into an array at a specified index.

In [29]:
arr = np.array([1, 2, 3, 4])
result = np.insert(arr, 2, 99)  # Insert 99 at index 2
print(result)

[ 1  2 99  3  4]


### `np.unique(arr)` - returns the sorted unique elements of an array.

In [30]:
arr = np.array([1, 2, 2, 3, 3, 3, 4])
result = np.unique(arr)
print(result)

[1 2 3 4]


### `np.append(arr, values, axis=None)` -  used to add elements at the end of an array.

In [31]:
arr = np.array([1, 2, 3])
result = np.append(arr, [4, 5])
print(result)

[1 2 3 4 5]


### `np.delete(arr, obj, axis=None)` - removes elements from an array at the specified index

In [33]:
arr = np.array([1, 2, 3, 4, 5])
result = np.delete(arr, 2)  # Removes the element at index 2 (which is 3)
print(result)

[1 2 4 5]


In [34]:
arr = np.array([1, 2, 3, 4, 5])
result = np.delete(arr, [1, 3])  # Removes elements at indices 1 and 3 (which are 2 and 4)
print(result)

[1 3 5]


In [35]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
result = np.delete(arr, 1, axis=0)  # Deletes the second row
print(result) 

[[1 2 3]
 [7 8 9]]


In [36]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
result = np.delete(arr, 1, axis=1)  # Deletes the second column
print(result)

[[1 3]
 [4 6]
 [7 9]]


### `np.intersect1d(arr1, arr2)` - returns the sorted, unique values that are common to both arrays

In [39]:
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([4, 5, 6, 7, 8])

result = np.intersect1d(arr1, arr2)
print(result)

[4 5]


### `np.setdiff1d(arr1, arr2)` -  returns the unique values in the first array that are not present in the second array

In [40]:
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([4, 5, 6, 7, 8])

result = np.setdiff1d(arr1, arr2)
print(result)

[1 2 3]
