# Assignment 2: NumPy
- **you will learn:** how to create and manipulate NumPy arrays, perform vectorized computations, and use basic NumPy functions for data analysis
- **task:**  See section 2.9 below
- **deadline:** 20.10.2025
- [NumPy documentation](https://numpy.org/doc/stable/)
- 📝 **Reminder:** Sync your GitHub repository with the main course repository, update your project in PyCharm, and after completing the assignment, commit and push your changes back to GitHub.
---

## 2.0 PEP 8 and Code Commenting

### What is PEP 8?
**PEP 8** is the official **style guide** for Python code.
It defines conventions that make your code **clean, consistent, and easy to read**.
While not mandatory, following PEP 8 is considered a sign of **professional and readable coding**.

### Some Important PEP 8 Rules
- ✅ **Line length:** keep lines **under 79 characters**.
- ✅ **Spacing:**
  - add spaces around operators (`a + b`, not `a+b`)
  - add a space **after** commas, not before (`[1, 2, 3]`, not `[1 ,2 ,3]`)
- ✅ **Variable and function names:** use lowercase with underscores (`calculate_mean`, not `CalculateMean`).
- ✅ **Class names:** use `CamelCase` (`DataProcessor`).
- ✅ **Imports:** at the top of the file, one per line.
- ✅ **Blank lines:** use two blank lines between functions.

 **Official guide:** [PEP 8 – Style Guide for Python Code](https://peps.python.org/pep-0008/)

### Comments and Docstrings
Comments explain **what** your code does and **why**.
Every function should include a **docstring** — a text enclosed in triple quotes `""" ... """` that briefly describes the function’s purpose, parameters, and return value.

#### Example:

```python
def calculate_mean(values):
    """
    Compute the arithmetic mean of a list of numbers.

    Parameters
    ----------
    values : list of float
        Input numbers.

    Returns
    -------
    float
        The arithmetic mean of the input values.
    """
    if not values:
        return 0.0
    return sum(values) / len(values)



---
## 2.1 What is a NumPy Array?

- In computer programming, an **array** is a structure for storing and retrieving data.
- They are the foundation for **data science, machine learning, and scientific computing** in Python.
- We often visualize an array as a **grid in space**, with each cell storing one element of data.
- Arrays can be **1D (vectors), 2D (matrices), or higher-dimensional (tensors)**.

Most NumPy arrays have some rules:

1. **Homogeneous type:** All elements must be of the same data type.
2. **Fixed size:** Once created, the total size cannot change.
3. **Rectangular shape:** All rows (in 2D arrays) must have the same number of columns — no jagged arrays.

When these conditions are met, NumPy can exploit them to make arrays:

- **Faster** (optimized C loops under the hood)
- **More memory efficient** (contiguous memory storage)
- **More convenient to use** (vectorized operations without explicit loops)

In [2]:
import numpy as np
from sqlalchemy.orm.collections import collection_adapter
from sqlalchemy.sql.functions import percentile_cont

print(np.__version__)

2.3.3


In [19]:
# Create 1D array (vector)
vector = np.array([10, 20, 30, 40, 50])
print("1D array (vector):", vector)
print("Shape:", vector.shape, "Dtype:", vector.dtype)

# Create 2D array (matrix)
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
print("2D array (matrix):\n", matrix)
print("Shape:", matrix.shape, "Dtype:", matrix.dtype)

# Vectorized operation: multiply all elements by 2 or square them
matrix2 = 2 * matrix
matrix_sq = matrix ** 2
print("Matrix after multiplying by 2:\n", matrix2)
print("Matrix after squaring by 2:\n", matrix_sq)

1D array (vector): [10 20 30 40 50]
Shape: (5,) Dtype: int64
2D array (matrix):
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Shape: (3, 3) Dtype: int64
Matrix after multiplying by 2:
 [[ 2  4  6]
 [ 8 10 12]
 [14 16 18]]
Matrix after squaring by 2:
 [[ 1  4  9]
 [16 25 36]
 [49 64 81]]


---
## 2.2 Constructing arrays

There are several mechanisms for creating arrays. Among others:

1. **Conversion from other Python structures**
   Arrays can be created directly from existing **lists or tuples** using `np.array()`.
   This is the most common and straightforward way to build an array from existing data.

In [4]:
# Conversion from Python structures
a = np.array([1, 2, 3, 4, 5])
b= np.array(((1,0),(0,1)))
c = np.array([([1,2],[2,1]), ([3,1],[1,3])])
print("From list:\n", a)
print("From tuples of tuples:\n", b)
print("From list of tuples or lists:\n", c)

From list:
 [1 2 3 4 5]
From tuples of tuples:
 [[1 0]
 [0 1]]
From list of tuples or lists:
 [[[1 2]
  [2 1]]

 [[3 1]
  [1 3]]]



2. **NumPy array creation functions**
   NumPy provides a set of **built-in constructors** such as `np.zeros`, `np.ones`, `np.arange`, and `np.linspace`
   to generate arrays of a specific shape or with evenly spaced values.

In [5]:
# np.empty(shape, dtype)
# Creates a new array *without initializing* its entries (values are arbitrary).
arr_empty = np.empty((2,3), dtype="int32")
print("np.empty:\n", arr_empty)
print("np.empty type:", arr_empty.dtype)

np.empty:
 [[         5 1074266112        -22]
 [1073741823         13 1072693248]]
np.empty type: int32


In [6]:
# np.identity(n)
# Shortcut for creating a square identity matrix (ones on the main diagonal).
arr_identity = np.identity(4)
print("np.identity:\n", arr_identity)

np.identity:
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [7]:
# np.eye(N, M, k)
# Creates a 2D array with ones on the main (or k-th) diagonal, zeros elsewhere.
arr_eye = np.eye(4,4,-1)
print("np.eye:\n", arr_eye)

np.eye:
 [[0. 0. 0. 0.]
 [1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]]


In [8]:
# np.ones(shape, dtype)
# Creates an array of given shape filled with ones.
arr_ones = np.ones((2, 4))
print("np.ones:\n", arr_ones)

np.ones:
 [[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [9]:
# np.zeros(shape, dtype)
# Creates an array filled with zeros.
arr_zeros = np.zeros((3, 3))
print("np.zeros:\n", arr_zeros)

np.zeros:
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [10]:
# np.full(shape, fill_value)
# Creates an array filled with a specified constant value.
arr_full = np.full((2, 3), fill_value=7)
print("np.full:\n", arr_full)

np.full:
 [[7 7 7]
 [7 7 7]]


In [11]:
# np.empty_like(prototype)
# Creates an uninitialized array with the *same shape and dtype* as another array.
prototype = np.array([[1, 2, 3], [4, 5, 6]], dtype=float)
arr_zeros_like = np.zeros_like(prototype)
print("np.empty_like:\n", arr_zeros_like)

# Similarly with np.empty_like(a), np.zeros_like(a), np.full_like(a, fill_value)

np.empty_like:
 [[0. 0. 0.]
 [0. 0. 0.]]


In [12]:
# np.arange([start,] stop[, step,][, dtype])
# Returns evenly spaced values within a given interval.
# Similar to Python's range(), but returns a NumPy array.
arr_arange = np.arange(0, 10, 2, dtype=float)
print("np.arange:\n", arr_arange)

np.arange:
 [0. 2. 4. 6. 8.]


In [13]:
# np.linspace(start, stop[, num, endpoint])
# Returns evenly spaced numbers over a specified interval.
# Unlike arange, it lets you specify the number of samples.
arr_linspace = np.linspace(0, 1, num=5)
print("np.linspace:\n", arr_linspace)

np.linspace:
 [0.   0.25 0.5  0.75 1.  ]


In [14]:
# np.diag(v[, k])
# Construct a diagonal matrix from a 1D array, or extract a diagonal from a 2D array.
v = np.array([1, 2, 3])
arr_diag = np.diag(v)
print("np.diag (construct from 1D):\n", arr_diag)

np.diag (construct from 1D):
 [[1 0 0]
 [0 2 0]
 [0 0 3]]


In [22]:
# np.tril(m[, k])
# Return the lower triangle of an array (elements above the k-th diagonal are zeroed).
m = np.arange(1, 10).reshape(3, 3)
arr_tril = np.tril(m)
print("np.tril (lower triangle):\n", arr_tril)

# similarly with np.triu(m[, k])

np.tril (lower triangle):
 [[1 0 0]
 [4 5 0]
 [7 8 9]]



3. **Replicating, joining, or mutating existing arrays**
   Arrays can be **copied, concatenated, reshaped, or repeated** to create new ones.
   For example, you can use `np.tile`, `np.concatenate`, or `reshape` for this purpose.

In [None]:
# np.reshape(a, newshape)
# Changes the shape of an array without changing its data.
a = np.arange(6)
print("a:\n", a)
reshaped = np.reshape(a, (3, 2))
print("np.reshape:\n", reshaped)

In [None]:
# a.flatten()
# Flatten a multi-dimensional array into 1D.
a2 = np.array([[1, 2], [3, 4]])
print("a.flatten:\n", a2.flatten())

In [None]:
# np.transpose(a) or a.T
# Swaps axes, e.g., turns rows into columns.
print("np.transpose:\n", np.transpose(a2))

In [15]:
# np.swapaxes(a, axis1, axis2)
# Swaps any two axes in a multi-dimensional array.
a3 = np.arange(8).reshape(2, 2, 2)
print("a3:\n", a3)
print("np.swapaxes:\n", np.swapaxes(a3, 0, 2))

a3:
 [[[0 1]
  [2 3]]

 [[4 5]
  [6 7]]]
np.swapaxes:
 [[[0 4]
  [2 6]]

 [[1 5]
  [3 7]]]


In [16]:
# np.moveaxis(a, source, destination)
# Moves a given axis to a new position.
a4 = np.zeros((2, 3, 4))
print("np.moveaxis shape:", np.moveaxis(a4, 0, -1).shape)

np.moveaxis shape: (3, 4, 2)


In [17]:
# np.squeeze(a)
# Removes axes of length 1.
a5 = np.zeros((1, 3, 1))
print("a5:\n", a5)
print("np.squeeze shape:", np.squeeze(a5).shape)
print("a5 squeezed:\n", np.squeeze(a5))

a5:
 [[[0.]
  [0.]
  [0.]]]
np.squeeze shape: (3,)
a5 squeezed:
 [0. 0. 0.]


In [None]:
# np.expand_dims(a, axis)
# Adds a new dimension (axis) to the array.
a6 = np.array([1, 2, 3])
print("np.expand_dims shape:", np.expand_dims(a6, axis=0).shape)

In [None]:
# np.concatenate((a1, a2, ...), axis)
# Joins arrays along an existing axis.
a7 = np.ones((2, 2))
b7 = np.zeros((2, 2))
print("np.concatenate:\n", np.concatenate((a7, b7), axis=1))

In [None]:
# np.stack((a1, a2, ...), axis)
# Stacks arrays along a new axis.
a8 = np.array([1, 2])
b8 = np.array([3, 4])
print("np.stack:\n", np.stack((a8, b8), axis=0))

In [None]:
# np.split(a, sections, axis)
# Splits an array into multiple subarrays.
x = np.arange(9)
print("np.split:\n", np.split(x, 3))

In [None]:
# np.copy(a)
# Creates a deep copy of the array (independent of the original).
a11 = np.array([1, 2, 3])
b11 = a11.copy()
b11[0] = 99
print("Original:", a11, " | Copy:", b11)

In [None]:
# np.astype(dtype)
# Converts array elements to a new type.
a12 = np.array([1, 2, 3])
print("astype to float:\n", a12.astype(float))

In [None]:
# np.clip(a, min, max)
# Limits values to a given range.
a13 = np.array([-1, 0, 2, 5])
print("np.clip:\n", np.clip(a13, 0, 3))

In [32]:
# np.where(condition, x, y)
# Selects elements based on a condition.
a14 = np.array([1, 2, 3])
print("np.where (a > 1 -> 100):\n", np.where(a14 > 1, 100, a14))

np.where (a > 1 -> 100):
 [  1 100 100]



4. **Creating Arrays from Other Libraries**

Many Python libraries — such as **SciPy**, **Pandas**, and **OpenCV** — use NumPy `ndarray` objects as a **common format for data exchange**.
These libraries can **create**, **manipulate**, and **interoperate with** NumPy arrays directly.


---
## 2.3 Indexing arrays

Note that Python indexes (unlike for example R) start from 0, not from 1.

### Basic Indexing

In [47]:
# Create a 2D array for demonstration
x = np.arange(1, 13).reshape(3, 4)
print("Array x:\n", x)

# 1. Single element indexing
print("Single element x[1, 2]:", x[1, 2])
print("Same by chained indexing x[1][2]:", x[1][2])

Array x:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
Single element x[1, 2]: 7
Same by chained indexing x[1][2]: 7


In [None]:
# 2. If fewer indices than dimensions, returns a subarray (view)
print("x[0] returns first row (view):", x[0])

In [None]:
# 3. Slicing (rows, columns)
print("x[0:2, 1:4] → rows 0 and 1, columns 1 to 3:\n", x[0:2, 1:4])

In [None]:
# 4. Striding with step (convention start:stop:step)
print("x[:, ::2] → all rows, every second column:\n", x[:, ::2])

In [None]:
# 5. Using negative indices
print("x[-1, -2]:", x[-1, -2])  # last row, second-last column

In [None]:
# 6. Ellipsis (`...`) and `newaxis` (alias None)
# Ellipsis expands to as many ":" as needed
print("x[..., 2] → same as x[:, 2]:", x[..., 2])

# newaxis introduces a new dimension
y = x[:, 1]  # shape (3,)
y2 = x[:, 1, np.newaxis]  # shape (3,1)
print("y shape:", y.shape, "   y2 shape:", y2.shape)

- All slicing operations produce views, not copies — they refer to the same underlying data.
- Because of this, modifying a slice will affect the original array.
- When using integer indexing (not slicing), you reduce a dimension.
- : means “select all elements along this axis”.
- ... is a convenient placeholder to fill in missing : for remaining axes.

### Advanced Indexing

In [None]:
x = np.arange(1, 13).reshape(3, 4)
print("Array x:\n", x)

# 1. Integer array indexing
row_idx = [0, 2]
col_idx = [1, 3]
# Select elements (0,1) and (2,3)
print("x[row_idx, col_idx]:", x[row_idx, col_idx])

# Equivalent as
print("The same as:", np.array([x[0,1],x[2,3]]))

In [None]:
# 2. Broadcasting integer indices
# If you supply fewer index arrays or scalars, they broadcast
print("x[row_idx, 2]:", x[row_idx, 2])
print("The same as:", x[[0,2], [2,2]])

In [121]:
# 3. Boolean masking (Boolean indexing)
mask = x % 2 == 0  # True for even numbers
print("Boolean mask:\n", mask)
print("x[mask] → all even elements:", x[mask])

# Example modification using Boolean mask
x2 = x.copy()
x2[x2 % 2 == 1] = -1
print("x2 with odd elements replaced by –1:\n", x2)

Boolean mask:
 [[False  True False  True]
 [False  True False  True]
 [False  True False  True]]
x[mask] → all even elements: [ 2  4  6  8 10 12]
x2 with odd elements replaced by –1:
 [[-1  2 -1  4]
 [-1  6 -1  8]
 [-1 10 -1 12]]


In [None]:
# 4. Combining basic and advanced indexing
# e.g. select rows 0 and 2, but columns 1:3
print("x[[0, 2], 1:3]:\n", x[[0, 2], 1:3])

---
## 2.4 Array attributes

- Every NumPy array is a Python object of class `numpy.ndarray`.
- Besides storing the actual data, it also stores various attributes that describe its structure and memory layout.

In [None]:
# Let's create a simple 2D array
a = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])

print("Type of numpy array object:", type(a))

In [None]:
print("Number of dimensions (ndim):", a.ndim)
print("Shape (rows, columns):", a.shape)
print("Total number of elements (size):", a.size)
print("Data type (dtype):", a.dtype)
print("Size of one element in bytes (itemsize):", a.itemsize)
print("Total size in bytes (nbytes):", a.nbytes)
print("Transposed array (T):\n", a.T)

Example explanation:
- ndim   → tells how many axes (dimensions) the array has
- shape  → gives the length of each dimension as a tuple
- size   → total count of elements = product of shape entries
- dtype  → data type of the elements (e.g. int32, float64)
- itemsize → bytes per element, depends on dtype
- nbytes   → total memory used by the array
- T        → shorthand for the transposed view (rows <-> columns)


---
## 2.5 Array methods

- A NumPy ndarray provides many built-in methods that operate on the array or return information about it. Most of these methods return a new array or a computed value derived from the data.

In [None]:
# Create a 2D array
a = np.array([[1, 2, 3], [4, 5, 6]])

# Reshape the array to 3 rows and 2 columns
reshaped = a.reshape(3, 2)
print("Reshaped array:\n", reshaped)

# Flatten the array to 1D
flattened = a.flatten()
print("\nFlattened array:", flattened)

In [None]:
# --- Max and Min ---
print("Max element:", a.max())                  # ndarray.max()
print("Index of max (flattened):", a.argmax())  # ndarray.argmax()
print("Min element:", a.min())                  # ndarray.min()
print("Index of min (flattened):", a.argmin())  # ndarray.argmin()

In [None]:
# --- Rounding ---
arr_float = np.array([[1.234, 2.567], [3.891, 4.456]])
rounded = arr_float.round(1)             # Round to 1 decimal
print("Rounded array:\n", rounded)

In [None]:
# --- Trace ---
print("Trace (sum of diagonal):", a.trace())  # Sum along main diagonal

In [None]:
# --- Sum, Cumsum, Mean ---
print("Sum of all elements:", a.sum())
print("Cumulative sum along rows:\n", a.cumsum(axis=1))
print("Mean along columns:", a.mean(axis=0))

In [None]:
# --- Variance and Standard Deviation ---
print("Variance of all elements:", a.var())
print("Standard deviation:", a.std())

In [None]:
# --- Logical checks ---
print("All elements > 0?", (a > 0).all())
print("Any element > 5?", (a > 5).any())

---
## 2.6 Arithmetic and linear algebra


In [None]:
# Create example arrays
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

print("Array a:\n", a)
print("Array b:\n", b)


In [None]:
# Addition (elementwise)
print("Addition (a + b):\n", a + b)

# Subtraction (elementwise)
print("Subtraction (a - b):\n", a - b)

# Multiplication (elementwise)
print("Elementwise multiplication (a * b):\n", a * b)

# Division (elementwise)
print("Elementwise division (a / b):\n", a / b)

# Exponentiation (elementwise)
print("Elementwise power (a ** 2):\n", a ** 2)

# Modulo (elementwise)
print("Elementwise modulo (b % a):\n", b % a)

In [None]:
# Matrix multiplication
matmul = a @ b  # or np.matmul(a, b)
print("Matrix multiplication (a @ b):\n", matmul)

# Dot product
dot = np.dot(a[0], b[0])
print("Dot product (np.dot(a[0], b[0])):\n", dot)

# Transpose
print("Transpose of a (a.T):\n", a.T)

# Determinant
det = np.linalg.det(a)
print("Determinant of a:", det)

# Inverse
inv = np.linalg.inv(a)
print("Inverse of a:\n", inv)

# Eigenvalues and eigenvectors
eigvals, eigvecs = np.linalg.eig(a)
print("Eigenvalues of a:", eigvals)
print("Eigenvectors of a:\n", eigvecs)

# Norms
norm_a0 = np.linalg.norm(a[0])
print("Frobenius norm of a[0]:", norm_a0)

---
## 2.7 Miscellaneous

**NumPy** provides optimized **mathematical functions** that work directly on entire arrays (`np.array` objects). These functions are implemented in **compiled C code**, which makes them **very fast**. They automatically apply the operation **elementwise** to all elements in the array — this is called **vectorization**.

In [None]:
# These are elementwise functions that operate efficiently on ndarrays.

# Create an example array
a = np.array([0, np.pi/4, np.pi/2, np.pi])
print("Array a:\n", a)

# Elementwise trigonometric functions
print("sin(a):", np.sin(a))

In [None]:
b = np.array([1, 2, 3, 4])
print("Array b:\n", b)

# Exponential
print("Exponential (e^b):", np.exp(b))

# Logarithms
print("Natural log (ln(b)):", np.log(b))
print("Log base 10:", np.log10(b))

# Power
print("b cubed:", np.power(b, 3))
c = np.array([1.234, 5.678, -9.1011])

# Round to nearest integer
print("Rounded:", np.round(c))

# Floor and ceiling
print("Floor:", np.floor(c))
print("Ceil:", np.ceil(c))

# Absolute values
print("Absolute values:", np.abs(c))
d = np.array([1, 2, 3, 4, 5])

# Sum/product
print("Sum:", np.sum(d))
print("Product:", np.prod(d))

---
## 2.8 Python lists vs NumPy arrays

Python lists are flexible but **slow and memory-inefficient** for numerical computations.
NumPy arrays (`ndarray`) store data in **contiguous memory** and support **vectorized operations**, making them much faster and smaller in memory footprint.

Let's compare both in terms of **execution speed** and **memory usage**.

In [None]:
import time, sys

# Create a large list and a NumPy array
n = 1_000_000
py_list = list(range(n))
np_array = np.arange(n)

# Compare memory usage of both objects
list_mem = sys.getsizeof(py_list) + sum(sys.getsizeof(x) for x in py_list)
array_mem = np_array.nbytes
print(f"Python list memory: {list_mem / 1e6:.2f} MB")
print(f"NumPy array memory: {array_mem / 1e6:.2f} MB")
print(f"Memory ratio (list / array): {list_mem / array_mem:.1f}×")

# Compute 2x Python list
list_start = time.time()
list_result = [x * 2 for x in py_list]
list_end = time.time()
print(f"Python list time: {list_end - list_start:.5f} s")

# Compute 2x NumPy array
array_start = time.time()
array_result = np_array * 2
array_end = time.time()
print(f"NumPy array time: {array_end - array_start:.5f} s")
print(f"Execution speed ratio (list / array): {(list_end - list_start) / (array_end - array_start):.1f}×")

**Observation:**

- NumPy operations are much faster because they run in optimized C loops rather than Python loops.
- NumPy arrays use far less memory because all elements share the same data type and they are stored in an efficient way

➡️ This demonstrates the two biggest advantages of NumPy:
1. **Vectorization** (no explicit loops)
2. **Efficient memory representation**

---
## 2.9  🏠 Homework: NumPy Arrays in Data Science

### Task Overview
In this assignment, you will practice working with **NumPy arrays** and **mathematical functions** to perform a mini data analysis. You will simulate a small part of a **data preprocessing pipeline** — a common step in data science when dealing with multivariate datasets.

### Your Task

1. **Generate synthetic data:**
   - Create a NumPy array `data` of shape **(100, 10)** — representing 100 samples and 10 features.
   - The values should be drawn from a **normal distribution** with mean = 50 and standard deviation = 10 using
      `np.random.normal(loc=50, scale=10, size=(100, 10))`.
   - Print the shape, data type, and the **first 5 rows** of the array.

2. **Data cleaning:**
   - Replace all values **smaller than 20** or **larger than 80** with `np.nan` (treat them as outliers).
   - Print how many `np.nan` values are now in the array.

3. **Handle missing values:**
   - Compute the **mean of each column** ignoring missing values (`np.nanmean`).
   - Replace all `np.nan` values in each column with that column’s mean.

4. **Data transformation:**
   - **Standardize each column** so that it has mean 0 and standard deviation 1.
   - Create a new array where:
     - all positive standardized values are replaced with their **square roots**,
     - negative values remain unchanged.
   - For the first 5 rows, also compute the **exponential (`np.exp`)** of all standardized values and print the result.

5. **Array indexing and logical operations:**
   - Compute the **75th percentile** for each column.
   - Create a Boolean mask that marks all values above the 75th percentile.
   - Print how many such “high” values there are in total.
   - Replace all values **below the 25th percentile** (computed column-wise) with the 25th percentile value (a simple form of *winsorization*).

6. **Descriptive statistics:**
   - Compute and print for the final cleaned dataset:
     - column-wise **mean**, **median**, **variance**, and **standard deviation**,
     - the **overall mean** of the entire array,
     - and the **minimum and maximum** values per column.

### ✍️ Hints
- Use functions such as `np.mean`, `np.std`, `np.nanmean`, `np.isnan`, `np.where`, `np.percentile`, `np.sqrt`, and `np.exp`.
- Remember to specify the `axis` argument when computing column-wise statistics (`axis=0`).
- Use **vectorized operations** — avoid `for` loops.
- Include **comments or docstrings** to make your code clear and readable.


---
## Your solution:

In [4]:
import numpy as np
## 1 ##
vyber = np.random.normal(loc=50, scale=10, size=(100, 10))
print("Shape:", vyber.shape, "Dtype:", vyber.dtype)
print("Rows 1 through 5:\n", vyber[0:5, : ])


Shape: (100, 10) Dtype: float64
Rows 1 through 5:
 [[59.12654283 43.18531932 60.60039162 56.43056976 51.30733059 47.21319191
  54.28542097 46.20269849 33.41721962 47.16505394]
 [48.27954317 52.95940339 39.03176088 60.93442531 55.97096534 45.32438574
  49.75916768 55.18216609 42.16457374 55.51821516]
 [59.85643049 55.73963206 59.36470724 49.91613448 41.50653193 55.39307427
  69.45458753 56.18948693 43.43661722 49.59680086]
 [39.73204569 53.26736794 30.95713894 51.74340908 50.99319628 31.25024922
  56.01668079 59.93401612 46.77128464 41.56112267]
 [48.77256623 33.58144831 54.08513689 50.65139985 44.8139263  41.67627272
  59.00928679 64.47546581 44.63171442 47.24406491]]


In [5]:
## 2 ##
#removes outliers
vyber = np.where(vyber > 80, np.nan, vyber)
vyber = np.where(vyber < 20, np.nan, vyber)

#number of nan elements
print("number of nan elements:", np.sum(np.isnan(vyber)))

number of nan elements: 1


In [6]:
## 3 ##
#column means
col_means = np.nanmean(vyber, axis=0)
#print(col_means)

#replace nan values with mean
vyber = np.where(np.isnan(vyber), col_means, vyber)


In [8]:
## 4 ##
#standardize
vyb_stand = (vyber - col_means) / np.std(vyber, axis=0)
#print(vyb_stand)
#check:
    #print(np.nanmean(vyb_stand, axis=0))
    #print(np.std(vyb_stand, axis=0))

#involution
vyb_sq = vyb_stand.copy()
vyb_sq[vyb_sq > 0] **= 2

#exp
print(np.exp(vyb_stand[0:5, : ]))

[[2.68441903 0.47637079 2.66138191 1.72706183 1.26648444 0.70973331
  1.49051602 0.60887231 0.17376099 0.95165443]
 [0.81506866 1.43574312 0.27803944 2.65363752 2.0770838  0.57857528
  0.93142042 1.42138776 0.42925138 2.04915079]
 [2.90859369 1.96501226 2.33832497 0.9279041  0.44779166 1.71942251
  7.20537602 1.56320482 0.48958367 1.18973099]
 [0.31862382 1.4865289  0.11935767 1.10454223 1.22497622 0.12623009
  1.78417816 2.22614441 0.69112201 0.56887158]
 [0.86044468 0.16112297 1.34516467 0.99530269 0.63598668 0.3899178
  2.43468259 3.41796036 0.55397126 0.95858351]]


In [9]:
## 5 ##
#percentile
perc = np.percentile(vyb_stand, 75, axis=0)

#mask
mask = vyb_stand > perc
print(len(vyb_stand[mask]))

#clip
perc1 = np.percentile(vyb_stand, 25, axis=0)
vyb_stand = np.clip(vyb_stand, a_min=perc1, a_max=None)


250


In [10]:
## 6 ##
# characteristics
col_means1 = np.mean(vyb_stand, axis=0)
print("Column means:", col_means1)
col_median1 = np.median(vyb_stand, axis=0)
print("Column medians:", col_median1)
col_vars1 = np.var(vyb_stand, axis=0)
print("Column variance:", col_vars1)
col_sd1 = np.std(vyb_stand, axis=0)
print("Column SD:", col_sd1)

#mean
print("Overall mean:", np.mean(vyb_stand))

#min max
print("Column minimums:", np.min(vyb_stand, axis=0))
print("Column maximums:", np.max(vyb_stand, axis=0))

Column means: [0.12087104 0.13881719 0.16416365 0.22503401 0.16544283 0.16976939
 0.15424769 0.11649983 0.11970501 0.16509867]
Column medians: [-0.01081629  0.05784684 -0.05565818  0.02345305 -0.02149482  0.10069135
  0.06356606 -0.0981708   0.03597742  0.00990746]
Column variance: [0.68475455 0.65534461 0.60950301 0.44242861 0.59409323 0.54884265
 0.59196432 0.71844537 0.6921017  0.6018594 ]
Column SD: [0.82749897 0.80953358 0.78070674 0.66515307 0.77077443 0.74083915
 0.76939218 0.84761157 0.8319265  0.77579598]
Overall mean: 0.1539649329777104
Column minimums: [-0.8159082  -0.7570459  -0.58850661 -0.42913396 -0.62493242 -0.61309882
 -0.68683911 -0.76780932 -0.8063089  -0.64011036]
Column maximums: [2.25377313 2.1558068  2.40331887 2.37279521 2.58609196 2.243183
 2.73139708 2.3378215  2.05765655 2.49381728]
