# Theoretical Questions:

## 1. Explain the purpose and advantages of NumPy in scientific computing and data analysis. How does it enhance Python's capabilities for numerical operations?

### The Purpose and Advantages of NumPy in Scientific Computing and Data Analysis

NumPy (Numerical Python) is a fundamental library for scientific computing and data analysis in Python. It provides an efficient and powerful way to handle large, multi-dimensional arrays of numerical data, significantly enhancing Python's capabilities for numerical operations.

#### Purpose of NumPy:

The primary purpose of NumPy is to offer an optimized way to perform numerical computations on vast datasets. While Python's built-in lists can store collections of data, they are not designed for high-performance numerical operations on large arrays. NumPy addresses this by providing:

* **N-dimensional Array Object (`ndarray`):** At its core, NumPy introduces the `ndarray`, a homogeneous, fixed-size array that stores elements of the same data type. Unlike Python lists, `ndarrays` are stored contiguously in memory, enabling highly efficient operations.
* **Mathematical Functions:** A comprehensive collection of mathematical functions is available, allowing element-wise operations on arrays. This includes trigonometric, logarithmic, exponential, statistical, and many other functions.
* **Linear Algebra Routines:** NumPy provides efficient implementations of essential linear algebra operations, such as dot products, matrix multiplication, decompositions (e.g., SVD, LU), and solvers for systems of linear equations.
* **Fourier Transforms:** It includes modules for performing Fast Fourier Transforms (FFTs) and related operations, which are crucial for signal processing and spectral analysis.
* **Random Number Generation:** Powerful tools for generating various types of random numbers are provided, which are indispensable for simulations, statistical modeling, and machine learning.

### Advantages of NumPy:

#### 1. Performance:

* **Vectorization:** NumPy operations are implemented in C and Fortran, leading to highly optimized, vectorized computations. This means operations are applied to entire arrays at once, drastically outperforming Python's native loops for numerical tasks.
* **Memory Efficiency:** `ndarrays` consume less memory than equivalent Python lists because they store homogeneous data types and have a fixed size, leading to better memory management.

#### 2. Expressiveness and Conciseness:

* NumPy's array-oriented syntax allows you to express complex mathematical operations with concise and readable code. For example, adding two arrays `a + b` is far simpler and faster than iterating through elements with a `for` loop in Python.

#### 3. Foundation for Other Libraries:

* NumPy serves as the bedrock for many other prominent scientific computing and data analysis libraries in the Python ecosystem, including:
    * **Pandas:** Leverages NumPy arrays internally for its DataFrame and Series objects.
    * **SciPy:** Builds upon NumPy to provide more advanced scientific and technical computing functionalities (optimization, interpolation, signal processing, image processing, etc.).
    * **Matplotlib:** Uses NumPy arrays extensively for plotting data.
    * **Scikit-learn:** Relies heavily on NumPy arrays for representing data and performing machine learning algorithms.
    * **Deep Learning Frameworks (TensorFlow/PyTorch):** While they have their own tensor objects, the concepts of vectorized operations and array manipulation are heavily influenced by NumPy.

#### 4. Broadcasting:

* NumPy's broadcasting mechanism enables operations between arrays of different shapes, provided they are compatible. This simplifies code and eliminates the need for explicit loops or tiling operations in many cases.

#### 5. C/C++ and Fortran Integration:

* NumPy offers mechanisms to integrate with existing high-performance C/C++ and Fortran codebases, allowing users to leverage optimized legacy code.

### How NumPy Enhances Python's Capabilities for Numerical Operations:

Python, as a general-purpose programming language, offers great readability and rapid development. However, its native data structures (like lists) and interpreted loops can be slow for computationally intensive numerical tasks. NumPy significantly enhances Python's capabilities by:

1.  **Overcoming the "Python Loop Problem":** NumPy's vectorized operations execute computations on entire arrays within highly optimized C code, effectively bypassing the performance bottleneck of Python's interpreter overhead for numerical tasks.
2.  **Introducing the `ndarray` for Efficient Data Storage:** Standard Python lists are heterogeneous and store pointers to objects, incurring significant overhead. `ndarrays` store contiguous blocks of homogeneous data, leading to better cache utilization and reduced memory footprint.
3.  **Providing a Rich Set of Optimized Numerical Algorithms:** Instead of requiring users to reimplement common mathematical and statistical algorithms from scratch in Python, NumPy provides highly optimized, pre-built functions for these operations. This not only saves development time but also ensures numerical stability and efficiency.
4.  **Enabling High-Performance Data Manipulation:** Operations like slicing, reshaping, stacking, and splitting arrays are exceptionally fast in NumPy, facilitating efficient preparation and transformation of data for analysis and modeling.

In essence, NumPy transforms Python into a powerful and efficient environment for scientific computing, making it a strong contender against specialized numerical languages like MATLAB and and R for a wide range of tasks.

## 2. Compare and contrast np.mean() and np.average() functions in NumPy. When would you use one over the other?

### Comparison of `np.mean()` and `np.average()` in NumPy

#### `np.mean()`

#### Purpose:
The `np.mean()` function computes the **arithmetic mean** (simple average) of elements along a specified axis or over the entire array. It treats all elements equally, assuming they have the same "weight" or importance.

#### Key Characteristics:
* **Arithmetic Mean:** Always calculates the sum of elements divided by the count of elements.
* **No Weights:** Does not accept a `weights` parameter. Every element contributes equally to the mean.
* **`dtype` Parameter:** Supports a `dtype` argument to specify the data type of the output. This can be crucial for numerical precision, especially with large integer arrays where intermediate sums might overflow standard integer types.
* **Masked Arrays:** When used with `numpy.ma.MaskedArray`, `np.mean()` considers the mask and computes the mean only over the unmasked (valid) values.
* **Syntax:** `numpy.mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)`

In [1]:
### Example:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])
simple_mean = np.mean(arr)
print(f"Array: {arr}")
print(f"np.mean(arr): {simple_mean}")

matrix = np.array([[1, 2, 3], [4, 5, 6]])
mean_axis0 = np.mean(matrix, axis=0) # Mean of columns
mean_axis1 = np.mean(matrix, axis=1) # Mean of rows
print(f"\nMatrix:\n{matrix}")
print(f"np.mean(matrix, axis=0): {mean_axis0}")
print(f"np.mean(matrix, axis=1): {mean_axis1}")

Array: [1 2 3 4 5]
np.mean(arr): 3.0

Matrix:
[[1 2 3]
 [4 5 6]]
np.mean(matrix, axis=0): [2.5 3.5 4.5]
np.mean(matrix, axis=1): [2. 5.]


### When to Use One Over the Other

#### Use `np.mean()` when:
- You need to calculate the **simple arithmetic mean** where all data points are considered equally important. This is the most common use case for averages in basic statistical analysis.
- You are concerned about **numerical precision** and need to explicitly control the `dtype` of the output (e.g., to avoid overflow with integer inputs by forcing `float64`).
- You are working with `numpy.ma.MaskedArray` and want the mean to automatically **ignore masked values**.

#### Use `np.average()` when:
- You need to calculate a **weighted average**, where different data points contribute unequally to the overall average. This is common in scenarios like:
  - **Statistics**: Calculating GPA (grades have different credit hours).
  - **Finance**: Portfolio returns (different assets have different allocations).
  - **Survey Data**: Adjusting for sample biases.
  - **Signal Processing**: Applying windowing functions.
- You want a single function that can **optionally handle weighted averages**, simplifying your code if you sometimes need weighted and sometimes simple averages.
- You need to **retrieve the sum of weights along with the average** (using `returned=True`).


## 3. Describe the methods for reversing a NumPy array along different axes. Provide examples for 1D and 2D arrays.

#### 1D Array Reversal

- `np.mean()` vs `np.average()` is **not relevant here**. This section focuses on reversing arrays.
- Reversing with slicing (`[::-1]`) is the most **concise and efficient** way to reverse a NumPy array.
- `np.roll()` performs **cyclic shifts**, not actual reversals. Trying to reverse with `np.roll()` leads to **unexpected or incorrect outcomes** unless done with complex logic.

In [2]:
####  Code
import numpy as np

# 1D Array Example
arr_1d = np.array([1, 2, 3, 4, 5])
print(f"Original 1D array: {arr_1d}")

# Reversing with slicing (preferred)
reversed_1d_slicing = arr_1d[::-1]
print(f"Slicing reversal: {reversed_1d_slicing}")

# Example showing what roll does (not reversal)
rolled_arr_1d = np.roll(arr_1d, 2)  # Shift elements by 2 positions to the right
print(f"np.roll(arr_1d, 2): {rolled_arr_1d}")  # Output: [4 5 1 2 3]

rolled_arr_1d_neg = np.roll(arr_1d, -1)  # Shift elements by 1 position to the left
print(f"np.roll(arr_1d, -1): {rolled_arr_1d_neg}")  # Output: [2 3 4 5 1]

Original 1D array: [1 2 3 4 5]
Slicing reversal: [5 4 3 2 1]
np.roll(arr_1d, 2): [4 5 1 2 3]
np.roll(arr_1d, -1): [2 3 4 5 1]


### 2D Array Reversal

- `np.flip(array, axis=0)` reverses **rows** (i.e., flips along the vertical axis).
- `np.flip(array, axis=1)` reverses **columns** (i.e., flips along the horizontal axis).
- `np.roll()` again does **not reverse** the array; it just **shifts rows or columns cyclically**.


In [3]:
print("\n--- 2D Array Example ---")
matrix = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
print("Original 2D array:\n", matrix)

# Reverse rows using np.flip (preferred)
reversed_rows_flip = np.flip(matrix, axis=0)
print(f"\nReversed rows using np.flip(axis=0):\n{reversed_rows_flip}")

# Trying to simulate row reversal with np.roll()
rolled_rows = np.roll(matrix, shift=1, axis=0)
print(f"np.roll(matrix, shift=1, axis=0):\n{rolled_rows}")
# Output:
# [[7 8 9]
#  [1 2 3]
#  [4 5 6]]

# Reverse columns using np.flip (preferred)
reversed_cols_flip = np.flip(matrix, axis=1)
print(f"\nReversed columns using np.flip(axis=1):\n{reversed_cols_flip}")

# Trying to simulate column reversal with np.roll()
rolled_cols = np.roll(matrix, shift=1, axis=1)
print(f"np.roll(matrix, shift=1, axis=1):\n{rolled_cols}")
# Output:
# [[3 1 2]
#  [6 4 5]
#  [9 7 8]]


--- 2D Array Example ---
Original 2D array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

Reversed rows using np.flip(axis=0):
[[7 8 9]
 [4 5 6]
 [1 2 3]]
np.roll(matrix, shift=1, axis=0):
[[7 8 9]
 [1 2 3]
 [4 5 6]]

Reversed columns using np.flip(axis=1):
[[3 2 1]
 [6 5 4]
 [9 8 7]]
np.roll(matrix, shift=1, axis=1):
[[3 1 2]
 [6 4 5]
 [9 7 8]]


## 4. How can you determine the data type of elements in a NumPy array? Discuss the importance of data types in memory management and performance.

### Determining Data Type in NumPy Arrays

In **NumPy**, the data type of elements in an array is a crucial attribute that impacts:

- How the array is stored in memory.
- How numerical operations are performed.

#### How to Determine the Data Type

You can determine the data type of elements in a NumPy array using the `.dtype` attribute of the array object.

#### Key Points

- If **no type is specified**, NumPy infers the type from the input values.
- If **mixed types** are given, NumPy will **upcast** to the most general type that can contain all values  
  _(e.g., `int` + `float` → `float`)_.
- You can **explicitly set** the data type using the `dtype` parameter.

#### Common NumPy Data Types

- `int8`, `int16`, `int32`, `int64`
- `float16`, `float32`, `float64`
- `complex64`, `complex128`
- `bool`


In [4]:
# Example 1: Integer array
arr_int = np.array([1, 2, 3])
print(f"Array: {arr_int}")
print(f"Data type (arr_int.dtype): {arr_int.dtype}")  # Output: int64 (or int32 depending on system)

# Example 2: Float array
arr_float = np.array([1.0, 2.5, 3.7])
print(f"\nArray: {arr_float}")
print(f"Data type (arr_float.dtype): {arr_float.dtype}")  # Output: float64

# Example 3: Mixed type (NumPy usually upcasts)
arr_mixed = np.array([1, 2.5, 3])
print(f"\nArray: {arr_mixed}")
print(f"Data type (arr_mixed.dtype): {arr_mixed.dtype}")  # Output: float64 (upcasted to accommodate float)

# Example 4: Explicitly specified data type
arr_explicit_int8 = np.array([10, 20, 30], dtype=np.int8)
print(f"\nArray: {arr_explicit_int8}")
print(f"Data type (arr_explicit_int8.dtype): {arr_explicit_int8.dtype}")  # Output: int8

arr_explicit_complex = np.array([1+2j, 3-4j], dtype=np.complex64)
print(f"\nArray: {arr_explicit_complex}")
print(f"Data type (arr_explicit_complex.dtype): {arr_explicit_complex.dtype}")  # Output: complex64

# Example 5: Boolean array
arr_bool = np.array([True, False, True])
print(f"\nArray: {arr_bool}")
print(f"Data type (arr_bool.dtype): {arr_bool.dtype}")  # Output: bool

Array: [1 2 3]
Data type (arr_int.dtype): int64

Array: [1.  2.5 3.7]
Data type (arr_float.dtype): float64

Array: [1.  2.5 3. ]
Data type (arr_mixed.dtype): float64

Array: [10 20 30]
Data type (arr_explicit_int8.dtype): int8

Array: [1.+2.j 3.-4.j]
Data type (arr_explicit_complex.dtype): complex64

Array: [ True False  True]
Data type (arr_bool.dtype): bool


### Importance of Data Types in Memory Management and Performance

Data types are fundamental to how **NumPy** optimizes operations. Their importance stems from two major areas:

---

#### 1. Memory Management

- **Fixed-Size Elements**:  
  NumPy arrays are **homogeneous**, meaning all elements must be of the exact same data type.  
  This fixed size per element allows NumPy to store array data **contiguously** in memory as a block.

- **Predictable Memory Allocation**:  
  Knowing the `dtype` (e.g., `int8` takes 1 byte, `float64` takes 8 bytes) allows NumPy to allocate the **precise amount of memory** needed for the entire array.  
  This is **much more efficient** than Python’s native lists, where each element is a separate Python object with its own memory overhead and pointers.

- **Reduced Overhead**:  
  Since NumPy doesn’t need to store type information for each individual element (only once for the entire array),  
  it significantly reduces memory overhead compared to Python lists — especially for large datasets.

- **🔍 Example**:  
  A Python list of 1 million integers would store 1 million Python `int` objects,  
  each with its own reference count, type information, and value.  
  A NumPy `int32` array of 1 million integers would simply store 1 million 4-byte integers **contiguously** —  
  a massive memory saving.

---

#### 2. Performance

- **Vectorized Operations (C/Fortran Speed)**:  
  The **homogeneity** and **contiguous memory layout** of NumPy arrays enable highly optimized, **vectorized operations**.  
  NumPy delegates these element-wise operations to **highly optimized C or Fortran code**,  
  avoiding slow Python `for` loops and yielding **orders of magnitude faster** performance.

- **CPU Cache Efficiency**:  
  When data is stored contiguously and accessed sequentially (as in vectorized operations),  
  the CPU can efficiently **prefetch data into its cache**, reducing latency and improving speed.

- **No Type Checking Overhead**:  
  Since all elements are known to be of the same type, NumPy **doesn't need to perform runtime type checking** on each element,  
  saving significant processing time.

- **Optimized Algorithms**:  
  Many numerical algorithms (e.g., **linear algebra**, **Fourier transforms**) are implemented in **compiled C/Fortran**,  
  leveraging the fixed-size data types for **maximum efficiency**.

  Choosing an appropriate data type (e.g., `float32` vs. `float64`) can sometimes lead to **faster computations**  
  if lower precision is acceptable, as 32-bit operations may be faster than 64-bit ones on some hardware.

- **Avoiding Type Coercion**:  
  If you perform operations between arrays of different `dtype`s, NumPy might perform **type coercion**  
  (e.g., upcasting to a larger type) before executing the operation.  

  While automatic, this can introduce a **minor performance penalty** if done frequently.  
  Explicitly defining `dtype` or ensuring consistent types can help **avoid unnecessary coercion**.


## 5. Define ndarrays in NumPy and explain their key features. How do they differ from standard Python lists?

#### Definition  
A **NumPy ndarray** (n-dimensional array) is a homogeneous, multidimensional array used for efficient numerical computations in Python. It is the core data structure provided by the NumPy library, optimized for performance and mathematical operations.  

---

#### Key Features of NumPy ndarrays  

1. **Homogeneous Data Type**  
   - All elements must be of the same data type (e.g., `int`, `float`, `bool`), ensuring efficient memory usage and computation.  

2. **Fixed Size**  
   - Once created, the size of an ndarray cannot be changed dynamically (unlike Python lists).  

3. **Multidimensional**  
   - Supports n-dimensional arrays (1D, 2D, 3D, etc.), making it suitable for matrices, tensors, and other numerical structures.  

4. **Efficient Operations**  
   - Optimized for **vectorized operations** (element-wise computations without loops), leading to faster execution than Python lists.  

5. **Memory Efficiency**  
   - Stores data in **contiguous memory blocks**, reducing overhead and improving cache utilization.  

6. **Broadcasting Support**  
   - Allows arithmetic operations between arrays of different shapes under certain conditions.  

7. **Rich Functionality**  
   - Supports advanced operations (linear algebra, Fourier transforms, random number generation, etc.).  

8. **Indexing & Slicing**  
   - Provides flexible indexing (boolean, integer array indexing) and slicing capabilities.  

---

#### Differences Between NumPy ndarrays and Python Lists  

| Feature               | NumPy ndarray                          | Python List                          |  
|----------------------|----------------------------------------|--------------------------------------|  
| **Data Type**        | Homogeneous (same type)                | Heterogeneous (mixed types)          |  
| **Performance**      | Faster (C backend + vectorization)     | Slower for numerical ops             |  
| **Memory Usage**     | Contiguous storage (efficient)         | Stores pointers (less efficient)     |  
| **Functionality**    | Built-in math ops (e.g., `np.sin()`)   | Requires loops/list comprehensions   |  
| **Size Flexibility** | Fixed after creation                   | Dynamically resizable                |  
| **Dimensionality**   | Supports n-dimensions                  | Typically 1D (nested for higher)     |  

---

In [5]:
#### Example Comparison    
import numpy as np  
# NumPy array (homogeneous, efficient)  
arr = np.array([1, 2, 3])  # Stored as int64 in contiguous memory  
# Python list (heterogeneous, flexible)  
lst = [1, "two", 3.0]      # Mixed types, slower for math  
print(arr)
print(lst)

[1 2 3]
[1, 'two', 3.0]


#### When to Use Which?
- **Use NumPy ndarrays** for:
  - Numerical computations
  - Large datasets
  - Math-heavy tasks
  - Machine learning applications
  - Operations requiring vectorization

- **Use Python lists** for:
  - General-purpose collections
  - Mixed data types
  - Dynamic resizing needs
  - Non-numerical data storage
  - Situations where list-specific methods are needed

## 6. Analyze the performance benefits of NumPy arrays over Python lists for large-scale numerical operations.

### NumPy Arrays vs Python Lists: Performance Advantages

NumPy arrays offer significant performance advantages over standard Python lists for large-scale numerical operations. These benefits stem from fundamental differences in their design and implementation.

#### Memory Efficiency and Contiguous Storage

##### NumPy Arrays (ndarray):
- **Homogeneous**: All elements must be the same data type (e.g., all int32 or float64)
- **Contiguous Memory Allocation**: Elements stored in a single, contiguous block
- **Reduced Overhead**: Type information stored once for entire array
- **Cache Locality**: Contiguous storage enables better CPU cache utilization

##### Python Lists:
- **Heterogeneous**: Can store elements of different data types
- **Non-Contiguous Storage**: Stores pointers to scattered Python objects
- **High Overhead**: Each element has its own memory overhead
- **Poor Cache Locality**: Scattered memory locations cause cache misses

#### Vectorization (Elimination of Python Loops)

##### NumPy Arrays:
- **Vectorized Operations**: Functions/operators work on entire arrays
- **C/Fortran Implementation**: Core operations in optimized compiled code
- **"NumPy is Fast C"**: Moves loops from Python to C for speed

##### Python Lists:
- **Explicit Loops**: Require `for` loops or list comprehensions
- **Interpreter Overhead**: Each iteration has Python interpreter costs

#### Optimized Algorithms and Libraries

##### NumPy Arrays:
- **Mature Implementations**: Optimized numerical algorithms in C/Fortran
- **Scientific Stack Foundation**: Used by SciPy, Pandas, ML libraries

##### Python Lists:
- **Limited Numerical Support**: Lacks specialized numerical algorithms
- **Manual Implementation**: Complex operations need custom Python code

#### Broadcasting

##### NumPy Arrays:
- **Automatic Broadcasting**: Handles operations between different-shaped arrays
- **Simplified Code**: Eliminates need for explicit looping

##### Python Lists:
- **No Broadcasting**: Requires manual iteration for different structures
- **Complex Logic**: Needs conditional checks for shape mismatches

#### Performance Comparison Summary

| Feature                | NumPy Arrays (ndarray)                  | Python Lists                          |
|------------------------|-----------------------------------------|---------------------------------------|
| **Data Storage**       | Homogeneous, contiguous                 | Heterogeneous, scattered pointers     |
| **Memory Usage**       | Low overhead                            | High overhead                         |
| **Operation Speed**    | Very fast (C/Fortran backend)           | Slow (Python interpreter)             |
| **CPU Cache**          | Excellent locality                      | Poor locality                         |
| **Numerical Ops**      | Built-in optimized algorithms           | Requires explicit loops               |
| **Flexibility**        | Fixed size, homogeneous                 | Dynamic size, heterogeneous           |

In [6]:
import sys
import time

# --- Configuration ---
NUM_ELEMENTS = 10_000_000 # Ten million elements for large-scale comparison

print(f"--- Performance Comparison for {NUM_ELEMENTS:,} Elements ---\n")

# --- 1. Memory Usage Comparison ---
print("### 1. Memory Usage Comparison\n")

# Python list
python_list = list(range(NUM_ELEMENTS))
# NumPy array (using int32 for comparison, or int64 by default on most systems)
numpy_array_int32 = np.arange(NUM_ELEMENTS, dtype=np.int32)
numpy_array_int64 = np.arange(NUM_ELEMENTS, dtype=np.int64)

# sys.getsizeof for a list usually only gives the size of the list object itself,
# not the elements it points to, which are separate Python int objects.
# For a NumPy array, .nbytes gives the actual size of the data block.
print(f"#### Python List Memory Usage:")
print(f"Size of Python list object (approx, excluding element objects): {sys.getsizeof(python_list)} bytes")
# To get a more realistic estimate for Python list of integers:
# Each Python integer object typically takes about 28 bytes (on 64-bit systems)
# plus the list's own overhead.
estimated_python_list_memory = sys.getsizeof(python_list) + NUM_ELEMENTS * sys.getsizeof(0)
print(f"Estimated total memory for Python list (elements + pointers): {estimated_python_list_memory:,} bytes")
print(f"  (This estimation might still be conservative compared to actual memory usage due to object overheads)")

print(f"\n#### NumPy Array Memory Usage:")
print(f"NumPy array (int32) data size: {numpy_array_int32.nbytes:,} bytes ({numpy_array_int32.itemsize} bytes/element)")
print(f"NumPy array (int64) data size: {numpy_array_int64.nbytes:,} bytes ({numpy_array_int64.itemsize} bytes/element)")

print(f"\nObservation: NumPy arrays consume significantly less memory due to homogeneous, contiguous storage.")

# --- 2. Performance Comparison: Element-wise Addition ---
print("\n### 2. Performance Comparison: Element-wise Addition\n")

# Prepare data for addition
py_list1 = list(range(NUM_ELEMENTS))
py_list2 = list(range(NUM_ELEMENTS))

np_array1 = np.arange(NUM_ELEMENTS)
np_array2 = np.arange(NUM_ELEMENTS)

print("#### Python List Addition (using list comprehension):")
start_time = time.perf_counter()
result_py_list = [x + y for x, y in zip(py_list1, py_list2)]
end_time = time.perf_counter()
python_list_time = end_time - start_time
print(f"Time taken for list addition: {python_list_time:.6f} seconds")
# print(f"First 5 elements of result_py_list: {result_py_list[:5]}") # Optional: verify results

print("\n#### NumPy Array Addition (vectorized operation):")
start_time = time.perf_counter()
result_np_array = np_array1 + np_array2
end_time = time.perf_counter()
numpy_array_time = end_time - start_time
print(f"Time taken for NumPy array addition: {numpy_array_time:.6f} seconds")
# print(f"First 5 elements of result_np_array: {result_np_array[:5]}") # Optional: verify results

print(f"\nObservation: NumPy addition is approximately {python_list_time / numpy_array_time:.2f} times faster.")


# --- 3. Performance Comparison: Summation ---
print("\n### 3. Performance Comparison: Summation\n")

print("#### Python List Summation (using built-in sum()):")
start_time = time.perf_counter()
sum_python = sum(py_list1)
end_time = time.perf_counter()
python_sum_time = end_time - start_time
print(f"Time taken for list summation: {python_sum_time:.6f} seconds")

print("\n#### NumPy Array Summation (using np.sum()):")
start_time = time.perf_counter()
sum_numpy = np.sum(np_array1)
end_time = time.perf_counter()
numpy_sum_time = end_time - start_time
print(f"Time taken for NumPy array summation: {numpy_sum_time:.6f} seconds")

print(f"\nObservation: NumPy summation is approximately {python_sum_time / numpy_sum_time:.2f} times faster.")

print("\n--- Conclusion ---")
print("These comparisons clearly demonstrate the significant performance and memory efficiency benefits of NumPy arrays over standard Python lists for large-scale numerical operations.")
print("This is primarily due to NumPy's underlying C/Fortran implementations and contiguous memory storage, which enable highly optimized vectorized computations and better CPU cache utilization.")

--- Performance Comparison for 10,000,000 Elements ---

### 1. Memory Usage Comparison

#### Python List Memory Usage:
Size of Python list object (approx, excluding element objects): 80000056 bytes
Estimated total memory for Python list (elements + pointers): 360,000,056 bytes
  (This estimation might still be conservative compared to actual memory usage due to object overheads)

#### NumPy Array Memory Usage:
NumPy array (int32) data size: 40,000,000 bytes (4 bytes/element)
NumPy array (int64) data size: 80,000,000 bytes (8 bytes/element)

Observation: NumPy arrays consume significantly less memory due to homogeneous, contiguous storage.

### 2. Performance Comparison: Element-wise Addition

#### Python List Addition (using list comprehension):
Time taken for list addition: 0.273212 seconds

#### NumPy Array Addition (vectorized operation):
Time taken for NumPy array addition: 0.017106 seconds

Observation: NumPy addition is approximately 15.97 times faster.

### 3. Performance Compar

## 7. Compare vstack() and hstack() functions in NumPy. Provide examples demonstrating their usage and output.

### Vertical Stacking (vstack())

#### Definition
- **`np.vstack()`** (vertical stack) concatenates arrays along the first axis (row-wise)
- Stacks arrays vertically, increasing the row count
- All input arrays must have the same number of columns

#### Key Characteristics
- Input arrays must have matching shapes in all dimensions except the first
- Equivalent to `np.concatenate(arrays, axis=0)`
- Useful for combining data records with the same features

### Horizontal Stacking (hstack())

#### Definition
- **`np.hstack()`** (horizontal stack) concatenates arrays along the second axis (column-wise)
- Stacks arrays horizontally, increasing the column count
- All input arrays must have the same number of rows

#### Key Characteristics
- Input arrays must have matching shapes in all dimensions except the second
- Equivalent to `np.concatenate(arrays, axis=1)`
- Useful for combining different features of the same observations

### Key Differences

| Feature            | vstack()                          | hstack()                          |
|--------------------|-----------------------------------|-----------------------------------|
| **Axis**           | Stack along axis 0 (rows)         | Stack along axis 1 (columns)      |
| **Shape Requirement** | Same column count             | Same row count                |
| **Result Shape**   | Rows increase                    | Columns increase                |
| **Use Case**       | Adding more data samples         | Adding more features            |

In [7]:
# Example 1: Basic stacking
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Vertical stacking (row-wise)
v_result = np.vstack((a, b))
print("Vertical Stack:")
print(v_result)
"""
Output:
[[1 2 3]
 [4 5 6]]
"""

# Horizontal stacking (column-wise)
h_result = np.hstack((a, b))
print("\nHorizontal Stack:")
print(h_result)
"""
Output:
[1 2 3 4 5 6]
"""

# Example 2: 2D arrays
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6], [7, 8]])

# Vertical stacking of 2D arrays
v_result_2d = np.vstack((x, y))
print("\nVertical Stack (2D):")
print(v_result_2d)
"""
Output:
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
"""

# Horizontal stacking of 2D arrays
h_result_2d = np.hstack((x, y))
print("\nHorizontal Stack (2D):")
print(h_result_2d)
"""
Output:
[[1 2 5 6]
 [3 4 7 8]]
"""

# Example 3: Shape requirements
try:
    # This will fail for hstack (different row counts)
    np.hstack((np.array([[1, 2]]), np.array([[3, 4], [5, 6]])))
except ValueError as e:
    print(f"\nError: {e}")
    # Output: Error: all the input array dimensions except for the concatenation axis must match exactly

try:
    # This will fail for vstack (different column counts)
    np.vstack((np.array([[1, 2]]), np.array([[3, 4, 5]])))
except ValueError as e:
    print(f"Error: {e}")
    # Output: Error: all the input array dimensions except for the concatenation axis must match exactly

Vertical Stack:
[[1 2 3]
 [4 5 6]]

Horizontal Stack:
[1 2 3 4 5 6]

Vertical Stack (2D):
[[1 2]
 [3 4]
 [5 6]
 [7 8]]

Horizontal Stack (2D):
[[1 2 5 6]
 [3 4 7 8]]

Error: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 1 and the array at index 1 has size 2
Error: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 2 and the array at index 1 has size 3


## 8. Explain the differences between fliplr() and flipud() methods in NumPy, including their effects on various array dimensions.

### `np.fliplr()` (Flip Left-Right)

#### Definition
- Flips array **horizontally** (left to right)
- Operates along the **second axis** (columns)
- Only works on arrays with **2 or more dimensions**

#### Behavior
- Reverses the order of elements in each row
- Maintains the same row order but mirrors column positions
- For 2D arrays: Flips column indices (e.g., column 0 ↔ last column)

---

### `np.flipud()` (Flip Up-Down)

#### Definition
- Flips array **vertically** (top to bottom)
- Operates along the **first axis** (rows)
- Works on arrays of **any dimension**

#### Behavior
- Reverses the order of elements in each column
- Maintains the same column order but mirrors row positions
- For 2D arrays: Flips row indices (e.g., row 0 ↔ last row)

---

### Key Differences

| Feature               | `fliplr()`                          | `flipud()`                          |
|-----------------------|-------------------------------------|-------------------------------------|
| **Axis**              | Second axis (columns)               | First axis (rows)                   |
| **Minimum Dimensions**| Requires ≥2D arrays                 | Works on 1D+ arrays                 |
| **Visual Effect**     | Mirror image horizontally           | Mirror image vertically             |
| **Common Use**        | Image processing (left-right flip)  | Matrix operations (row reversal)    |

In [8]:
# Example 1: 2D array operations
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

print("Original array:")
print(arr)

print("\nfliplr (horizontal flip):")
print(np.fliplr(arr))
"""
[[3 2 1]
 [6 5 4]
 [9 8 7]]
"""

print("\nflipud (vertical flip):")
print(np.flipud(arr))
"""
[[7 8 9]
 [4 5 6]
 [1 2 3]]
"""

# Example 2: 1D array behavior
vec = np.array([1, 2, 3])

print("\nOriginal 1D array:")
print(vec)

print("\nflipud works on 1D:")
print(np.flipud(vec))  # [3 2 1]

try:
    print("\nAttempting fliplr on 1D:")
    print(np.fliplr(vec))  # Raises ValueError
except ValueError as e:
    print(f"Error: {e}")  # "Input must be >= 2-d"

# Example 3: Higher dimensions
cube = np.array([[[1, 2], [3, 4]],
                [[5, 6], [7, 8]]])

print("\nOriginal 3D array (shape: 2,2,2):")
print(cube)

print("\nfliplr on 3D array (flips last dimension):")
print(np.fliplr(cube))
"""
[[[3 4], [1 2]]
 [[7 8], [5 6]]]
"""

print("\nflipud on 3D array (flips first dimension):")
print(np.flipud(cube))
"""
[[[5 6], [7 8]]
 [[1 2], [3 4]]]
"""

Original array:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

fliplr (horizontal flip):
[[3 2 1]
 [6 5 4]
 [9 8 7]]

flipud (vertical flip):
[[7 8 9]
 [4 5 6]
 [1 2 3]]

Original 1D array:
[1 2 3]

flipud works on 1D:
[3 2 1]

Attempting fliplr on 1D:
Error: Input must be >= 2-d.

Original 3D array (shape: 2,2,2):
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]

fliplr on 3D array (flips last dimension):
[[[3 4]
  [1 2]]

 [[7 8]
  [5 6]]]

flipud on 3D array (flips first dimension):
[[[5 6]
  [7 8]]

 [[1 2]
  [3 4]]]


'\n[[[5 6], [7 8]]\n [[1 2], [3 4]]]\n'

### Key Observations

#### Dimensionality Matters
- `fliplr()` requires ≥2D arrays
- `flipud()` works on any dimension (including 1D vectors)

#### Axis Behavior
- `fliplr()` always operates on axis 1 (columns)
- `flipud()` always operates on axis 0 (rows)

#### Higher Dimensions
- For 3D+ arrays:
  - `fliplr()` flips along the last axis of each 2D slice
  - `flipud()` flips along the first axis regardless of total dimensions

#### Performance Characteristics
- Both functions:
  - Return views (not copies) when possible
  - Have O(n) computational complexity (for n elements)

## 9. Discuss the functionality of the array_split() method in NumPy. How does it handle uneven splits?

### `numpy.array_split()` Method

#### Definition
- Splits an array into multiple sub-arrays (sections)
- More flexible than `split()` as it allows **uneven divisions**
- Works along specified axis (default is `axis=0`)

#### Key Features
- Accepts **any division number** (can be greater than array length)
- Handles **indivisible cases gracefully** by varying section sizes
- Returns a **list of sub-arrays** (views when possible)
- Supports **multi-dimensional arrays** along any axis

#### Uneven Split Handling
When the array can't be divided equally:
1. Distributes **extra elements evenly** across early sections
2. Results in sections that differ by **at most 1 element**
3. Follows the pattern: First sections get extra elements

#### Comparison with `split()`

| Feature          | `array_split()`            | `split()`                |
|------------------|----------------------------|--------------------------|
| **Uneven Divisions** | Allowed                    | Raises ValueError        |
| **Output Sections**  | May vary in size           | Must be equal            |
| **Use Case**         | Flexible partitioning      | Strict equal divisions   |

In [11]:
# Example 1: Basic uneven split
arr = np.arange(10)  # [0 1 2 3 4 5 6 7 8 9]

# Split into 3 parts (10/3 = 3 with remainder 1)
result = np.array_split(arr, 3)
print("Uneven split (3 parts):")
for i, section in enumerate(result):
    print(f"Section {i+1}: {section}")

"""
Output:
Section 1: [0 1 2 3]
Section 2: [4 5 6]
Section 3: [7 8 9]
"""

# Example 2: More sections than elements
small_arr = np.array([1, 2, 3])
print("\nMore sections than elements:")
print(np.array_split(small_arr, 5))  # Returns 5 sections (some empty)

"""
Output:
[array([1]), array([2]), array([3]), array([], dtype=int32), array([], dtype=int32)]
"""

# Example 3: Multi-dimensional splitting
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Column-wise split (axis=1)
print("\nColumn-wise split:")
col_splits = np.array_split(matrix, 2, axis=1)
for i, col_section in enumerate(col_splits):
    print(f"Columns {i+1}:\n{col_section}")

"""
Output:
Columns 1:
[[1 2]
 [4 5]
 [7 8]]
Columns 2:
[[3]
 [6]
 [9]]
"""

# Example 4: Comparison with split()
try:
    np.split(arr, 3)  # Will raise ValueError for uneven division
except ValueError as e:
    print(f"\nsplit() error: {e}")

Uneven split (3 parts):
Section 1: [0 1 2 3]
Section 2: [4 5 6]
Section 3: [7 8 9]

More sections than elements:
[array([1]), array([2]), array([3]), array([], dtype=int64), array([], dtype=int64)]

Column-wise split:
Columns 1:
[[1 2]
 [4 5]
 [7 8]]
Columns 2:
[[3]
 [6]
 [9]]

split() error: array split does not result in an equal division


### Key Observations

#### Intelligent Distribution
- For 10 elements into 3 sections → sizes `[4, 3, 3]`
- For 7 elements into 3 sections → sizes `[3, 2, 2]`

#### Edge Cases
- Returns empty arrays when `N > array length`
- Handles zero-length arrays gracefully

#### Memory Efficiency
- Returns **views** rather than copies for contiguous splits
- Creates **copies** only when necessary (non-contiguous splits)

#### Multi-axis Support
- Works identically along **any specified axis**
- Maintains array **dimensionality** in splits

## 10. Explain the concepts of vectorization and broadcasting in NumPy. How do they contribute to efficient array operations?

### Vectorization in NumPy

#### Definition
* **Vectorization** refers to performing operations on entire arrays rather than individual elements.
* It eliminates the need for explicit Python loops, which can be slow.
* It's implemented via optimized C/Fortran backends, making operations much faster.

#### Key Benefits
* **Performance:**
    * Avoids Python interpreter overhead, leading to significant speed gains.
    * Leverages the CPU's **SIMD** (Single Instruction Multiple Data) instructions, allowing parallel processing of data.
* **Readability:**
    * Promotes a more concise and mathematical notation, making your code easier to understand.
* **Functionality:**
    * Enables element-wise operations with clean and intuitive syntax, simplifying array manipulations.

In [22]:
# Non-vectorized (slow)
a = [1, 2, 3]
b = [4, 5, 6]
result = [x+y for x,y in zip(a,b)]
print(result)

[5, 7, 9]


In [24]:
# Vectorized (fast)
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = a + b  # [5 7 9]
print(result)

[5 7 9]


### Broadcasting in NumPy

#### Definition
* **Broadcasting** allows operations between arrays of different shapes.
* It automatically expands the smaller array to match the larger array's shape without actually duplicating data.
* It follows strict rules for dimension compatibility to determine if an operation is valid.

#### Broadcasting Rules
* **Shape Alignment:**
    * When comparing two arrays, it starts with the trailing dimensions (from right to left).
    * For an operation to be valid, corresponding dimensions must either be equal, or one of them must be 1.
* **Dimension Expansion:**
    * If arrays have different numbers of dimensions, the smaller array is padded with size-1 dimensions on its left side until the number of dimensions matches the larger array.
* **Size Matching:**
    * Any dimension of size 1 is "stretched" or repeated to match the corresponding dimension's size in the other array.

In [26]:
# Array (3,3) + Scalar → Scalar broadcasts to (3,3)
matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
result = matrix + 10  
print(matrix)
print(result)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[11 12 13]
 [14 15 16]
 [17 18 19]]


In [27]:
# Array (3,1) + Array (1,3) → Both broadcast to (3,3)
a = np.array([[1], [2], [3]])  # shape (3,1)
b = np.array([4, 5, 6])        # shape (3,)
result = a + b
print(a)
print(b)
print(result)

[[1]
 [2]
 [3]]
[4 5 6]
[[5 6 7]
 [6 7 8]
 [7 8 9]]


### Efficiency Contributions

#### Performance Optimization

| Feature      | Benefit                                | Speed Gain |
|--------------|----------------------------------------|------------|
| **Vectorization** | Eliminates Python loop overhead        | 10-100x    |
| **Broadcasting** | Avoids explicit array expansion        | 2-10x      |

#### Memory Efficiency

* **Vectorization:**
    * Processes contiguous memory blocks, leading to more efficient data access.
    * Results in better cache utilization, as relevant data is often already in the CPU's cache.
* **Broadcasting:**
    * Avoids creating temporary copies of arrays, especially when operating on arrays of different shapes.
    * Uses "virtual expansion," meaning the smaller array is logically stretched without consuming additional memory for the expanded view.

In [28]:
# Traditional approach (slow)
def slow_add(arr, scalar):
    result = np.empty_like(arr)
    for i in range(arr.shape[0]):
        for j in range(arr.shape[1]):
            result[i,j] = arr[i,j] + scalar
    return result
print(result)

[[5 6 7]
 [6 7 8]
 [7 8 9]]


In [29]:
# NumPy approach (fast)
def fast_add(arr, scalar):
    result = arr + scalar  # Uses both vectorization and broadcasting
    return result
print(result)

[[5 6 7]
 [6 7 8]
 [7 8 9]]


### Key Differences

#### Concept & Application

| Concept        | Primary Advantage        | When It Applies                                  |
|----------------|--------------------------|--------------------------------------------------|
| **Vectorization** | Loop elimination         | Primarily for element-wise operations on arrays. |
| **Broadcasting** | Shape compatibility      | When performing operations between arrays of different, but compatible, dimensions. |

# Practical Question

## 1. Create a 3x3 NumPy array with random integers between 1 and 100. Then, interchange its rows and columns.

In [33]:
arr = np.random.randint(1, 101, size = (3,3))
arr

array([[55, 66, 12],
       [62, 56, 87],
       [39, 49, 59]])

In [34]:
arr.T

array([[55, 62, 39],
       [66, 56, 49],
       [12, 87, 59]])

## 2. Generate a 1D NumPy array with 10 elements. Reshape it into a 2x5 array, then into a 5x2 array

In [46]:
arr = np.array([11,14,12,34,63,725,56,24,51,46])
arr

array([ 11,  14,  12,  34,  63, 725,  56,  24,  51,  46])

In [47]:
arr.size

10

In [48]:
np.reshape(arr, (2,5))

array([[ 11,  14,  12,  34,  63],
       [725,  56,  24,  51,  46]])

In [49]:
np.reshape(arr, (5,2))

array([[ 11,  14],
       [ 12,  34],
       [ 63, 725],
       [ 56,  24],
       [ 51,  46]])

## 3. Create a 4x4 NumPy array with random float values. Add a border of zeros around it, resulting in a 6x6 array.

In [85]:
arr = np.random.rand(4,4)
arr

array([[0.64364751, 0.23728421, 0.56141876, 0.12342034],
       [0.23215511, 0.88599638, 0.76521042, 0.08366925],
       [0.87258684, 0.10665718, 0.11210163, 0.23499065],
       [0.28573075, 0.01508316, 0.36382986, 0.4655406 ]])

In [86]:
brr = np.zeros((6,6))
brr

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [88]:
brr[1:5, 1:5] = arr
brr

array([[0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ],
       [0.        , 0.64364751, 0.23728421, 0.56141876, 0.12342034,
        0.        ],
       [0.        , 0.23215511, 0.88599638, 0.76521042, 0.08366925,
        0.        ],
       [0.        , 0.87258684, 0.10665718, 0.11210163, 0.23499065,
        0.        ],
       [0.        , 0.28573075, 0.01508316, 0.36382986, 0.4655406 ,
        0.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ,
        0.        ]])

In [90]:
# once again 
arr = np.ones((4,4))
arr

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [92]:
brr = np.zeros((6,6))
brr

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [93]:
brr[1:5, 1:5] = arr
brr

array([[0., 0., 0., 0., 0., 0.],
       [0., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 0.],
       [0., 1., 1., 1., 1., 0.],
       [0., 0., 0., 0., 0., 0.]])

## 4. Using NumPy, create an array of integers from 10 to 60 with a step of 5.

In [101]:
arr = np.arange(10,61,5)
arr

array([10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60])

## 5. Create a NumPy array of strings ['python', 'numpy', 'pandas']. Apply different case transformations (uppercase, lowercase, title case, etc.) to each element.

In [102]:
arr = np.array(["Python","Numpy","Pandas"])
arr

array(['Python', 'Numpy', 'Pandas'], dtype='<U6')

In [103]:
np.char.upper(arr)

array(['PYTHON', 'NUMPY', 'PANDAS'], dtype='<U6')

In [104]:
np.char.lower(arr)

array(['python', 'numpy', 'pandas'], dtype='<U6')

In [105]:
np.char.title(arr)

array(['Python', 'Numpy', 'Pandas'], dtype='<U6')

In [106]:
np.char.swapcase(arr)

array(['pYTHON', 'nUMPY', 'pANDAS'], dtype='<U6')

In [107]:
np.char.capitalize(arr)

array(['Python', 'Numpy', 'Pandas'], dtype='<U6')

## 6. Generate a NumPy array of words. Insert a space between each character of every word in the array.

In [125]:
arr = np.array(['hello', 'numpy', 'world', 'python'])
arr

array(['hello', 'numpy', 'world', 'python'], dtype='<U6')

In [126]:
def add_spaces_between_chars(word):
    #Inserts a space between each character of a given word.
    return ' '.join(word)

In [127]:
# Apply the function to each element of the NumPy array
add_spaces_vectorized = np.vectorize(add_spaces_between_chars)
spaced_words_array = add_spaces_vectorized(arr)

In [128]:
print("\nArray with spaces between each character of every word:")
print(spaced_words_array)


Array with spaces between each character of every word:
['h e l l o' 'n u m p y' 'w o r l d' 'p y t h o n']


## 7. Create two 2D NumPy arrays and perform element-wise addition, subtraction, multiplication, and division.

In [129]:
ary = np.random.randint(1, 11, size = (3,3))
bry = np.random.randint(1, 11, size = (3,3))

In [130]:
ary, bry

(array([[ 2,  1,  3],
        [ 4,  7,  3],
        [ 6, 10,  3]]),
 array([[5, 2, 3],
        [7, 5, 7],
        [4, 8, 4]]))

In [131]:
ary + bry

array([[ 7,  3,  6],
       [11, 12, 10],
       [10, 18,  7]])

In [133]:
ary - bry

array([[-3, -1,  0],
       [-3,  2, -4],
       [ 2,  2, -1]])

In [134]:
ary * bry

array([[10,  2,  9],
       [28, 35, 21],
       [24, 80, 12]])

In [135]:
ary / bry

array([[0.4       , 0.5       , 1.        ],
       [0.57142857, 1.4       , 0.42857143],
       [1.5       , 1.25      , 0.75      ]])

In [137]:
ary @ bry # this is original matrix multiplication

array([[ 29,  33,  25],
       [ 81,  67,  73],
       [112,  86, 100]])

## 8. Use NumPy to create a 5x5 identity matrix, then extract its diagonal elements.

In [154]:
arr = np.eye(4)
arr

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [156]:
diagonal_elements = np.diag(arr)
diagonal_elements

array([1., 1., 1., 1.])

## 9. Generate a NumPy array of 100 random integers between 0 and 1000. Find and display all prime numbers in this array.

In [157]:
def is_prime(num):
    """
    Checks if a number is prime.
    A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself.
    """
    if num <= 1:
        return False
    if num <= 3:
        return True
    if num % 2 == 0 or num % 3 == 0:
        return False
    i = 5
    while i * i <= num:
        if num % i == 0 or num % (i + 2) == 0:
            return False
        i += 6
    return True

In [161]:
arr = np.random.randint(0, 1001, 100) # 0 to 1000 inclusive
arr

array([959, 678, 255, 612, 790, 298, 437,  53, 997, 870, 650, 660, 636,
       360, 458, 440, 392, 870, 481, 552, 637, 805, 589, 223, 496,  30,
       708, 116, 926, 725,  63,  84, 456, 380, 528,  92, 790, 379, 152,
       848, 377, 734, 384, 391,  36,  14, 709, 825, 732, 566, 583, 819,
       218, 151,  29, 343, 620, 648,  92, 572, 481, 173, 196, 585, 366,
       676, 824, 504, 399, 925, 959, 560, 205, 123, 468, 752, 169, 698,
       903,  99, 828, 665, 388,  40, 652, 416, 794, 321, 618, 506, 439,
       325,   4, 755, 154, 735, 839, 737, 293, 685])

In [163]:
arr.size

100

In [169]:
# We can use np.vectorize to apply the is_prime function element-wise
vectorized_is_prime = np.vectorize(is_prime)

# Create a boolean mask where True indicates a prime number
prime_mask = vectorized_is_prime(arr)

# Use the boolean mask to filter the original array
prime_numbers_in_array = arr[prime_mask]

In [170]:
print("\nPrime Numbers found in the array:")
if len(prime_numbers_in_array) > 0:
    print(np.sort(prime_numbers_in_array)) # Display them sorted for better readability
else:
    print("No prime numbers found in the generated array.")

print(f"\nTotal prime numbers found: {len(prime_numbers_in_array)}")


Prime Numbers found in the array:
[ 29  53 151 173 223 293 379 439 709 839 997]

Total prime numbers found: 11


## 10. Create a NumPy array representing daily temperatures for a month. Calculate and display the weekly averages.

In [171]:
# 1. Create a NumPy array representing daily temperatures for a month (e.g., 28 days)
# Let's simulate temperatures between 15.0 and 35.0 degrees Celsius.
# np.random.uniform(low, high, size) generates float values.
daily_temperatures = np.random.uniform(low=15.0, high=35.0, size=28)

print("--- Daily Temperatures for the Month (28 days) ---")
print(daily_temperatures)
print(f"Total days: {len(daily_temperatures)}")

# 2. Reshape the array to group temperatures by week
# Since there are 28 days and 7 days in a week, we reshape to (4 weeks, 7 days/week)
# This assumes the total number of days is a multiple of 7.
temperatures_reshaped_by_week = daily_temperatures.reshape(4, 7)

print("\n--- Daily Temperatures Reshaped by Week (4x7 matrix) ---")
print(temperatures_reshaped_by_week)

# 3. Calculate the weekly averages
# We use np.mean() along axis=1 to get the average for each row (each week).
weekly_averages = np.mean(temperatures_reshaped_by_week, axis=1)

# For better readability, let's round the averages
weekly_averages_rounded = np.round(weekly_averages, decimals=2)


print("\n--- Weekly Averages ---")
for i, avg in enumerate(weekly_averages_rounded):
    print(f"Week {i+1} Average Temperature: {avg:.2f}°C")

# --- Handling months with more than 28 days (e.g., 30 or 31 days) ---
print("\n" + "="*50)
print("--- Handling Months with More Than 28 Days (e.g., 31 days) ---")

total_days_in_month = 31 # Example for a 31-day month
daily_temperatures_31_days = np.random.uniform(low=15.0, high=35.0, size=total_days_in_month)
print(f"\nDaily Temperatures for {total_days_in_month} days:")
print(daily_temperatures_31_days)

# We can't simply reshape if it's not a perfect multiple of 7.
# One common approach is to pad the array or calculate for full weeks and then handle remaining days.

# Method 1: Calculate averages for full weeks and then for remaining days
num_full_weeks = total_days_in_month // 7
remaining_days = total_days_in_month % 7

print(f"\nNumber of full weeks: {num_full_weeks}")
print(f"Number of remaining days: {remaining_days}")

# Averages for full weeks
full_weeks_temps = daily_temperatures_31_days[:num_full_weeks * 7].reshape(num_full_weeks, 7)
full_weeks_averages = np.mean(full_weeks_temps, axis=1)

print("\n--- Averages for Full Weeks ---")
for i, avg in enumerate(np.round(full_weeks_averages, 2)):
    print(f"Week {i+1} Average Temperature: {avg:.2f}°C")

# Average for remaining days (if any)
if remaining_days > 0:
    remaining_temps = daily_temperatures_31_days[num_full_weeks * 7:]
    avg_remaining = np.mean(remaining_temps)
    print(f"Average Temperature for remaining {remaining_days} days (partial week): {avg_remaining:.2f}°C")
else:
    print("No remaining days to average.")

# Method 2: Using a loop with slicing (more flexible for non-exact weeks)
print("\n--- Weekly Averages (Loop with Slicing for any month length) ---")
weekly_averages_flexible = []
for i in range(0, total_days_in_month, 7):
    week_data = daily_temperatures_31_days[i : i+7]
    weekly_averages_flexible.append(np.mean(week_data))

for i, avg in enumerate(np.round(weekly_averages_flexible, 2)):
    print(f"Week {i+1} Average Temperature: {avg:.2f}°C")

--- Daily Temperatures for the Month (28 days) ---
[22.90432719 20.61589972 26.12367959 34.63671863 20.28844261 27.15215826
 16.52472675 24.71137929 31.26430077 22.47197575 33.70887657 17.64448671
 17.40881875 20.24691557 33.78226153 18.93552502 15.53170926 27.72833626
 33.73535895 24.99494329 21.00181721 25.88952112 24.75209362 33.68176479
 15.02575965 26.19548839 25.86467994 22.56279669]
Total days: 28

--- Daily Temperatures Reshaped by Week (4x7 matrix) ---
[[22.90432719 20.61589972 26.12367959 34.63671863 20.28844261 27.15215826
  16.52472675]
 [24.71137929 31.26430077 22.47197575 33.70887657 17.64448671 17.40881875
  20.24691557]
 [33.78226153 18.93552502 15.53170926 27.72833626 33.73535895 24.99494329
  21.00181721]
 [25.88952112 24.75209362 33.68176479 15.02575965 26.19548839 25.86467994
  22.56279669]]

--- Weekly Averages ---
Week 1 Average Temperature: 24.04°C
Week 2 Average Temperature: 23.92°C
Week 3 Average Temperature: 25.10°C
Week 4 Average Temperature: 24.85°C

--- Han