# Introduction To NumPy
NumPy is the fundamental package for scientific computing in Python. NumPy (short for Numerical Python) is a Python library that provides multidimensional array object (a general term for one or more dimensions in an array), vectorized operations, mathematical logical, shape manipulation, sorting, etc.

At the core of the NumPy package, is the `ndarray` object. This encapsulates n-dimensional arrays of homogeneous data type, with many operations being performed in compiled code for performance. There are several differences between NumPy arrays and Python's built-in `list` type:
- NumPy arrays have a fixed size creation.
- All of the elements in NumPy arrays required to be of the same data type, thus will be the same size in memory.
- NumPy arrays facilitate advanced mathematical and other type of operations on large numbers of data. Typically, such operations are executed more efficiently and less code compared to Python's built-in `list`.

Here are the code differences between NumPy arrays and Python's built-in `list` type:

In [1]:
# Import the numpy package and set the alias name to np
import numpy as np

# Import the sys module to used the getsizeof function
from sys import getsizeof

In [2]:
# Create a lists object
my_list1 = list(range(1, 100))
my_list2 = list(range(1, 100))

In [3]:
# Create a NumPy arrays object
my_array1 = np.arange(1, 100)
my_array2 = np.arange(1, 100)

In [4]:
# Check the memory consumptions of both data structures object
print(f"my_list1 consumed memory for about = {getsizeof(my_list1)} Bytes")
print(f"my_array1 consumed memory for about = {my_array1.nbytes} Bytes")

my_list1 consumed memory for about = 856 Bytes
my_array1 consumed memory for about = 396 Bytes


In [5]:
# Benchmarking the NumPy arrays speed object to do mathematical operation
%timeit array_result = my_array1 * my_array2

685 ns ± 9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


## NumPy Array Benchmark
- **%timeit**: The %timeit magic command in Jupyter notebooks is used to measure the execution time of a single line of code. It automatically runs the code multiple times to provide a more reliable measurement by averaging over multiple iterations.

- **array_result = my_array1 * my_array2**: This is the code being timed. Here, my_array1 and my_array2 are assumed to be NumPy arrays, and the `*` operator performs element-wise multiplication between them. NumPy is highly optimized for such operations, which explains the faster performance compared to lists.

- **685 ns ± 9 ns per loop**:

    - 685 ns: This is the average time taken per loop (or per execution of the line of code) over multiple iterations. The time unit here, ns, stands for nanoseconds (1 billionth of a second).

    - ± 9 ns: This is the standard deviation, indicating how much variation there was between individual runs. A smaller standard deviation means more consistent results.

- **7 runs, 1,000,000 loops each**:

    - 7 runs: The %timeit command ran the code seven separate times, calculating an average execution time across these runs.

    - 1,000,000 loops each: During each of the 7 runs, the code was executed 1,000,000 times to further refine the timing measurement, yielding a high degree of accuracy in the reported timing.

Reference: ChatGpt.

In [7]:
# Benchmarking the Lists speed object to do mathematical operation
%%timeit
result = []
for i in range(len(my_list1)):
    result.append(my_list1[i] * my_list2[i])

8.57 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


## Python List Benchmark
- **%%timeit**: The %%timeit cell magic command is used to measure the execution time of multiple lines of code in a Jupyter cell.

- **result = [] ... result.append(my_list1[i] * my_list2[i])**:
    - This is the code being measured. Here, my_list1 and my_list2 are standard Python lists, and this block creates a new list, result, to store the product of each corresponding element in my_list1 and my_list2.
  
    - The code loops over the indices of the lists and appends the product of each pair of elements to result.
 
    - Since Python lists are not optimized for element-wise mathematical operations, this approach is notably slower than the equivalent NumPy array operation.
 
- 8.57 µs ± 146 ns per loop:

    - 8.57 µs: The average time taken per loop here is 8.57 microseconds. µs is the unit for microseconds (1 millionth of a second).

    - ± 146 ns: The standard deviation here is 146 nanoseconds, reflecting the variability in execution times.

- 7 runs, 100,000 loops each:

    - 7 runs: This code was run 7 times to gather multiple measurements.

    - 100,000 loops each: During each run, the code was executed 100,000 times, ensuring that the reported average is reliable.

## Summary
NumPy arrays are way faster because two things:
- **Vectorization**: Vectorization describes the absence of any explicit, looping, indexing, etc. In the code - these are optimized in C pre-compiled code.
- **Broadcasting**: Broadcasting is the term used to describe element-by-element behaviour of operations.

In summary, these two things are the key that makes NumPy arrays operation is faster compared to Python's built-in `list`.