# What is NumPy?

**NumPy** (short for *Numerical Python*) is the fundamental package for scientific computing in Python. It is an open-source library that provides:

*   A powerful **N-dimensional array object** (`ndarray`).
*   Sophisticated **broadcasting** functions.
*   Tools for integrating C/C++ and Fortran code.
*   Useful linear algebra, Fourier transform, and random number capabilities.

It serves as the building block for many other data science libraries like **Pandas**, **SciPy**, **Matplotlib**, and **scikit-learn**.

## Why use Jupyter Notebooks?

You might wonder why we are using **Jupyter Notebooks** (`.ipynb`) instead of standard Python scripts (`.py`).

1.  **Interactive Coding**: You can run code in small chunks (cells) and see the output immediately. This is perfect for data science where you want to inspect data at every step.
2.  **Rich Text Support**: You can write formatted text (Markdown), add images, and write mathematical equations (LaTeX) alongside your code.
3.  **Visualizations**: Charts and graphs appear directly below the code that generated them, making it easier to analyze results.
4.  **Documentation**: It acts as a "lab notebook" where you document your thought process, code, and results in one document.

**Note on GitHub**:
*   GitHub renders notebooks as static web pages. You cannot edit them directly in the standard GitHub view.
*   If you don't see your outputs in GitHub, make sure you **run the cells** and **save** the notebook before pushing your changes.

In [13]:
import numpy as np
import time

# Check the version of NumPy
print(f"NumPy version: {np.__version__}")

NumPy version: 2.4.0


## Performance Comparison: NumPy vs. Python Lists

One of the main reasons to use NumPy is its **speed**. NumPy arrays are significantly faster than standard Python lists for numerical operations.

Let's verify this claim by squaring a large dataset (100 million numbers) using both methods.

### 1. Using Standard Python Lists

First, we will create a list of 100 million integers and square each element using a list comprehension.

In [3]:
list_a = [i for i in range(1 , 100000000)]

In [4]:
list_a[:10]

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [5]:
st_time = time.time()
sqr_lsi = [x ** 2 for x in list_a] 
end_time = time.time()
print(f"Time taken using list comprehension: {end_time - st_time} seconds")

Time taken using list comprehension: 2.870865821838379 seconds


In [6]:
sqr_lsi[:10]

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

### 2. Using NumPy Arrays

Now, we will perform the same operation using NumPy. Notice how concise the syntax is (no explicit loops).

In [16]:
arr_numpy = np.arange(1, 100000000)

In [17]:
arr_numpy[:10]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [19]:
st_time = time.time()
sqr_lsi = arr_numpy ** 2
end_time = time.time()
print(f"Time taken using NumPy array operations: {end_time - st_time} seconds")

Time taken using NumPy array operations: 0.1054227352142334 seconds


### Why is NumPy Faster?

1.  **Vectorization**: NumPy delegates the loop processing to C, avoiding the overhead of Python loops.
2.  **Contiguous Memory**: NumPy arrays are stored in contiguous memory blocks, unlike Python lists which are arrays of pointers to objects scattered in memory. This allows for efficient cache utilization.
3.  **Fixed Type**: NumPy arrays have a fixed data type (e.g., `int64`), whereas Python lists can contain mixed types, requiring type checking for every element during operations.