<a href="https://colab.research.google.com/github/Dasaru-t/My-Machine-Learning-Course/blob/main/Section%201-%20Python%20Crash%20Course/2_NumPy_The_Engine_of_Data_Science.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>NumPy: The Engine of Data Science</h1>
<p><b>NumPy</b> (Numerical Python) is the library that makes Python fast enough for Data Science. It provides the <code>ndarray</code> object, which is up to 50x faster than traditional Python lists.</p>

<p><b>Why is it faster?</b></p>
<ul>
  <li><b>Locality of Reference:</b> NumPy arrays are stored in continuous blocks of memory (unlike lists, which are scattered pointers).</li>
  <li><b>SIMD (Single Instruction, Multiple Data):</b> Modern CPUs can process entire blocks of NumPy data in one clock cycle.</li>
</ul>

<p>In this tutorial, we will move beyond basic syntax and learn how to manipulate data like a Data Engineer.</p>

In [2]:
import numpy as np

# 1. Creating Arrays: List vs NumPy
# A standard Python list (Good for general purpose, bad for math)
py_list = [1, 2, 3, 4, 5]

# A NumPy Array (Optimized for calculation)
np_arr = np.array(py_list)

print(f"Type: {type(np_arr)}")
print(f"Array: {np_arr}")

# 2. Multi-Dimensional Arrays (Matrices)
# Think of this as a dataset with Rows and Columns
data_matrix = np.array([
    [1, 2, 3],  # Row 0
    [4, 5, 6],  # Row 1
    [7, 8, 9]   # Row 2
])

print("\n--- Matrix Shape ---")
print(f"Shape: {data_matrix.shape}") # Output: (3, 3) -> (Rows, Columns)
print(f"Dimensions: {data_matrix.ndim}") # Output: 2

Type: <class 'numpy.ndarray'>
Array: [1 2 3 4 5]

--- Matrix Shape ---
Shape: (3, 3)
Dimensions: 2


<h2>1. Modern Random Number Generation</h2>
<p>In older tutorials, you will see <code>np.random.rand()</code>. In modern Data Science (NumPy 1.17+), we use the <code>default_rng()</code> Generator. It is faster and statistically superior.</p>
<p>We use this to simulate datasets, initialize neural network weights, or split data into Train/Test sets.</p>

In [3]:
# Initialize the modern random number generator
rng = np.random.default_rng(seed=42)

# Generate a mock dataset: 5 Rows (Users), 3 Columns (Features: Age, Income, Score)
# random(shape) gives floats between 0.0 and 1.0
mock_data = rng.random((5, 3))

print("--- Mock Normalized Data (0-1) ---")
print(mock_data)

# Generate Integers (e.g., Random User IDs between 1000 and 9999)
user_ids = rng.integers(low=1000, high=9999, size=5)
print(f"\nUser IDs: {user_ids}")

--- Mock Normalized Data (0-1) ---
[[0.77395605 0.43887844 0.85859792]
 [0.69736803 0.09417735 0.97562235]
 [0.7611397  0.78606431 0.12811363]
 [0.45038594 0.37079802 0.92676499]
 [0.64386512 0.82276161 0.4434142 ]]

User IDs: [5053 3044 1829 5990 8990]
