<a href="https://colab.research.google.com/github/Saifullah785/python-data-science-handbook-notes/blob/main/02_01_Introduction_to_NumPy_02_02_Understanding_Data_Types_in_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **🧠 Summary Notes – Introduction to NumPy (with Real-World Analogies)**


---


**✅ Core Idea**

NumPy is the foundation of numerical computing in Python. It's essential for working with data—whether it's images, sound, text, or measurements.

**📦 Why NumPy?**

Real-world analogy: Think of data science as cooking, and data as ingredients.

But before you cook, you need to prepare (clean, slice, measure) the ingredients—this is what NumPy helps with.

Regardless of whether you're working with:


📸 Images → 2D arrays (rows and columns of pixel values)


🔊 Sound clips → 1D arrays (intensity over time)


📃 Text → Transformed into numbers (e.g., word frequency counts)


The common format: all become arrays of numbers.


**🧰 What is NumPy?**

NumPy = Numerical Python

It's a library that provides:

Efficient storage of numbers

Fast operations on arrays

It’s the engine that powers many data tools (like Pandas, Scikit-learn, etc.)

**🆚 NumPy Arrays vs Python Lists**

| Feature    | Python List  | NumPy Array          |
| ---------- | ------------ | -------------------- |
| Speed      | Slower       | Much faster          |
| Memory     | Inefficient  | Efficient            |
| Operations | Manual loops | Vectorized, bulk ops |


Memory Tip: Imagine carrying books 🧠:

Lists = carrying books one by one 📚

NumPy = using a trolley to carry them all at once 🚛

# **🧠 Summary Notes: Understanding Data Types in Python**


---



**🎯 Why it Matters:**

To write efficient, scalable data science code, you must understand:

How Python stores data (high-level, flexible but slower)

How NumPy improves performance (low-level, fixed-type arrays)

#**1.Dynamic vs Static Typing**

Python is dynamically typed -> you can reassign different types to the same variable

In [30]:
x = 4 # int
x = type('four') # now str
x

str

C/Java are statically typed -> you must declare the type of the variable

In [31]:
# int x = 4;
# x = 'four'; // error

**Real world analogy:**

Think of python variables  like a multi purpose box that can hold anything books shoes gadgets but in C, each box can only hold a specific item type, like only books.

# **2. Python Integer Is a Complex Object**

Python int is not just a number. It's an object with metadata:

Reference count (ob_refcnt)

Type info (ob_type)

Size info (ob_size)

Actual value (ob_digit)

**📦 Real-world analogy:**

A gift box with a label (type), how many people are using it (ref count), and the actual gift inside (value)

# **3. Python List vs NumPy Array**

🔹 Python List:

Flexible, can mix types

In [32]:
L  = [1, "two", 3.0]
L

# Each element is a full object ➝ More memory & slower processing.

[1, 'two', 3.0]

**🔹 NumPy Array:**

Fixed-type, faster & memory-efficient.

In [33]:
import numpy as np

np.array([1, "two", 3.0])

array(['1', 'two', '3.0'], dtype='<U32')

**✅ Analogy:**


Python List = A shelf of boxes of all shapes/sizes (hard to manage).

NumPy Array = A row of same-size containers (easy to process, efficient).

#**4. Creating NumPy Arrays**

In [34]:
np.array([1, 3, 4])

array([1, 3, 4])

**💡 Type Upcasting Example**

In [35]:
np.array([3.14, 4, 2]) # will be all floats

array([3.14, 4.  , 2.  ])

#**🎯 Setting Data Type**

In [36]:
np.array([1, 2, 3], dtype='float32')

array([1., 2., 3.], dtype=float32)

# **🧱 Multidimensional Arrays**

In [37]:
np.array([[1,2], [3,4]])

array([[1, 2],
       [3, 4]])

In [38]:
# Nested lists result in multidimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

#**5. Creating Arrays from Scratch (Faster)**

| Purpose                    | Code                               |
| -------------------------- | ---------------------------------- |
| All Zeros (int)            | `np.zeros(10, dtype=int)`          |
| All Ones (float)           | `np.ones((3, 5), dtype=float)`     |
| Constant value             | `np.full((3, 5), 3.14)`            |
| Range with steps           | `np.arange(0, 20, 2)`              |
| Evenly spaced              | `np.linspace(0, 1, 5)`             |
| Random float (0–1)         | `np.random.random((3, 3))`         |
| Random normal distribution | `np.random.normal(0, 1, (3, 3))`   |
| Random integers (0–9)      | `np.random.randint(0, 10, (3, 3))` |
| Identity matrix            | `np.eye(3)`                        |
| Uninitialized array        | `np.empty(3)`                      |


In [39]:
# Create a length-10 integer array filled with 0s

np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [40]:
# Create a 3x5 floating-point array filled with 1s

np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [41]:
# Create a 3x5 array filled with 3.14

np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [42]:
# Create an array filled with a linear sequence
# starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range function)

np.arange(0,20,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [43]:
# Create an array of five values evenly spaced between 0 and 1

np.linspace(0,1,5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [44]:
# Create a 3x3 array of uniformly distributed
# pseudorandom values between 0 and 1

np.random.random((3, 3))

array([[0.42167387, 0.60651111, 0.11765056],
       [0.53340084, 0.94662295, 0.13820143],
       [0.30152812, 0.97228414, 0.0385225 ]])

In [45]:
# Create a 3x3 array of normally distributed pseudorandom
# values with mean 0 and standard deviation 1

np.random.normal(0, 1, (3, 3))

array([[-1.56056461,  1.79520689, -0.08655393],
       [-0.05194797,  0.12470909,  0.59294043],
       [ 1.08498   ,  0.29776221, -1.05048891]])

In [46]:
# Create a 3x3 array of pseudorandom integers in the interval [0, 10)

np.random.randint(0, 10, (3, 3))

array([[6, 8, 4],
       [4, 0, 1],
       [2, 3, 4]])

In [47]:
# Create a 3x3 identity matrix

np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [48]:
# Create an uninitialized array of three integers; the values will be
# whatever happens to already exist at that memory location

np.empty(3)
#

array([1., 1., 1.])

✅ Real-world analogy: Like choosing pre-made furniture vs custom-building everything.

# **6. NumPy Data Types (dtypes)**

| NumPy Type      | Description                            |
| --------------- | -------------------------------------- |
| `bool_`         | Boolean (True/False)                   |
| `int8-64`       | Signed integers of various sizes       |
| `uint8-64`      | Unsigned integers (no negative values) |
| `float16-64`    | Floating point (half/single/double)    |
| `complex64/128` | Complex numbers                        |


In [49]:
np.zeros(10, dtype=np.int16)
np.zeros(10, dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

✅ Tip: Use smaller dtypes (like float32, int16) when working with large datasets to save memory.

**🧠 Memory Hook Example:**

Python List = Garage with different boxes (flexible, messy)

NumPy Array = Warehouse with identical containers (efficient, fast)