# NumPy
NumPy is a fundamental library for scientific computing in Python. It provides support for arrays and matrices, along with a colleciton of mathematical functions to operate on these data structures. In this lesson, we will cover the basics of NumPy, focusing on arrays and vectorized operations.

In [2]:
import numpy as np

# creating arrays using numpy
arr1np = np.array([1, 2, 3, 4, 5])
print("Numpy array:", arr1np)
print("Type of arr1np:", type(arr1np))

Numpy array: [1 2 3 4 5]
Type of arr1np: <class 'numpy.ndarray'>


In [3]:
np.ones((3, 3))  # creates a 3x3 array filled with ones

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [4]:
# Identity matrix
np.eye(3)  # creates a 3x3 identity matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [7]:
# Attributes of Numpy arrays
arr = np.array([[1, 2, 3], [4, 5, 6]])

print("Array:\n", arr)
print("Array shape:", arr.shape)  # prints the shape of the array
print("Array size:", arr.size)    # prints the total number of elements in the array
print("Array data type (may vary on platform):", arr.dtype)  # prints the data type of the array elements
print("Array itemsize (in bytes):", arr.itemsize)  # prints the size in bytes of each element in the array
print("Number of dimensions:", arr.ndim)  # prints the number of dimensions of the array

Array:
 [[1 2 3]
 [4 5 6]]
Array shape: (2, 3)
Array size: 6
Array data type (may vary on platform): int64
Array itemsize (in bytes): 8
Number of dimensions: 2


In [8]:
# Numpy Vectorized Operations
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([10, 20, 30, 40, 50])
arr3 = arr1 + arr2  # element-wise addition
print("Element-wise addition:", arr3)
arr4 = arr1 - arr2  # element-wise subtraction
print("Element-wise subtraction:", arr4)
arr5 = arr1 * arr2  # element-wise multiplication
print("Element-wise multiplication:", arr5)
arr6 = arr1 / arr2  # element-wise division
print("Element-wise division:", arr6)

Element-wise addition: [11 22 33 44 55]
Element-wise subtraction: [ -9 -18 -27 -36 -45]
Element-wise multiplication: [ 10  40  90 160 250]
Element-wise division: [0.1 0.1 0.1 0.1 0.1]


In [12]:
# Universal Functions (ufuncs)
# What are ufuncs?
# Universal functions (ufuncs) are functions that operate element-wise on arrays.
# They are optimized for performance and can handle broadcasting.

arr = np.array([1,2,3,4,5,6])
print("Original array:", arr)
# square root
arr_sqrt = np.sqrt(arr)
print("Square root of arr:", arr_sqrt)
# exponential
arr_exp = np.exp(arr)
print("Exponential of arr (e^x):", arr_exp)
# logarithm
arr_log = np.log(arr)
print("Logarithm of arr:", arr_log)
# trigonometric functions
arr_sin = np.sin(arr)
print("Sine of arr:", arr_sin)
# rounding functions
arr_round = np.round(arr, 2)  # rounding to 2 decimal places
print("Rounded arr:", arr_round)

Original array: [1 2 3 4 5 6]
Square root of arr: [1.         1.41421356 1.73205081 2.         2.23606798 2.44948974]
Exponential of arr (e^x): [  2.71828183   7.3890561   20.08553692  54.59815003 148.4131591
 403.42879349]
Logarithm of arr: [0.         0.69314718 1.09861229 1.38629436 1.60943791 1.79175947]
Sine of arr: [ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427 -0.2794155 ]
Rounded arr: [1 2 3 4 5 6]


In [22]:
# Array Slicing and Indexing
arr = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print("Original array:\n", arr)

# accessing a specific element
print("Element at (0, 0):", arr[0, 0])  # accessing element at row 1, column 2 #can also be accessed as arr[0][0]

# slicing a row
print("First row:", arr[0, :])  # accessing the first row

# slicing a column
print("First column:", arr[:, 0])  # accessing the first column
# slicing a sub-array
print("Sub-array (first 2 rows, first 2 columns):\n", arr[:2, :2])  # accessing a sub-array
print("Sub-array (last 2 rows, last 2 columns):\n", arr[1:, 2:])  # accessing a sub-array 

print("Sub-array (center bottom 2x2):\n", arr[1:, 1:3])

Original array:
 [[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
Element at (0, 0): 1
First row: [1 2 3 4]
First column: [1 5 9]
Sub-array (first 2 rows, first 2 columns):
 [[1 2]
 [5 6]]
Sub-array (last 2 rows, last 2 columns):
 [[ 7  8]
 [11 12]]
Sub-array (center bottom 2x2):
 [[ 6  7]
 [10 11]]


In [23]:
# Modifying multiple elements
arr[1:, 1:3] = 0  # setting the center bottom 2x2 sub-array to zero
print("Modified array:\n", arr)

Modified array:
 [[ 1  2  3  4]
 [ 5  0  0  8]
 [ 9  0  0 12]]


In [25]:
# Modifying the last 2 rows in our array
arr[1:] = 777
print("Array after modifying last 2 rows:\n", arr)

Array after modifying last 2 rows:
 [[  1   2   3   4]
 [777 777 777 777]
 [777 777 777 777]]


# Practical Examples

In [26]:
# Statistical Concepts - Normalization
# need to change data to have a mean of 0 and a standard deviation of 1
data = np.array([1, 2, 3, 4, 5])
mean = np.mean(data)
std_dev = np.std(data)
normalized_data = (data - mean) / std_dev
print("Original data:", data)
print("Normalized data:", normalized_data)

Original data: [1 2 3 4 5]
Normalized data: [-1.41421356 -0.70710678  0.          0.70710678  1.41421356]


In [27]:
# Real-life example: Why normalize data?
# Suppose you are building a machine learning model to predict house prices.
# Your dataset includes features like 'square footage' (ranging from hundreds to thousands)
# and 'number of bedrooms' (ranging from 1 to 5).
# If you do not normalize, 'square footage' will dominate the model because of its larger scale.
# Normalization ensures all features contribute equally to the model.

import numpy as np

# Example data: [square footage, number of bedrooms]
houses = np.array([
    [1500, 3],
    [2500, 4],
    [800, 2],
    [1200, 2],
    [2000, 3]
])

# Normalize each feature (column)
mean = np.mean(houses, axis=0)
std = np.std(houses, axis=0)
houses_normalized = (houses - mean) / std

print("Original data:\n", houses)
print("\nNormalized data:\n", houses_normalized)

# Now both features have mean 0 and std 1, so the model treats them fairly.

Original data:
 [[1500    3]
 [2500    4]
 [ 800    2]
 [1200    2]
 [2000    3]]

Normalized data:
 [[-0.16760038  0.26726124]
 [ 1.50840343  1.60356745]
 [-1.34080305 -1.06904497]
 [-0.67040152 -1.06904497]
 [ 0.67040152  0.26726124]]


When we say "'square footage' will dominate the model because of its larger scale," we mean that if you have features (columns) in your data with very different ranges, the feature with the largest numbers (like square footage, which might be in the thousands) will have a much bigger influence on the model's calculations than features with smaller numbers (like number of bedrooms, which might be between 1 and 5).

This is not always desirable, especially for many machine learning algorithms (like linear regression, k-nearest neighbors, or neural networks) that use distance or gradient calculations. These algorithms are sensitive to the scale of the input features. If you don't normalize, the model might "pay more attention" to square footage simply because its values are numerically larger, not necessarily because it's more important for predicting the outcome.

However, the true importance of a feature should be learned from the data, not just from its scale. Normalization puts all features on the same scale, so the model can learn the real relationship between each feature and the target variable, rather than being biased by the units or magnitude of the numbers.

In summary:

We normalize so that all features contribute fairly to the model, regardless of their original scale.
This helps the model learn which features are actually important, rather than just focusing on those with larger numbers