<a href="https://colab.research.google.com/github/Kausharalam7/Data-Science/blob/main/Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

NumPy (short for Numerical Python) is a Python library for numerical and scientific computing.

Problem with Python lists

Lists store data as objects, not as raw numbers.

They store both data + type info + pointers for each element → takes more memory.

Operations loop through elements in Python (slow, interpreted code).

In [2]:
import numpy as np
import time

# Python list
list_data = list(range(1_000_000))
start = time.time()
list_result = [x * 2 for x in list_data]
end = time.time()
print("List time:", end - start)

# NumPy array
arr_data = np.arange(1_000_000)
start = time.time()
arr_result = arr_data * 2
end = time.time()
print("NumPy time:", end - start)

# Observation: NumPy is 10–50x faster for large operations.


List time: 0.05140805244445801
NumPy time: 0.003913164138793945


In [None]:
# create a numpy array from python list
arr=np.array([1,2,3,434,6])
print(arr)
print(type(arr))

[  1   2   3 434   6]
<class 'numpy.ndarray'>


Why are NumPy arrays faster than Python lists?

Answer:

Stored in contiguous memory (better cache locality).

Homogeneous data → no type checking for each element.

Uses vectorized operations in compiled C instead of Python loops.

In [None]:
# from list,tuples to numpy array
arr1=np.array([1,2,3,4])
arr2=np.array((3,4,5,6))
print(arr1)
print(arr2)
print(type(arr2))

[1 2 3 4]
[3 4 5 6]
<class 'numpy.ndarray'>


In [None]:
# Using np.arange()  -- np.arange(start, stop, step)
# Creates evenly spaced values within a range (like Python’s range() but returns a NumPy array).
np.arange(0,10,2)

array([0, 2, 4, 6, 8])

In [None]:
# Using np.linspace() -- np.linspace(start, stop, num_points)
# Creates evenly spaced values between two points (inclusive).
np.linspace(0,1,5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [None]:
# Creating Special Arrays -- Zeros and Ones
np.zeros((3,4))


array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [None]:
np.ones((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [None]:
# Full array with a constant value
np.full((3,3),17)

array([[17, 17, 17],
       [17, 17, 17],
       [17, 17, 17]])

In [None]:
# Identity Matrix -- Used in linear algebra (matrix inverses, transformations).
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [None]:
# Empty Array (Uninitialized Memory)   -- Faster than zeros when you plan to overwrite all values.
np.empty((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [None]:
np.random.rand(3,2)

array([[0.06393974, 0.22561991],
       [0.940464  , 0.66746637],
       [0.17280264, 0.7997398 ]])

In [None]:
# copy vs view -- Avoiding unintended data modifications in preprocessing.
arr=np.array([1,2,3])
view = arr.view()   # changes affect original
copy = arr.copy()   # independent
print(view)
print(copy)

[1 2 3]
[1 2 3]


.shape – Shape of the Array

Returns a tuple of (rows, columns) for 2D arrays, or sizes of each dimension for nD arrays.

Can be modified to reshape arrays (must keep same number of elements).

In [None]:
arr=np.array([[1,2,3],[2,34,2],[1,2,5],[4,5,8]])
print(arr,"shape: ",arr.shape)
print(" ")
#change shape
arr.shape=(4,3)
print(arr)


[[ 1  2  3]
 [ 2 34  2]
 [ 1  2  5]
 [ 4  5  8]] shape:  (4, 3)
 
[[ 1  2  3]
 [ 2 34  2]
 [ 1  2  5]
 [ 4  5  8]]


.ndim – Number of Dimensions

1D array → ndim = 1

2D array (matrix) → ndim = 2

3D array (tensor) → ndim = 3

In [None]:
# In deep learning, you must know if you’re feeding 1D, 2D, or 4D tensors into the model.
print(np.array([3,4,5]).ndim)
print(np.array([[2,3],[4,2]]).ndim)

1
2


In [None]:
arr=np.arange(12).reshape(3,4)
print(arr.size,arr)

12 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [None]:
# .dtype – Data Type --- Shows the type of elements (e.g., int32, float64).
arr=np.array([34,52,45])
print(arr.dtype)
arr1=np.array([34,23,52],dtype=np.float32)
print(arr1.dtype)

int64
float32


In [None]:
# .itemsize – Memory per Element --- Shows the bytes consumed by each element.
arr = np.array([1, 2, 3], dtype=np.int32)
print(arr.itemsize)

4


In [None]:
# .nbytes – Total Memory Size --- Total bytes consumed by the array.
arr = np.arange(1000, dtype=np.float64)
print(arr.nbytes)  # 8000 bytes (8 bytes × 1000 elements)

8000


In [None]:
# .T – Transpose --- Swaps rows and columns (useful for matrix algebra).
arr=np.array([[3,4],[1,2]])
print(arr.T)

[[3 1]
 [4 2]]


Indexing and Slicing

In [None]:
# 1D
arr=np.array([2,3,4,5,6])
print(arr[2])
print(arr[-1])

4
6


In [None]:
# slicing -- arr[start:end:step]
print(arr[1:4])
print(arr[3:])
print(arr[::2])

[3 4 5]
[5 6]
[2 4 6]


In [None]:
# 2D
arr = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])
print(arr[1,1])

5


In [None]:
print(arr[:2,:2])
print(arr[:,2])

[[1 2]
 [4 5]]
[3 6 9]


In [None]:
arr = np.array([10, 20, 30, 40, 50])
mask = arr > 25
print(arr[mask])

[30 40 50]


Mathematical Operations

NumPy allows you to perform element-wise mathematical operations directly on arrays without writing loops.
This is much faster than using Python loops because NumPy uses vectorized operations implemented in C.

In [None]:
# Element-wise Arithmetic --- When two arrays have the same shape:
a=np.array([3,4,5])
b=np.array([1,5,8])

print(a+b)
print(a-b)
print(a*b)
print(a/b)
print(a**2)

[ 4  9 13]
[ 2 -1 -3]
[ 3 20 40]
[3.    0.8   0.625]
[ 9 16 25]


In [None]:
# Scalar Operations -- When you operate an array with a single number (broadcasting):
print(a+5)
print(b*10)

[ 8  9 10]
[10 50 80]


In [None]:
# Universal Functions --- NumPy provides optimized mathematical functions that work element-wise:
arr=np.array([1,4,9,16])
print(np.sqrt(arr))
print(np.log(arr))
print(np.exp(arr))

[1. 2. 3. 4.]
[0.         1.38629436 2.19722458 2.77258872]
[2.71828183e+00 5.45981500e+01 8.10308393e+03 8.88611052e+06]


In [None]:
# Rounding Function
arr = np.array([1.23, 4.56, 7.89])
print(np.round(arr,1))
print(np.floor(arr))
print(np.ceil(arr))

[1.2 4.6 7.9]
[1. 4. 7.]
[2. 5. 8.]


Statistical & Aggregation Functions

In [None]:
# basic descriptive statistics
arr = np.array([1, 2, 3, 4, 5])
print(np.sum(arr))
print(np.mean(arr))
print(np.median(arr))
print(np.std(arr))
print(np.var(arr))
print(np.min(arr))
print(np.max(arr))

15
3.0
3.0
1.4142135623730951
2.0
1
5


In [None]:
# Axis-based Calculations --- For 2D arrays, axis determines direction:
matrix=np.array([[1,2,3],
                 [4,5,6]])
print(matrix.sum(axis=0))  # column-wise sum → [5 7 9]
print(matrix.sum(axis=1))  # row-wise sum → [6 15]

[5 7 9]
[ 6 15]


In [None]:
# Percentiles & Quantiles -- Used to find thresholds in data (important in outlier detection).
data = np.array([1,3,5,7,9])
print(np.percentile(data,50))
print(np.percentile(data, 80))

5.0
7.4


In [None]:
# Cumulative Functions
arr=np.array([1,2,3,4,5])
print(np.cumsum(arr))
print(np.cumprod(arr))

[ 1  3  6 10 15]
[  1   2   6  24 120]


In [None]:
# Arg Functions (Return Indexes)
arr = np.array([10, 20, 5, 30])

print("index of max value: ",np.argmax(arr))
print("index of min value: ",np.argmin(arr))


index of max value:  3
index of min value:  2


In [None]:
# flatten()  -- Purpose: Returns a copy of the array collapsed into 1D.

arr = np.array([[1, 2], [3, 4]])
flat = arr.flatten()
print(flat)

[1 2 3 4]


In [None]:
X = np.array([1, 2, 3, 4, 5])
X = X.reshape(-1, 1)  # (5, 1) — 5 samples, 1 feature
print(X.shape)
print(X)

# why do we use reshape(-1, 1) in machine learning preprocessing?
#It automatically calculates the first dimension while ensuring data is in (n_samples, 1) shape, which is required for ML models expecting 2D inputs.

(5, 1)
[[1]
 [2]
 [3]
 [4]
 [5]]


In [None]:
# Resizing --- Unlike reshape, resize() can change the number of elements by adding zeros or removing values.

arr = np.array([1, 2, 3])
arr.resize(5)
print(arr)  # [1 2 3 0 0]

[1 2 3 0 0]


Stacking & Splitting Arrays

In [None]:
# Stacking Arrays --  Stacking means joining arrays along a new or existing axis.

# a) np.vstack() — Vertical Stack
# Stacks arrays vertically (row-wise).
# Equivalent to axis=0 concatenation for 2D arrays.

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
v_stacked=np.vstack((a,b))
print(v_stacked)

[[1 2]
 [3 4]
 [5 6]]


In [None]:
# np.hstack() — Horizontal Stack
# Stacks arrays horizontally (column-wise).
# Equivalent to axis=1 concatenation for 2D arrays.

h_stacked = np.hstack((a, b.T))
print(h_stacked)



[[1 2 5]
 [3 4 6]]


In [None]:
# np.concatenate()
# General-purpose function for stacking.
# Can specify axis=0, axis=1, etc.

concat = np.concatenate((a, b), axis=0)  # same as vstack
print(concat)

[[1 2]
 [3 4]
 [5 6]]


In [None]:
# Splitting Arrays --- Splitting is the opposite of stacking — breaking arrays into smaller sub-arrays.

# np.vsplit() — Vertical Split
arr = np.array([[1, 2], [3, 4], [5, 6]])
v_split = np.vsplit(arr, 3)
print(v_split)

[array([[1, 2]]), array([[3, 4]]), array([[5, 6]])]


In [None]:
# np.hsplit() — Horizontal Split
# Splits along columns.

arr = np.array([[1, 2, 3, 4]])
h_split = np.hsplit(arr, 2)
print(h_split)

[array([[1, 2]]), array([[3, 4]])]


In [None]:
# np.hsplit() — Horizontal Split
arr = np.array([[1, 2, 3, 4]])
h_split = np.hsplit(arr, 2)
print(h_split)

[array([[1, 2]]), array([[3, 4]])]


Broadcasting

When performing arithmetic operations between two arrays, NumPy tries to match their shapes.

If the shapes don’t match, NumPy applies the broadcasting rules to make them compatible without physically copying data.

In [None]:
a = np.array([1, 2, 3])
b = 2
print(a + b)
# Here, b (scalar) is broadcasted to [2, 2, 2].

[3 4 5]


In [None]:
# Why does (3, 2) and (2,) work but (3, 2) and (3,) fail?
# Answer:
# (3, 2) and (2,) → match from right → (3, 2) & (1, 2) → compatible.
# (3, 2) and (3,) → match from right → (3, 2) & (1, 3) → 2 ≠ 3, incompatible.

Random Number

Random numbers are essential in data science for simulations, bootstrapping, train-test splits, shuffling datasets, and initializing machine learning model weights.

In [4]:
# Random floats
arr=np.random.rand(3,6)
print(arr)
# Difference between rand() and randn()?
# rand() → Uniform distribution in [0, 1)
# randn() → Normal distribution with mean 0, std 1

[[0.88586963 0.32162163 0.59033103 0.2010988  0.40291945 0.08236755]
 [0.66331357 0.35333382 0.94839196 0.36571165 0.4584438  0.37025965]
 [0.5950736  0.53647601 0.37129964 0.88482317 0.80468214 0.27575492]]


In [5]:
# Random Integers
nums=np.random.randint(10,30,size=(2,3))
print(nums)

[[26 14 20]
 [15 28 27]]


In [6]:
np.random.seed(42)
print(np.random.rand(3))
# Why use np.random.seed()
# To ensure reproducibility of results when debugging or comparing algorithms.

[0.37454012 0.95071431 0.73199394]


**Linear Algebra**

In [7]:
# Dot product
a=np.array([2,3,4])
b=np.array([4,5,3])
print(np.dot(a,b))

35


In [8]:
# Matrix Multiplication – np.matmul()
A = np.array([[1, 2],
              [3, 4]])
B = np.array([[5, 6],
              [7, 8]])

print(np.matmul(A, B))

[[19 22]
 [43 50]]


In [9]:
# Matrix Inverse – np.linalg.inv()   ---  Only works for square matrices with non-zero determinant.
a=np.array([[1,2],
            [3,4]])
inv_a=np.linalg.inv(a)
print(inv_a)

[[-2.   1. ]
 [ 1.5 -0.5]]


In [10]:
# Determinant – np.linalg.det()  --- Useful for checking if a matrix is invertible (det ≠ 0).
det_A=np.linalg.det(A)
print(det_A)

-2.0000000000000004


In [11]:
# Eigenvalues & Eigenvectors – np.linalg.eig()
# Eigenvalues tell you the stretch factor of transformation.
# Eigenvectors give the direction of that stretch.

eig_vals, eig_vecs = np.linalg.eig(A)
print("Eigenvalues:", eig_vals)
print("Eigenvectors:\n", eig_vecs)

Eigenvalues: [-0.37228132  5.37228132]
Eigenvectors:
 [[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]


In [12]:
# Vector Norm -- np.linalg.norm()
# Norm is the length or magnitude of a vector
# L2 -- Euclidean Distance
# L1 -- Manhattan Distance

v=np.array([3,4])
print(np.linalg.norm(v)) #l2
print(np.linalg.norm(v,1)) #l1

5.0
7.0
