## What is numpy ?
- NumPy stands for Numerical Python ,NumPy is a Python library used for working with arrays.
- It also has functions for working in linear algebra, Fourier transform, and matrices.
- It is an open source project and you can use it freely.

## Why Use NumPy?
- It is much faster than Python lists.
- Supports large multi-dimensional arrays and matrices.
- Has powerful mathematical functions.


In [19]:
# Installation
!pip3 install numpy

Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m


##  Why NumPy is Faster than Python Lists

**Homogeneous vs Heterogeneous**

- A Python list can contain mixed data types (e.g., [1, "a", 3.14]) so it stores references (pointers)
- NumPy arrays store elements of the same data type, allowing efficient memory usage and SIMD (single instruction, multiple data) operations.


**Memory Efficiency**
- Python lists store pointers to each object, leading to non-contiguous memory.
- NumPy stores data in a contiguous memory block, which the CPU can process more efficiently (better cache locality).

**Vectorization – No More Loops!**
- NumPy avoids loops by applying operations to entire arrays at once using SIMD (Single Instruction, Multiple Data) and other low-level optimizations. SIMD is a CPU-level optimization provided by modern processors.
- You avoid the Python for-loop overhead by using vectorized operations.

In [12]:
## Speed test numpy vs python lists
import time

size = 1_000_000_0
arr1 = [i for i in range(size)]
arr2 = [i for i in range(size)]
start = time.time()
result = [ arr1[i] + arr2[i] for i in range(size)]
print(result[0:10])
end = time.time()
print(f"time taken = {end-start}")


[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
time taken = 0.5284678936004639


In [23]:
## using numpy
import numpy as np

arr1 = np.array(arr1)
arr2 = np.array(arr2)
start = time.time()
result = arr1 + arr2
print(result[0:10])
end = time.time()
print(f"time taken = {end-start}")

[ 0  2  4  6  8 10 12 14 16 18]
time taken = 0.05883026123046875


## Memory Efficiency – NumPy vs. Lists

In [30]:
import sys
python_list = list(range(size))
numpy_array = np.array(python_list)

print(f"python list takes {sys.getsizeof(python_list) * len(python_list)} space")
print(f"numpy list takes {numpy_array.nbytes} space")

python list takes 800000560000000 space
numpy list takes 80000000 space


### Ways of creating arrays

In [100]:
# 1D array Creation
array = np.array([1,2,3,4,5])
print(f"array = {array}")
print(f"array dimenssions = {array.ndim}D")

array = [1 2 3 4 5]
array dimenssions = 1D


In [105]:
# 2D array creation
array2D = np.array([[1,2,3],[4,5,6]])
print(f"array: \n {array2D}")
print(f"array dimenssions = {array2D.ndim}D")

array: 
 [[1 2 3]
 [4 5 6]]
array dimenssions = 2D


In [106]:
# 3D array creation
array3D = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
print(f"array: \n {array3D}")
print(f"array dimenssions = {array3D.ndim}D")

array: 
 [[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]
array dimenssions = 3D


In [36]:
# all zero arrays (rows, columns)
np.zeros((4, 5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [37]:
## all one array (rows, columns)
np.ones((4,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [107]:
## all specific value array (row, columns, specific_number)
np.full((4,5), 9)

array([[9, 9, 9, 9, 9],
       [9, 9, 9, 9, 9],
       [9, 9, 9, 9, 9],
       [9, 9, 9, 9, 9]])

In [52]:
## all random values {contains floats in [0, 1)}
np.random.rand(4,5)

array([[0.18534437, 0.58830666, 0.14293662, 0.33455079, 0.2912479 ],
       [0.57044469, 0.63390952, 0.42709829, 0.71063637, 0.93441605],
       [0.42916881, 0.91516013, 0.65887148, 0.64256461, 0.88848978],
       [0.07240305, 0.8590866 , 0.48998229, 0.81288722, 0.69857762]])

In [50]:
## all random integers
np.random.randint(0,100, size=(4,5))

array([[ 5, 39, 40,  4, 48],
       [75, 16, 17, 80, 53],
       [54, 79,  7, 48,  1],
       [ 8, 10, 19, 55, 62]])

In [53]:
## create array with garbage value
np.empty((4,5))

array([[0.18534437, 0.58830666, 0.14293662, 0.33455079, 0.2912479 ],
       [0.57044469, 0.63390952, 0.42709829, 0.71063637, 0.93441605],
       [0.42916881, 0.91516013, 0.65887148, 0.64256461, 0.88848978],
       [0.07240305, 0.8590866 , 0.48998229, 0.81288722, 0.69857762]])

In [54]:
## indentity Metrics
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [55]:
## arange (start, end, diff)
np.arange(1, 50, 5)

array([ 1,  6, 11, 16, 21, 26, 31, 36, 41, 46])

In [56]:
## linspace : give values evenly spaced, of lenght n 
np.linspace(1, 100, 5)

array([  1.  ,  25.75,  50.5 ,  75.25, 100.  ])

In [57]:
## build array with function
np.fromfunction(lambda i,j: i + j, (4,5))

array([[0., 1., 2., 3., 4.],
       [1., 2., 3., 4., 5.],
       [2., 3., 4., 5., 6.],
       [3., 4., 5., 6., 7.]])

## Attributes of ndarray
NumPy array properties helps you analyze and manipulate arrays more effectively.
Here's a list of key properties every NumPy array (ndarray) has.
- **ndarray.shape**: Returns a tuple representing the shape (dimensions) of the array.
- **ndarray.ndim**: Returns the number of dimensions (axes) of the array.
- **ndarray.size**: Returns the total number of elements in the array.
- **ndarray.dtype**: Provides the data type of the array elements.
- **ndarray.itemsize**: Returns the size in bytes of each element

In [60]:
array = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])

In [62]:
array

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [63]:
print(f"shape of the array (row, columns) : {array.shape}")

shape of the array (row, columns) : (3, 4)


In [65]:
print(f"dimensions of the array : {array.ndim}D")

dimensions of the array : 2D


In [66]:
print(f"size of the array (row * columns) : {array.size}")

size of the array (row * columns) : 12


In [67]:
print(f"datatype of the array : {array.dtype}")

datatype of the array : int64


In [69]:
print(f"itemsize of the array (row, columns) : {array.itemsize} bytes")

itemsize of the array (row, columns) : 8 bytes


### Common NumPy Array Operations

In [71]:
# reshape() — Reshapes an array without changing the data.
reshaped_array = array.reshape((2,6))
reshaped_array


array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [73]:
# flatten() — Returns a copy of the array collapsed into one dimension.
flat_array = array.flatten()
flat_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [75]:
# transpose() - Swaps rows and columns in 2D arrays
trans_array = array.T
trans_array

array([[ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11],
       [ 4,  8, 12]])

In [87]:
# resize() -  Change shape in-place
array.resize((2,6))
array

array([[ 1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12]])

In [91]:
# concatenate() — Join Arrays
arr1 = [[1, 2], [3, 4]]
arr2 = [[9, 8], [7, 6]]
np.concatenate((arr1, arr2))

array([[1, 2],
       [3, 4],
       [9, 8],
       [7, 6]])

### Datatypes in NumPy

- In NumPy, dtype stands for "data type" and is a fundamental attribute of a NumPy array (ndarray).
- NumPy arrays are homogeneous — all elements must have the same type, unlike Python lists.

NumPy supports a wide variety of data types. Here are the most common ones:
1. **int8, int16,int32, int64**: Integer types with different bit sizes.
2. **float16, float32, float64**: Floating-point types with different precision.
3. **bool**: Boolean data type.
4. **complex64, complex128**: Complex number types.
5. **object**: For storing objects (e.g., Python objects, strings).

In [120]:
# numpy array with integer datatype
arr = np.array([1,2,3,4])
print(arr.dtype)   #int64

arr2 = np.array([1.2, 2.1, 3.4])
print(arr2.dtype)  #float64

arr3 = np.array([True, False])
print(arr3.dtype)  #bool

arr4 = np.array([1 + 2j, 3 + 4j, 5 + 6j])
print(arr4.dtype)  #complex128

arr5 = np.array(['ram', 'shyam', 'jay'])
print(arr5.dtype)  #object


int64
float64
bool
complex128
<U5


**Type Casting**
before starting lets first learn about type casting, as we gonna use that in further slides:

### What is Type Casting in NumPy?
**Type casting** is the process of converting one data type to another. In NumPy.
* You can cast during array creation using **'dtype='**.
* Or convert later using **.astype()**


In [108]:
 # cast float datatype to int
arr = np.array([1.5, 2.8, 3.9], dtype=np.int32) 
print(arr)         # Output: [1 2 3]
print(arr.dtype)   # int32


[1 2 3]
int32


In [114]:
# convert back to float data type
arr_float = arr.astype(float)
print(arr_float)
print(arr_float.dtype)

arr_float32 = arr.astype(np.float32)
print(arr_float32.dtype)

[1. 2. 3.]
float64
float32


### Integer Types: int8, int16, int32, int64

In [121]:
import numpy as np

a = np.array([100], dtype=np.int8)
b = np.array([1000], dtype=np.int16)
c = np.array([100000], dtype=np.int32)
d = np.array([10000000000], dtype=np.int64)

print("int8:", a, a.dtype)       # int8: [100] int8
print("int16:", b, b.dtype)      # int16: [1000] int16
print("int32:", c, c.dtype)      # int32: [100000] int32
print("int64:", d, d.dtype)      # int64: [10000000000] int64


int8: [100] int8
int16: [1000] int16
int32: [100000] int32
int64: [10000000000] int64


### Float Types: float16, float32, float64

In [122]:
e = np.array([3.14], dtype=np.float16)
f = np.array([3.1415926], dtype=np.float32)
g = np.array([3.14159265358979], dtype=np.float64)

print("float16:", e, e.dtype)    # float16: [3.14] float16
print("float32:", f, f.dtype)    # float32: [3.1415925] float32
print("float64:", g, g.dtype)    # float64: [3.14159265] float64

float16: [3.14] float16
float32: [3.1415925] float32
float64: [3.14159265] float64


### Boolean Type: bool

In [123]:
h = np.array([1, 0, 3], dtype=np.bool_)
print("bool:", h, h.dtype)       # bool: [ True False  True] bool

bool: [ True False  True] bool


### Complex Types: complex64, complex128
Stores complex numbers (real + imag j)

In [124]:
i = np.array([1 + 2j], dtype=np.complex64)
j = np.array([3.5 + 4.5j], dtype=np.complex128)

print("complex64:", i, i.dtype)   # complex64: [1.+2.j] complex64
print("complex128:", j, j.dtype) # complex128: [3.5+4.5j] complex128


complex64: [1.+2.j] complex64
complex128: [3.5+4.5j] complex128


### Object Type: object
Can store Python objects like strings, lists, even mixed types.

In [126]:
k = np.array(["Hello", 42, 3.14], dtype=object)
print("object:", k, k.dtype)     # object: ['Hello' 42 3.14] object

object: ['Hello' 42 3.14] object


# Downcasting?
**Downcasting** means converting a data type to a smaller or less precise type.
In simple word, lets suppose you have a column name age, and age ranges from (1-100)
and if we store it into int64, than there is a lot of memory waste as **Smaller data types use less memory**.
Example: 
* float64 → float32, int64 → int32, or even int32 → int8
- It’s essentially casting from a wider type to a narrower one.

[Note : **Downcasting can lead to data loss if the new type can’t store the original values** ]

#### Good Practice for Safe Downcasting:
First check the value range for desired datatype: 
- iinfo()
- finfo()

In [131]:
np.iinfo(np.int16)

iinfo(min=-32768, max=32767, dtype=int16)

In [132]:
np.finfo(np.float16)

finfo(resolution=0.001, min=-6.55040e+04, max=6.55040e+04, dtype=float16)

In [145]:
# array of age values
import time

array = np.random.randint(0, 100, 10_000_000_0).astype(np.int64)
start = time.time()
np.sqrt(array)
print(array[0:10])
stop = time.time()
print(f"Time taken: {stop - start}")

[ 0 67 84 65 71 90 90 60 25 89]
Time taken: 0.2505629062652588


In [148]:
array = np.random.randint(0, 100, 10_000_000_0).astype(np.int16)
start = time.time()
np.sqrt(array)
print(array[0:10])
stop = time.time()
print(f"Time taken: {stop - start}")

[65 34 94 97 19 25 77  7 81 96]
Time taken: 0.12869501113891602


## Slicing and Indexing in NumPy

- **Indexing** : indexing used for accessing the element of the array
- **slicing** : extract a portion of the array.

In [153]:
# 1D indexing
arr1 = np.array([1,3,4,5,6])
print(f'element: {arr1[2]}')

# 2D indexing
arr2 = np.array([[1,2,3],[4,5,6]])
print(f'element: {arr2[0,2]}')  # oth row, 2th column

# 3D indexing
arr3 = np.array([
    [[ 1,  2,  3], [ 4,  5,  6]],
    [[ 7,  8,  9], [10, 11, 12]]
])
print(f'element: {arr3[0,1,2]}')

element: 4
element: 3
element: 6


#### 3D index selection
* Block 0:
[[1, 2, 3],
 [4, 5, 6]]

* Block 1:
[[7, 8, 9],
 [10, 11, 12]]

- 0th "blocks" (or depth layers)
- Each block has 2 rows [0,1] , select 1th row
- Each row has 3 columns [0,1,2], select 2th column element

#### Fancy Indexing & Boolean Masking

In [155]:
# select multiple elements
indexs = [0 ,1, 2]
print(arr1[indexs])

[1 3 4]


In [157]:
# Boolean Masking (Filter Data)
mask = arr1 > 3
print(arr1[mask])
# value greater than 3 will be True

[4 5 6]


#### Slicing
Slicing always returns a view (not a copy) — changes will affect the original array unless you use .copy().


In [162]:
# slicing [start: end: Step]
arr = np.array([1,2,3,4,5,6,7])
print(arr[1:5]) 
print(arr[:6]) # first 6 elements
print(arr[::2])  # step = 2

[2 3 4 5]
[1 2 3 4 5 6]
[1 3 5 7]


In [165]:
# changes will effect
arr2 = arr[:6]
arr2[3] = 123654
print(f"arr = {arr[3]}, arr2 = {arr2[3]}")  # affect the original

arr = 123654, arr2 = 123654


In [167]:
# use copy
arr3 = arr2.copy()
arr3[2] = 111111
print(f"arr = {arr3[2]}, arr2 = {arr2[2]}") # unaffected 

arr = 111111, arr2 = 3


In [170]:
# slicing
arr_2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(arr_2d[:, 1])     # [2 5 8] → all rows, column 1
print(arr_2d[1, :])     # [4 5 6] → row 1, all columns
print(arr_2d[0:2, 0:2]) # [[1 2]
                        #  [4 5]]

[2 5 8]
[4 5 6]
[[1 2]
 [4 5]]


## What is Vectorization?

- **Vectorization** means performing operations on entire arrays without using Python loops.
- NumPy uses compiled C under the hood → speed boost!

In [172]:
# Traditional Python way:
import time
size = 1_000_000_0
a = list(range(size))
b = list(range(size))
start = time.time()
c = [a[i] + b[i] for i in range(size)]  # slow
stop = time.time()
print(f"time Taken: {stop - start}")

time Taken: 0.9273388385772705


In [173]:
# using Vectorization
d = np.array(a)
e = np.array(b)
start = time.time()
f = d + e   # fast
stop = time.time()
print(f"time Taken: {stop - start}")

time Taken: 0.05406999588012695


## What is Broadcasting?

**Broadcasting** in NumPy allows us to perform arithmetic operations on arrays of different shapes without reshaping them.


In [176]:
# add 10 to every element
arr = np.array([1, 2, 3, 4, 5])
print(arr + 10)

[11 12 13 14 15]


In [177]:
# add 1D + 2D array
arr = np.array([[1, 2, 3],
              [4, 5, 6]])
brr = np.array([10, 20, 30])

print(arr + brr)

[[11 22 33]
 [14 25 36]]


In [178]:
# Column-wise addition
arr = np.array([[1, 2, 3],
              [4, 5, 6]])
brr = np.array([[10],
              [20]])

print(arr + brr)

[[11 12 13]
 [24 25 26]]


## Aggregation & Reduction Operations

- **np.sum**	-> Sum of elements
- **np.mean**	-> Mean (average)
- **np.min**	-> Minimum value
- **np.max**	-> Maximum value
- **np.std**	-> Standard deviation
- **np.var**	-> Variance
- **np.prod**	-> Product of all elements
- **np.argmin**	-> Index of minimum value
- **np.argmax**	-> Index of maximum value

In [188]:
arr = np.array([1,2,2,3,4,4,6,7,8,9])

# sum
print(f"sum : {np.sum(arr)}")

# mean
print(f"mean : {np.mean(arr)}")

# min
print(f"minimum value : {np.min(arr)}")

# max
print(f"maximum value : {np.max(arr)}")

# std
print(f"standard deviation : {np.std(arr)}")

# var
print(f"variance : {np.var(arr)}")

# prod
print(f"prod : {np.prod(arr)}")

# argmin
print(f"min value index : {np.argmin(arr)}")

# argmax
print(f"min value index : {np.argmax(arr)}")


sum : 46
mean : 4.6
minimum value : 1
maximum value : 9
standard deviation : 2.6153393661244038
variance : 6.839999999999999
prod : 580608
min value index : 0
min value index : 9
