# Data Types & Broadcasting in Numpy

In [1]:
import numpy as np

In [2]:
arr = np.arange(0,101)
arr

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100])

In [3]:
arr.dtype

dtype('int64')

In [4]:
arr1 = arr
arr1.astype("float64")

array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
        11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,
        22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,
        33.,  34.,  35.,  36.,  37.,  38.,  39.,  40.,  41.,  42.,  43.,
        44.,  45.,  46.,  47.,  48.,  49.,  50.,  51.,  52.,  53.,  54.,
        55.,  56.,  57.,  58.,  59.,  60.,  61.,  62.,  63.,  64.,  65.,
        66.,  67.,  68.,  69.,  70.,  71.,  72.,  73.,  74.,  75.,  76.,
        77.,  78.,  79.,  80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,
        88.,  89.,  90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,
        99., 100.])

## Downcasting to Save Memory

The core idea of downcasting is to optimize memory by reducing the bit size of an array's data type, such as converting from $\text{int64}$ (8 bytes) to $\text{int32}$ (4 bytes). This technique is crucial when a dataset's maximum value can be safely represented by a smaller type. By ensuring the data type's memory footprint is just large enough for the range of values, you can save significant memory. This reduction in memory usage directly translates into faster data loading and improved processing speeds, as more data fits into the CPU's cache. This optimization is achieved without compromising data integrity, provided no values exceed the new, smaller type's capacity.

In [5]:
arr_large = np.array([100000, 200000, 300000], dtype=np.int64)
arr_small = arr_large.astype(np.int32) # Downcastting to small
print(arr_small)
print(arr_small.dtype)

[100000 200000 300000]
int32


In [6]:
print(f"Small: {arr_small.nbytes}")
print(f"Large: {arr_large.nbytes}")

Small: 12
Large: 24


In [7]:
arr = np.array([1,2,3,4,5])
result = arr ** 2 #Vectorized Operations
print(result)

[ 1  4  9 16 25]


In [8]:
result + 10

array([11, 14, 19, 26, 35])

## Broadcasting in 2D Array

In [9]:
arr1 = np.array([[1,2,3], [4,5,6]])
arr2 = np.array([1,2,3])
result1 = arr1 + arr2
print(result1)

[[2 4 6]
 [5 7 9]]


## Normalization Data Using Broadcasting 

Imagine we have a dataset where each row represents a sample and each column represent a feature. You can normalize the data by subtracting the mean of each column and dividing by the standard deviation

In [10]:
data = np.array([[10,20,30],
                 [15,25,35],
                 [20,30,40],
                 [25,35,45],
                 [30,40,50]])

mean = data.mean(axis=0)
std = data.std(axis=0)

print(f"Data: {data}")
print(f"Mean of the data: {mean}")
print(f"Std Deviation of the data: {std}")

normalized_data = [data - mean] / std
print(f"Normalized Data: {normalized_data}")


Data: [[10 20 30]
 [15 25 35]
 [20 30 40]
 [25 35 45]
 [30 40 50]]
Mean of the data: [20. 30. 40.]
Std Deviation of the data: [7.07106781 7.07106781 7.07106781]
Normalized Data: [[[-1.41421356 -1.41421356 -1.41421356]
  [-0.70710678 -0.70710678 -0.70710678]
  [ 0.          0.          0.        ]
  [ 0.70710678  0.70710678  0.70710678]
  [ 1.41421356  1.41421356  1.41421356]]]


# Summary: 
Broadcasting in NumPy allows arrays of different shapes to work together during arithmetic operations without needing explicit loops or extra memory. Python loops are slow because they process elements one at a time, while NumPyâ€™s underlying C code performs operations on entire blocks of data at once. Broadcasting uses this vectorization to apply operations across large arrays instantly, making computations highly efficient. In real-world data science, broadcasting is essential for tasks like normalizing datasets, where you subtract the mean and divide by the standard deviation across millions of values without copying data. This lets you transform large datasets quickly while keeping memory usage and performance optimized.