# NumPy Comprehensive Guide

This notebook covers essential NumPy concepts with detailed explanations and examples.

In [1]:
# Basic print statement to verify execution
print("hello")

hello


In [2]:
# Import NumPy and check version
import numpy as np
print(np.__version__)

2.4.1


## Creating NumPy Arrays

NumPy arrays (ndarrays) are the core data structure in NumPy. They are homogeneous collections of items with the same data type.

In [3]:
# Creating a 1-dimensional array from a Python list
arr = np.array([1,2,3,4,5])
print(arr)

[1 2 3 4 5]


In [4]:
# Check the type of the array
type(arr)

numpy.ndarray

In [5]:
# Creating a 2-dimensional array (matrix)
arr2 = np.array([[1,2,3], [4,5,6]])
print(arr2)
arr2

[[1 2 3]
 [4 5 6]]


array([[1, 2, 3],
       [4, 5, 6]])

## Array Creation Functions

NumPy provides various functions to create arrays with specific patterns or values.

In [6]:
# Create an array filled with zeros
arr3 = np.zeros((2,3))
print(arr3)

[[0. 0. 0.]
 [0. 0. 0.]]


In [7]:
# Create an array filled with ones
arr4 = np.ones((3,4))
print(arr4)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


In [8]:
# Create an identity matrix
arr5 = np.identity(4)
print(arr5)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


In [9]:
# Create an array with evenly spaced values within a given interval
arr6 = np.arange(1,10,2)  # Start=1, Stop=10, Step=2
print(arr6)

[1 3 5 7 9]


In [10]:
# Create an array with evenly spaced numbers over a specified interval
arr7 = np.linspace(10,20,5)  # Start=10, End=20, Number of elements=5
print(arr7)

# Formula: step = (end - start) / (n - 1)
# Values generated: 10.0, 12.5, 15.0, 17.5, 20.0

[10.  12.5 15.  17.5 20. ]


In [11]:
# Create a copy of an array
arr8 = arr7.copy()
print(arr8)

[10.  12.5 15.  17.5 20. ]


## Array Properties

Understanding the properties of NumPy arrays is crucial for effective array manipulation.

In [12]:
arr

array([1, 2, 3, 4, 5])

In [13]:
# Shape of the array - returns tuple showing size of each dimension
arr.shape

(5,)

In [14]:
arr2.shape

(2, 3)

In [15]:
# Total number of elements in the array
arr2.size

6

In [16]:
arr3.size

6

In [17]:
# Creating a 3-dimensional array
arr9 = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
arr9

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

In [18]:
# Shape of 3D array
arr9.shape

(2, 2, 2)

In [19]:
# Number of dimensions in the array
arr9.ndim

3

In [20]:
arr2.ndim

2

In [21]:
arr.ndim

1

In [22]:
# Total number of elements
arr9.size

8

In [23]:
# Size in bytes of each element
arr9.itemsize

8

In [24]:
# Data type of the array elements
arr8.dtype

dtype('float64')

In [25]:
arr9.dtype

dtype('int64')

In [26]:
# Converting data type of array elements
arr9.astype('float')

array([[[1., 2.],
        [3., 4.]],

       [[5., 6.],
        [7., 8.]]])

In [27]:
arr9.astype('float32')

array([[[1., 2.],
        [3., 4.]],

       [[5., 6.],
        [7., 8.]]], dtype=float32)

## NumPy Arrays vs Python Lists

NumPy arrays offer significant advantages over Python lists in terms of performance and functionality.

In [28]:
# Memory comparison: Python lists vs NumPy arrays
lista = range(100)
arr11 = np.arange(100)

import sys
print(sys.getsizeof(5) * len(lista))  # Memory used by Python list

2800


In [29]:
print(arr11.itemsize * arr11.size)  # Memory used by NumPy array

800


In [30]:
# Performance comparison: Element-wise operations
import time

x = range(100000)
y = range(100000,200000)

start_time = time.time()
c = [x+y for x,y in zip(x,y)]
print("Time taken using Python list:", time.time() - start_time)

Time taken using Python list: 0.0039968490600585938


In [31]:
import numpy as np
import time

x = np.arange(100000)
y = np.arange(100000,200000)

start_time = time.time()
c = x + y
print("Time taken NumPy:", time.time() - start_time)

Time taken NumPy: 0.001003265380859375


## Indexing, Slicing and Iteration

NumPy provides powerful indexing and slicing capabilities for accessing array elements.

In [32]:
# Create a 6x4 array with values from 0 to 23
arr12 = np.arange(24).reshape(6,4)
arr12

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

In [33]:
arr

array([1, 2, 3, 4, 5])

In [34]:
# Accessing individual elements
arr[3]

4

In [35]:
# Slicing to get a range of elements
arr[2:3]

array([3])

In [36]:
# Negative indexing
arr[-1]

5

In [37]:
# Accessing a row in 2D array
arr12[2]

array([ 8,  9, 10, 11])

In [38]:
# Slicing rows
arr12[:2]

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [39]:
# Slicing rows and columns
arr12[2:3,0:2]

array([[8, 9]])

In [40]:
# Accessing entire column
arr12[:,2]

array([ 2,  6, 10, 14, 18, 22])

In [41]:
# Mixed indexing and slicing
arr12[3,1:3]

array([13, 14])

In [42]:
# Slicing sub-blocks
arr12[4:6,2:4]

array([[18, 19],
       [22, 23]])

In [43]:
# Iterating over rows
for i in arr12:
    print(i)

[0 1 2 3]
[4 5 6 7]
[ 8  9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]


In [44]:
# Iterating over all elements
for i in np.nditer(arr12):
    print(i)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


## Array Operations

NumPy supports vectorized operations which are much faster than equivalent operations on Python lists.

In [45]:
import numpy as np
arr1 = np.array([1,2,3,4,5,6])
arr2 = np.array([4,5,6,7,8,9])

In [46]:
# Element-wise subtraction
arr1 - arr2

array([-3, -3, -3, -3, -3, -3])

In [47]:
# Element-wise addition
arr1 + arr2  # vectorized addition

array([ 5,  7,  9, 11, 13, 15])

In [48]:
# Element-wise multiplication
arr1 * arr2  # vectorized multiplication

array([ 4, 10, 18, 28, 40, 54])

In [49]:
# Scalar multiplication
arr1*2  # scalar multiplication

array([ 2,  4,  6,  8, 10, 12])

In [50]:
# Boolean operations
arr2>3  # returns boolean array

array([ True,  True,  True,  True,  True,  True])

In [51]:
# Matrix multiplication
arr3 = np.arange(6).reshape(2,3)
arr4 = np.arange(6,12).reshape(3,2)

In [52]:
arr3.dot(arr4)  # matrix multiplication

array([[ 28,  31],
       [100, 112]])

## Statistical Operations

NumPy provides built-in functions for statistical analysis of data.

In [53]:
arr4.max()

11

In [54]:
arr4.min()

6

In [55]:
arr4.min(axis=0)  # column wise min

array([6, 7])

In [56]:
arr4.max(axis=1)  # row wise max

array([ 7,  9, 11])

In [57]:
arr4.sum(axis=0)  # column wise sum

array([24, 27])

In [58]:
arr4.mean()

8.5

In [59]:
arr4.mean(axis=1)

array([ 6.5,  8.5, 10.5])

In [60]:
arr4.std()  # standard deviation

1.707825127659933

In [61]:
np.sin(arr4)

array([[-0.2794155 ,  0.6569866 ],
       [ 0.98935825,  0.41211849],
       [-0.54402111, -0.99999021]])

In [62]:
np.median(arr4)

8.5

In [63]:
np.exp(arr4)

array([[  403.42879349,  1096.63315843],
       [ 2980.95798704,  8103.08392758],
       [22026.46579481, 59874.1417152 ]])

## Array Reshaping

Reshaping arrays is a common operation in data processing and analysis.

In [64]:
arr4.ndim

2

In [65]:
arr4.ravel()  # flatten the array to 1D

array([ 6,  7,  8,  9, 10, 11])

In [66]:
arr4.T  # transpose of the array

array([[ 6,  8, 10],
       [ 7,  9, 11]])

In [67]:
arr4.transpose()

array([[ 6,  8, 10],
       [ 7,  9, 11]])

In [68]:
arr5 = np.arange(12,18).reshape(2,3)

In [69]:
np.hstack((arr3, arr5))  # horizontal stacking

array([[ 0,  1,  2, 12, 13, 14],
       [ 3,  4,  5, 15, 16, 17]])

In [70]:
np.vstack((arr3, arr5))  # vertical stacking

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [12, 13, 14],
       [15, 16, 17]])

## Fancy Indexing

Fancy indexing allows using arrays of indices to access multiple array elements simultaneously.

In [71]:
arr8 = np.arange(24).reshape(4,6)
arr8

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

In [72]:
arr8[[0,1]]  # Select rows 0 and 1

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

In [73]:
arr8[:, [0,2,5]]  # Select columns 0, 2, and 5

array([[ 0,  2,  5],
       [ 6,  8, 11],
       [12, 14, 17],
       [18, 20, 23]])

## Boolean Indexing

Boolean indexing allows selecting array elements based on conditions.

In [74]:
arr13 = np.random.randint(low=1, high=100, size=20).reshape(4,5)
arr13

array([[53, 26, 22, 35, 71],
       [47, 29, 77,  5, 22],
       [ 5, 48, 48, 40, 55],
       [45, 37, 42, 90,  3]], dtype=int32)

In [75]:
arr13>50  # Boolean mask for values greater than 50

array([[ True, False, False, False,  True],
       [False, False,  True, False, False],
       [False, False, False, False,  True],
       [False, False, False,  True, False]])

In [76]:
(arr13>50).shape

(4, 5)

In [77]:
arr13[arr13>50]  # Extract values greater than 50

array([53, 71, 77, 55, 90], dtype=int32)

In [78]:
arr13[(arr13>50) & (arr13%2!=0)]  # Extract odd values greater than 50

array([53, 71, 77, 55], dtype=int32)

In [79]:
arr13[(arr13>50) & (arr13%2!=0)] = 0  # Set odd values greater than 50 to 0

In [80]:
arr13

array([[ 0, 26, 22, 35,  0],
       [47, 29,  0,  5, 22],
       [ 5, 48, 48, 40,  0],
       [45, 37, 42, 90,  3]], dtype=int32)

## Visualization with Matplotlib

NumPy arrays work seamlessly with visualization libraries like Matplotlib.

In [81]:
x = np.linspace(-40,40,100)

In [82]:
y = np.sin(x)

In [83]:
import matplotlib.pyplot as plt
%matplotlib inline

In [84]:
plt.plot(x,y)

[<matplotlib.lines.Line2D at 0x7f0c04065a90>]

In [85]:
y = x*x+2*x+6
plt.plot(x,y)

[<matplotlib.lines.Line2D at 0x7f0c04054810>]

## Broadcasting

Broadcasting is a powerful feature that allows NumPy to work with arrays of different shapes during arithmetic operations.

In [86]:
# Broadcasting: The term broadcasting refers to the ability of NumPy to treat arrays of different shapes during arithmetic operations.
# Arithmetic operations on arrays are usually done on corresponding elements. 
# If two arrays are of exactly the same shape, then these operations are smoothly performed.
# If the dimensions of two arrays are dissimilar, element-to-element operations are not possible.
# However, operations on arrays of non-similar shapes is still possible in NumPy, 
# because of the broadcasting capability. The smaller array is broadcast to the size of the larger array 
# so that they have compatible shapes.

In [87]:
a1=np.arange(8).reshape(2,4)
a2=np.arange(8,16).reshape(2,4)

print(a1)
print(a2)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [88]:
a1 + a2  # Same shape - normal addition

array([[ 8, 10, 12, 14],
       [16, 18, 20, 22]])

In [89]:
a3 = np.arange(9).reshape(3,3)
a4 = np.arange(3).reshape(1,3)

print(a3,a4)

[[0 1 2]
 [3 4 5]
 [6 7 8]] [[0 1 2]]


In [90]:
a3 + a4  # Broadcasting: a4 is broadcast to match a3's shape

array([[ 0,  2,  4],
       [ 3,  5,  7],
       [ 6,  8, 10]])

In [91]:
# Case A: Adding a scalar to an array
a = np.array([1, 2, 3])
b = 10
print(a + b) 
# Result: [11, 12, 13]

[11 12 13]


In [92]:
# Case B: Adding 1D array to 2D matrix
matrix = np.array([[1, 1, 1], 
                   [2, 2, 2]]) # Shape (2, 3)
row = np.array([10, 20, 30])    # Shape (3,)

print(matrix + row)

[[11 21 31]
 [12 22 32]]


## Rules for Broadcasting

1. Rule 1: Same Shape - If arrays have the same shape, operations are performed element-wise.
2. Rule 2: One Dimension is 1 - If one array has a dimension of size 1, it gets stretched to match the other array.
3. Rule 3: Different Dimensions - If arrays have different number of dimensions, the smaller array is padded with 1s on the left.

In [93]:
# Rule 1: Same Shape
a1 = np.arange(8).reshape(2,4)
a2 = np.arange(8,16).reshape(2,4)

a1 + a2

array([[ 8, 10, 12, 14],
       [16, 18, 20, 22]])

In [94]:
# Rule 2: One Dimension is 1
a5 = np.arange(3).reshape(1,3)
a6 = np.arange(12).reshape(4,3)

print(a5, a6)

[[0 1 2]] [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


In [95]:
# Rule 3: Different Dimensions
a7 = np.arange(4).reshape(4,1)
a8 = np.arange(12).reshape(4,3)

print(a7)
print(a8)

print(a7 + a8)

[[0]
 [1]
 [2]
 [3]]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[[ 0  1  2]
 [ 4  5  6]
 [ 8  9 10]
 [12 13 14]]


## Random Number Generation

NumPy provides various functions for generating random numbers.

In [96]:
np.random.random()  # Random float between 0 and 1

0.06296341447304132

In [97]:
np.random.seed(1)  # Set seed for reproducible results
np.random.random()

0.417022004702574

In [98]:
np.random.uniform(1, 100,10).reshape(2,5)  # Uniform distribution

array([[87.9336262 ,  3.71137173, 67.37628351, 42.31317543, 56.31029302],
       [14.89830692, 20.61204742, 80.2737123 , 96.857896  , 32.02899364]])

In [99]:
np.random.randint(1,10,15).reshape(3,5)  # Random integers

array([[4, 7, 9, 1, 3],
       [8, 8, 8, 4, 1],
       [9, 8, 8, 2, 2]], dtype=int32)

# Conclusion

This notebook covered the fundamental concepts of NumPy including:

1. Array creation and properties
2. Indexing, slicing, and iteration
3. Array operations and broadcasting
4. Statistical functions
5. Reshaping and stacking
6. Boolean and fancy indexing
7. Random number generation
8. Integration with visualization libraries

These concepts form the foundation for scientific computing with Python.