Numpy - Data Analysis Library in Python


NumPy is a fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently. NumPy is widely used for data analysis, numerical computations, and as a foundation for other scientific libraries such as pandas, SciPy, and scikit-learn.

Key features of NumPy:

Powerful N-dimensional array object (ndarray) ,

Broadcasting functions for arithmetic operations ,

Tools for integrating C, C++, and Fortran code ,

Useful linear algebra, Fourier transform, and random number capabilities.

In [None]:
# Installing numpy package
#  Run this command 
#  pip install numpy

# Importing numpy package 
import numpy as np

# creating a numpy array
arr=np.array([1,2,3,4,5])
print(arr)

print(f"The type of the array is {type(arr)}") # checking the type of the numpy array

# Checking the shape of the array
print(f"the shape of the array is {arr.shape}")
# >>>>>> this shape is a tuple that represent the dimension of the array
#  for 1D array , it will be (n,) , where n is the number of elements in the array
#  for 2D array , it will be (n,m) , where n is no of rows and m is no of columns




[1 2 3 4 5]
The type of the array is <class 'numpy.ndarray'>
the shape of the array is (5,)


Reshaping the Array 

Reshaping a NumPy array means changing its shape (i.e., the number of rows and columns) without changing its data. For example, you can convert a 1D array into a 2D array or vice versa, as long as the total number of elements remains the same.

In [14]:
#  Reshaping the array 
arr1=np.array([1,2,3,4,5,6,7,8])
arr2=arr1.reshape(2,4)  # representing the array in 2 row and 4 columns
print(arr2)

print("-------------------------------")

# Creating 2D array 
arr3=np.array([[1,2,3,],[4,5,6]])
print(arr3)
print()

#checking the shape of the 2D array
print(f"The shape of 2D array is {arr3.shape}") # represents 2 rows and 3 columns
print()
#  Reshaping the 2D array
arr4=arr3.reshape(3,2)
print(arr4)


[[1 2 3 4]
 [5 6 7 8]]
-------------------------------
[[1 2 3]
 [4 5 6]]

The shape of 2D array is (2, 3)

[[1 2]
 [3 4]
 [5 6]]


Creating numpy arrays with some Built-in functions

Here are some common built-in functions in NumPy for creating arrays:

In [None]:
# Creating arrays with some built-in NumPy functions
zeros_arr = np.zeros((2, 3))        # Creating 2 D array filled with zeros
ones_arr = np.ones((3, 2))          # creating 2D arrays filled with ones

# Creating arrays with specific ranges and values
arange_arr = np.arange(0, 10, 2)


linspace_arr = np.linspace(0, 1, 5)  # creating an array with 5 evenly spaced values between o and 1
eye_arr = np.eye(3)                  # creating identity matrix of size 3*3

# Printing the arrays and their types
print("zeros_arr:\n", zeros_arr)
print("Type:", type(zeros_arr), "\n")

print("ones_arr:\n", ones_arr)
print("Type:", type(ones_arr), "\n")

print("arange_arr:\n", arange_arr)
print("Type:", type(arange_arr), "\n")

print("linspace_arr:\n", linspace_arr)
print("Type:", type(linspace_arr), "\n")

print("eye_arr:\n", eye_arr)
print("Type:", type(eye_arr), "\n")

some functions of numpy 

In [17]:
# some functions of numpy

# Here are some commonly used NumPy functions for data analysis:
# These functions help perform statistical, mathematical, and structural operations efficiently on NumPy


arr = np.array([1, 2, 3, 4, 5, 6])

print(np.sum(arr))        # Sum of all elements
print(np.mean(arr))       # Mean (average) of elements
print(np.median(arr))     # Median value
print(np.std(arr))        # Standard deviation
print(np.min(arr))        # Minimum value
print(np.max(arr))        # Maximum value
print(np.argmin(arr))     # Index of minimum value
print(np.argmax(arr))     # Index of maximum value
print(np.sort(arr))       # Sorted array
print(np.unique(arr))     # Unique elements



print ("------------------------------")
#  for 2D arrays
mat = np.array([[1, 2], [3, 4]])
print(np.dot(mat, mat))           # Dot product (matrix multiplication)
print(np.transpose(mat))          # Transpose of the matrix
print(np.concatenate([arr, arr])) # Concatenate arrays
print(arr.reshape(2, 3)) 

21
3.5
3.5
1.707825127659933
1
6
0
5
[1 2 3 4 5 6]
[1 2 3 4 5 6]
------------------------------
[[ 7 10]
 [15 22]]
[[1 3]
 [2 4]]
[1 2 3 4 5 6 1 2 3 4 5 6]
[[1 2 3]
 [4 5 6]]


Vectorised Operations on Numpy Arrays 


Vectorized operations allow you to perform element-wise computations on entire NumPy arrays without the need for explicit loops. This makes your code more concise, readable, and efficient. For example, you can add, subtract, multiply, or divide arrays directly, and NumPy will apply the operation to each element automatically. Vectorized operations leverage low-level optimizations, resulting in faster execution compared to traditional Python loops.

In [None]:
# Vectorized Operations on NumPy Arrays

# 1. Arithmetic Operations (element-wise)
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

add = a + b           # Addition: array([5, 7, 9])
sub = a - b           # Subtraction: array([-3, -3, -3])
mul = a * b           # Multiplication: array([ 4, 10, 18])
div = a / b           # Division: array([0.25, 0.4 , 0.5 ])
exp = a ** 2          # Exponentiation: array([1, 4, 9])

print("Addition:", add)
print("Subtraction:", sub)
print("Multiplication:", mul)
print("Division:", div)
print("Exponentiation:", exp)

# 2. Universal Functions (ufuncs)
sqrt = np.sqrt(a)     # Square root: array([1.        , 1.41421356, 1.73205081])
log = np.log(a)       # Natural log: array([0.        , 0.69314718, 1.09861229])
sin = np.sin(a)       # Sine: array([0.84147098, 0.90929743, 0.14112001])

print("Square root:", sqrt)
print("Natural log:", log)
print("Sine:", sin)

# 3. Comparison Operations (element-wise)
comp = a > 2          # array([False, False,  True])
eq = a == b           # array([False, False, False])

print("Greater than 2:", comp)
print("Equal to b:", eq)

# 4. Aggregation Functions
sum_a = a.sum()       # Sum: 6
mean_a = a.mean()     # Mean: 2.0
max_a = a.max()       # Max: 3
min_a = a.min()       # Min: 1

print("Sum:", sum_a)
print("Mean:", mean_a)
print("Max:", max_a)
print("Min:", min_a)

# 5. Broadcasting
c = np.array([[1], [2], [3]])
d = np.array([4, 5, 6])
broadcasted_add = c + d  # Shape (3,3): each row of c is added to d

print("Broadcasted addition:\n", broadcasted_add)

# 6. Logical Operations
logical_and = np.logical_and(a > 1, b < 6)  # array([False,  True, False])
logical_or = np.logical_or(a < 2, b > 5)    # array([ True, False,  True])

print("Logical AND:", logical_and)
print("Logical OR:", logical_or)

# 7. In-place Operations
a += 1  # a is now array([2, 3, 4])
print("In-place addition:", a)

# Explanation:
# - Vectorized operations allow you to perform computations on entire arrays without explicit loops.
# - They are faster and more concise than looping through elements.
# - NumPy uses optimized C code for these operations, making them efficient.
# - Broadcasting lets you perform operations on arrays of different shapes by automatically expanding their dimensions.
# - Universal functions (ufuncs) are functions that operate element-wise on arrays (e.g., np.sqrt, np.exp).
# - Aggregation functions compute summary statistics (sum, mean, min, max, etc.).
# - Logical and comparison operations return boolean arrays for filtering or conditional logic.

Array slicing and Indexing

In [None]:
# Array Slicing and Indexing in NumPy

# 1. Indexing
arr = np.array([10, 20, 30, 40, 50])
print("Element at index 0:", arr[0])      # 10
print("Element at last index:", arr[-1])  # 50

# 2. Slicing (1D array)
print("Elements from index 1 to 3:", arr[1:4])   # [20 30 40]
print("Elements from start to index 2:", arr[:3]) # [10 20 30]
print("Elements from index 2 to end:", arr[2:])   # [30 40 50]
print("Every second element:", arr[::2])          # [10 30 50]

# 3. Indexing and Slicing (2D array)
mat = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Element at row 1, col 2:", mat[1, 2])      # 6
print("First row:", mat[0])                       # [1 2 3]
print("First column:", mat[:, 0])                 # [1 4 7]
print("Submatrix (rows 0-1, cols 1-2):\n", mat[0:2, 1:3])  # [[2 3], [5 6]]

# 4. Boolean Indexing
bool_idx = arr > 25
print("Elements greater than 25:", arr[bool_idx]) # [30 40 50]

# 5. Fancy Indexing
indices = [0, 2, 4]
print("Elements at indices 0, 2, 4:", arr[indices]) # [10 30 50]

# 6. Modifying arrays using slicing
arr[1:4] = 99
print("Modified array:", arr)  # [10 99 99 99 50]

# Notes:
# - Slicing creates a view, not a copy, so modifying the slice affects the original array.
# - Use arr.copy() if you want a separate copy.
# - Indexing and slicing work similarly for higher-dimensional arrays.