This  shows how `NumPy` can make data analysis much faster by replacing slow Python loops. A fictional company, **EnviroTech Dynamics**, which processes over a million sensor readings daily but struggles with slow code.

To demonstrate NumPy’s power,  a simple project using fake sensor data has been created:
-Temperature readings
- Pressure readings
- Status codes (0 = OK, 1 = Warning, 2 = Critical, 3 = Faulty)

The goal is to show beginners how NumPy handles large datasets quickly and efficiently, making it perfect for real-world data tasks.

- Performance and efficiency benchmark
- Foundational statistical baseline
- Critical anomaly detection and
- Data cleaning and imputation

By the end of this article, you should be able to get a full grasp of NumPy and its usefulness in data analysis.

In [1]:
# REQUIRED LIBRARIES
import numpy as np
import matplotlib.pyplot as plt


#### Objective 1: Performance and Efficiency Benchmark

In [8]:
# setting size of data
temp_num_readings = 1_000_000
# print(temp_num_readings)

np.random.seed(42)
mean_temp = 45.0
std_dev_temp = 12.0
temp_data = np.random.normal(loc=mean_temp, scale=std_dev_temp, size=temp_num_readings)

print(f"Data Array Size: {temp_data.size} elements")
print(f"First 5 Temperatures: {temp_data[:5]}")

Data Array Size: 1000000 elements
First 5 Temperatures: [50.96056984 43.34082839 52.77226246 63.27635828 42.1901595 ]


- To calculate the average of all these elements.
- **NP.MEAN()** will be used.
- Using built-in mean function in NumPy so that the entire operation (average in this case) will be performed on the entire array at once. This is possible because of `NumPy Vectorization.`

In [18]:
# average function initialized 
def calculate_mean_data(data):
    return np.mean(data)

# implementing function
temp_data_mean = calculate_mean_data(temp_data)
print(f"Mean (NumPy Method):{temp_data_mean:.4f}")      #here {:.4f} means 4 decimal places. One can set it on their choice.

Mean (NumPy Method):44.9808


In [25]:
# Calculating how much time it takes for NumPy Vectorization

print("— — Timing the NumPy Vectorization — -")

%timeit -n 10 -r 5 calculate_mean_data(temp_data)       # 1 ms = 1000 μs    #convert µs → ms by dividing by 1000


— — Timing the NumPy Vectorization — -
248 μs ± 74.2 μs per loop (mean ± std. dev. of 5 runs, 10 loops each)


#### Objective 2: Foundational Statistical Baseline