This  shows how `NumPy` can make data analysis much faster by replacing slow Python loops. A fictional company, **EnviroTech Dynamics**, which processes over a million sensor readings daily but struggles with slow code.

To demonstrate NumPy’s power,  a simple project using fake sensor data has been created:
-Temperature readings
- Pressure readings
- Status codes (0 = OK, 1 = Warning, 2 = Critical, 3 = Faulty)

The goal is to show beginners how NumPy handles large datasets quickly and efficiently, making it perfect for real-world data tasks.

- Performance and efficiency benchmark
- Foundational statistical baseline
- Critical anomaly detection and
- Data cleaning and imputation

By the end of this article, you should be able to get a full grasp of NumPy and its usefulness in data analysis.

In [2]:
# REQUIRED LIBRARIES
import numpy as np
import matplotlib.pyplot as plt


#### Objective 1: Performance and Efficiency Benchmark

In [3]:
# setting size of data
temp_num_readings = 1_000_000
# print(temp_num_readings)

# Generate the Temperature array (1 million random floating-point numbers)
np.random.seed(42)  #np.random.seed() locks or fixes the randomness in NumPy so you always get the same random numbers every time you run the code.
mean_temp = 45.0
std_dev_temp = 12.0
temp_data = np.random.normal(loc=mean_temp, scale=std_dev_temp, size=temp_num_readings)

print(f"Data Array Size: {temp_data.size} elements")
print(f"First 5 Temperatures: {temp_data[:5]}")

Data Array Size: 1000000 elements
First 5 Temperatures: [50.96056984 43.34082839 52.77226246 63.27635828 42.1901595 ]


- To calculate the average of all these elements.
- **NP.MEAN()** will be used.
- Using built-in mean function in NumPy so that the entire operation (average in this case) will be performed on the entire array at once. This is possible because of `NumPy Vectorization.`

In [4]:
# average function initialized 
def calculate_mean_data(data):
    return np.mean(data)

# implementing function
temp_data_mean = calculate_mean_data(temp_data)
print(f"Mean (NumPy Method):{temp_data_mean:.4f}")      #here {:.4f} means 4 decimal places. One can set it on their choice.

Mean (NumPy Method):44.9808


In [5]:
# Calculating how much time it takes for NumPy Vectorization

print("— — Timing the NumPy Vectorization — -")

%timeit -n 10 -r 5 calculate_mean_data(temp_data)       # 1 ms = 1000 μs    #convert µs → ms by dividing by 1000


— — Timing the NumPy Vectorization — -
242 μs ± 68.4 μs per loop (mean ± std. dev. of 5 runs, 10 loops each)


#### Objective 2: Foundational Statistical Baseline
NumPy offers the ability to perform basic to advanced statistics.
A good overview of what’s going on in your dataset. Some of them listed below: 

- **np.mean()** — to calculate the average.
- **np.median** — the middle value of the data
- **np.std()** — shows how spread out your numbers are from the average
- **np.percentile()** — tells you the value below which a certain percentage of your data falls.

 **After building an efficient NumPy-based system for processing large datasets, Next Step:**
- Pressure data will be generated just like the temperature data.
- This helps demonstrate NumPy’s ability to handle multiple large arrays quickly.
- Pressure readings also provide important system health insights for the client.
- Temperature and pressure are often linked — changes in one can affect the other.
- Calculating baselines for both helps identify abnormal patterns or drifting behavior.

In [9]:
np.random.seed(43)
pressure_data = np.random.uniform(low=100.0, high=500.0, size=1_000_000)

print("Data Arrays Ready")
print("Pressure Data:\n",pressure_data)

Data Arrays Ready
Pressure Data:
 [146.02182656 343.62661571 153.35638567 ... 455.21937053 303.91397541
 293.53768099]


**CALCULATING TEMPERATURE STATISTICS**

In [7]:
print("\n- - - - Temperature Stats - - - - ")

#1. calculating Mean and Median of temperature
temp_mean = np.mean(temp_data)
temp_median = np.median(temp_data)

#2. calculating Standard Deviation
temp_std = np.std(temp_data)

#3. calculating percintile(Defining the 90% Normal Range). (A percentile tells you the value below which a certain percentage of data falls.)
temp_p5 = np.percentile(temp_data, 5)   #5th Percentile.
temp_p95 = np.percentile(temp_data, 95)     #95th Percentile

#Printing Results:
print(f"Temperature Mean(Average):{temp_mean:.2f}°C")
print(f"Temperature Median(Middle):{temp_median:.2f}°C")
print(f"Standard Deviation(Spread): {temp_std:.2f}°C")
print(f"90% Normal Range: {temp_p5:.2f}°C to {temp_p95:.2f}°C")


- - - - Temperature Stats - - - - 
Temperature Mean(Average):44.98°C
Temperature Median(Middle):44.99°C
Standard Deviation(Spread): 12.00°C
90% Normal Range: 25.24°C to 64.71°C


- `The Mean (Average) 44.98°C` basically gives us a central point around which most readings are expected to fall. 
- `The Median(Middle) 44.99C` is nearly identical, meaning the dataset is well-balanced without extreme outliers affecting the average.
- `The standard deviation 12°C` indicating that temperatures fluctuate quite widely rather than remaining stable.
- `90% of all temperature readings fall between 25°C and 65°C.` This range represents what can be considered the “normal operating window.”

**CALCULATING PRESSURE STATISTICS**

- To improve our codebase storing all the calculations performed in a dictionary called *pressure stats*, and simply looping over the key-value pairs.

In [12]:
print("\n- - - - Pressure Stats - - - - ")

#Same Calculation for Pressure
pressure_stats = { 
"Mean": np.mean(pressure_data),
"Median": np.median(pressure_data),
"Standard Deviation":np.std(pressure_data),
"5th %tile": np.percentile(pressure_data, 5),
"95th %tile":np.percentile(pressure_data, 95)}

for label, value in pressure_stats.items():
    print(f"{label:<12}: {value:.2f} kPa")


- - - - Pressure Stats - - - - 
Mean        : 300.09 kPa
Median      : 300.04 kPa
Standard Deviation: 115.47 kPa
5th %tile   : 120.11 kPa
95th %tile  : 480.09 kPa


- Pressure readings average around `300 kilopascals`
- The median(Middle Value) is almost the same. 
- The standard deviation is about `115 kPa`. means there’s a lot of variation between readings. In other words, some readings are much higher or lower than the typical 300 kPa level.
- The percentiles, `90% of our readings fall between 120 and 480 kPa.` That’s a wide range,

This Suggests that pressure conditions are not stable — possibly fluctuating between low and high states during operation. So while the average looks fine, the variability could point to inconsistent performance or environmental factors affecting the system.”