# Industrial Failure Detection System Data Generation

In this notebook, we generate synthetic sensor data for an industrial failure detection system. The generated dataset simulates the behavior of multiple sensors under normal operating conditions, with some anomalies introduced to mimic failure events.

---

## Step 1: Import Necessary Libraries

We begin by importing the required libraries: `numpy` and `pandas`. These are used for numerical computations and data manipulation, respectively.

```python
import numpy as np
import pandas as pd


In [1]:
import numpy as np
import pandas as pd

## Step 2: Define Helper Functions
generate_normal_sensor_data
This function generates synthetic sensor data under normal operating conditions. The data simulates periodic fluctuations, such as those caused by cyclic processes in machines, with added random noise.

Parameters:
sensor_name: Name of the sensor.
num_points: Number of data points to generate.
min_val and max_val: Range of sensor values.
noise_factor: Degree of random noise to add.

In [2]:
def generate_normal_sensor_data(sensor_name, num_points, min_val, max_val, noise_factor=0.1):
    """Generates normal operating sensor data with periodic fluctuations."""
    time = np.arange(0, num_points)
    # Simulate cyclic behavior (like sinusoidal pattern) with some random noise
    data = (np.sin(time / 50) + 1) * (max_val - min_val) / 2 + min_val
    noise = np.random.normal(0, (max_val - min_val) * noise_factor, num_points)
    return data + noise

introduce_failures
This function introduces anomalies into the dataset to simulate failure events. These anomalies manifest as significant changes in sensor readings, such as spikes in temperature or vibration.

Parameters:
df: Input DataFrame containing sensor data.
num_failures: Number of failure events to introduce.
failure_duration: Duration (in data points) of each failure event.

In [3]:
def introduce_failures(df, num_failures, failure_duration=60):
    """Introduce anomalies just before failure events."""
    max_start_index = len(df) - failure_duration  # Ensure we don't go out of bounds
    failure_indices = np.random.choice(max_start_index, num_failures, replace=False)  # Prevent exceeding DataFrame length
    
    for idx in failure_indices:
        # Adjust the end index to ensure it doesn't exceed the DataFrame length
        end_idx = idx + failure_duration
        if end_idx > len(df):
            end_idx = len(df)  # Prevent out-of-bound errors
        
        # Add abnormal changes in temperature, vibration, etc.
        df.loc[idx:end_idx-1, 'temperature'] += np.random.randint(20, 50, end_idx - idx)
        df.loc[idx:end_idx-1, 'vibration'] += np.random.uniform(3, 5, end_idx - idx)
        df.loc[idx:end_idx-1, 'failure'] = 1  # Failure flag for the duration of the anomaly
    
    return df

## Step 3: Generate Sensor Data
Normal Operating Data
We generate synthetic sensor data for temperature, vibration, pressure, and humidity under normal conditions. Each sensor's data exhibits periodic fluctuations with added noise.

Parameters:
num_points: Number of data points (30 days of 1-minute intervals).

In [4]:
# Generate sensor data for one device
num_points = 43200  # 30 days of 1-minute intervals
temperature_data = generate_normal_sensor_data('temperature', num_points, 20, 40)
vibration_data = generate_normal_sensor_data('vibration', num_points, 0, 2)
pressure_data = generate_normal_sensor_data('pressure', num_points, 100, 200)
humidity_data = generate_normal_sensor_data('humidity', num_points, 30, 70)


Create Initial DataFrame
We combine the generated sensor data into a pandas DataFrame and initialize a failure flag (failure = 0).

In [5]:
# Create DataFrame
df = pd.DataFrame({
    'timestamp': pd.date_range(start='2024-12-01', periods=num_points, freq='1min'),
    'temperature': temperature_data,
    'vibration': vibration_data,
    'pressure': pressure_data,
    'humidity': humidity_data,
    'failure': 0  # Default no failure
})

## Step 4: Introduce Failures
We introduce 5 failure events, each lasting for 60 minutes. During these events, temperature and vibration exhibit abnormal spikes.

In [6]:
# Introduce failures
df = introduce_failures(df, num_failures=5, failure_duration=60)

## Step 5: Simulate Multiple Devices
We simulate data for 10 different machines by duplicating the initial DataFrame and assigning a unique machine ID to each.

In [7]:
devices = []
for device_id in range(1, 11):  # 10 devices
    df_device = df.copy()
    df_device['machine_id'] = device_id
    devices.append(df_device)

df_full = pd.concat(devices, ignore_index=True)


## Step 6 Export the data

In [8]:
df_full.to_csv('synthetic_iot_sensor_data.csv', index=False)
