# Sensor Reading

## Objectives
- Load and preprocess an automotive engine dataset.
- Simulate timestamps with a 1-minute interval.
- Create new time-based features (Hour, Day_of_week, Is_weekend).
- Detect outliers in Engine rpm using the IQR method.
- Analyze related sensor readings for detected outliers.

- Load and preprocess an automotive engine dataset.

In [1]:
# Loading Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import timedelta
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

# Loading Dataset
df = pd.read_csv(r'C:\Users\hp\OneDrive\Desktop\Data_Analyst\Automotive_Vehicles_Engine\Dataset\engine_data.csv')

In [None]:
# Standardize column names
df.columns = [col.strip().lower().replace(" ", "_") for col in df.columns]

# Convert data types (if needed)
df['engine_rpm'] = df['engine_rpm'].astype(float)

df.head()

Unnamed: 0,engine_rpm,lub_oil_pressure,fuel_pressure,coolant_pressure,lub_oil_temp,coolant_temp,engine_condition
0,700.0,2.493592,11.790927,3.178981,84.144163,81.632187,1
1,876.0,2.941606,16.193866,2.464504,77.640934,82.445724,0
2,520.0,2.961746,6.553147,1.064347,77.752266,79.645777,1
3,473.0,3.707835,19.510172,3.727455,74.129907,71.774629,1
4,619.0,5.672919,15.738871,2.052251,78.396989,87.000225,0


- Feature Engineering

In [4]:
df['temp_diff'] = df['coolant_temp'] - df['lub_oil_temp']

df['pressure_ratio'] = df['lub_oil_pressure'] / df['fuel_pressure']


- Simulating Timestamp

In [None]:
# Simulate a starting timestamp
start_time = pd.Timestamp('2025-01-01 00:00:00')

# Generate timestamps with a 1-minute interval
df['Timestamp'] = [start_time + timedelta(minutes=i) for i in range(len(df))]
df["Timestamp"] = pd.to_datetime(df["Timestamp"])


- Create new time-based features

In [12]:
df['Hour'] = df['Timestamp'].dt.hour
df['Day_of_week'] = df['Timestamp'].dt.dayofweek
df['Is_weekend'] = df['Day_of_week'].apply(lambda x: 1 if x >= 5 else 0)
print(df.head())

   engine_rpm  lub_oil_pressure  fuel_pressure  coolant_pressure  \
0       700.0          2.493592      11.790927          3.178981   
1       876.0          2.941606      16.193866          2.464504   
2       520.0          2.961746       6.553147          1.064347   
3       473.0          3.707835      19.510172          3.727455   
4       619.0          5.672919      15.738871          2.052251   

   lub_oil_temp  coolant_temp  engine_condition  temp_diff  pressure_ratio  \
0     84.144163     81.632187                 1  -2.511976        0.211484   
1     77.640934     82.445724                 0   4.804790        0.181649   
2     77.752266     79.645777                 1   1.893511        0.451958   
3     74.129907     71.774629                 1  -2.355278        0.190046   
4     78.396989     87.000225                 0   8.603237        0.360440   

            Timestamp  Hour  Day_of_week  Is_weekend  
0 2025-01-01 00:00:00     0            2           0  
1 2025-01-01

- Detect outliers in Engine rpm using the IQR method.

In [13]:
# Calculate Q1, Q3, and IQR for 'engine_rpm'
Q1 = df['engine_rpm'].quantile(0.25)
Q3 = df['engine_rpm'].quantile(0.75)
IQR = Q3 - Q1

# Define outlier bounds
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

In [14]:
outliers = df[df['engine_rpm'] > upper_bound]

# Count and percentage of outliers
num_outliers = len(outliers)
percent_outliers = (num_outliers / len(df)) * 100

- Analyze related sensor readings for detected outliers.

In [15]:
# Investigate corresponding values in other columns
related_stats = outliers[['lub_oil_pressure', 'fuel_pressure', 'coolant_pressure', 'engine_condition']].describe()

In [26]:
engine_data = df.to_csv('cleaned_engine_data.csv', index = False)
