# Sensor Fault Detection and Prediction System

**💡 Overview:**

Build an AI-powered tool that ingests time-series sensor data from industrial systems (like railway automation, HVAC, or manufacturing), detects anomalies (faults), and predicts potential failures using machine learning.

**🔧 Technologies:**

- *Python, Pandas, NumPy*

- *Scikit-learn / XGBoost / Isolation Forest*

- **Streamlit** for UI dashboard

- **SQLite or PostgreSQL** for storing sensor logs

- **Matplotlib / Seaborn / Plotly** for visualization

- (Optional) **Power BI** for external reports

--------------------------------------------------------------
**🎯 Key Features:**

📂 Upload sensor CSV files (with time, sensor values)

📉 Auto-detect outliers and sensor drift using **statistical & ML models**

⏳ Handle varying time intervals and **resample/align sensor streams**

⚠️ Classify sensor readings as Normal, Faulty, or At-Risk

📊 Visual charts: time series, moving averages, anomaly markers

🗃️ Save processed logs with results in a database

🔍 Filter logs by sensor type, timestamp, or fault severity

-----------------------------------------------------------------------
**🔬 AI/ML Components:**
    
- Isolation Forest / One-Class SVM for anomaly detection

- Random Forest / XGBoost for supervised fault prediction (if labeled data available)

- Rolling statistics and Z-score detection for signal deviation

---------------------------------------------------------------------
**🚀 Bonus Features:**

- Export fault logs for Power BI

- Build alerting system (email/Slack) for high-risk readings

- Train and deploy ML model using .pkl file in the app

-----------------------------------------------------------------------------------
**📂 GitHub Folder Structure Suggestion:**

Sensor_Fault_Predictor/
├── data/
│   └── sample_sensor_data.csv
├── models/
│   └── fault_model.pkl
├── streamlit_app/
│   └── app.py
│   └── init_db.py
├── utils/
│   └── preprocess.py
├── requirements.txt
├── README.md
└── .gitignore


--------------------------------------------------------------------------------------
**🧠 Why This Project is Great:**

- Combines **instrumentation signals + AI** — exactly my domain

- Highlights **anomaly detection**, a hot topic in predictive maintenance

- Demonstrates **real-time data analysis**, which is valuable in any automation/QA role

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import random

# Set seed for reproducibility
np.random.seed(42)
random.seed(42)

sensor_ids = [f"SENSOR_{i}" for i in range(1, 6)]
start_time = datetime(2024, 1, 1, 0, 0, 0)
timestamps = [start_time + timedelta(minutes=i) for i in range(0, 1440)]  # 1 day at 1-min intervals

data = []
for timestamp in timestamps:
    for sensor in sensor_ids:
        value = round(np.random.normal(loc=50, scale=10), 2)
        if random.random() < 0.01:
            value += random.choice([50, -50])  # simulate fault spike/drop
        data.append([timestamp, sensor, value])

df = pd.DataFrame(data, columns=["Timestamp", "SensorID", "SensorValue"])
df.to_csv("sensor_fault_data.csv", index=False)
print("✅ Dataset saved as 'sensor_fault_data.csv'")

✅ Dataset saved as 'sensor_fault_data.csv'


--------------------------------------------------------------------------
**🎯 Objective**

We are building a tool that:

- **Ingests sensor data** (like temperature, pressure, voltage, etc.) from industrial systems (railways, factories, etc.)

- **Detects faults or anomalies** in the data (like sudden spikes or drops)

- **Predicts potential future failures** using ML (optional advanced phase)

- **Displays everything in a user-friendly dashboard** using Streamlit or Power BI

-------------------------------------------------------------------------------------------
**🧱 What Each Part Does**

| Phase                                   | Description                                                                |
| --------------------------------------- | -------------------------------------------------------------------------- |
| **1. Upload Sensor Data**               | Upload a `.csv` file containing timestamped sensor readings                |
| **2. Preprocess Data**                  | Handle time formatting, missing data, etc.                                 |
| **3. Fault Detection (NLP-like logic)** | Use **rules** (e.g., spikes > ±3 std dev) or ML to detect faulty readings  |
| **4. Visualization**                    | Show sensor values over time, highlight faults                             |
| **5. Filtering & Export**               | Allow filtering by sensor, time, or fault type — export to CSV or Power BI |
| **6. (Bonus) Fault Prediction**         | Use a trained ML model to predict fault likelihood                         |



--------------------------------------------------------------------------
**🛠️ Real-Life Analogy**

Imagine we’re monitoring:

**1**. Sensors inside a train (track pressure, signal relay voltage, door temperature)

**2**. We want to **automatically find weird readings** (e.g., voltage drops, temperature spikes)

**3**. Then **classify or predict** which sensor might fail soon — **before** it happens.

---------------------------------
**✅ What we’ll Learn/Showcase**

- Real-world **time-series data handling**
- Use of **rule-based or ML-based anomaly detection**
- *Streamlit app development*
- Data filtering, fault visualization, and **dashboard deployment**
- Optional: ML model training and integration (.pkl)

------------------------------------------------------------
# ✅ Phase 1: Set Up Your Project Folder Structure

Sensor_Fault_Detector/
├── data/
│   └── sensor_fault_data.csv      ← sample data file
├── streamlit_app/
│   └── app.py                     ← Streamlit dashboard app
├── utils/
│   └── data_processing.py         ← data cleaning and fault detection functions
├── models/
│   └── model.pkl                  ← trained ML model (optional)
├── requirements.txt
├── README.md
└── .gitignore


**✅ Phase 2:** Add the Dataset

Place the sensor_fault_data.csv file generated into the data/ folder.

**✅ Phase 3:**

Create Streamlit App Skeleton (app.py)

Let’s create a basic working Streamlit app:

📍 File: streamlit_app/app.py

**✅ Phase 4:**

Data Preprocessing & Fault Detection, We'll build a Python utility file to:

1. Load the sensor data
2. Detect faults (e.g., using Z-score method)
3. Return a DataFrame with an added FaultType column

**✅ Phase 5:**

Use This in Streamlit (app.py)
Now, go back to app.py and update it like this:

📍 Modified app.py

-----------------------------------------------------------------------

**🚀 Top Feature Additions (to make your project stand out)**

**✅ Level 1: Must-Have Enhancements**

| Feature                     | What it does                               | Why it’s useful                         |
| --------------------------- | ------------------------------------------ | --------------------------------------- |
| 📥 CSV Export               | Download filtered or processed data        | Helps users save their results          |
| ⏳ Date Range Filter         | Filter logs by date (using `Timestamp`)    | Enables time-based fault analysis       |
| ⚠️ Fault Severity Score     | Classify faults into Critical/Warning/Info | Adds business relevance                 |
| 📈 Sensor Trend Line Charts | Line plot for sensor values over time      | Helps spot drifts or anomalies visually |
| 🔄 Refresh Database Button  | Add button to clear/reload DB              | Useful for testing or restarting        |


**✅ Step 1:** Date Range Filter (on Timestamp)

**🎯 Goal:**

Allow users to filter the sensor data by a start and end date using the Timestamp column.

📝 This ensures all downstream charts, tables, and fault summaries will reflect only the selected date range.

**✅ Step 2:** Sensor Trend Line Charts

**🎯 Goal:**

Visualize time-series trends of sensor readings to identify drifts, spikes, or patterns.

💡 What this does:

1. Auto-detects sensor columns like Sensor_1, Sensor_2, etc.
2. Lets the user select which sensors to visualize.
3. Plots trend over time using Streamlit’s built-in line_chart().

**🔍 Level 2: Smart Analysis Features**

| Feature                         | Description                                                                            |
| ------------------------------- | -------------------------------------------------------------------------------------- |
| 📉 **Sensor Drift Detection**   | Detect gradual deviation from normal sensor behavior                                   |
| 🔄 **Compare Two Sensors**      | Side-by-side trend comparison of two sensors                                           |
| 🧠 **ML Prediction (optional)** | Use ML model (e.g. IsolationForest or Logistic Regression) to predict fault likelihood |


**🌐 Level 3: Dashboard Polish**

| Feature               | Description                                    |
| --------------------- | ---------------------------------------------- |
| 🌙 Dark Mode Toggle   | Theme switch for user preference               |
| 🗂️ Tabbed Sections   | Use `st.tabs()` for Upload, Analyze, Download  |
| 📷 Export Plot as PNG | Save fault charts directly from UI             |
| 📌 Notes Section      | Add manual annotations for faults from the app |


Let’s pick 3 enhancements to start with:

1. Date Range Filter
2. Sensor Trend Line Chart
3. Fault Severity Classification

-----------------------------------------------------------------
**🎤 Project Pitch (30 seconds)**

"I built a Streamlit-based dashboard called Sensor Fault Detection and Prediction System, which simulates real-world sensor monitoring using a custom dataset.

It includes end-to-end data processing: from ingestion, cleaning, fault detection using domain logic, to visualizations like trend lines, fault distribution, and severity analysis.

I also added smart filters like date range and sensor selection, and allowed export of processed data.
This project showcases my skills in Python, pandas, Streamlit, and real-time fault monitoring — blending my instrumentation background with AI/data capabilities.

I designed it to be Git-ready with modular code, clear folder structure, and a user-friendly interface."

✅ Customization Options

If applying for a Data Analyst role, add:

"It simulates live sensor data analysis, mimicking a production monitoring dashboard for anomaly trends and classification."

If applying for AI/ML or Automation roles, add:

"This lays the foundation for integrating ML models for predictive maintenance in future iterations."

🔧 2. Add Unit Test for detect_faults() Function

This function is responsible for labeling faults and severity per sensor reading. We’ll:

Create a small mock dataset with expected faulty values.

Validate if it assigns correct FaultType and Severity.

✅ You’ve Completed:
1. Improved Unit Test Coverage

test_preprocess_sensor_data_basic ✅ Passed

test_detect_faults_classification ✅ Passed

🔜 Next Step:

2. Add Logging to data_processing.py for Better Debugging
3. 
Logging will help track how sensor data is processed, which thresholds triggered faults, etc.

3. 🛡️ Add Input Validation & Error Handling (High Priority)
This step ensures the app gracefully handles:

Missing required columns

Empty files

Wrong formats (e.g. Excel instead of CSV)

Invalid sensor values

✅ What This Adds:

| Check                     | Result                                 |
| ------------------------- | -------------------------------------- |
| Missing columns           | Stops and shows missing ones           |
| Empty file                | Shows error                            |
| Corrupted or wrong format | Informs user to upload a proper CSV    |
| Unexpected error          | Logs and shows a general error message |
