# ONE WARE - Predictive Maintenance Task 

### The Task

For an overview of the task, please refer to [the Readme](https://github.com/Friedrich-Mueller/ai_solutions/tree/master/predictive_maintenance)

Since the task is rather comprehensive, I chose to document only my though process within this notebook, leaving the tasks instructions out.

### Imports


In [1]:
import numpy as np
import pandas as pd

import torch


import kagglehub
import shutil
from pathlib import Path

from torch.utils.data import Dataset




### Verifty Cuda Functionality

In [2]:
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version PyTorch was built with: {torch.version.cuda}")
    print(f"Current GPU: {torch.cuda.get_device_name(0)}")
    
    # Check compute capability
    capability = torch.cuda.get_device_capability(0)
    print(f"GPU Compute Capability: {capability[0]}.{capability[1]}")
else:
    print("No GPU detected. PyTorch will run on CPU.")

PyTorch version: 2.5.1
CUDA available: True
CUDA version PyTorch was built with: 12.1
Current GPU: NVIDIA GeForce RTX 3060 Ti
GPU Compute Capability: 8.6


### Download Dataset 

Dataset of choice is the [One Year Industrial Component Degradation](https://www.kaggle.com/datasets/inIT-OWL/one-year-industrial-component-degradation/data).

In [3]:
download_path = Path(
    kagglehub.dataset_download("inIT-OWL/one-year-industrial-component-degradation")
)

source_path = download_path / "oneyeardata"

print(f"Extracting data from: {source_path}")

# Target directory
data_dir = Path("..") / "data"
data_dir.mkdir(parents=True, exist_ok=True)

# Copy files
for item in source_path.iterdir():
    dest = data_dir / item.name
    if item.is_dir():
        shutil.copytree(item, dest, dirs_exist_ok=True)
    else:
        shutil.copy2(item, dest)

print("Dataset successfully copied to:", data_dir.resolve())

Extracting data from: /home/fjunpop/.cache/kagglehub/datasets/inIT-OWL/one-year-industrial-component-degradation/versions/1/oneyeardata
Dataset successfully copied to: /home/fjunpop/ai_solutions/predictive_maintenance/data


### Extract/Engineer the data

In [4]:
# Master Dataset
data_dir = Path("..") / "data"
file_list = sorted(list(data_dir.glob("*.csv")))

all_dfs = []

for file_path in file_list:
    file_df = pd.read_csv(file_path)
    file_df['mode'] = int(file_path.stem[-1])
    all_dfs.append(file_df)

master_df = pd.concat(all_dfs, ignore_index=True)
total_rows = len(master_df)

master_df['health'] = np.linspace(1.0, 0.0, total_rows)

In [8]:
print("Columns before cleaning:", master_df.columns.tolist())

master_df.drop(columns="timestamp", inplace=True, errors='ignore')

print("Columns after cleaning:", master_df.columns.tolist())

Columns before cleaning: ['timestamp', 'pCut::Motor_Torque', 'pCut::CTRL_Position_controller::Lag_error', 'pCut::CTRL_Position_controller::Actual_position', 'pCut::CTRL_Position_controller::Actual_speed', 'pSvolFilm::CTRL_Position_controller::Actual_position', 'pSvolFilm::CTRL_Position_controller::Actual_speed', 'pSvolFilm::CTRL_Position_controller::Lag_error', 'pSpintor::VAX_speed', 'mode', 'health']
Columns after cleaning: ['pCut::Motor_Torque', 'pCut::CTRL_Position_controller::Lag_error', 'pCut::CTRL_Position_controller::Actual_position', 'pCut::CTRL_Position_controller::Actual_speed', 'pSvolFilm::CTRL_Position_controller::Actual_position', 'pSvolFilm::CTRL_Position_controller::Actual_speed', 'pSvolFilm::CTRL_Position_controller::Lag_error', 'pSpintor::VAX_speed', 'mode', 'health']


### Understand the Data

The "One-Year Industrial Component Degradation" is a high-quality, real-world Predictive Maintenance (PdM) dataset that allows to infer from raw sensor signals a state/health detection. The data covers the state of a cutting blade that is used in some industrial machine. Since the blade can not be inspected visually during operation due to the blade being enclosed in a metal housing and its fast rotation speed, predictive maintance can be applied to monitor its degradation.

The data covers the state of the blade over the span of a whole year. We assume that at the beginning of that year in which the data was farmed, the blade was in some 'best' state, which is either perfect condition or acceptable condition or something inbetween. And for the sake of the task, we assume that in the last 5% of the time (e.g. end of the year) the blade enters a state of some 'maximum' degradation which requires maintenance.


This task could be a classification task, where each sample is binned via equal frequency (or equal depth). But also, it could be a regression task, where each sample is assigned a percentage of degradation based on its position on the timeline. 

Due to the nature of the blade being used in different 'modes', there is an assumption that perhaps some modes are only used rarely. For example, if for whatever reason, one mode was only used once near the end of the year, and equal frequency binning would be applied, it could happen that said rare mode would be binned in the respectively latest bin, which would represent 95-100% degradation and indicate a need for maintenance. Now if that mode was used at an earlier point in a following year, the monitoring would immediately suggest a maintenance. 

Therefore, applying a regression based solution makes alot more sense, with the assumption of a continuous degradation of the blade, regardless of mode.