# **PREDICTION OF MILLING MACHINE BEHAVIOR**

Datasets elegido: https://www.kaggle.com/datasets/stephanmatzka/predictive-maintenance-dataset-ai4i-2020

In [1]:
import pandas as pd

In [9]:
df=pd.read_csv('../data/raw/ai4i2020.csv',index_col=0)

## Structure of the Synthetic Dataset

### **General Description**
- **Dataset Type**: Synthetic, modeled from an existing milling machine.
- **Dataset Size**: 10,000 data points.
- **Structure**: Each row represents a data point with 14 features (columns).

---

### **Features (Columns)**

1. **UID (Unique Identifier)**:
   - *Range*: 1 to 10,000.
   - *Description*: Unique identifier for each data point.

2. **Product ID (Product Identifier)**:
   - *Format*: Letter (L, M, H) + Serial number.
     - **L**: Low quality (50% of products).
     - **M**: Medium quality (30% of products).
     - **H**: High quality (20% of products).
   - *Description*: Identifies the quality variant of the product and its serial number.

3. **Type (Product Type)**:
   - *Values*: L, M, H.
   - *Description*: Represents the product quality (low, medium, high).

4. **Air Temperature [K]**:
   - *Generation*: Normalized random walk process with a standard deviation of 2 K around 300 K.
   - *Description*: Air temperature in Kelvin.

5. **Process Temperature [K]**:
   - *Generation*: Normalized random walk process with a standard deviation of 1 K, added to the air temperature plus 10 K.
   - *Description*: Process temperature in Kelvin.

6. **Rotational Speed [rpm]**:
   - *Calculation*: Derived from a power of 2860 W, with normally distributed noise.
   - *Description*: Rotational speed in revolutions per minute (rpm).

7. **Torque [Nm]**:
   - *Distribution*: Values normally distributed around 40 Nm with a standard deviation of 10 Nm.
   - *Constraint*: No negative values.
   - *Description*: Torque in Newton-meters (Nm).

8. **Tool Wear [min]**:
   - *Calculation*: Depends on product quality:
     - **H**: Adds 5 minutes of wear.
     - **M**: Adds 3 minutes of wear.
     - **L**: Adds 2 minutes of wear.
   - *Description*: Tool wear time in minutes.

9. **Machine Failure**:
   - *Values*: 0 (no failure) or 1 (failure).
   - *Description*: Label indicating whether the machine has failed at that data point due to any of the failure modes.

---

### **Independent Failure Modes**

1. **Tool Wear Failure (TWF)**:
   - *Condition*: Tool wear reaches a value between 200 and 240 minutes.

2. **Heat Dissipation Failure (HDF)**:
   - *Condition*:
     - Difference between air temperature and process temperature < 8.6 K.
     - Rotational speed < 1380 rpm.

3. **Power Failure (PWF)**:
   - *Condition*: Power (product of torque and rotational speed in rad/s) is outside the range [3500 W, 9000 W].

4. **Overstrain Failure (OSF)**:
   - *Condition*: The product of tool wear and torque exceeds

5. **Random Failures (RNF)**:
   - *Condition*: Each process has a 0.1% chance of failing regardless of parameters.