Skip to content

TFAIRCHI/machine_failure_api

Repository files navigation

Machinery Failure Prediction API

Overall Objective

This project builds and serves a machine learning API that predicts the probability of machinery failure from operating sensor measurements.

The workflow has two connected parts:

  1. asset_failure.ipynb trains, evaluates, and saves a machine failure prediction model.
  2. app.py loads the saved .pkl model and exposes it through a FastAPI application.

The final goal is to make the trained model easy to use from another application, dashboard, script, or automated maintenance workflow.

Project Files

File Purpose
ai4i2020.csv Source dataset used to train the model.
asset_failure.ipynb Jupyter notebook that prepares the data, trains the model, evaluates performance, and saves the model pipeline.
xgb_smote_pipeline.pkl Saved machine learning pipeline used by the API for predictions.
app.py FastAPI application that serves the trained model.
testapi.py Small Python script that sends a sample prediction request to the API.
requirements.txt Python dependencies needed to run the notebook and API.

What the API Predicts

The API predicts whether a machine is likely to fail based on five operating conditions:

  • Air temperature in Kelvin
  • Process temperature in Kelvin
  • Rotational speed in RPM
  • Torque in Newton-meters
  • Tool wear in minutes

For each request, the API returns:

  • prediction: 1 for predicted machine failure, 0 for no predicted machine failure
  • failure_probability: probability that the machine will fail
  • prediction_label: readable label for the prediction

Setup

Create and activate a virtual environment, then install the dependencies.

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Run the API

Start the FastAPI application with Uvicorn:

uvicorn app:app --reload

The API will run locally at:

http://127.0.0.1:8000

FastAPI also provides interactive API documentation at:

http://127.0.0.1:8000/docs

API Endpoints

Health Check

GET /health

Example response:

{
  "status": "ok"
}

Prediction Endpoint

POST /predict

Example request body:

{
  "Air temperature [K]": 300,
  "Process temperature [K]": 310,
  "Rotational speed [rpm]": 1500,
  "Torque [Nm]": 40,
  "Tool wear [min]": 10
}

Example response:

{
  "prediction": 0,
  "failure_probability": 0.0234,
  "prediction_label": "No Machine Failure"
}

The exact probability may change if the model is retrained.

Test the API

After starting the API, run:

python testapi.py

The script sends a sample machine operating profile to http://127.0.0.1:8000/predict and prints the API response.

How the Model Was Built

The model is built in asset_failure.ipynb and saved as xgb_smote_pipeline.pkl.

1. Import libraries

The notebook imports tools for:

  • Loading and manipulating data with pandas
  • Splitting data into training and test sets with train_test_split
  • Building a preprocessing and modeling pipeline
  • Handling missing values with SimpleImputer
  • Addressing class imbalance with SMOTE
  • Training an XGBClassifier
  • Evaluating the model with accuracy, classification report, ROC curve, and AUC
  • Saving the final pipeline with joblib

This keeps the full model-building process reproducible from data loading through model export.

2. Load the dataset

df = pd.read_csv('./ai4i2020.csv')

The notebook loads ai4i2020.csv, which contains machine operating measurements and failure labels.

3. Define features and target

X = df.drop(columns=['UDI', 'Product ID', 'Machine failure', 'TWF', 'HDF', 'PWF', 'OSF', 'RNF'])
y = df['Machine failure']

The target variable is Machine failure.

The notebook removes:

  • UDI: an identifier column that does not describe machine behavior
  • Product ID: a product identifier that is not a direct operating measurement
  • Machine failure: the target column, which must not be included as an input feature
  • TWF, HDF, PWF, OSF, RNF: specific failure-mode flags

Removing the specific failure-mode flags is important because the API is designed to predict overall machine failure from operating conditions, not from columns that already describe failure events.

The final model uses these numeric input features:

numeric_features = [
    'Air temperature [K]',
    'Process temperature [K]',
    'Rotational speed [rpm]',
    'Torque [Nm]',
    'Tool wear [min]'
]

These are the same fields required by the API.

4. Build preprocessing

preprocessor = ColumnTransformer(
    transformers=[
        ('num', SimpleImputer(strategy='median'), numeric_features)
    ]
)

The preprocessing step fills missing numeric values using the median.

Median imputation is useful because it is less sensitive to extreme values than mean imputation. This is a practical choice for machinery data, where unusual readings can occur and should not overly influence how missing values are filled.

ColumnTransformer is used so preprocessing is tied directly to the expected feature columns. This makes the training workflow and API prediction workflow consistent.

5. Handle class imbalance with SMOTE

SMOTE(sampling_strategy=0.2, random_state=42, k_neighbors=5)

Machine failure events are usually much less common than normal operation. If the model trained directly on the raw class distribution, it could achieve high accuracy by mostly predicting "no failure" while missing many actual failures.

SMOTE creates synthetic examples of the minority class in the training data. This helps the model learn the failure pattern more effectively.

The notebook uses:

  • sampling_strategy=0.2: increases the number of failure examples without forcing a fully balanced dataset
  • random_state=42: makes the training process repeatable
  • k_neighbors=5: controls how nearby minority-class examples are used to create synthetic samples

6. Train an XGBoost classifier

XGBClassifier(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=4,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    eval_metric='logloss'
)

XGBoost is used because it performs well on structured tabular data and can capture nonlinear relationships between operating conditions and failure risk.

The chosen settings aim to balance predictive performance and generalization:

  • n_estimators=300: builds enough trees to learn useful patterns
  • learning_rate=0.05: uses smaller learning steps to reduce overfitting risk
  • max_depth=4: limits tree complexity
  • subsample=0.8: trains each tree on a sample of rows
  • colsample_bytree=0.8: trains each tree on a sample of features
  • random_state=42: makes results reproducible
  • eval_metric='logloss': uses a probability-aware classification metric during training

7. Combine preprocessing, SMOTE, and model into one pipeline

xgb_smote_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('smote', SMOTE(sampling_strategy=0.2, random_state=42, k_neighbors=5)),
    ('model', XGBClassifier(...))
])

The notebook saves the complete workflow as one pipeline, not just the XGBoost model.

This matters because the API needs to apply the same preprocessing steps used during training. Saving the full pipeline ensures that incoming API data is imputed and transformed correctly before prediction.

8. Split into training and test data

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

The dataset is split into:

  • 80% training data
  • 20% test data

stratify=y preserves the same failure/non-failure ratio in both sets. This is especially important for imbalanced datasets because the test set needs to represent the rare failure class fairly.

9. Fit and evaluate the model

xgb_smote_pipeline.fit(X_train, y_train)
y_pred = xgb_smote_pipeline.predict(X_test)

The notebook trains the pipeline on the training data and evaluates it on the held-out test data.

Reported test results from the notebook:

Accuracy: 0.978

Class 0 - No Machine Failure:
precision: 0.99
recall:    0.98
f1-score:  0.99

Class 1 - Machine Failure:
precision: 0.64
recall:    0.81
f1-score:  0.71

ROC AUC Score: 0.9727

The recall of 0.81 for the failure class means the model detected many of the actual failures in the test set. For a failure prediction use case, recall is important because missing a real failure can be more costly than flagging a machine for inspection.

The ROC AUC score of about 0.973 shows that the model separates failure and non-failure cases well across different probability thresholds.

10. Review feature importance

The notebook prints feature importance from the trained XGBoost model:

Feature Importance
Torque [Nm] 0.287694
Rotational speed [rpm] 0.286066
Tool wear [min] 0.210199
Air temperature [K] 0.129629
Process temperature [K] 0.086412

This helps explain which machine measurements had the most influence on the model. In this run, torque, rotational speed, and tool wear were the strongest predictors.

11. Save the trained pipeline

joblib.dump(xgb_smote_pipeline, './xgb_smote_pipeline.pkl')

The final trained pipeline is saved as xgb_smote_pipeline.pkl.

This .pkl file contains:

  • The median imputation preprocessing step
  • The trained XGBoost model
  • The feature structure expected at prediction time

The API loads this file with:

model = joblib.load("xgb_smote_pipeline.pkl")

When /predict receives a request, the API converts the request body into a one-row pandas DataFrame, passes it into the saved pipeline, and returns the predicted class plus the failure probability.

How the API Works

app.py defines a FastAPI application and a Pydantic input schema.

The input schema expects the same feature names used during model training:

class AssetInput(BaseModel):
    air_temperature_k: float = Field(..., alias="Air temperature [K]")
    process_temperature_k: float = Field(..., alias="Process temperature [K]")
    rotational_speed_rpm: float = Field(..., alias="Rotational speed [rpm]")
    torque_nm: float = Field(..., alias="Torque [Nm]")
    tool_wear_min: float = Field(..., alias="Tool wear [min]")

Inside the /predict endpoint, the API:

  1. Receives JSON input from the user.
  2. Converts the input into a DataFrame with the exact column names used during training.
  3. Calls model.predict(df) to get the predicted class.
  4. Calls model.predict_proba(df) to get the probability of machine failure.
  5. Returns the prediction, probability, and readable prediction label.

Example curl Request

curl.exe -X POST "http://127.0.0.1:8000/predict" `
  -H "Content-Type: application/json" `
  -d "{\"Air temperature [K]\":300,\"Process temperature [K]\":310,\"Rotational speed [rpm]\":1500,\"Torque [Nm]\":40,\"Tool wear [min]\":10}"

Important Notes

  • The .pkl file must stay in the same folder as app.py unless the load path is changed.
  • Input JSON field names must match the expected API aliases shown in the example request.
  • The model predicts probability, not certainty. A high probability means higher predicted risk based on the training data.
  • Retraining the notebook can produce a new .pkl file and slightly different prediction probabilities.
  • The current notebook is focused on model development and export. For production use, consider adding model versioning, logging, monitoring, authentication, and threshold tuning based on business risk.

About

This project builds and serves a machine learning API that predicts the probability of machinery failure from operating sensor measurements.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors