Machinery Failure Prediction API

Overall Objective

This project builds and serves a machine learning API that predicts the probability of machinery failure from operating sensor measurements.

The workflow has two connected parts:

asset_failure.ipynb trains, evaluates, and saves a machine failure prediction model.
app.py loads the saved .pkl model and exposes it through a FastAPI application.

The final goal is to make the trained model easy to use from another application, dashboard, script, or automated maintenance workflow.

Project Files

File	Purpose
`ai4i2020.csv`	Source dataset used to train the model.
`asset_failure.ipynb`	Jupyter notebook that prepares the data, trains the model, evaluates performance, and saves the model pipeline.
`xgb_smote_pipeline.pkl`	Saved machine learning pipeline used by the API for predictions.
`app.py`	FastAPI application that serves the trained model.
`testapi.py`	Small Python script that sends a sample prediction request to the API.
`requirements.txt`	Python dependencies needed to run the notebook and API.

What the API Predicts

The API predicts whether a machine is likely to fail based on five operating conditions:

Air temperature in Kelvin
Process temperature in Kelvin
Rotational speed in RPM
Torque in Newton-meters
Tool wear in minutes

For each request, the API returns:

prediction: 1 for predicted machine failure, 0 for no predicted machine failure
failure_probability: probability that the machine will fail
prediction_label: readable label for the prediction

Setup

Create and activate a virtual environment, then install the dependencies.

python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

Run the API

Start the FastAPI application with Uvicorn:

uvicorn app:app --reload

The API will run locally at:

http://127.0.0.1:8000

FastAPI also provides interactive API documentation at:

http://127.0.0.1:8000/docs

API Endpoints

Health Check

GET /health

Example response:

{
  "status": "ok"
}

Prediction Endpoint

POST /predict

Example request body:

{
  "Air temperature [K]": 300,
  "Process temperature [K]": 310,
  "Rotational speed [rpm]": 1500,
  "Torque [Nm]": 40,
  "Tool wear [min]": 10
}

Example response:

{
  "prediction": 0,
  "failure_probability": 0.0234,
  "prediction_label": "No Machine Failure"
}

The exact probability may change if the model is retrained.

Test the API

After starting the API, run:

python testapi.py

The script sends a sample machine operating profile to http://127.0.0.1:8000/predict and prints the API response.

How the Model Was Built

The model is built in asset_failure.ipynb and saved as xgb_smote_pipeline.pkl.

1. Import libraries

The notebook imports tools for:

Loading and manipulating data with pandas
Splitting data into training and test sets with train_test_split
Building a preprocessing and modeling pipeline
Handling missing values with SimpleImputer
Addressing class imbalance with SMOTE
Training an XGBClassifier
Evaluating the model with accuracy, classification report, ROC curve, and AUC
Saving the final pipeline with joblib

This keeps the full model-building process reproducible from data loading through model export.

2. Load the dataset

df = pd.read_csv('./ai4i2020.csv')

The notebook loads ai4i2020.csv, which contains machine operating measurements and failure labels.

3. Define features and target

X = df.drop(columns=['UDI', 'Product ID', 'Machine failure', 'TWF', 'HDF', 'PWF', 'OSF', 'RNF'])
y = df['Machine failure']

The target variable is Machine failure.

The notebook removes:

UDI: an identifier column that does not describe machine behavior
Product ID: a product identifier that is not a direct operating measurement
Machine failure: the target column, which must not be included as an input feature
TWF, HDF, PWF, OSF, RNF: specific failure-mode flags

Removing the specific failure-mode flags is important because the API is designed to predict overall machine failure from operating conditions, not from columns that already describe failure events.

The final model uses these numeric input features:

numeric_features = [
    'Air temperature [K]',
    'Process temperature [K]',
    'Rotational speed [rpm]',
    'Torque [Nm]',
    'Tool wear [min]'
]

These are the same fields required by the API.

4. Build preprocessing

preprocessor = ColumnTransformer(
    transformers=[
        ('num', SimpleImputer(strategy='median'), numeric_features)
    ]
)

The preprocessing step fills missing numeric values using the median.

Median imputation is useful because it is less sensitive to extreme values than mean imputation. This is a practical choice for machinery data, where unusual readings can occur and should not overly influence how missing values are filled.

ColumnTransformer is used so preprocessing is tied directly to the expected feature columns. This makes the training workflow and API prediction workflow consistent.

5. Handle class imbalance with SMOTE

SMOTE(sampling_strategy=0.2, random_state=42, k_neighbors=5)

Machine failure events are usually much less common than normal operation. If the model trained directly on the raw class distribution, it could achieve high accuracy by mostly predicting "no failure" while missing many actual failures.

SMOTE creates synthetic examples of the minority class in the training data. This helps the model learn the failure pattern more effectively.

The notebook uses:

sampling_strategy=0.2: increases the number of failure examples without forcing a fully balanced dataset
random_state=42: makes the training process repeatable
k_neighbors=5: controls how nearby minority-class examples are used to create synthetic samples

6. Train an XGBoost classifier

XGBClassifier(
    n_estimators=300,
    learning_rate=0.05,
    max_depth=4,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    eval_metric='logloss'
)

XGBoost is used because it performs well on structured tabular data and can capture nonlinear relationships between operating conditions and failure risk.

The chosen settings aim to balance predictive performance and generalization:

n_estimators=300: builds enough trees to learn useful patterns
learning_rate=0.05: uses smaller learning steps to reduce overfitting risk
max_depth=4: limits tree complexity
subsample=0.8: trains each tree on a sample of rows
colsample_bytree=0.8: trains each tree on a sample of features
random_state=42: makes results reproducible
eval_metric='logloss': uses a probability-aware classification metric during training

7. Combine preprocessing, SMOTE, and model into one pipeline

xgb_smote_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('smote', SMOTE(sampling_strategy=0.2, random_state=42, k_neighbors=5)),
    ('model', XGBClassifier(...))
])

The notebook saves the complete workflow as one pipeline, not just the XGBoost model.

This matters because the API needs to apply the same preprocessing steps used during training. Saving the full pipeline ensures that incoming API data is imputed and transformed correctly before prediction.

8. Split into training and test data

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42,
    stratify=y
)

The dataset is split into:

80% training data
20% test data

stratify=y preserves the same failure/non-failure ratio in both sets. This is especially important for imbalanced datasets because the test set needs to represent the rare failure class fairly.

9. Fit and evaluate the model

xgb_smote_pipeline.fit(X_train, y_train)
y_pred = xgb_smote_pipeline.predict(X_test)

The notebook trains the pipeline on the training data and evaluates it on the held-out test data.

Reported test results from the notebook:

Accuracy: 0.978

Class 0 - No Machine Failure:
precision: 0.99
recall:    0.98
f1-score:  0.99

Class 1 - Machine Failure:
precision: 0.64
recall:    0.81
f1-score:  0.71

ROC AUC Score: 0.9727

The recall of 0.81 for the failure class means the model detected many of the actual failures in the test set. For a failure prediction use case, recall is important because missing a real failure can be more costly than flagging a machine for inspection.

The ROC AUC score of about 0.973 shows that the model separates failure and non-failure cases well across different probability thresholds.

10. Review feature importance

The notebook prints feature importance from the trained XGBoost model:

Feature	Importance
Torque [Nm]	0.287694
Rotational speed [rpm]	0.286066
Tool wear [min]	0.210199
Air temperature [K]	0.129629
Process temperature [K]	0.086412

This helps explain which machine measurements had the most influence on the model. In this run, torque, rotational speed, and tool wear were the strongest predictors.

11. Save the trained pipeline

joblib.dump(xgb_smote_pipeline, './xgb_smote_pipeline.pkl')

The final trained pipeline is saved as xgb_smote_pipeline.pkl.

This .pkl file contains:

The median imputation preprocessing step
The trained XGBoost model
The feature structure expected at prediction time

The API loads this file with:

model = joblib.load("xgb_smote_pipeline.pkl")

When /predict receives a request, the API converts the request body into a one-row pandas DataFrame, passes it into the saved pipeline, and returns the predicted class plus the failure probability.

How the API Works

app.py defines a FastAPI application and a Pydantic input schema.

The input schema expects the same feature names used during model training:

class AssetInput(BaseModel):
    air_temperature_k: float = Field(..., alias="Air temperature [K]")
    process_temperature_k: float = Field(..., alias="Process temperature [K]")
    rotational_speed_rpm: float = Field(..., alias="Rotational speed [rpm]")
    torque_nm: float = Field(..., alias="Torque [Nm]")
    tool_wear_min: float = Field(..., alias="Tool wear [min]")

Inside the /predict endpoint, the API:

Receives JSON input from the user.
Converts the input into a DataFrame with the exact column names used during training.
Calls model.predict(df) to get the predicted class.
Calls model.predict_proba(df) to get the probability of machine failure.
Returns the prediction, probability, and readable prediction label.

Example `curl` Request

curl.exe -X POST "http://127.0.0.1:8000/predict" `
  -H "Content-Type: application/json" `
  -d "{\"Air temperature [K]\":300,\"Process temperature [K]\":310,\"Rotational speed [rpm]\":1500,\"Torque [Nm]\":40,\"Tool wear [min]\":10}"

Important Notes

The .pkl file must stay in the same folder as app.py unless the load path is changed.
Input JSON field names must match the expected API aliases shown in the example request.
The model predicts probability, not certainty. A high probability means higher predicted risk based on the training data.
Retraining the notebook can produce a new .pkl file and slightly different prediction probabilities.
The current notebook is focused on model development and export. For production use, consider adding model versioning, logging, monitoring, authentication, and threshold tuning based on business risk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machinery Failure Prediction API

Overall Objective

Project Files

What the API Predicts

Setup

Run the API

API Endpoints

Health Check

Prediction Endpoint

Test the API

How the Model Was Built

1. Import libraries

2. Load the dataset

3. Define features and target

4. Build preprocessing

5. Handle class imbalance with SMOTE

6. Train an XGBoost classifier

7. Combine preprocessing, SMOTE, and model into one pipeline

8. Split into training and test data

9. Fit and evaluate the model

10. Review feature importance

11. Save the trained pipeline

How the API Works

Example `curl` Request

Important Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
ai4i2020.csv		ai4i2020.csv
app.py		app.py
asset_failure.ipynb		asset_failure.ipynb
requirements.txt		requirements.txt
testapi.py		testapi.py
xgb_smote_pipeline.pkl		xgb_smote_pipeline.pkl

Folders and files

Latest commit

History

Repository files navigation

Machinery Failure Prediction API

Overall Objective

Project Files

What the API Predicts

Setup

Run the API

API Endpoints

Health Check

Prediction Endpoint

Test the API

How the Model Was Built

1. Import libraries

2. Load the dataset

3. Define features and target

4. Build preprocessing

5. Handle class imbalance with SMOTE

6. Train an XGBoost classifier

7. Combine preprocessing, SMOTE, and model into one pipeline

8. Split into training and test data

9. Fit and evaluate the model

10. Review feature importance

11. Save the trained pipeline

How the API Works

Example curl Request

Important Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Example `curl` Request

Packages