# Workshop: Predicting FabLab Equipment Usage and Maintenance Needs

**Objective:**  
In this workshop, you will apply regression and classification algorithms to predict equipment usage patterns and identify maintenance needs in a FabLab.

**Scenario:**  
FabLabs rely on various types of equipment, such as 3D printers, CNC machines, and laser cutters. Each machine logs usage data, including operating hours, frequency of use, and the number of errors or maintenance issues. Your task is to build models to:
- Predict the daily operating hours of each machine (regression).
- Classify machines into high-maintenance or low-maintenance categories (classification).

## 1. Data Preparation

Load the dataset, handle missing values, encode categorical features, and scale numerical data.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

# Load data
df = pd.read_csv('./data/fablab_equipment_usage.csv')
df.head()

Handle missing values (if any), encode `machine_type`, and scale numeric features.

In [None]:
# Preprocessing pipeline
numeric_features = ['daily_operating_hours', 'error_frequency', 'days_since_last_maintenance', 'total_operating_hours']
categorical_features = ['machine_type']

numeric_transformer = Pipeline(steps=[
    ('scaler', StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(drop='first'))
])

preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, numeric_features),
    ('cat', categorical_transformer, categorical_features)
])

# Split data for further tasks
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)
X = df.drop('high_maintenance', axis=1)
y = df['high_maintenance']


## 2. Regression Task

Predict **daily_operating_hours** using regression models. Evaluate with MSE and R².

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Prepare regression data
X_reg = df.drop(['daily_operating_hours', 'high_maintenance'], axis=1)
y_reg = df['daily_operating_hours']

# Preprocess and split
X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)

# Regression pipeline
reg_pipeline_lr = Pipeline(steps=[('preprocessor', preprocessor),
                                  ('regressor', LinearRegression())])
reg_pipeline_rf = Pipeline(steps=[('preprocessor', preprocessor),
                                  ('regressor', RandomForestRegressor(random_state=42))])

# Train and evaluate
for name, model in [('Linear Regression', reg_pipeline_lr), ('Random Forest', reg_pipeline_rf)]:
    model.fit(X_reg_train, y_reg_train)
    preds = model.predict(X_reg_test)
    print(f"{name} -- MSE: {mean_squared_error(y_reg_test, preds):.2f}, R²: {r2_score(y_reg_test, preds):.2f}")

## 3. Classification Task

Classify machines into `high_maintenance` vs `low_maintenance`. Evaluate with accuracy, precision, recall, and F1-score.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Prepare classification data
X_clf = df.drop('high_maintenance', axis=1)
y_clf = df['high_maintenance']

X_clf_train, X_clf_test, y_clf_train, y_clf_test = train_test_split(X_clf, y_clf, test_size=0.2, random_state=42)

# Classification pipelines
clf_models = [
    ('Logistic Regression', LogisticRegression(max_iter=1000)),
    ('KNN', KNeighborsClassifier()),
    ('SVM', SVC())
]

for name, clf in clf_models:
    pipe = Pipeline(steps=[('preprocessor', preprocessor), ('classifier', clf)])
    pipe.fit(X_clf_train, y_clf_train)
    preds = pipe.predict(X_clf_test)
    print(f"{name} -- Accuracy: {accuracy_score(y_clf_test, preds):.2f}, Precision: {precision_score(y_clf_test, preds):.2f}, Recall: {recall_score(y_clf_test, preds):.2f}, F1: {f1_score(y_clf_test, preds):.2f}")

## 4. Model Comparison and Optimization

- Compare metrics above.
- Use GridSearchCV for hyperparameter tuning.
- Choose best model for regression and classification.

## 5. Insights and Recommendations

- Identify machines at risk of high maintenance.
- Suggest maintenance schedules based on predictions.
- Recommend resource allocation.
