# Exercise: Predicting Surface Roughness in Milling

In this exercise, we use **AI-based approaches** to model a manufacturing process.  
The goal is to predict **surface roughness `Rz`** based on:

- Feed `f`
- Cutting depth `ap`
- Cutting width `ae`

> Note: Additional factors like tool wear are ignored in this simplified example.


# Step 1: Loading the Dataset

In this step, you will load the dataset for the milling process.  
**Hint:** The Excel file is called `dataset_roughness.xlsx`.  
Select only the columns: `f`, `ap`, `ae`, `Rz`.



In [None]:
# Import the necessary library to handle Excel files
import panda as pd 

# Load the dataset from 'dataset_roughness.xlsx'
df = __________  # Load the excel file into a dataframe

# Select relevant columns for the analysis
columns = ['f', 'ap', 'ae', 'Rz']
df = df[columns]

# Inspect data 
print(____) # Show the first rows of the dataset
print(f'Dataset size: {______} datapoints') # How many datapoints are in the dataset?



# Step 2: Visualizing the Data

Create a 3D scatter plot of the data.  
Use `f` for x-axis, `ap` for y-axis, `Rz` for z-axis, and color the points by `ae`.

**Hint:** Use a color map like `Blues` to visualize cutting width.




In [None]:
# Import the library for plotting
import _________ as plt  # Commonly used library for plots

# Visualize the dataset in 3D
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
graph = ax.scatter(_____, _____, _____, _____, cmap='____') # Fill in: x-, y-, z-axis, colorbar and colormap
cbar = fig.colorbar(graph, ax=ax)

ax.set_xlabel('__________')  # x-axis label
ax.set_ylabel('__________')  # y-axis label
ax.set_zlabel('__________')  # z-axis label
cbar.set_label('__________') # colorbar label

plt.show()


# Step 3: Data Aggregation

The dataset contains repeated measurements for the same parameter combinations.  
Aggregate the data using the **mean** to have one reference value per combination.





In [None]:
# Aggregate repeated measurements using the mean
df_agg = df.groupby(['f', 'ap', 'ae'])['Rz']._____('____').reset_index()  # Use an aggregation function to combine repeated measurements

print(f'Aggregated dataset size: {_____} datapoints') # How many datapoints are in the aggregated dataset?

# Visualize aggregated data in 3D (same way as in Step 2)


# Step 4: Data Preparation for Modeling

Split the dataset into **training** and **test sets**.  
Scale the features using a standard scaling method.




In [None]:
# Import scaler
from sklearn.preprocessing import StandardScaler

# Split dataset into training and test sets (70% training, 30% testing)
data_train = ______ # Use 70% of the data for training with reproducibility(Hint: use .sample(...)) 
data_test = _______ # Remaining 30% of the data should be used for testing (Hint: use .drop(...))

#Seperate input features and labels 
train_input, train_label = data_train.drop('Rz', axis=1), data_train['__________'] # Input: all columns except target column; Label = target column
test_input, test_label = data_test.drop('Rz', axis=1), data_test['__________']     # Input: all columns except target column; Label = target column

print(______) # Print the size of the train set
print(______) # Print the size of the test set

# Scale the input features
scaler = StandardScaler()
scaler.fit(train_input)
train_input = _____  # Transform the training input
test_input = _____   # Transform the test input



# Step 5: Model Setup and Hyperparameter Optimization

Set up a **MLPRegressor** and perform **GridSearchCV** to optimize hyperparameters.




In [None]:
# Import necessary libraries
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import _________  # Library for cross validation

# Define the neural network model
neuralnet = _____(_______) # Hint: use ReLU activation, Adam solver, and a high number of iterations (e.g., 5000) 

# Set hyperparameter grid
param_grid = {
    'hidden_layer_sizes': [(10,10), (25,25), (50,50), (100,100)],
    'alpha': [0.0001, 0.001, 0.01],
    'learning_rate_init': [0.001, 0.01, 0.1]
}

# Set up the grid search
# Hint: Use the defined neural network, param_grid, 5-fold CV, scoring = neg_root_mean_squared_error, and parallel processing
gridsearch_cv = GridSearchCV(
    _______, 
    _______, 
    _______, 
    _______,  
    _______
)

gridsearch_cv.fit(____,_____) #Fit the grid search on the training data



## Task 5: Evaluate the model

After training, evaluate the model performance using **RMSE** on the training and test set.  
Compare the performance of the **best model from GridSearchCV**.

**Questions:**  
1. What does RMSE indicate in a regression task?  
2. Why might the test RMSE be higher than the training RMSE?


In [None]:
# Import RMSE metric and numpy
from sklearn.metrics import _________  # RMSE metric
import numpy as np  

# Use the best model from grid search
best_model = gridsearch_cv.__________  # Attribute for the best estimator

# Use the best model to predict the test set
test_prediction = best_model.__________(__________)  # Method to generate predictions

# Compute RMSE
test_rmse = np.__________(__________(__________, __________))  # Hint: first compute the mean squared error between test labels and predictions, then take the square root
print(f'Test set RMSE: {__________}')  # Print the computed metric

# Print best grid search results
print(f'Best grid search RMSE: {-gridsearch_cv.__________}')  # Print the attribute with the best score
print(f'Best hyperparameters: gridsearch_cv.__________')     # Print the attribute with the best parameters

