-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
- Preprocessing Data
• Missing Values:
na_percentage <- sapply(data, function(x) mean(is.na(x)))
data_clean <- data[, na_percentage < 0.9] # Hapus kolom dengan >90% NA
• Delete Non-Predictive Coloum:
data_clean <- data_clean %>%
select(-c(X, user_name, raw_timestamp_part_1, raw_timestamp_part_2,
cvtd_timestamp, new_window, num_window))
• Conversion Variabel:
data_clean$classe <- as.factor(data_clean$classe)
-
Feature Selection and Data Division
• Data Partition (70% training, 30% testing):
set.seed(123)
trainIndex <- createDataPartition(data_clean$classe, p = 0.7, list = FALSE)
training <- data_clean[trainIndex, ]
testing <- data_clean[-trainIndex, ] -
Model Building with Random Forest
• Model Training with 5-fold Cross Validation:
trControl <- trainControl(method = "cv", number = 5)
model_rf <- train(classe ~ .,
data = training,
method = "rf",
trControl = trControl,
verbose = FALSE)
• Model results:
Accuracy: 0.9923
Out-of-Sample Error Estimate: 0.77%
- Model Validation and Evaluation
• Prediction on Testing Set:
predictions <- predict(model_rf, newdata = testing)
confusionMatrix(predictions, testing$classe)
• Evaluation Results:
Accuracy : 0.993
95% CI : (0.991, 0.994)
- Out-of-Sample Error Estimation
• Estimated using cross-validation during model training:
model_rf$results$Accuracy # Menunjukkan akurasi validasi silang
•Out-of-Sample Error = 1 - Accuracy = 1 - 0.9923 = 0.0077
•Conclusion:
Model Built: Random Forest with 99.23% accuracy on training data and 99.3% on testing set.
Error Estimation: Out-of-sample error estimated at 0.77% using cross validation.
Prediction Quality: The model was highly accurate in classifying the 5 activity types (A-E).