The objective of the projects is to prepare you to apply different machine learning algorithms to real-world tasks. This will help you to increase your knowledge about the workflow of the machine learning tasks. You will learn how to clean your data, applying pre-processing, feature engineering, classification methods.
- **Check for missing value. **
- map the class label
Transform the class labels from their original string representation (M and B) into integers
Use sklearn to scale and transform the data
- DataFrame .describe()
Calculating some statistical data like percentile, mean and std of the numerical values of the Series or DataFrame.
- The f2 and f4 feature have a strong positive correlation with f6,f8 and f9 feature;
- The f21 and f22 feature have a weak correlation with f24,f8 and f19 feature;
Number of benign tumor data more than number of malignant tumor data
most of the values are usually higher in malignant than that of benign
80% of the data for training and the remaining 20% for validation.
-
Kernel:
-
sigmoid
accuracy score =0.945054945054945.
-
linear
accuracy score =0.978021978021978.
-
rbf
accuracy score =0.978021978021978.
-
-
gamma:
-
0.001
accuracy score =0.9560439560439561.
-
0.0001
accuracy score =0.7362637362637363.
-
0.01
accuracy score =0.978021978021978.
-
-
max_depth:
-
(None)
Accuracy score =0.9340659340659341.
-
(2)
Accuracy score =0.9560439560439561.
-
(4)
Accuracy score =0.945054945054945
-
-
min_samples_leaf:
-
(10)
Accuracy score =0.9560439560439561.
-
(6)
Accuracy score =0.967032967032967
-
(4)
Accuracy score =0.9340659340659341
-
- xgboost
-
max_depth:
-
(3)
Accuracy score =0.978021978021978
-
(2)
Accuracy score =0.967032967032967
-
(4)
Accuracy score =0.967032967032967
-
-
learning_rate:
-
(0.05)
Accuracy score =0.978021978021978.
-
(0.5)
Accuracy score =0.967032967032967
-
(0.10)
Accuracy score =0.978021978021978
-
- PCA
-
SVM:
-
(0.90)
Accuracy score =0.978021978021978.
-
(0.50)
Accuracy score =0.9340659340659341
-
(25)
Accuracy score =0.978021978021978
-
-
Decision Tree:
-
(0.90)
Accuracy score =0.9340659340659341
-
(25)
Accuracy score =0.9340659340659341
-
(0.70)
Accuracy score =0.9230769230769231.
-
-
xgboost:
-
(0.90)
Accuracy score =0.945054945054945
-
(0.50)
Accuracy score =0.9340659340659341
-
(24)
Accuracy score =0.9560439560439561
-
We applied _Decision Tree,_XGBoosts and Support Vector Machine (SVM)
algorithms to the Tumor Cancer dataset.
• To predict whether the Tumor cancer is malignant or benign. • Compared the performance results of all the algorithms based on_
the accuracy values. and showed that XGBoosts classifier is the best among all in determining benign and malignant tumors.