You have the output column and you train the model with it.
-
Classification: Predict a class label or category (e.g., true/false, spam/ham, cat/dog).
- Logistic Regression (binary or multi-class)
- Decision Tree
- Random Forest
- Support Vector Machine (SVM)
- K-Nearest Neighbors (KNN)
- Naive Bayes
- Gradient Boosting Machines (GBM)
- Neural Networks
-
Regression: Predict a continuous value (e.g., price, temperature, age).
- Linear Regression
- Decision Tree Regression
- Random Forest Regression
- Support Vector Regression (SVR)
- K-Nearest Neighbors Regression
- Ridge Regression
- Lasso Regression
- Polynomial Regression
- Neural Networks (e.g., Multilayer Perceptron)
You don't have the output column and you train the model without it.
-
Dimensionality Reduction: Reduce the number of features while preserving important information.
- Principal Component Analysis (PCA)
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- Singular Value Decomposition (SVD)
- Linear Discriminant Analysis (LDA)
- Independent Component Analysis (ICA)
-
Density Estimation: Estimate the probability distribution of data.
- Gaussian Mixture Models (GMM)
- Kernel Density Estimation (KDE)
-
Market Basket Analysis: Identify associations between items.
- Apriori Algorithm
- Eclat Algorithm
-
Clustering: Group similar data points together.
- K-Means
- Hierarchical Clustering
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
- Mean Shift
- Accuracy gives an overall measure of correct predictions.
- Confusion Matrix breaks down the types of correct and incorrect predictions.
- Recall (or sensitivity) focuses on correctly identifying positive cases.
- Specificity (or true negative rate) focuses on correctly identifying negative cases.
- F1 Score balances precision and recall into a single metric, useful when there's an uneven class distribution.