# Machine Learning Algorithms Grouped by Regression vs Classification

| **Regression Algorithms**       | **Classification Algorithms**       |
| ------------------------------- | ----------------------------------- |
| Linear Regression               | Logistic Regression                 |
| Ridge Regression                | Decision Tree Classifier            |
| Lasso Regression                | Random Forest Classifier            |
| ElasticNet                      | Gradient Boosting Classifier        |
| Support Vector Regression (SVR) | Support Vector Classification (SVC) |
| Decision Tree Regressor         | K-Nearest Neighbors (KNN)           |
| Random Forest Regressor         | Naive Bayes                         |
| Gradient Boosting Regressor     | Linear Discriminant Analysis (LDA)  |
| K-Nearest Neighbors Regressor   | Quadratic Discriminant Analysis     |
| Bayesian Regression             | Neural Network Classifier           |
| XGBoost Regressor               | XGBoost Classifier                  |
| CatBoost Regressor              | CatBoost Classifier                 |
| LightGBM Regressor              | LightGBM Classifier                 |


Regression Algorithms

| **Algorithm**                   | **Example**                              |
| ------------------------------- | ---------------------------------------- |
| **Linear Regression**           | `LinearRegression().fit(X, y)`           |
| **Ridge Regression**            | `Ridge().fit(X, y)`                      |
| **Lasso Regression**            | `Lasso().fit(X, y)`                      |
| **ElasticNet**                  | `ElasticNet().fit(X, y)`                 |
| **SVR**                         | `SVR().fit(X, y)`                        |
| **Decision Tree Regressor**     | `DecisionTreeRegressor().fit(X, y)`      |
| **Random Forest Regressor**     | `RandomForestRegressor().fit(X, y)`      |
| **Gradient Boosting Regressor** | `GradientBoostingRegressor().fit(X, y)`  |
| **KNN Regressor**               | `KNeighborsRegressor().fit(X, y)`        |
| **Bayesian Ridge**              | `BayesianRidge().fit(X, y)`              |
| **XGBoost Regressor**           | `XGBRegressor().fit(X, y)`               |
| **CatBoost Regressor**          | `CatBoostRegressor(verbose=0).fit(X, y)` |
| **LightGBM Regressor**          | `LGBMRegressor().fit(X, y)`              |


Classification Algorithms

| **Algorithm**                    | **Example**                                 |
| -------------------------------- | ------------------------------------------- |
| **Logistic Regression**          | `LogisticRegression().fit(X, y)`            |
| **Decision Tree Classifier**     | `DecisionTreeClassifier().fit(X, y)`        |
| **Random Forest Classifier**     | `RandomForestClassifier().fit(X, y)`        |
| **Gradient Boosting Classifier** | `GradientBoostingClassifier().fit(X, y)`    |
| **SVC**                          | `SVC().fit(X, y)`                           |
| **KNN Classifier**               | `KNeighborsClassifier().fit(X, y)`          |
| **Naive Bayes**                  | `GaussianNB().fit(X, y)`                    |
| **LDA**                          | `LinearDiscriminantAnalysis().fit(X, y)`    |
| **QDA**                          | `QuadraticDiscriminantAnalysis().fit(X, y)` |
| **MLP Classifier (Neural Net)**  | `MLPClassifier().fit(X, y)`                 |
| **XGBoost Classifier**           | `XGBClassifier().fit(X, y)`                 |
| **CatBoost Classifier**          | `CatBoostClassifier(verbose=0).fit(X, y)`   |
| **LightGBM Classifier**          | `LGBMClassifier().fit(X, y)`                |


# General Flow Of ML model Building

1. Collecting Data

2. Cleaning Data
  - Perform light EDA
  - Remove outliers
  - fill or drop na values
  - remove duplicates

3. Feature Engineering
  - Create new features
  - Encode categorical variables(OneHotEncoder or LabelEncoder)
  - Scale/normalize numeric feature (StandardScaler or MinMaxScaler)
  - PCA - reduce multicollinearity

4. Exploratory Data Analysis (EDA)
  - Visualize target variables
  - Explore relationshipa(corr, heatmap, scatter plot, hist)
  - Make hist for all numeric columns/features
  - Run value_counts on all categorical features/columns and also the target feature

5. Model Selection
  - Choose 2-3 model types for the problem
    - Logistic reg, Random Forest, XGBoost(for classification)
    - Linear reg, SVR, Gradient Boosting(for regression)

6. Split the Data
  - Use train_test_split - usually 80/20 or 70/30
  
7. Train and Tune Models
  - Use cross-validation (cross_val_score, GridSearchCV, or RandomizedSearchCV)
  - Tune hyperparameters
  - Evaluate with proper metrics:
    - Classification: accuracy, precision, recall, F1, ROC AUC
    - Regression: RMSE, MAE, R²

8. Ensemble If Needed
  - Combine best models using Bagging, Boosting, or Stacking
  - Use voting classifier or blending if appropriate

9. Evaluate Final Model
  - Test final model on hold-out test set
  - Compare with baseline performance
  - Check for overfitting (train vs test performance gap)

10. Export and Use
  - Save model with joblib or pickle
  - Document features and data processing steps
  - Optional: build a pipeline (Pipeline from sklearn)