Skip to content

Predicting Colorado forest cover types using diverse ML models for classification. Baseline creation, feature selection, comparison, and tuning optimize accuracy in this University of Ottawa Master's Machine Learning course final project (2023).

License

Notifications You must be signed in to change notification settings

RimTouny/User-Forest-Cover-Type-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

User Forest Cover Type Prediction

Predicting Colorado forest cover types using diverse ML models for classification. Baseline creation, feature selection, comparison ,and tuning optimize accuracy on Forest Cover Type Prediction in this University of Ottawa Master's Machine Learning course final project (2023). image

  • Required libraries: scikit-learn, pandas, matplotlib.
  • Execute cells in a Jupyter Notebook environment.
  • The uploaded code has been executed and tested successfully within the Google Colab environment.

Multi-class classification problem

Task is to classify the Forest Cover Type Prediction dataset into seven types: Spruce/Fir, Lodgepole Pine, Ponderosa Pine, Cottonwood/Willow, Aspen, Douglas-fir, and Krummholz.

Independent Variables:

  • 54 geographical Features include 'Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology', 'Vertical_Distance_To_Hydrology', 'Horizontal_Distance_To_Roadways', 'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm', 'Horizontal_Distance_To_Fire_Points', 'Wilderness_Area1' to 'Wilderness_Area4', and 'Soil_Type1' to 'Soil_Type40'.

Target variable:

  • 'Cover Type' column represents the target with 7 classes

Key Tasks Undertaken

  1. Problem’s Overview:

    • Create a conceptual figure showcasing the end-to-end data flow.
    • Illustrate insights into the problem through data flow visualization.
  2. Dataset’s Overview (EDA):

    • Present numerical information about the dataset. merge_from_ofoct (1)
  3. General Flowchart:

    • Develop a detailed flowchart illustrating each step of the project's implementation.

  4. Visualize Training and Test Sets:

    • Generate TSNE plots separately for the training and test sets to understand the problem's complexity.
      • Problem Complexity image

      • Reduction and Transformation image

  5. Obtain Baseline Performance:

    • Apply diverse ML methods (KNN, LogisticRegression, SVM, DecisionTreeClassifier, Naive Bayes Classifier) to establish a baseline. merge_from_ofoct (6)

    • Champion Model merge_from_ofoct

  6. First Improvement Strategy: Feature Selection:

    • Implement feature selection methods, including

      • Filter Selection Methods (Information Gain/Mutual Information , Feature Selection , Variance Threshold ,Chi-Square)
      • Wrapper Selection Methods (Forward Feature Elimination- Backward Feature Elimination- Recursive Feature Elimination
    • Proceed with the best-performing feature subset and ML model for subsequent stages.

      • Champion Model in Filter Selection: Information Gain
           Maximum of Feature Selection-K-Nearest Neighbors: 73.96721311475409
           Best number of n_components Feature Selection-K-Nearest Neighbors: 12
        
           Maximum of Feature Selection-Decision Tree Classifier: 76.65573770491804
           Best number of n_components Feature Selection-Decision Tree Classifier: 8

      • Champion Model in Wrapper Selection: Recursive
           Maximum of Recursive_FE-K-Nearest Neighbors: 73.96721311475409
           Best number of n_components Recursive_FE-K-Nearest Neighbors: 12
        
           Maximum of Recursive_FE-Decision Tree Classifier: 76.26229508196721
           Best number of n_components Recursive_FE-Decision Tree Classifier: 10

  7. Adding More Machine Learning Models:

    • Implement advanced models (Random Forest, ensemble techniques) to enhance performance. image

    • Compare new technique performance with the initial improvement through confusion matrices. image

About

Predicting Colorado forest cover types using diverse ML models for classification. Baseline creation, feature selection, comparison, and tuning optimize accuracy in this University of Ottawa Master's Machine Learning course final project (2023).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published