Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 1.63 KB

11-explore-more.md

File metadata and controls

33 lines (24 loc) · 1.63 KB

6.11 Explore more

  • For this dataset we didn't do EDA or feature engineering. You can do it to get more insights into the problem.
  • For random forest, there are more parameters that we can tune. Check max_features and bootstrap.
  • There's a variation of random forest caled "extremely randomized trees", or "extra trees". Instead of selecting the best split among all possible thresholds, it selects a few thresholds randomly and picks the best one among them. Because of that extra trees never overfit. In Scikit-Learn, they are implemented in ExtraTreesClassifier. Try it for this project.
  • XGBoost can deal with NAs - we don't have to do fillna for it. Check if not filling NA's help improve performance.
  • Experiment with other XGBoost parameters: subsample and colsample_bytree.
  • When selecting the best split, decision trees find the most useful features. This information can be used for understanding which features are more important than otheres. See example here for random forest (it's the same for plain decision trees) and for xgboost
  • Trees can also be used for solving the regression problems: check DecisionTreeRegressor, RandomForestRegressor and the objective=reg:squarederror parameter for XGBoost.

Notes

Add notes from the video (PRs are welcome)

⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation