Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional Inference Trees #56

Closed
andrewnolanhall opened this issue Mar 21, 2020 · 3 comments
Closed

Conditional Inference Trees #56

andrewnolanhall opened this issue Mar 21, 2020 · 3 comments

Comments

@andrewnolanhall
Copy link

I enjoyed the code that you have here as well as the descriptions in the accompanying slideshow. I have two suggestions.

  1. Would you consider adding a section on conditional inference trees? These are trees that utilize probabilistic associations between features and outcome variable to make splits rather than the information gain (or impurity/error reduction) criterion used here. Conditional inference trees are effective in reducing the variable selection bias present in CART methods (the framework for decision tree creation used here) in which variables with more splits are preferentially selected when compared to variables with fewer splits. They are also less risky to interpret, as each split is based on some statistical association rather than absolute differences in some information metric. The one thing I worry about is obfuscating your very clear presentation of the decision tree algorithm, but this might help to reduce confusion down the line when people try to interpret CART method trees too much. Note that besides the segmentation criterion, conditional inference trees are identical to CART method trees.
  2. In the Module 5-Decision Tree slideshow on slide 53 (the slide right before "Model Performance") you reference the fact that the RMSE algorithm proceeds by "Calculating the variance for each node" and "calculating the variance for each split as weighted average of each node variance." Unless I'm misunderstanding here, I believe this is incorrect. You are not calculating the variance in each node, nor are you calculating the variance for each split; you are instead calculating the root mean squared error (RMSE) and not the variance. The equations are similar, but for variance you are subtracting the average value from each observation in the summation, whereas in RMSE you are subtracting the predicted value from each observation in the summation.

Thank you for your consideration.

@CloudChaoszero
Copy link
Collaborator

Hello @andrewnolanhall,

Hope you are doing well! I am one of the Teaching Fellows helping out at Delta Analytics.

Thanks for reaching out about these two points, and Delta Analytics appreciates your feedback! These are great conceptual topics I can ask the technical lead or other leads about, for sure.

I will keep you posted! Have a good one until then.

@andrewnolanhall
Copy link
Author

andrewnolanhall commented Mar 23, 2020 via email

@brianspiering
Copy link
Member

Closing this issue. Conditional inference trees are out of the scope for the introductory nature of this course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants