You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I enjoyed the code that you have here as well as the descriptions in the accompanying slideshow. I have two suggestions.
Would you consider adding a section on conditional inference trees? These are trees that utilize probabilistic associations between features and outcome variable to make splits rather than the information gain (or impurity/error reduction) criterion used here. Conditional inference trees are effective in reducing the variable selection bias present in CART methods (the framework for decision tree creation used here) in which variables with more splits are preferentially selected when compared to variables with fewer splits. They are also less risky to interpret, as each split is based on some statistical association rather than absolute differences in some information metric. The one thing I worry about is obfuscating your very clear presentation of the decision tree algorithm, but this might help to reduce confusion down the line when people try to interpret CART method trees too much. Note that besides the segmentation criterion, conditional inference trees are identical to CART method trees.
In the Module 5-Decision Tree slideshow on slide 53 (the slide right before "Model Performance") you reference the fact that the RMSE algorithm proceeds by "Calculating the variance for each node" and "calculating the variance for each split as weighted average of each node variance." Unless I'm misunderstanding here, I believe this is incorrect. You are not calculating the variance in each node, nor are you calculating the variance for each split; you are instead calculating the root mean squared error (RMSE) and not the variance. The equations are similar, but for variance you are subtracting the average value from each observation in the summation, whereas in RMSE you are subtracting the predicted value from each observation in the summation.
Thank you for your consideration.
The text was updated successfully, but these errors were encountered:
Hope you are doing well! I am one of the Teaching Fellows helping out at Delta Analytics.
Thanks for reaching out about these two points, and Delta Analytics appreciates your feedback! These are great conceptual topics I can ask the technical lead or other leads about, for sure.
I will keep you posted! Have a good one until then.
Thank you for your response! And thanks for considering it. I couldn't
decide if adding the conditional inference trees would add more confusion
than they are worth, but thought it might be good to point them out.
Best,
*Andrew N Hall *
*PhD Candidate, Northwestern University *
*Data Science Research Consultant*
*Northwestern IT Research Computing Services*
On Sat, Mar 21, 2020 at 3:32 AM Raul-ing Average ***@***.***> wrote:
Hello @andrewnolanhall <https://github.com/andrewnolanhall>,
Hope you are doing well! I am one of the Teaching Fellows helping out at
Delta Analytics.
Thanks for reaching out about these two points, and Delta Analytics
appreciates your feedback! These are great conceptual topics I can ask the
technical lead or other leads about, for sure.
I will keep you posted! Have a good one until then.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#56 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKI2TB7H4V75N5HP5QLXBNDRIR3STANCNFSM4LQXMX4Q>
.
I enjoyed the code that you have here as well as the descriptions in the accompanying slideshow. I have two suggestions.
Thank you for your consideration.
The text was updated successfully, but these errors were encountered: