This was a 3 hours workshop on machine learning model interpretation conducted on May 27, 2018 in the Data Science Summit 2018 Workshop day in the IDC, Herzliya.
- First Session: Hanan Shteingart, PhD
- Second Session: Yigal Wienberger
A. Interpretable Models, by Hanan Shteingart, PhD
- Introduction slides - talks about the motivation for this workshop: when you want to be able to understand why the model has predicted what ever it has predicted.
-
Naive Bayes notebook - a naive bayes multinomial classifier interpretation example on a newsgroups 20 dataset. It shows how one can easily compute
$P(class=c|feature x_i)$ thus marking words supporting the true and predicted classes. - Tree Ensemble notebook - random forest is commonly regarded as blackbox. This is false. One can use decision paths in order to learn about the contribution of each feature to the final decision. I will show how this method can be used on the iris data set.
- Linear pitfalls notebook - many believe a linear model is easily interpertable. However, linear coefficient are far from intuitive. Specifically, coefficient are sensitive to scaling. However, even if you normalize your features, due to mulitcolinearity, features which are positively correlated with the class may end up having a negative coefficient and vice versa. I show how can the importance of each feature can be estimated using bootstrap shuffeling.
B. Black Box approach using LIME, by Yigal Wienberger
- Deck: Peering into the blackbox
- notebooks: