An Introduction to Statistical Learning (James, Witten, Hastie, Tibshirani, 2013): Python code
Switch branches/tags
Nothing to show
Clone or download
Latest commit 47136ed May 20, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Notebooks Update Chapter 6 & 7 May 20, 2018
LICENSE.md Create LICENSE.md Jun 15, 2017
README.md Update README.md Apr 22, 2018

README.md

ISLR-python

This repository contains Python code for a selection of tables, figures and LAB sections from the book 'An Introduction to Statistical Learning with Applications in R' by James, Witten, Hastie, Tibshirani (2013).

2018-01-15:
Minor updates to the repository due to changes/deprecations in several packages. The notebooks have been tested with these package versions. Thanks @lincolnfrias and @telescopeuser.

2016-08-30:
Chapter 6: I included Ridge/Lasso regression code using the new python-glmnet library. This is a python wrapper for the Fortran library used in the R package glmnet.

Chapter 3 - Linear Regression
Chapter 4 - Classification
Chapter 5 - Resampling Methods
Chapter 6 - Linear Model Selection and Regularization
Chapter 7 - Moving Beyond Linearity
Chapter 8 - Tree-Based Methods
Chapter 9 - Support Vector Machines
Chapter 10 - Unsupervised Learning

Extra: Misclassification rate simulation - SVM and Logistic Regression

This great book gives a thorough introduction to the field of Statistical/Machine Learning. The book is available for download (see link below), but I think this is one of those books that is definitely worth buying. The book contains sections with applications in R based on public datasets available for download or which are part of the R-package ISLR. Furthermore, there is a Stanford University online course based on this book and taught by the authors (See course catalogue for current schedule).

Since Python is my language of choice for data analysis, I decided to try and do some of the calculations and plots in Jupyter Notebooks using:

  • pandas
  • numpy
  • scipy
  • scikit-learn
  • python-glmnet
  • statsmodels
  • patsy
  • matplotlib
  • seaborn

It was a good way to learn more about Machine Learning in Python by creating these notebooks. I created some of the figures/tables of the chapters and worked through some LAB sections. At certain points I realize that it may look like I tried too hard to make the output identical to the tables and R-plots in the book. But I did this to explore some details of the libraries mentioned above (mostly matplotlib and seaborn). Note that this repository is not a standalone tutorial and that you probably should have a copy of the book to follow along. Suggestions for improvement and help with unsolved issues are welcome! See Hastie et al. (2009) for an advanced treatment of these topics.

For Bayesian data analysis, take a look at this notebook.

References:

James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R, Springer Science+Business Media, New York. http://www-bcf.usc.edu/~gareth/ISL/index.html

Hastie, T., Tibshirani, R., Friedman, J. (2009). Elements of Statistical Learning, Second Edition, Springer Science+Business Media, New York. http://statweb.stanford.edu/~tibs/ElemStatLearn/