Skip to content

Latest commit

 

History

History
127 lines (71 loc) · 4.63 KB

scratch.md

File metadata and controls

127 lines (71 loc) · 4.63 KB

Statistical learning: an introduction

  • Regression vs. K-nearest neighbors
  • Cross validation

Linear models

Regularization and feature selection

Classification

Trees and ensembles

Clustering

Basics of clustering; K-means clustering; hierarchical clustering.

Scripts and data:

Readings:

Latent-feature models

Principal component analysis (PCA). Using PCA for dimensionality reduction in regression.

Scripts and data:

If time:

Readings:

  • ISL Section 10.2

Supplemental readings (optional and more advanced):

  • Elements Chapter 14.5
  • Shalizi Chapters 18 and 19. In particular, Chapter 19 has a lot more advanced material on factor models, beyond what we covered in class.

Networks and Association Rules

Networks and association rule mining.

Scripts and data:

Readings:

Miscellaneous:

Monte Carlo simulation

Using the bootstrap to approximate value at risk (VaR).

Scripts:

Readings:

  • Section 2 of these notes, on bootstrap resampling. You can ignore the stuff about utility if you want.
  • Any basic explanation of the concept of value at risk (VaR) for a financial portfolio, e.g. here, here, or here.

Text data

Co-occurrence statistics; naive Bayes; TF-IDF; topic models; vector-space models of text (if time allows).

Scripts and data:

Readings:

Further topics

Causal inference meets statistical learning.

Neural networks.