This is a collection of jupyter notebooks describing introductory topics in statistics in data science, such as basic probability distributions, the difference between Bayesian and Frequentist statistics, and some links with optimisation of functions.
A worked example is included for locating a lighthouse in a Bayesian manner with a Markov Chain Monte Carlo method.
1 - Introduction to basic probability and statistical conventions and concepts. |
- Mode vs Median vs Mean
- Variance
- Correlation
- Linear Regression
2 - Probability Distributions |
- Probability Distributions
- Frequentist vs Bayesian Statistics
3 - Component Analysis
4 - Markov Chain Monte Carlo
5 - Clustering Data
6 - Optimisation
7 - Worked Bayesian Example with the lighthouse problem.
This should run with an out of the bag Jupyter install. I personally use brew, however conda is the best way to get up and going. Dependencies are:
- numpy
- scipy
By far the easiest way to get going and the lingua franca of python package managers currently is conda: