# Probability and statistics

In some form or another, machine learning is all about making predictions.  
We might want to predict the probability of a patient suffering a heart attack in the next year, given their clinical history.  
In anomaly detection, we might want to assess how likely a set of readings from an airplane’s jet engine would be, were it operating normally.  
In reinforcement learning, we want an agent to act intelligently in an environment.  
This means we need to think about the probability of getting a high reward under each of the available action.  
And when we build recommender systems we also need to think about probability.  
For example, if we hypothetically worked for a large online bookseller, we might want to estimate the probability that a particular user would buy a particular book, if prompted.  

For this we need to use the language of probability and statistics.  
Entire courses, majors, theses, careers, and even departments, are devoted to probability.  
So our goal here isn’t to teach the whole subject.  
Instead we hope to get you off the ground, to teach you just enough that you know everything necessary to start building your first machine learning models and to have enough of a flavor for the subject that you can begin to explore it on your own if you wish.  
We’ve talked a lot about probabilities so far without articulating what precisely they are or giving a concrete example.  
Let’s get more serious by considering the problem of distinguishing cats and dogs based on photographs.  
This might sound simpler but it’s actually a formidable challenge.  
To start with, the difficulty of the problem may depend on the resolution of the image.

![](img/cats_and_dogs.png)

While it’s easy for humans to recognize cats and dogs at 320 pixel resolution, it becomes challenging at 40 pixels and next to impossible at 20 pixels.  
In other words, our ability to tell cats and dogs apart at a large distance (and thus low resolution) might approach uninformed guessing.  
Probability gives us a formal way of reasoning about our level of certainty.  
If we are completely sure that the image depicts a cat, we say that the probability that the corresponding label l is cat, denoted $P(l=\mathrm{cat})$ equals 1.0.  
If we had no evidence to suggest that $l=\mathrm{cat}$ or that $l=\mathrm{dog}$, then we might say that the two possibilities were equally likely expressing this as $P(l=\mathrm{cat})=0.5$.  
If we were reasonably confident, but not sure that the image depicted a cat, we might assign a probability $.5<P(l=\mathrm{cat})<1.0$.

Now consider a second case: given some weather monitoring data, we want to predict the probability that it will rain in Taipei tomorrow.  
If it’s summertime, the rain might come with probability $.5$.  
In both cases, we have some value of interest.  
And in both cases we are uncertain about the outcome.  
But there’s a key difference between the two cases.  
In this first case, the image is in fact either a dog or a cat, we just don’t know which.  
In the second case, the outcome may actually be a random event, if you believe in such things (and most physicists do).  
So probability is a flexible language for reasoning about our level of certainty, and it can be applied effectively in a broad set of contexts.  

## Basic probability theory