# Intro to Artificial Intelligence with Python

## Part III - Uncertainty

Harvard CS50 Introduction to Artificial Intelligence with Python is an online course that I took in the Spring of 2020. It consisted of 6 lectures of which I have a notebook for each. Each lecture had 2 projects, those are located in the projects folder in the same directory as this notebook.

[Course Link](https://cs50.harvard.edu/ai/)

[Lecture Link](https://www.youtube.com/watch?v=uQmYZTTqDC0&list=PLhQjrBD2T382Nz7z1AEXmioc27axa19Kv&index=4)

---
## Probability


**Probability** - is a numerical description of how likely an event is to occur or how likely it is that a proposition is true. Probability is a number between 0 and 1, where, roughly speaking, 0 indicates impossibility and 1 indicates certainty. The higher the probability of an event, the more likely it is that the event will occur.


**Probability Theory** - the branch of mathematics concerned with probability.


* ($\omega$) - indicates a possible world containing different potential outcomes (example, when you roll a die, there are six possible worlds that could result, 1, 2, 3, 4, 5, or 6. Each possible world has some probablity of being true. 


* P($\omega$) - P(w) indicates the probablity of world $\omega$ occuring or being true


**Probability Range** - probabilities fall between 0 <= P(w) <= 1, where 0 would be an impossibile event (rolling a 7 for example) and 1 would mean an event is certain. In a die example there are 6 possible probabilities for each die side. All the probablities together have to sum to 1, for a die this would be 1/6 given a singlel roll. 
* Example: P(2) = 1/6 (the probablity of rolling a 2 with 6-sided die is 1/6)
   
  
If two dice are used and we want to get the probabilties of the end result of rolling both, then we need to consider all possible worlds that can be created by both together ((1,1), (2,1), (3,1)....(6,6)), with set dice, the probablities for each are equal with a pair just like with a single die taken together. 

However, if the values are summed rather than taken individually between dice,then things change. For example in that world there is only one possible 12 outcome (6, 6) and 1 possible 2 (1, 1) outcome, whereas, 7 has the most possibilites outcomes at 6. So it stands to reason that rolling a 7 should be more probably overall than rolling a 12 or a 2.

* Example: P(sum to 12) = 1/36 and  P(sum to 7) = 6/36 or 1/6


**Unconditional Probability** - degree of belief in a proposition in the absence of any other evidence


**Conditional Probability** - degree of belief in a proposition given some evidence that has already been revealed, formally:
* P(a | b), the probability that a is true, given b (already known truth)
    * Example 1: P(rain today | rain yesterday)
    * Example 2: P(route change | traffic conditions)
    * Example 3: P(disease | test results)
    

**When using probability with AI, usually conditional probabilty is what is being used**


### Calculating Conditional Probability
* P(a | b) = P(a and b) / P(b), probability of a given b is the probability of a and b being true / probablity of b

Previously with unconditionaly probablity we say that you have a 1/36 chance of rolling a 12 with two dice. Below, the dice example is used with conditional probablity.

* P(sum 12 | 6-roll), what is the probability that two dice some to 12 given we know one is already 6.
    * first need the individual probabilities for give 6 which is 1/6
    * then need the unconditional probablity of both rolling a 12, which was 1/36, P(sum 12 and roll6) = 1/36
    * then you divide the probablity of both by the known to get the final probability, in this case 1/6, so P(sum 12 | roll6) = 1/6


**Random Variable** - a variable in probability theory with a domain of possible values it can take on, 
* Example 1: roll = {1,2,3,4,5,6}, where roll is the random variable and the values in the set are all possibe values the random variable can take on
* Example 2: weather = {sun, cloud, rain, wind, snow}, note that variable values do not have to be numerical. 

**Probability Distribution** - is the probablity for each value in a random variable
 
**Probability Distribution Example:**
* P(Flight = on time)   = 0.6
* P(Flight = delayed)   = 0.3
* P(Flight = cancelled) = 0.1
* All the probabilities for the possible values (or worlds) from the variable flight equal 1

**Vectors** are often used to display probability distributions like:
* **P**(Flight) = <0.6, 0.3, 0.1>, bold **P** denotes probability distribution


**Independence** - the knowledge that one event occurs does not affect the probability of the other even, P(a and b) = P(a) * P(b)
* Example 1: (roll 6 on die1 AND roll 6 on die2) = P(6) * P(6) = 1/6 * 1/6 = 1/36

**Important: note that just because that each individual die roll has the same probablity, different die roll combinations DO NOT**

* Example 2: (roll 6 on die1 AND roll 4 on die2) != P(6) * P(4), these events are NOT independent and the formula P(a and b) = P(b) * P(a | b) 


### Probability Rules

**General Multiplication Rule** - consists of the two formluas used in the previous examples:
* independent events = P(a and b) = P(a) * P(b)
* dependent events = P(a and b) = P(b) * P (a | b)
* note that P(a and b) = P(a) * P (b | a) is equivalent to above

**Bayes' Rule (or theorem)** - In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule) describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes’ theorem allows the risk to an individual of a known age to be assessed more accurately than simply assuming that the individual is typical of the population as a whole
* Formula: P(b | a) = P(a | b) * P(b) / P(a)


**Bayes' Rule Example**
* Given clouds in the am, what' the probability of rain in the afternoon?
* Assume we know this info: 80% of rainy afternoons start with cloudy mornings
* Assume we know this info: 40% of days have cloudy mornings
* Assume we know this info: 10% of days have rainy afternoons
* Formula: P(rain|clouds) = P(clouds|rain) * P(rain) / P(clouds)
* Solution: (.80)(.1) / .4 = 0.2, or a 20% chance of rain in the afternoon

**Ultimately this means that knowing:**
* P(cloudy morning | rainy afternoon)

**Allows us to calculate:**
* P(rainy afternoon | cloudy morning)

### or more generally:

**Knowing**
* P(visible effect | unknown cause)

**We can calculate:**
* P(unknown cause | visible effect)

## Continue notes at joint probablity 34 min mark on vid