<a href="https://colab.research.google.com/github/brendanpshea/logic-prolog/blob/main/The_ProbabilityOfMurder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Probability of...Murder
In this chapter, we'll exploring the role that **probability** plays in arugments and reasoning. Probability is, at its most basic, the measure of how likely something is to occur, a concept that is as pivotal in statistics as it is in the realm of detective work. In order to explore probability, we'll be enlisting the help of some famous (fictional) detectives, from the sharp Sherlock Holmes to the perceptive Nancy Drew, to see how we can use probability to weave together bits of reality into a coherent picture of truth.

To get started, let's imagine that detective Adrian Monk, known keen observational skills, stands in the midst of a crime scene. He recalls that knowledge that 80% of similar crimes were committed using a particular method, he observes the same pattern at his current crime scene. This statistical insight leads Monk to a calculated conclusion: there's a high probability that these crimes are linked. Here, probability invovles concrete numbers and known frequencies. We'll later call this "frequency-type probability."

In contrast, let's consider the investigative approach of Velma from Scooby-Doo. In a mysterious mansion, she uncovers a concealed passage, subtly shifting the odds in favor of her hypothesis: the supposed ghost is merely a person exploiting these hidden corridors. Each clue Velma encounters – be it an unusual footprint or a specific thread of fabric – doesn’t just add to her evidence pile; it incrementally adjusts the likelihood of her theories being accurate. Her method is less about direct calculations and more about intuitively assessing how each piece of evidence modifies her hypotheses. We'll later cll this "belief-type probability."

For both Monk and Velma, probability is an ever-present guide. It helps them navigate through a landscape of uncertainty and ambiguity, turning each clue into a stepping stone towards the truth. This chapter will take us on a journey through the nuanced streets of probability and logic, where every clue carries its weight in the grand scheme of things.


## What is Probability, Part 1: The Kolmogorov Axioms
Tucked away in the back alleys of mathematical theory, like a cryptic clue in a detective's notebook, are the **Kolmogorov Axioms**. These axioms are the backbone of probability theory, named after the Russian mathematician Andrey Kolmogorov, who laid down the fundamental principles of probability in a rigorous mathematical way. But fear not, for these axioms are not as daunting as they might seem and can be understood without delving deep into complex mathematics.

To begin, we use notation to simplify our discussion. When we talk about the probability of an event, we use the notation *Pr(Event)*. Think of it like saying, "What are the odds of this happening?" For example, Sherlock Holmes might calculate the probability of a suspect being at the crime scene, which we could write as *Pr(Suspect at Crime Scene)*. Similarly, when we want to talk about the probability of a hypothesis given a specific event, we use the notation *Pr(Hypothesis|Event)*. It's like asking, "Given that this clue or event has occurred, what's the probability that my hypothesis is true?" This is something a detective like Nancy Drew might ponder when she finds a new clue and reassesses her theories.

The Kolmogorov Axioms *define* the mathematical notion of probability. They are as follows:

1. **Non-negativity.** Every event E has a probability that is a non-negative number:
  - $Pr(E) ≥ 0$.
2. **Certainty:**  The probability of a certain (or guaranteed) event is 1. For example, the probability of "an event E either happens or it doesn't happen" should be 1.
  - $Pr(E \vee \neg E) = 1$, where E is any event.
3. **Additivity.** For any two muually exclusive events (events that cannot both occur at the same time), the probability of either event occurring is the sum of their individual probabilities:
  - $Pr(A \vee B) = Pr(A) + Pr(B)$, for mutually exclusive events A and B.

The first axiom of Kolmogorov is that the probability of any event is a non-negative number. This simply means that you can't have a negative chance of something happening. It's either going to happen, or it isn't, or somewhere in between, but it's never less than zero. It's like saying, "There's no chance that the victim committed the crime," which would be a probability of zero, or "There's a certain chance that the butler did it," which might be a probability close to one, but never negative.

The second axiom states that the probability of a certain event (one that is guaranteed to happen) is 1. In our detective story, this would be akin to saying, "The crime definitely happened here," which is an absolute certainty and thus has a probability of 1.

The third axiom is a bit more complex. It involves the probability of the union of two mutually exclusive events. In simple terms, if you have two events that cannot happen at the same time (like the suspect can't be both in the library and the dining room at the same moment), then the probability that either one happens is the sum of the probabilities of each happening individually. For example, if there's a 30% chance the suspect was in the library and a 40% chance they were in the dining room, and these two events are mutually exclusive, the probability of the suspect being in either location is 70%.

### Some Rules for Calculating Probabilities

It isn't easy to directly apply the Kolmogorov axioms to calculate probabilities. Luckily, we don't have to! Instead, we can use various derived rules (logicians might call them *theorems*) to make our lives easier. Here are a few that might come in handy.

### Complement Rule

The complement rule states that the probability of an event not occurring is 1 minus the probability of the event occurring.

- **Complement Rule.** Pr(not E) = 1 - Pr(E), where E is any event.

For example, Enola and Mycroft Holmes (Sherlock's brother and sister) are investigating a case where they know the probability of a suspect being in London is 0.65. Using the complement rule, they deduce that the probability of the suspect not being in London is 1 - 0.65 = 0.35. This calculation helps the Holmes team strategize their investigation based on the suspect's likely whereabouts.

In [1]:
%%capture
# This chapter uses some helper functions
!wget https://github.com/brendanpshea/logic-prolog/raw/main/logic_util.py
from logic_util import *

In [2]:
# Computer code to do this. Try changing the number!
complement_rule(pr_e = 0.65)

P(not E) = 1 - P(E)
         = 1 - 0.65
         = 0.35


#### Simple Addition Rule (for Mutually Exclusive Events)
he simple addition rule applies to mutually exclusive events, meaning two events that cannot happen at the same time. The rule states that the probability of either event occurring is the sum of their individual probabilities:

- **Simple Addition Rule.** Pr(A or B) = Pr(A) + Pr(B).

For example, Nancy Drew is trying to determine the likelihood that a clue comes from either the attic (30% probability) or the basement (20% probability), knowing these locations cannot be involved in the clue's origin simultaneously. Applying the simple addition rule, she calculates a 50% probability (0.30 + 0.20) that the clue originates from either the attic or the basement.

In [None]:
# Some python code
simple_addition(pr_e1 = 0.30, # attic
                pr_e2 = 0.20) # basement

P(E1 or E2) = P(E1) + P(E2)
            = 0.3 + 0.2
            = 0.5


### General Addition Rule
The general addition rule is used when events can occur simultaneously. It states that the probability of either event A or event B occurring is the sum of their individual probabilities minus the probability of both events occurring together:

- **General Addition.** Pr(A or B) = Pr(A) + Pr(B) - Pr(A and B).

Suppose that Agent Scully is assessing the chances that a suspect has either a red scarf (40%) or a blue hat (50%), with a 15% chance the suspect has both. Using the general addition rule, she calculates a 75% chance (0.40 + 0.50 - 0.15) that the suspect has either a red scarf or a blue hat.

In [None]:
general_addition(pr_e1=0.4, # blue hat
                 pr_e2=0.5, # red scarf
                 pr_e1_and_e2=0.15) # both

P(E1 or E2) = P(E1) + P(E2) - P(E1 and E2)
            = 0.4 + 0.5 - 0.15
            = 0.75


### Simple Multiplication Rule (for Independent Events)
The simple multiplication rule applies to independent events, which are events where the occurrence of one does not affect the occurrence of the other. The rule states that the probability of both events occurring is the product of their individual probabilities:

- **Simple Multiplication Rule.** Pr(A and B) = Pr(A) * Pr(B)

Suppose Adrian Monk is investigating two unrelated leads: the chance that the first witness is telling the truth (70%) and the probability a second, slightly less trustworty, witness is telling the truth (50%). To deterime the probability that both are telling the truth, he would multiply 0.7 x 0.5 = 0.35. This gives the chance both leads are accurate.

In [None]:
simple_multiplication(pr_e1=.7,  # First witness
                      pr_e2=.5) # Second witness

P(E1 and E2) = P(E1) * P(E2)
             = 0.7 * 0.5
             = 0.35


### Conditional Probability
**Conditional Probability** explores "what ifs" within the universe of probability, focusing on the likelihood of one event occurring under the precondition that another specific event has already taken place. It's a measure that answers questions of the form, "Given that B has occurred, what is the chance of A happening?" This concept is mathematically represented as Pr(A|B), signifying the probability of event A given that B is known to have occurred.

The formula for calculating conditional probability is given by:
$$
Pr(A|B) = \frac{Pr(A \text{ and } B)}{Pr(B)}
$$

This equation highlights that the probability of both A and B happening together, divided by the probability of B happening, gives us the conditional probability of A given B. It's a way to refine our predictions or expectations about an event based on new information or given conditions.

To bring this concept to life, let's suppose that Sherlock is investigating a case where the presence of fingerprints at a crime scene could be crucial evidence. However, the night before the investigation, it rained, potentially washing away any fingerprints. Here, Sherlock is interested in calculating the conditional probability of finding fingerprints given that it rained. If historical data or his deductive reasoning suggests that the chance of finding fingerprints after rain is 25%, then we can denote this as Pr(Fingerprints | Rain) = 0.25. This means, according to Holmes' estimation, even after rain, there's a 25% chance that fingerprints, resilient or protected enough from the weather, could still be found at the crime scene.

### Table: Sample Conditional Probabilities
To help you get a better sense of how conditional probability works, here are some simple examples:

| Conditional Probability Claim | Description |
| --- | --- |
| Pr(Truth \| KnownLiar) = 0.2 | The probability of a known liar telling the truth is 20%. |
| Pr(FingerprintMatch \| SuspectPresent) = 0.9 | There's a 90% chance of finding a matching fingerprint if the suspect was present at the crime scene. |
| Pr(Confession \| Guilty) = 0.5 | If a suspect is guilty, there's a 50% probability that they will confess to the crime. |
| Pr(PoisonDetected \| LabTest) = 0.95 | There's a 95% chance that poison will be detected if a proper lab test is conducted. |
| Pr(Confession \| Guilty AND UnderPressure) = 0.85 | The probability that a guilty suspect confesses when under pressure increases to 85%. |
| Pr(AlibiVerified \| NOT CCTVFootage) = 0.3 | If there is no CCTV footage, the probability of an alibi being verified drops to 30%. |
| Pr(FingerprintMatch \| CleanedRoomOR WoreGloves) = 0.5 | There's a 50% chance of finding a matching fingerprint if the suspect cleaned the room or wore gloves, accounting for the possibility of gloves leaving no prints. |
| Pr(NoEvidenceLeft \| ProfessionalThief AND NightTime) = 0.95 | The probability that no evidence is left behind increases to 95% if the crime was committed by a professional thief during the night. |
| Pr(SuspectFlees \| Confronted AND NOT Armed) = 0.6 | If confronted and not armed, the probability that the suspect will attempt to flee increases to 60%. |
| Pr(PoisonDetected \| LabTest AND NOT ContaminatedSample) = 0.98 | There's a 98% chance that poison will be detected if a lab test is conducted on a sample that is not contaminated. |

### Complete Multiplication Rule (for Dependent Events)

Complete Multiplication Rule applies when calculating the probability of sequential, dependent events occurring. In scenarios where one event's outcome influences another's, the probability of both events happening is the product of the first event's probability and the conditional probability of the second event given the first.

$$
Pr(A \text{ and } B) = Pr(A) * Pr(B|A)
$$

Imagine Boba Fett tracking down two targets in the galaxy, where the capture of the first target significantly increases the chances of locating the second due to intel gathered. If the probability of capturing the first target is 70% (Pr(A) = 0.7), and this success boosts the probability of securing the second target to 80% (Pr(B|A) = 0.8), then the probability of Boba Fett capturing both targets, one after the other, can be calculated as 0.7 * 0.8 = 0.56. Thus, there's a 56% chance Boba Fett will successfully apprehend both targets, showcasing the interdependency of these events in his mission.

In [3]:
complete_multiplication(
    pr_e1 = 0.7,
    pr_e2_given_e1 = 0.8
)

P(E1 and E2) = P(E1) * P(E2|E1)
             = 0.7 * 0.8
             = 0.5599999999999999


### Basic Rules of Probability
Here are the basic rules of probability we've discussed so far. These are all simple enough that you should be able to compute them with a simple calculator app on your phone. However, you are also welcome to try out the "interactive" version of this chapter, which has custom fuctions. You'll just need to click here:

https://colab.research.google.com/github/brendanpshea/logic-prolog/blob/main/The_ProbabilityOfMurder.ipynb

And then select "Runtime: run all".


| Rule Name | Description in English | Definition | Python Function Call |
| --- | --- | --- | --- |
| Complement Rule | Calculates the chance of an event not happening. | `Pr(not E) = 1 - Pr(E)` | `complement_rule(pr_e)` |
| Conditional Probability | Determines the likelihood of an event A occurring given that event B has already occurred. | `Pr(A given B) = Pr(A and B) / Pr(B)` |  |
| Simple Addition | Finds the chance of either event happening, assuming they are mutually exclusive. | `Pr(E1 or E2) = Pr(E1) + Pr(E2)` | `simple_addition(pr_e1, pr_e2)` |
| General Addition | Adds probabilities of two events, subtracting the overlap to avoid double counting. | `Pr(E1 or E2) = Pr(E1) + Pr(E2) - Pr(E1 and E2)` | `general_addition(pr_e1, pr_e2, pr_e1_and_e2)` |
| Simple Multiplication | Multiplies the probabilities of two independent events to find the chance of both occurring. | `Pr(E1 and E2) = Pr(E1) * Pr(E2)` | `simple_multiplication(pr_e1, pr_e2)` |
| Complete Multiplication | For dependent events, multiplies the probability of one event by the conditional probability of the second. | `Pr(E1 and E2) = Pr(E1) * Pr(E2 given E1)` | `complete_multiplication(pr_e1, pr_e2_given_e1)` |
| Total Probability | Calculates overall probability of an event by considering all exclusive scenarios. | `Pr(E) = Pr(E given H1) * Pr(H1) + Pr(E given H2) * Pr(H2)` | `total_probability(pr_e_given_h1, pr_h1, pr_e_given_h2, pr_h2)` |

## Exercises
Here are some exercises to practice the basic rules of probability.

1. Detective Holmes is investigating a high-profile case and estimates the probability of the suspect being in London is 75%. What is the probability that the suspect is not in London?

2.  In the case of a stolen artifact, Inspector Gadget finds that if a suspect is known to have access to the museum, the probability of them being guilty increases to 40%. Given that 30% of all suspects had access to the museum, and 12% of all suspects had access and are guilty, what is the probability a suspect is guilty given they had access?

3. Sherlock Holmes is investigating a case and determines that the probability the thief took a cab away from the scene is 50% and the probability of leaving fingerprints at the scene is 20%. Assuming these events are independent, what is the probability the thief both took a cab and left fingerprints?

4. Veronica Mars is trying to determine who pranked the principal. There are two suspects: Lilly and Wallace. She knows that if Lilly did it, there's a 50% chance she would use a stink bomb. If Wallace did it, there's a 30% chance of him using the same method. Given Lilly is 60% likely and Wallace 40% likely to be the prankster, what is the total probability a stink bomb was used?

5. Detective Pikachu is on the trail of two separate clues regarding the location of a hidden item. He estimates a 20% chance the item is in the city park and a 15% chance it is at the local museum. Assuming these are the only two locations, what is the probability the item is at either location?

6. Sam Spade is tracking two leads. The probability the first lead pans out is 60%. If the first lead is successful, the probability the second lead will also be successful increases to 70%. What is the probability both leads will be successful?

7. Nancy Drew is investigating a case with two possible suspects. The probability suspect A is involved is 25%, and the probability suspect B is involved is 35%. If the probability that both A and B are involved is 10%, what is the probability that either A or B is involved?

## What is Probability, Part 2: Two Types of Probability

The previous sections tell us how probability works "mathematically." However, the equations we looked at don't tell us what these numbers *mean*. In probability, much like in a detective's investigation, the interpretation of evidence can take different forms. Two fundamental types of probability interpretations are Frequency-Type Probability and Belief-Type Probability. Each type offers a differents lens through which we can understand the meaning behind the numbers in probability.

### Frequency-Type Probability

**Frequency-Type Probability** is grounded in the long-run frequency of events. It is defined as the limit of the relative frequency of an event occurring after many trials. It is alway relative to a **reference class** describing the results of these trials. In simpler terms, it's about how often something happens over repeated trials or occurrences. It is sometimes called *objective probability*, *chance*, or *physical probability*.

-   Ex1: Imagine Sherlock Holmes is investigating a series of burglaries. He finds that in 60 out of 100 past burglary cases in the area (our reference class), the perpetrator left behind a specific clue. Here, the frequency-type probability of finding this clue in a burglary is .6 (or 60%). Holmes might use this probability to gauge the likelihood of encountering this clue in future burglary cases.
- Ex2: Veronica Mars has discovered that someone has become gravely ill after taking two aspirin. SHe wonders what the probability that this might happen by chance (for example, because of a drug allergy or manufacturing defect). She looks up the data, and finds out this happens only for only 1 out of 1,000,000 people, which gives the (very low!) probability of 0.00001. She begins to suspect poison.

### Belief-Type Probability

**Belief-Type Probability**, on the other hand, reflects a degree of belief or confidence that a person has in the occurrence of an event. It is always relative to a person's **total evidence**. It is sometimes called *subjective probability*, *inductive probability* or *logical probability*.

-   Ex1: Nancy Drew is investigating a mysterious disappearance. Based on her investigation, she estimates there's a 0.7 (70%) chance that the person disappeared voluntarily. This belief is based on her evidence concerning the case, the person's behavior patterns, and the evidence she has gathered. Her colleague, however, might assess the situation differently based on his perspective and information, assigning a different belief-type probability to the same hypothesis.
- Ex2: Lisbeth Salander is analyzing whether a leak within a company was an inside job. She thinks about her total evidence. Given the restricted access to the information, the employees' profiles, and recent unusual network activities, she approximates a 65% probability that the leak was internal. Her estimation is based on her analysis of network security data, employee access levels, and behavior patterns. These factors collectively shape her belief about the likelihood of an internal leak, though it remains an educated guess.

### Table: Two Types of Probability
In detective work, as in the rest of life, both interpetations of probability are important.

| Feature | Frequency-Type Probability | Belief-Type Probability |
| --- | --- | --- |
| Definition | Defined as the limit of the relative frequency of an event occurring after many trials. | Reflects a degree of belief or confidence in the occurrence of an event, relative to a person's total evidence. |
| Reference | Always relative to a specific reference class describing the results of trials or occurrences. | Always relative to a person's total evidence, incorporating both objective data and subjective interpretation. |
| Terminology | Also known as objective probability, chance, or physical probability. | Also known as subjective probability, inductive probability, or logical probability. |
| Mathematical Formulation | No difference. Probability values range from 0 to 1, where 0 indicates impossibility and 1 indicates certainty. | No difference. Probability values range from 0 to 1, where 0 indicates impossibility and 1 indicates certainty.  |
| Nature | Considered more objective, as it is based on empirical data and observed frequencies. | Considered more subjective, as it incorporates personal judgment and interpretation of evidence. |
| Application | Useful in situations where data from repeated trials or historical patterns are available. | Useful in decision-making processes where personal judgment and the assessment of all available evidence are key. |

In [4]:
bayes_theorem(p_h = 0.5,
              pr_e_given_h = .9,
              pr_e_given_not_h = .2)

P(H|E) = (P(E|H) * P(H)) / [P(E|H) * P(H) + P(E|not H) * P(not H)]
       = (0.9 * 0.5) / (0.9 * 0.5 + 0.2 * 0.5)
       = 0.82
