# How Numbers Decieve

Adapted from _Using and Understanding Mathematics: A Quantitative Reasoning Approach_

## What is this?

_Source: [Owlcation](https://owlcation.com/social-sciences/How-To-Beat-A-Polygraph-Test)_
![polygraph](https://usercontent1.hubstatic.com/13722112_f520.jpg)

It is a polygraph test, aka a lie detector you may have seen them in movies where police or FBI are trying to see if their suspect is lying or not. These are real machines, and they have a 90% accuracy. Most people would assume that means that only 10% of the people who fail the test are actually lying. However, the _actual_ percentage of false accusations can be significantly higher.

### Question: How high do you think the percentage actually is?

a) 20%

b) 120%

c) 90%

d) 60%


### Answer

In some cases, a polygraph can have _90% false accusations_

### What does all of this mean?

While numbers themselves can't lie, they can be presented or interpreted in a way that can be deceiving, and we need to be careful in our own interpretations of them. Today, we'll go over some of the ways in which numbers can decieve and how we can spot the deception


## Learning Outcomes

By the end of today's class, you should be able to...

1. Critically think about reported percentages and investigate their honesty
1. Differentiate between True Positives, False Positives, True Negatives, and False Negatives

## Acne Treatment

Let's say a pharmaceutical company creates an updated formula to their acne-curing product. They want to know if their new formula is better than their old formula.

- 90 people are given the old formula
- 110 people are given the new formula
- Those with mild acne: 2/10 who got the old formula were cured (20%), 30/90 who got the new formula were cured (33%)
- Those with severe acne: 40/80 who got the old formula were cured (50%), and 12/20 who got the new formula were cured (60%)

_Source: Using and Understanding Mathematics_
![acne](acne.png)

Given that the new formula got higher percentage success rates than the old one, many  would say that the new formula is the superior product.

### But let's dig a little deeper:

- 90 patients total (10 mild, 80 severe acne) got the **old formula** and 42 of them were cured (2 + 40) for an overall **cure rate of 42/90 = 46.7%**
- 110 patients total (90 mild, 20 severe acne) got the **new formula**, and 42 of them were also cured (30 + 12) for an **overall cure rate of 42/110 = 38.2%**

This shows that the old formula actually out preformed the new formula by 8.5%! Even though the new formula appeared to out perform across the group comparisions, the old formula did better overall. 

This is what's known as **Simpson's Paradox:** when results look positive locally (new formula looked better when just comparing percentages against the old formula), but are actually worse when the data is combined as a whole.

## Activity: Basketball Shots

We are tracking two players in a basketball game. We're given the following information:

- Kevin scored 4/10 of his shots in the first half, and 3/4 of his shots in the second half
- Kobe scored 1/4 of his shots in the first half, and 7/10 of his shots in the second half

_Source: Using and Understanding Mathematics_
![basketball](basketball.png)

At first glance, it looks like Kevin did better overall the Kobe. Is this true?

### Think, Pair, Share

- 5 min: Take some time to work out the answer on your own
- 2 min: Discuss your solution and reasoning with a partner.
- 5 min: The class will go over the solution with the instructor

## Breast Cancer Tests

Imagine you are a doctor who treats patients with breast tumors. You know that the mammogram screening test is 85% accurate in terms of identifying if a tumor is malignant (cancerous) or benign (not cancerous).

If the test says a patient has a malignant tumor, most people would think that there's a very high chance that the patient has cancer.

_But we're not most people!_

Let's consider the following information:

- We have 10,000 patients with tumors
- We know that 1% of tumors are malignant, so we know that 10,000 * 0.01 = 100 patients actually have cancer
- Therefore, we also know that 9900 patients do _not_ have cancer

So now we need to determine which 100 of the 10,000 patients are the ones with a cancerous tumor. Given our test is 85% accurate, and that we consider the test _positive_ if it correctly detects a malignant tumor, we can determine the following:

- Of the 100 patients who have cancer, the test will correctly identify 85 of them (85%) as having cancer. These are what we call **true positives**
- Of the 100 patients who have cancer 15 of them (15%) will be incorrectly identified as not having cancer, even  though their tumors are indeed malignant. These are examples of **false negatives**
- Of the 9900 patients who do not have cancer, 8415 of them (85%) will be correctly identified as not having cancer. These are our **true negatives**
- Of the 9900 patients who do not have cancer, 1485 (15%) of them will be incorrectly identified as having cancer. These are examples of **false positives**

See the below table for a summary:


_Source: Using and Understanding Mathematics_
![cancer](cancer.png)

Therefore, we know that there is a total of 1570 (85 + 1485) patients who the screening test will say has cancer. Therefore, when a mammogram screening test comes back positive, we know there is only a 85/1570 = 5.4% chance that the patient actually has cancer!

## Activity: False Negatives

Given the previous information, what is the percentage of patients with negative test results who actually have cancer (false negatives)?

### Think, Pair, Share

- 5 min: Take some time to work out the answer on your own
- 2 min: Discuss your solution and reasoning with a partner.
- 5 min: The class will go over the solution with the instructor

## Polygraph Revisited

Now that we've got some practice in with how to investigate our numbers more thoroughly, let's revisit that polygraph example in the beginning.

### Flowchart

Review [this polygraph flowchart](https://docs.google.com/presentation/d/1aIfy_eewNDPtktaxQ1jO6RijkHBB9ZoRSTgGK745fH0/edit?usp=sharing), being careful to step through it piece by piece, building it up as we go along. Make sure to view it in presentation mode so that you get the full benefits of the animations!

### Results

From this flowchart, we see that of the people marked as lying by the polygraph (108), 99/108 = 91.7% were actually telling the truth! This means that almost 92% of the rejected applicants for the job would actually have been qualified!

## Activity: Airline Arrivals

Review the following table about arrival data for two airlines in five cities (this is real data, the airline names have been changed):

_Source: Using and Understanding Mathematics_
![airplane](airplane.png)

Given this information, answer the following questions:

1. Based on the `% On Time` columns in the above table, which airline has the higher percentage of on-time flights to the five cities? Note that you shouldn't need to calculate anything, just read the columns in the table
1. Find the percentage of on-time flights for the two airlines for the total number of arrivals in all five cities. For example, there were 3,775 total arrivals done by Excelsior Airlines, what percentage of them were on-time? Same question for Paradise Airlines
1. Explain why the results from Question 2 and Question 1 differ

## Stretch Challenge: High School Drug Tests

A high school is administiring a drug test for its upcoming nationals volleyball game. Athletes who test positive for drugs are eliminated from the game.

- From previous studies, we know that the tests are 95% accuruate
- You can safely assume that 4% of athletes are using drugs
- There will be 1000 athletes participating in this game

What fraction of the athletes who fail the test are falsely accused?

## Stretch Challenge: Politcal Math

People will often focus on the numbers that will benefit their side. Neither side is more fair than the other, as each one will generally put a deliberate focus on a percentage in order to mask the absolute change, which may be less favorable towards their position. This is known as **Selective Truth**.

For example, let's say that government spending for a popular education program was \\$100 million this year. Spending for the program next year is slated to be \\$102 million. Note that inflation is supposed to increase by 3% over the next year, meaning \\$1 this year is equivalent to \\$1.03 next year.

Supporters of the program are claiming that this is actually a cut to the program, whereas those who oppose the program are complaining the budget should be cut, not increased. Is one side lying?