<a href="https://colab.research.google.com/github/annakl97/NGG6050/blob/main/LeonardAnna_HW1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**NGG 6050: Quant Neuro**

**HW1: Frequentist vs. Bayesian Approaches**

Anna Leonard

2023.09.02


---



**Exercise #1: If someone gets a positive test, is it "statistically significant" at the p<0.05 level? Why or why not?**

Technically, yes.

Data set: 1000 HIV tests (some positive, some negative, # not known)

Hypothesis: The test indicates positive when a person is infected, and negative when they are not.

NULL Hypothesis: The test may indicate negative or positive regardless of a person's infection status.

**Probability of the null hypothesis** = probability of false negatives (0) + probability of false positives (5% or p=.05) = **.05**

Therefore, (if we allow for p=.05 rather than p<.05 to indicate "significance"), we can reject our null hypothesis and accept a positive test as a "statistically significant" indicator of HIV+ status.

(However, for an *individual*, the matter of statistical significance has no bearing on the personal significance of their own test result. A large group of people may take this test and, given the small percentage of false positives, use it as an estimate of how many individuals are infected. A single person taking the test has no indication of whether their postive is true or false, unless they consider other factors influencing how likely they are to be infected -- including, as we see in Exercise 2, what proportion of the population is infected.)





**Exercise #2: What is the probability that if someone gets a positive test, that person is infected?**

This code contains a loop which calculates the number of true and false positives and negatives, the percentage of positives, and the percentage of TRUE positives, given a population size, false positive rate, and false negative rate, for population infection rates ranging from 0 to 1, in increments of 0.1

Credit to Bing Chat (not ChatGPT) for helping me figure out how to increment by 0.1 when the default range() function takes integers only. I was going to use ChatGPT, but it requires using a phone number to sign up, and my phone is out of commission this weekend. All other code I wrote myself with slight formatting help from Stack Overflow.

As we can see from these results, the likelihood of a true positive varies widely with the actual infection rate of the population. Even with ZERO infections, positive tests will still happen 5% of the time. For a 10% infection rate, more than *1 in 5* positive results will be false positives.

The main takeaway for science in general:

If one is testing a hypothesis which one thinks has a very large probability of being true, a cutoff of p=.05 may be sufficient to suggest a real result.
But if the hypothesis you're testing is, based on background research, quite unlikely to be true, or the phenomenon occurs very rarely, a much more stringent cutoff should be used.

In [1]:
import numpy as np
for x in np.arange(0, 1, 0.1):
  pop = 1000
  f_pos_rate = 0.05
  f_neg_rate = 0

  p_pos_t = pop * x
  p_pos_f = pop * (1 - x) * f_pos_rate

  p_neg_t = pop * (1 - x)
  p_neg_f = pop * x * f_neg_rate

  percent_pos = ((p_pos_t + p_pos_f)/1000)*100
  percent_pos_t = (p_pos_t / (p_pos_t + p_pos_f))*100

#You may be wondering, why did she use the round function so much?
#It's because somehow, the increments of 0.1 were not always *exactly* 0.1 and had a lot of extraneous zeros at the end
  print("For a population with an infection rate of:", round(x,1), "\n")
  print("Number of true positives equals", round(p_pos_t,0))
  print("Number of false positives equals", round(p_pos_f,0))
  print("Number of true negatives equals", round(p_neg_t,0))
  print("Number of false negatives equals", round(p_neg_f,0), "\n")
  print(round(percent_pos,0),"% of tests are positive")
  print(round(percent_pos_t,0),"% of positive tests are true positives \n")

For a population with an infection rate of: 0.0 

Number of true positives equals 0.0
Number of false positives equals 50.0
Number of true negatives equals 1000.0
Number of false negatives equals 0.0 

5.0 % of tests are positive
0.0 % of positive tests are true positives 

For a population with an infection rate of: 0.1 

Number of true positives equals 100.0
Number of false positives equals 45.0
Number of true negatives equals 900.0
Number of false negatives equals 0.0 

14.0 % of tests are positive
69.0 % of positive tests are true positives 

For a population with an infection rate of: 0.2 

Number of true positives equals 200.0
Number of false positives equals 40.0
Number of true negatives equals 800.0
Number of false negatives equals 0.0 

24.0 % of tests are positive
83.0 % of positive tests are true positives 

For a population with an infection rate of: 0.3 

Number of true positives equals 300.0
Number of false positives equals 35.0
Number of true negatives equals 700.0
Numbe

After I finished the code above, I started thinking about how to make the output more easily readable and visually appealing by printing it as a table.

As I said above, I don't have access to ChatGPT because of my phone, and this was a little beyond Bing Chat, so I decided to use Google Bard.

My entire exchange with Bard can be viewed at these two links:
  
  Attempt 1: https://g.co/bard/share/03be308e8360
  
  Attempt 2: https://g.co/bard/share/31728664c6fe

When Attempt 1 was returning too many errors and delivering really wonky results, I started a new chat in attempt 2 and tried to make my vision for the table more simple and my request much more specific.

It took probably more than an hour, and a lot of random error troubleshooting, before I finally ended up with a result that was basically what I had been aiming for.

This was very similar to the process of putting random error codes into Stack Overflow, trying someone's suggestion, and then troubleshooting again, but much easier overall because I didn't have to parse through unfamiliar code and adjust for my own purpose - it was ready for me to plug in and test. It was also very nice to have one point of contact instead of having to jump between different forum entries.

On the other hand, sometimes Bard seemed to make simple errors or return the same code it had already given me, which was not working. It seemed like it was making errors I might not have made had I been trying things out entirely on my own. I also think I was much lazier about trying to check the code and learn the function of it's suggestions, until I got to the final version that worked. I found myself just blindly running it to test and then sending the error back to Bard without thinking about it.

Overall, I would say it was a net positive experience that helped me build a better code, but it is very important to be clear and specific about what you're looking for, and to watch the results you're getting carefully so you don't end up down a rabbit hole because the AI made a simple error.

Below is the final version of my code that generates a nice table of the true positive probability:

In [None]:
import numpy as np
import tabulate as t

table = []
for x in np.arange(0, 1.1, 0.1):
  pop = 1000
  f_pos_rate = 0.05

  p_pos_t = pop * x
  p_pos_f = pop * (1 - x) * f_pos_rate

  percent_pos = ((p_pos_t + p_pos_f)/1000)*100
  percent_pos_t = round((p_pos_t / (p_pos_t + p_pos_f))*100, 1)

  table.append([(x*100), percent_pos, percent_pos_t])

print(t.tabulate(table, headers=['HIV Prevalence (%)', 'Positive Tests (%)', 'True Positive Tests (%)'], tablefmt='fancy_grid'))



╒══════════════════════╤══════════════════════╤═══════════════════════════╕
│   HIV Prevalence (%) │   Positive Tests (%) │   True Positive Tests (%) │
╞══════════════════════╪══════════════════════╪═══════════════════════════╡
│                    0 │                  5   │                       0   │
├──────────────────────┼──────────────────────┼───────────────────────────┤
│                   10 │                 14.5 │                      69   │
├──────────────────────┼──────────────────────┼───────────────────────────┤
│                   20 │                 24   │                      83.3 │
├──────────────────────┼──────────────────────┼───────────────────────────┤
│                   30 │                 33.5 │                      89.6 │
├──────────────────────┼──────────────────────┼───────────────────────────┤
│                   40 │                 43   │                      93   │
├──────────────────────┼──────────────────────┼───────────────────────────┤
│           