# CS50 and CS51 Review

# Import necessary libraries
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt

# --- CS50: Introduction to Computer Science ---

## Session 1 (1.1): Critical Thinking

### Qualitative Notes
* **Critical thinking** involves analyzing information objectively, identifying assumptions, evaluating evidence, and forming judgments. 
* It's about **questioning everything**, not just accepting information at face value.

### Quantitative Notes
* **No direct quantitative applications** in this introductory session. 
* However, critical thinking forms the foundation for analyzing data and drawing accurate conclusions in later sessions.

## Session 2 (1.2): Logical Sentences

### Qualitative Notes
* **Logical sentences** are statements that can be either true or false. 
* We use **propositions** (statements represented by letters) to build logical sentences.

### Quantitative Notes
* **Truth values:**
    *  True (T)
    *  False (F)
* **Example:**
    * Proposition: "It is raining." (Let's represent this with the letter 'R')
    * If it's raining, R is True (T). If it's not raining, R is False (F).

## Session 3 (2.1): Logical Connectives and Truth Tables

### Qualitative Notes
* **Logical connectives** combine logical sentences:
    * **AND (∧):** True only if both statements are true.
    * **OR (∨):** True if at least one statement is true.
    * **NOT (¬):** Reverses the truth value of a statement.

### Quantitative Notes
* **Truth tables** systematically list all possible truth values for logical sentences.
* **Example:**
    * P: "It is sunny."
    * Q: "It is warm."
    * P ∧ Q (Sunny AND Warm) is true only if both P and Q are true.

| P | Q | P ∧ Q |
|---|---|-------|
| T | T | T     |
| T | F | F     |
| F | T | F     |
| F | F | F     |

## Session 4 (2.2): De Morgan's Laws

### Qualitative Notes
* **De Morgan's Laws** provide rules for negating compound logical sentences:
    * ¬(P ∧ Q) is logically equivalent to (¬P) ∨ (¬Q)
    * ¬(P ∨ Q) is logically equivalent to (¬P) ∧ (¬Q)
* They are essential for simplifying complex logical expressions.

### Quantitative Notes
* You can verify De Morgan's Laws using truth tables. 
* **Example:** 
    * ¬("It is sunny" AND "It is warm") is the same as ("It is not sunny" OR "It is not warm").

## Session 5 (3.1): Deductive Validity

### Qualitative Notes
* **Deductive validity:** An argument is deductively valid if the conclusion logically follows from the premises (the statements given).
* If the premises are true and the argument is valid, the conclusion *must* be true.

### Quantitative Notes
* No direct numerical calculations are involved.
* **Example:**
    * Premise 1: All dogs are mammals.
    * Premise 2: Fido is a dog.
    * Conclusion: Fido is a mammal. (This argument is deductively valid) 

# ... (Previous Code) ...

## Session 6 (3.2): Induction

### Qualitative Notes
* **Induction** involves making generalizations based on observed patterns.
* **Strong vs. Weak Induction:**
    * **Strong:** If a statement holds for all cases up to a certain point, it holds for the next case as well.
    * **Weak:** Draws a general conclusion from a limited number of observations.

### Quantitative Notes
* Often used in mathematical proofs and algorithms.
* **Example (Weak Induction):** You observe 100 swans, and they are all white. You might conclude that all swans are white (though this is not necessarily true).

## Session 7 (4.1): Fallacy Detection

### Qualitative Notes
* **Fallacies** are errors in reasoning that make arguments invalid.
* Common Fallacies:
    * **Ad hominem:** Attacking the person instead of the argument.
    * **Straw man:** Misrepresenting the opponent's argument.
    * **False dilemma:** Presenting only two options when more exist.

### Quantitative Notes
* No direct quantitative aspects, but recognizing fallacies is crucial for interpreting data and statistical claims. 
* **Example:** "You can't trust their study on climate change; they are funded by an oil company" (ad hominem fallacy).

## Session 8 (4.2): Logic Synthesis

### Qualitative Notes
* **Logic synthesis** involves combining basic logical elements (gates) to create more complex circuits.
* Fundamental to computer science, as it forms the basis of how computers process information.

### Quantitative Notes
* Boolean algebra is used to represent and manipulate logical expressions.
* **Truth tables** are essential tools in logic synthesis. 
* **Example:** Designing a logic circuit that outputs True only when two input signals are both True (this would be an AND gate).

## Session 9 (5.1): Estimation: Fermi Problems

### Qualitative Notes
* **Fermi problems** involve making order-of-magnitude estimations for quantities that seem difficult or impossible to calculate directly.
* Focus on breaking down the problem into smaller, more manageable parts.

### Quantitative Notes
* Emphasizes using approximations and back-of-the-envelope calculations.
* **Example:** Estimate the number of piano tuners in Chicago. You might consider the population of Chicago, the percentage of people who own pianos, how often pianos need tuning, etc. 

## Session 10 (5.2): Variables

### Qualitative Notes
* **Variables** are symbols that represent values that can change within a program or calculation.
* Essential for storing and manipulating data.

### Quantitative Notes
* Variables can store various data types:
    * **Integers (int):** Whole numbers (e.g., 5, -10, 1000)
    * **Floats (float):** Numbers with decimal points (e.g., 3.14, -2.5, 10.0)
    * **Strings (str):** Text (e.g., "hello", "CS50")
* **Example:**
```python
x = 10 # Assign the value 10 to the variable 'x'
y = 3.14 # Assign 3.14 to 'y'
name = "Alice" # Assign the string "Alice" to 'name'

# ... (Previous Code) ...

## Session 11 (6.2): Descriptive Statistics

### Qualitative Notes
* **Descriptive statistics** summarize and visualize data to reveal patterns and insights.
* **Measures of central tendency:**
    * **Mean:** Average value.
    * **Median:** Middle value when data is ordered.
    * **Mode:** Most frequent value.
* **Measures of dispersion:**
    * **Range:** Difference between the highest and lowest values.
    * **Variance:** Average squared deviation from the mean.
    * **Standard deviation:** Square root of the variance (measures spread).

### Quantitative Notes
* **Example:** Consider the following dataset of exam scores: [75, 80, 85, 90, 95]
    * Mean: 85
    * Median: 85
    * Range: 20
    * Standard deviation: ≈ 7.07
* **Visualizations:** Histograms, box plots, scatter plots.

## Session 12 (7.1): Probability Rules and Interpretations

### Qualitative Notes
* **Probability** quantifies the likelihood of an event occurring.
* **Probability scale:** Ranges from 0 (impossible) to 1 (certain).
* **Basic probability rules:**
    * Sum rule: P(A or B) = P(A) + P(B) - P(A and B)
    * Complement rule: P(not A) = 1 - P(A)

### Quantitative Notes
* **Example:**
    * If you roll a fair six-sided die, the probability of rolling a 4 is 1/6.
    * The probability of rolling an even number is 1/2 (2, 4, or 6).

## Session 13 (7.2): Conditional Probability

### Qualitative Notes
* **Conditional probability** is the probability of an event happening given that another event has already occurred.
* Notation: P(A|B) represents the probability of event A given event B.

### Quantitative Notes
* Formula: P(A|B) = P(A and B) / P(B)
* **Example:** 
    * A bag contains 3 red balls and 2 blue balls.
    * Event A: Drawing a red ball.
    * Event B: Drawing a blue ball on the first draw (without replacement).
    * P(A|B) = (3/4) * (2/3) / (2/5) = 5/8 (The probability of drawing a red ball given that a blue ball was drawn first)

## Session 14 (8.1): Distributions of Discrete Random Variables

### Qualitative Notes
* **Random variable:** A variable whose value is a numerical outcome of a random phenomenon.
* **Discrete random variable:** Can only take on a finite number of values.
* **Probability distribution:** Assigns probabilities to each possible value of the random variable.

### Quantitative Notes
* **Common discrete distributions:**
    * Bernoulli: Models the probability of success or failure in a single trial.
    * Binomial: Models the probability of a certain number of successes in a fixed number of trials.
    * Poisson: Models the probability of a certain number of events occurring in a fixed interval of time or space.

## Session 15 (8.2): Probability, Distributions, and Simulations

### Qualitative Notes
* **Simulations** use random numbers and computational models to mimic real-world processes.
* Useful for estimating probabilities and understanding complex systems when analytical solutions are difficult.

### Quantitative Notes
* **Monte Carlo simulations** are a common type that involve repeated random sampling.
* **Example:** Simulating coin flips to estimate the probability of getting a certain number of heads in a row.

# ... (Previous Code) ...

## Session 16 (9.1): The Normal Distribution

### Qualitative Notes
* The **normal distribution** (bell curve) is a fundamental probability distribution in statistics.
* Characterized by its mean (μ) and standard deviation (σ).
* **68-95-99.7 rule:** 
    * Approximately 68% of data falls within 1 standard deviation of the mean.
    * 95% within 2 standard deviations.
    * 99.7% within 3 standard deviations.

### Quantitative Notes
* **Probability density function (PDF):** Describes the probability of a continuous random variable falling within a given range.
* **Example:** Heights, IQ scores, and many natural phenomena often follow a normal distribution.

## Session 17 (9.2): Sampling Distributions and the Central Limit Theorem

### Qualitative Notes
* **Sampling distribution:** The distribution of a statistic (e.g., the mean) calculated from multiple random samples.
* **Central Limit Theorem (CLT):**  As sample size increases, the sampling distribution of the sample mean approaches a normal distribution, regardless of the shape of the original population distribution.

### Quantitative Notes
* CLT is crucial for making inferences about populations based on samples.
* **Example:** If you repeatedly sample from a population (even if it's not normally distributed) and calculate the mean of each sample, the distribution of those sample means will start to look normal as the sample size gets larger.

## Session 18 (10.1): CLT Continued and Regression to the Mean

### Qualitative Notes
* **Regression to the mean:** Extreme values in a dataset tend to be followed by values closer to the average.
* Explained by random variation and the tendency for extreme events to be less likely.

### Quantitative Notes
* **Example:**  A student who scores very high on one exam is likely to score slightly lower on the next exam, simply due to random variation. 

## Session 19 (10.2): Confidence Intervals for Means with the Normal Distribution

### Qualitative Notes
* **Confidence interval:** A range of values within which we are confident that a population parameter (e.g., the mean) lies.
* Expressed with a confidence level (e.g., 95% confident).
* Wider intervals indicate more uncertainty.

### Quantitative Notes
* Formula (for a 95% confidence interval of the mean, assuming a normal distribution):
    *  `Confidence Interval = sample_mean ± (1.96 * standard_error)` 
    *  `standard_error = sample_standard_deviation / sqrt(sample_size)`
* **Example:**  If a 95% confidence interval for the average height of a group is 160cm to 170cm, we are 95% confident that the true average height falls within this range.

## Session 20 (11.1): Confidence Intervals for Means with the T-Distribution and Sampling Independently

### Qualitative Notes
* **T-distribution:** Similar to the normal distribution, but used when the population standard deviation is unknown, and the sample size is small.
* **Degrees of freedom:** A parameter of the t-distribution, related to the sample size. 
* **Independent sampling:** Observations in one sample do not influence observations in another.

### Quantitative Notes
* The t-distribution is wider and flatter than the normal distribution for smaller sample sizes, reflecting greater uncertainty.
* **Example:**  Calculating a confidence interval for the average income of a small town when you don't know the population standard deviation.

# ... (Previous Code) ...

## Session 16 (9.1): The Normal Distribution

### Qualitative Notes
* The **normal distribution** (bell curve) is a fundamental probability distribution in statistics.
* Characterized by its mean (μ) and standard deviation (σ).
* **68-95-99.7 rule:** 
    * Approximately 68% of data falls within 1 standard deviation of the mean.
    * 95% within 2 standard deviations.
    * 99.7% within 3 standard deviations.

### Quantitative Notes
* **Probability density function (PDF):** Describes the probability of a continuous random variable falling within a given range.
* **Example:** Heights, IQ scores, and many natural phenomena often follow a normal distribution.

## Session 17 (9.2): Sampling Distributions and the Central Limit Theorem

### Qualitative Notes
* **Sampling distribution:** The distribution of a statistic (e.g., the mean) calculated from multiple random samples.
* **Central Limit Theorem (CLT):**  As sample size increases, the sampling distribution of the sample mean approaches a normal distribution, regardless of the shape of the original population distribution.

### Quantitative Notes
* CLT is crucial for making inferences about populations based on samples.
* **Example:** If you repeatedly sample from a population (even if it's not normally distributed) and calculate the mean of each sample, the distribution of those sample means will start to look normal as the sample size gets larger.

## Session 18 (10.1): CLT Continued and Regression to the Mean

### Qualitative Notes
* **Regression to the mean:** Extreme values in a dataset tend to be followed by values closer to the average.
* Explained by random variation and the tendency for extreme events to be less likely.

### Quantitative Notes
* **Example:**  A student who scores very high on one exam is likely to score slightly lower on the next exam, simply due to random variation. 

## Session 19 (10.2): Confidence Intervals for Means with the Normal Distribution

### Qualitative Notes
* **Confidence interval:** A range of values within which we are confident that a population parameter (e.g., the mean) lies.
* Expressed with a confidence level (e.g., 95% confident).
* Wider intervals indicate more uncertainty.

### Quantitative Notes
* Formula (for a 95% confidence interval of the mean, assuming a normal distribution):
    *  `Confidence Interval = sample_mean ± (1.96 * standard_error)` 
    *  `standard_error = sample_standard_deviation / sqrt(sample_size)`
* **Example:**  If a 95% confidence interval for the average height of a group is 160cm to 170cm, we are 95% confident that the true average height falls within this range.

## Session 20 (11.1): Confidence Intervals for Means with the T-Distribution and Sampling Independently

### Qualitative Notes
* **T-distribution:** Similar to the normal distribution, but used when the population standard deviation is unknown, and the sample size is small.
* **Degrees of freedom:** A parameter of the t-distribution, related to the sample size. 
* **Independent sampling:** Observations in one sample do not influence observations in another.

### Quantitative Notes
* The t-distribution is wider and flatter than the normal distribution for smaller sample sizes, reflecting greater uncertainty.
* **Example:**  Calculating a confidence interval for the average income of a small town when you don't know the population standard deviation.

# ... (Previous Code) ...

## Session 21 (11.2): Hypothesis Tests for Single Means, Type I & II Errors

### Qualitative Notes
* **Hypothesis test:** A statistical method for testing a claim about a population parameter.
* **Null hypothesis (H0):** The default assumption we are trying to disprove.
* **Alternative hypothesis (H1):**  The claim we are trying to find evidence for.
* **Type I error:** Rejecting the null hypothesis when it's actually true (false positive).
* **Type II error:** Failing to reject the null hypothesis when it's actually false (false negative).

### Quantitative Notes
* **P-value:**  The probability of observing data as extreme as ours if the null hypothesis were true. 
    * Low p-value (typically < 0.05) leads to rejecting the null hypothesis.
* **Example:**  Testing whether a new drug lowers blood pressure more effectively than a placebo.

## Session 22 (13.1): Hypothesis Tests, Confidence Intervals, and Multiple Comparisons

### Qualitative Notes
* **Connection between CI and hypothesis tests:** If a 95% CI for the difference between two groups doesn't contain 0, it suggests a statistically significant difference (we would reject the null hypothesis of no difference).
* **Multiple comparisons problem:** When conducting multiple hypothesis tests, the chance of a Type I error (false positive) increases.
* **Corrections (e.g., Bonferroni):** Methods for adjusting p-values to account for multiple comparisons. 

### Quantitative Notes
* It's essential to choose an appropriate hypothesis test based on the data and research question. 
* **Example:** Testing the effectiveness of different marketing campaigns on website traffic requires adjusting for multiple comparisons.

## Session 23 (13.2): Difference of Means Tests and Effect Size

### Qualitative Notes
* **Difference of means tests:** Used to determine if there's a significant difference between the means of two groups.
* **Independent samples t-test:** For independent groups.
* **Paired samples t-test:** For related groups (e.g., before-and-after measurements).
* **Effect size:**  Measures the magnitude of the difference between groups (not just whether it's statistically significant).

### Quantitative Notes
* **Cohen's d** is a common effect size measure for the difference of means.
* **Example:** A large effect size indicates a substantial practical difference between groups, even if the p-value is small.

## Session 24 (14.1): Effect Size and Statistical Power

### Qualitative Notes
* **Statistical power:** The probability of correctly rejecting the null hypothesis when it is false (finding a real effect).
* Power is influenced by:
    * Effect size: Larger effects are easier to detect.
    * Sample size: Larger samples provide more power.
    * Significance level (alpha): Smaller alpha reduces power (more conservative). 

### Quantitative Notes
* Power analysis is used to determine the required sample size to achieve a desired level of power.
* **Example:**  A study with low power might fail to detect a real difference between groups due to insufficient sample size.

## Session 25 (14.2): Review and Synthesis: All HCs

### Qualitative Notes
* Final review of key concepts and connections between them.
* Emphasizes the application of statistical thinking to real-world problems.

### Quantitative Notes
* **Example:**  You would synthesize your understanding of hypothesis testing, confidence intervals, effect size, and power to design and interpret a research study. 